NEW DEVELOPMENTS IN PSYCHOLOGICAL CHOICE MODELING
ADVANCES
IN PSYCHOLOGY 60 Editors:
G . E. STELMACH
P. A. VROON
NORTH-HOLLAND AMSTERDAM. NEW YORK . OXFORD. TOKYO
NEW DEVELOPMENTS IN PSYCHOLOGICAL CHOICE MODELING
Edited by
Geert DE SOETE Universin; of Ghent Belgium
Hubert FEGER Free University of Berlin F.R.G.
Karl C.KLAUER Free University of Berlin F.R.G.
I989
NORTH-HOLLAND AMSTERDAM. NEW YORK . OXFORD. TOKYO
ELSEVIER SCIENCE PUBLISHERS B.V. Sara Burgerhartstraat 25 P.O. Box 21 I , lo00 AE Amsterdam, The Netherlands Distributors for the United States and Canada: ELSEVIER SCIENCE PUBLISHING COMPANY, INC. 655 Avenue of the Americas New York, N.Y. 10010,U.S.A.
ISBN: 0 444 88057 7
OELSEVIER SCIENCE PUBLISHERS B.V.. 1989 All rights reserved. No part of this publication may be reproduced, stored in a retrieval system, or transmitted, in any form or by any means, electronic, mechanical, photocopying, recording or otherwise, without the prior written permission of the publisher, Elsevier Science Publishers B.V./Physical Sciences and Engineering Division, P.O. Box 199I , 1000 BZ Amsterdam, The Netherlands. Special regulations for readers in the U.S.A. - This publication has been registered with the Copyright Clearance Center Inc. (CCC), Salem, Massachusetts. Information can be obtained from the CCC about conditions under which photocopies of parts of this publication may be made in the U.S.A. All other copyright questions, including photocopying outside of the U.S.A., should be referred to the copyright owner, Elsevier Science Publishers B.V., unless otherwise specified. No responsibility is assumed by the Publisher for any injury and/or damage to persons or property as a matter of products liability, negligence or otherwise, or from any use or operation of any methods, products, instructions or ideas contained in the material herein. Printed in The Netherlands.
V
CONTENTS
List of contributors
vii
Introduction
1
Order invariant unfolding analysis under smoothness restrictions. W . J . Heiser
3
An analytical approach to unfolding. H . Feger
33
GENFOLD2: A general unfolding methodology for the analysis of preference/dominance data. W. S. DeSarbo & V . R . Rao
57
Maximum likelihood unidimensional unfolding for a probabilistic model without distributional assumptions. P . M.Bossuyt & E. E. Roskam
77
Latent class models for the analysis of rankings. M . A . Croon
99
The wandering ideal point model for analyzing paired comparisons data. G. De Soete, J . D. Carroll, & W. S. DeSarbo
123
Analysis of covariance structures and probabilistic binary choice data. Y . Takane
139
Two classes of stochastic tree unfolding models. J. D . Carroll, W . S . DeSarbo, & G.De Soete
161
Probabilistic multidimensional analysis of preference ratio judgments. J . L. Zinnes & D . B . MacKay
177
Testing probabilistic choice models. P . M . Bossuyt & E . E . Roskam
207
vi
Contcnts
On the axiomatic foundations of unfolding, with an application to political party preferences of German voters. B. Orth
22 1
Unfolding and consensus ranking: A prestige ladder for technical occupations. R. van Blokland-Vogelesang
237
Unfolding the German political parties: A description and application of multiple unidimensional unfolding. W . H . van Schuur
259
Probabilistic multidimensional scaling models for analyzing consumer choice behavior. W . S. DeSarbo, G. De Soete, & K . Jedidi
29 1
Probabilistic choice behavior models and their combination with additional tools needed for applications to marketing. W. Gaul
317
Author index
339
Subject index
341
vii
LIST OF CONTRIBUTORS
P. M. Bossuyt, Center for Clinical Decision Making, Erasmus University, P.O. Box 1738, 3000 DR Rotterdam, The Netherlands. J. D. Carroll, AT&T Bell Laboratories, 600 Mountain Avenue, Murray Hill, New Jersey 07974, U.S.A. M. A. Croon, Psychology Department, Tilburg University, Tilburg, The Netherlands. W. S. DeSarbo, Graduate School of Business, Marketing and Statistics Departments, University of Michigan, Ann Arbor, Michigan 48109, U.S.A. G. De Soete, Department of Psychology, University of Ghent, Henri Dunantlaan 2, 9000 Ghent, Belgium. H. Feger, Institute for Psychology, Free University Berlin, Habelschwerdter Allee 45, 1000 Berlin 33, FR Germany. W. Gaul, Institute of Decision Theory and Operations Research, Faculty of Economics, P.O. Box 6380,7500 Karlsruhe 1, FR Germany. W. J. Heiser, Department of Data Theory, University of Leiden, Middelstegracht 4, 2312 TW Leiden, The Netherlands.
K. Jedidi, Marketing Department, Wharton School, University of Pennsylvania, Philadelphia, Pennsylvania 19104, U.S.A. K. C. Klauer, Institute for Psychology, Free University Berlin, Habelschwerdter Allee 45, 1000 Berlin 33, FR Germany. D. B. MacKay, School of Business, Indiana University, Bloomington, Indiana 47405, U.S.A. B. Orth, Department of Psychology, University of Hamburg, Von-MellePark 6, 2000 Hamburg 13, FR Germany.
viii
Contributors
V. R. Rao, Johnson Graduate School of Management, Cornell University, Ithaca, New York 14853, U.S.A.
E. E. Roskam, Mathematical Psychology Group, University of Nijmegen, Montessorilaan 3, 6500 HE Nijmegen, The Netherlands. Y. Takane, Department of Psychology, McGill University, 1205 Docteur Penfield Avenue, Montreal, PQ, Canada H3A 1B1. R. van Blokland-Vogelesang, Department of Psychology, Free University, Van der Boechorstraat 1, Room 1B-69, P.O. Box 7161, 1007 MC Amsterdam, The Netherlands.
W. H. van Schuur, Department of Statistics and Measurement Theory, Faculty of Social Sciences, University of Groningen, Oude Boteringestraat 23, 9712 GC Groningen, The Netherlands. J. L. Zinnes, National Analysts, 400 Market Street, Philadelphia, Pennsylvania 19106, U.S.A.
1
INTRODUCTION
Historically, two of the most important contributions to psychological choice modeling are undoubtedly Thurstone’s (1927) Law of Comparative Judgment and Coombs’ (1950, 1964) unfolding theory. The framework that Thurstone’s Law of Comparative Judgment provides for representing inconsistent choices is still the point of departure for much of the current work in probabilistic choice modeling. In 1987 the journal Communication & Cognition published a special issue on probabilistic choice models. Several of the papers in this special issue exemplify how many of the recent probabilistic choice models are still in one way of another related to Thurstone’s general Law of Comparative Judgment. An entirely different approach to modeling individual choice was offered by Coombs in his unfolding theory. Coombs’ unfolding principle gave rise to many different unidimensional and multidimensional unfolding models, as illustrated in the 1988 special issue on unfolding of the German journal of social psychology Zeitschrift fur Sozialpsychologie. The editors of both special issues wanted to make the contributions in these issues available to a broader audience. Since the papers in the two special issues are often very much related to each other, in that some of the recent stochastic choice models are based on a geometric unfolding model or, equivalently, that some of the recent unfolding models are probabilistic, it was decided to bundle the contributions into a single edited volume. Most papers have been substantially revised since their initial publication in either Communication & Cognition or Zeitschrift fur Sozialpsychologie. The resulting volume is fairly representative of the current work in psychological choice modeling. The papers by Heiser, Feger, and DeSarbo and Rao concentrate on devising efficient methods for fitting deterministic unfolding models to nonmemc (Heiser, Feger) or metric (DeSarbo & Rao) data. In the papers by Bossuyt and Roskam, Croon, De Soete et al., Takane, Carroll et al., and Zinnes and MacKay new choice models are developed. Whereas Bossuyt and Roskam propose a new
2
De Soete, Feger, 13Klauer
unidimensional probabilistic unfolding model, De Soere er al. and Zinnes and MucKay elaborate new multidimensional probabilistic unfolding models. Takune proposes a family of stochastic models where the within-subject and the between-subject inconsistency are explicitly modeled. An attempt to formulated discrete probabilistic analogs of the unfolding model is reported by Carroll et al. Next come two papers that deal with the problem of assessing the validity of choice models. Bossuyt and Roskam discuss one approach to testing the assumptions of probabilistic models, while Orrh explains and illustrates an axiomatization of the (deterministic) Coombsian unfolding model. The remaining contributions of the volume contain some important applications of psychological choice modeling in the fields of political science and marketing research. Van Blokland-Vogelesang illustrates the use of an unfolding technique for constructing a prestige ladder, whereas van Schuur applies a specific unidimensional unfolding model to political science data. DeSarbo et al. and Gaul discuss probabilistic choice models and related tools that are applicable in consumer research. As will be apparent from the various contributions in this volume, important progress has been made in psychological choice modeling in the last few years. However, many problems remain to be solved and it is our sincere hope that this volume might stimulate other researchers to work on some of these problems.
References Coombs, C. H. (1950). Psychological scaling without a unit of measurement. Psychological Review, 57, 145-158. Coombs, C. H. (1964). A theory of data. New York: Wiley. Thurstone, L. L. (1927). A law of comparative judgment. Psychological Review, 34, 273-286.
New Developments in PsychologicalChoice Modeling G . De Soete, H. Feger and K. C. Klauer (eds.) 0 Elsevier Science Publisher B.V. (North-Holland), 1989
3
ORDER INVARIANT UNFOLDING ANALYSIS UNDER SMOOTHNESS RESTRICTIONS Willem J . Heiser University of Leiden, The Netherlands Unfolding analysis is shown to have firm roots in the Thurstonian attitude scaling tradition. Next the nonmetric multidimensional approach to unfolding is described, and characterized in terms of objectives proposcd for attitude scaling by Guttman. The nonmetric approach is frequently bothcrcd by a phenomenon called degeneration, i.e., the occurrence of extremely uninformative solutions with good or even perfect fit. A new way to resolve this problem, while keeping the method order invariant, follows from the introduction of smoothness restrictions on the admissible model values. The effectiveness of requiring smoothness is illustrated with an example of political attitude scaling, and with a two-dimensional analysis of differential power attribution among children. Cross validation and resampling techniques can be used for establishing the stability of the unfolding results.
1. Introduction Applications of the unfolding model, using any one of its associated techniques, have been remarkably scarce in social psychology, especially in view of the fact that this methodology has such a classic precursor: the Thurstonian attitude scaling approach (Thurstone, 1929, 1931; Thurstone & Chave, 1929; see also Thurstone, 1959). Thurstone transferred the unimodal response model familiar from psychophysics to the study of attitudes and opinions, more generally of aflectively loaded responses. The attitude score of a subject was defined as the mean or the median scale value of the attitude statements endorsed. The selection and the allocation This paper is a revised version of an article published in Zeitschrifi f i r Sozialpsychologie, 1987, 18, 220-235.
4
Heiser
of scale values to the statements was usually done in a preliminary study, in which judges had to compare them with respect to their ‘‘favorability”. The reader is referred to Edwards (1957) for an extensive discussion of the Thurstonian approach, including its quality criteria and various early variants. In modem terms, it can be characterized as a way to perform an external unfolding analysis (a name coined by Carroll, 1972), with the model of equal appearing intervals - or the method of paired comparisons - as the first stimulus scaling step, and the computation of the mean or median as a primitive method to find the ideal point, i.e., the location of an imaginary statement that would get maximal support from any particular subject, or group of subjects. After the Second World War, Thurstonian attitude measurement became more and more a curiosity. The assumed possibility to obtain unique, common scale values in the first step of the judgmentendorsement procedure had always been a matter of debate. The early evidence in a variety of attitude domains, such as attitude “toward the Negro’’ (Hinckley, 1932), “toward a particular candidate for political office” (Beyle, 1932), “toward war” (Ferguson, 1935), and “toward one’s own country” (Pintner & Forlano, 1937), seemed to be positive in the sense that very high correlations were found between sets of scale values obtained from groups of judges with widely different attitudes. However, starting with Hovland and Sherif (1952) the influential social judgment school (Sherif & Hovland, 1961; Sherif, Sherif, & Nebergall, 1965) cast serious doubts on the validity of trying to separate “cognitive” judgments - presumably elicited in the first step - from “affective” judgments - presumably elicited in the second step. Objections were raised against some of the standard practices, such as eliminating judges with extreme categorizing behavior. Evidence was found for meaningful and systematic assimilation and contrast effects, reflected in local distortions of the stimulus scale. In addition, the social judgment school called attention to other aspects of attitudinal responses, i.e., the range of statements strongly endorsed (“the latitude of acceptance”), the subset of statements strongly rejected (“the latitude of rejection”, not necessarily consisting of statements in consecutive positions along the scale), and areas of neutrality (forming “the latitude of noncommitment” in between the regions of acceptance and rejection).
Smooth Order Invariant Unfolding Analysis
5
It is important to notice that, despite these criticisms and amendments, the major constituents of the Thurstonian approach remained intact. The statements were scaled in a separate judgment procedure. Attitude was conceived as a subject specific response function with respect to these scale values. Although other aspects than location of the peak were deemed important, it was still assumed - and empirically verified - that response strength tapers off as a function of the distance from the “own stand as an anchor point’’ (Sherif et al., 1965). Meanwhile, Likert’s short-cut (Liken, 1932) had become increasingly popular. It involves the reduction of the judgment to an a priori classification of the statements into two about equally sized classes: the favorable ones and the unfavorable ones. By adjusting the scoring direction of the responses accordingly, and by using “refinements” borrowed from test theory, the concept of a statement scale value seemed to be superfluous. Indeed, it has become common practice to ask subjects directly for their evaluations of the attitude object. Only Likert’s response format survived, and statement scaling was abandoned altogether. Guttman’s (1941, 1944, 1947, 1950) contributions are much less easily summarized in a few sentences. At least three novelties that he introduced into the field of attitude measurement should be mentioned: a. A method for finding a scale based on the endorsement alone; b. Posing reproducibiliry as an explicit criterion for scale construction; c. Scaling the response categories, rather than the statements themselves; It is of some historical interest to notice that the desirability of (a), called the “response approach” by Torgerson (1958, pp. 45-48), had already been expressed at the very introduction of Thurstone’s method: “Ideally, the scale should perhaps be constructed by means of voting only. It may be possible to formulate the problem so that the scale values of the statements may be extracted from the records of actual voting. If that should be possible, then the present procedure of establishing the scale values by sorting will be superseded.” (Thurstone & Chave, 1929, p. 56). Guttman achieved (a) by using (b): the construction should be such that “from a person’s rank alone we can reproduce his response to each of the items in a simple fashion” (Guttman, 1947, p. 249). But at the same time -
6
Heiser
although this would not have been strictly necessary - he switched from the concept of a Statement point (i.e., a stimulus scale value) to the idea of characterizing each statement as a set of category points (i.e., response alternative scale values). In addition he assumed that all category points for a single statement would ideally be ordered along the scale in their “natural” order, from “strongly disagree” via “indifferent” to “strongly agree”. So in Guttman scaling each subject is characterized by a score, and each statement by some monotonically increasing curve, for which frequently a step function is used as a first approximation. By contrast, and in line with the Thurstonian tradition, the unfolding technique represents each statement as a point along a scale, and each subject as some unimodal or single-peaked curve, for which frequently the location of the peak is considered to be the parameter of most interest. The approach of this paper will be to stick to aims (a) and (b), to replace (c) with a less restrictive requirement, and to bring in again the allocation of scale values to the objects of judgments. Undoubtedly, Coombs (1950, 1964) contributed much to the conceptual development of the single-peaked response model, including coining the generic name unfolding. In particular, he convincingly argued that one should refrain from making strong assumptions about the measurement level of human judgments - within, but especially also across persons - and that metric information should be obtained through the study of scalability. However, his methods for actually fitting scaling models to any set of data at hand lacked the rigor of optimizing a single loss function (as the reproducibility criterion is called nowadays). The Nonmemc Multidimensional Scaling (NMDS) approach to unfolding, to be discussed in Section 2, does enjoy this property. However, it is frequently bothered by a phenomenon called degeneration, as shall be clarified in Section 3. Then Section 4 proposes a new approach to resolve this difficulty, based on the idea of requiring a smooth succession of reproduced values. Next, the method will be applied in Section 5 to some political attitude data, and to a small example concerning the perceived importance of power characteristics by different groups of children in a classroom setting. Finally, Section 6 discusses some of the diagnostics that can be used in connection with an unfolding analysis.
Smooth Order Invariant Unfolding Analysis
7
2. The Nonmetric Multidimensional Scaling Approach to Unfolding
The earlier formulations of the unimodal response were all onedimensional, perhaps for reasons of simplicity, or just “another manifestation of psychologists’ peculiar evaluation monomania, reducing all information to this one dimension as if people think of themselves and other objects exclusively in terms of how good or how bad they are” (McGuire, 1985, p. 242, referring to McGuire, 1984). The model can be formulated q-dimensionally right from the start, with q = 1 merely a special case. At our disposal is a table P with elements pi,, each row of which corresponds to a particular subject, or group of subjects, i (i = 1, . . . , n), whereas each column corresponds to a particular statement, or other piece of psychological material, j 0’ = 1, . , . , rn). P might contain a measure of preference or response strength, or the proportion of people in group i voting for alternative j , or any other indication of the attraction of objecr j for source i. The first objective is to assign a point y, to each object. In the onedimensional case y, is just one real-valued number that can be marked off on a line; in the two-dimensional case y, is characterized by two coordinate values that can be plotted in a plane; in the q-dimensional case y, is a location in a q-dimensional space (less easy to visualize and talk about, but the principles and notation remain the same). We may now view the response strength of source i as a function of the y,. Under the unimodal response model it is assumed that this function has a single peak, i.e., it decreases monotonically in all directions with respect to some central point xi. In addition, it is assumed that the location of the peak is specific for each source. Since response strength is maximal at the position of the central point, xi is usually called the ideal point for source i. So the model associates objects with points, and sources with single-peaked curves or surfaces that are shifted with respect to each other. These shifts, or translations, are very important. Imagine, for instance, a set of unimodal curves precisely on top of each other; then any relocation of the object points along the line, although destroying the common shape, would still account for the same information. One could make the curves more skewed, double-peaked, monotonically increasing, any shape at all, by suitable reexpressions of the values against which they are plotted. But, when the curves are shifted along the object scale, the freedom of
Heiser
8
simultaneous change of shape is reduced enormously. It was Coombs (1950) who first clearly demonstrated this property of shifted singlepeakedness. Similar properties of shifted monotonically increasing curves have been studied in depth by Levine (1970, 1972). So far the description characterizes what is common to all unfolding techniques (though some are confined to the one-dimensional case). The MDS approach now proceeds as follows. Attention is restricted to those single-peaked curves and surfaces that are a decreasing function of the distance d(xi,y,) of the object point y, from the ideal point x i . This is almost always the ordinary Euclidean distance d(Xi,Yj)
=
c (xi, [a
-yja)*
I
3
(1)
defined here on the coordinate values x, and yja for ideal points and object points respectively, where a = 1, . . . , q. A major consequence of this restriction is that the response function will always be symmetric. Suppose we connect all points that have equal attractivity for a given source. Such a contour line is called an isochrest in this context, in analogy with “isobar” and “isotherm” for lines of equal atmospheric pressure and equal temperature on a map of physical locations Weiser & De Leeuw, 1981). In the MDS approach to unfolding the isochrests are assumed to be sets of concentric circles (or spheres, or hyperspheres, for q > 2) centered at the ideal point, due to their dependence on the distance function (1). At his juncture, the set of single-peaked functions could be restricted still further, for instance by choosing the explicit model
Here xi, denotes the predicted response strength, the decay function is of the negative exponential type, the parameter pi represents the maximum of the function (attained when the ideal point xi coincides with the object point y,), and the parameter ai represents the dispersion or tolerance of source i. Both oli and pi are assumed to be strictly positive. Note that ai would be a parameter of interest to workers in the tradition of the social judgment school, as it indicates the size of the latitude of acceptance relative to the latitude of rejection. From (2) it follows that the logarithm of predicted response strength is linear in the distances, and a metric
Smooth Order Invariant Unfolding Analysis
9
unfolding technique could be based on this model feature (cf. Heiser, 1986, for a more detailed discussion hereof). Obviously, there are many more conceivable relationships between data and distances than the one expressed in (2). The nonmetric approach attempts to embrace them all by introducing an intermediate type of quantities called the pseudo-distances (a term from Kruskal, 1977). In the unfolding situation, where we deal with row specific functions, they are defined as follows. Suppose the location of the object points is fixed, and consider a candidate ideal point xi, also fixed. In order to evaluate how well the distances in this particular configuration correspond to the i’th row of the data, we compute the minimum value of the raw stress
over all values of yi, satisfying the monoronicity restrictions %; 2 h if Pij
(4)
The pseudo-distances are those quantities for which a minimum of raw stress is attained. So the pseudo-distances increase whenever the preference decrease, (4),while approximating the current distances as well as possible in the least squares sense, (3). Because of the latter property they are denoted with di;. For each source a set of pseudo-distances can be calculated for any given candidate configuration; this procedure, which serves as a subprocedure of NMDS, is called monotonic, or isotonic, regression (Barlow, Bartholomew, Bremner, & Brunk, 1972; Kruskal, 1964a, 1964b). Now it is easy to define the NMDS approach to unfolding. It is to find a joint configuration of object points and ideal points that minimizes the mean normalized stress. A normalization factor is needed because we want to avoid a collapse of all points into a single location, which would make all distances zero, and consequently all pseudo-distances zero. Monotonicity requirement (4)would still be satisfied, and the mean raw stress would be minimal, but obviously such a result is trivial because it does not depend on the data. More will be said about normalization in the next section. Summarizing, the aim is to minimize
,.
10
Heiser
over all q-dimensional joint configurations. The notation E Ti is used to denote the domain of the inner minimizations: all admissible pseudodistances satisfying (4) for source i. Conventionally, stress is the square root of the right-hand side of (5); this is reflected in the notation S2 at the left-hand side, but of no consequence for the calculations. Computer programs that execute the minimization of ( 5 ) do exist, e.g., KYST (Kruskal, Young, & Seery, 1973) and MINIRSA (Roskam, 1979). If we compare the present specification with the three general aims mentioned in the Introduction, it becomes clear that: a.
The method depends on the endorsements alone, which enter into the definition of the pseudo-distances (4);
b. The explicit reproducibility criterion is the mean normalized stress (3, and the data are reproduced in terms of the optimal distances; c. The response categories are scaled for each source separately (the optimal pseudo-distances) and the statements are scaled by assigning object points to them. In addition the subjects are scaled by assigning ideal points to them.
On the one hand the methods puts heavy trust on the data, since it does not rest upon the availability of some a priori given set of object points (in the terminology of Carroll, 1972, the analysis is internal). On the other hand it does not squeeze the data dry, since only the row-wise ordinal relationships (4) are used. The exact numerical values of the responses are immaterial; every set of transformed values that preserves the row-wise order will results in precisely the same joint configuration of points. Thus the analysis is order invariant. One might wonder what price - apart from extensive computation - has to be paid to achieve all this. It turns out that very frequently degenerate solutions are obtained; i.e., solutions that fit well or even perfectly, but nevertheless account for little of the original information of the data.
Smooth Order Invariant Unfolding Analysis
11
3. Degeneration Many methods of multidimensional data analysis get into trouble when the conditions for optimal performance are violated. This need not be very alarming as long as good diagnostics are available. For nonmetric unfolding the situation seems to be quite different. It gets into trouble almost all the time, even in circumstances that look favorable. It is not so much a matter of computational trouble, but of disappointingly uninformative results. The extreme form of uninformativeness is known under the name degeneration. This section mes to clarify what forms of degeneracy can be expected, and why it can happen so often.
X
-
V
A...Z one-plus-oneconfigwarion
x "
X/A V
7-
A
B...Z two-plus-twoconfigwarion
x/z
XIA
"
"
"
A
B...Y
Z
two-plus-threeco4gwation
Figure 1. Degeneracies in one dimension.
Kruskal and Carroll (1969) provided the first careful exposition of the problem. Good accounts can also be found in Borg (1981) and Heiser
Heiser
12
(1981). The key observation is that the unfolding technique has no problem at all representing a data table with constant elements - quite contrary to the square symmetric MDS case. By collapsing all ideal points into one location and all object points onto another, a one-plus-one configuration is obtained (see Figure 1, in which X is the label for the cluster of ideal points and A * Z the label for the cluster of object points). So in the unfolding model equal object-to-source distances can be embedded in one dimension, whereas in symmemc MDS the equal distance case requires n - 1 dimensions. Generally, a data table does not have constant elements, of course, but under (4) equal pseudo-distances are allowed for any set of data. So if NORM in ( 5 ) is chosen simply as the sum of squared distances or the sum of squared pseudo-distances a properly working algorithm can always find a zero stress solution of the one-plus-one form. All variance in P is thus reduced to zero variance in the @seudo-)distances. This observation suggested at once that both a collapse into one location (the original motivation for normalization) and a collapse into two locations (of the one-plus-one form) could be avoided by a clever choice of NORM. Choosing the variance seemed to be particularly suited: configurations that resemble either one of the situations have distances with small variance, and because NORM is in the denominator of normalized stress they would be “unattractive” to the algorithm. The early experiments with nonmetric unfolding also included an unconditional approach. Unconditionally simply means that raw stress is defined for the whole table, not row-wise as in (3), that the monotone regression is performed on all distances, and that the normalization is done across rows as well. In this case normalization on the variance cannot exclude the Occurrence of a two-plus-two configuration, which is illustrated in the middle panel of Figure 1. Suppose A is the object of last choice for a source or group of sources XIA. Then the remaining sources X can be placed halfway between A and the remaining objects B * * Z, which yields equal distances, and X / A can be placed at any location to the right of X, which yields some small distances and at least one larger one. Therefore the variance will be strictly larger than zero. The twoplus-two is, for that matter, not the only possibility here. If we allow equal pseudo-distances within sources, and merely require distinct values at some place in the table, then there is an endless number of solutions of the type object implosion (see Figure 2, top). This degeneracy has n
-
Smooth Order Invariant Unfolding Analysis
13
distinct object-to-source distances, constant for any given source, and it is totally uninformative. In conclusion, unconditional normalization on the variance does not help.
* * *
**
* *
* *
A,
*
...,Z 0
* *
*
* *
object implosion
* *
*
0
0
0
0
0 0 0
*
object circle
X
0
0
0 0
0
B Q
Figure 2. Two-dimensional degeneracies.
14
Heiser
Returning to the row-conditional case, there does not seem to exist a universal degeneracy anymore if NORM is chosen as the variance. Yet in practice one frequently finds something close to the object circle (see Figure 2, middle), which differs from the previous cases in that it is an idealization, not a strict counterexample. Why the object circle is so attractive, even when NORM is the row-wise variance, is not completely understood. Heiser (1981) has given some heuristic arguments, but these do not seem to be conclusive. A particularly challenging observation is this: suppose { d i j } is a given set of distances, and consider the transformation { d ; } t {di, + ai}, i.e., we add a constant to each row of the distance mamx. Then the pseudo-distances associated with { d ; } change with precisely that additive constant too, and the raw stress remains the same. Because the variance is invariant under addition of a constant, the mean normalized stress remains the same as well. So all distance matrices that differ from each other by row-wise increments fit equally well! By letting ai grow without bound, { d ; } approximates the equal distance case. This fact would certainly merit a more thorough mathematical analysis, in order to reveal which further conditions cause the object circle to remain such an attractive object. The last two examples of degeneration, due to Heiser (1985), follow from certain simple conditions on the data. They are therefore not complerely uninformative, because they diagnose the conditions whenever these occur, but nevertheless the bulk of the empirical information is lost. Figure 1 (bottom) shows the rwo-plus-rhreeconfiguration, which is a zero stress solution, with positive variances, for any set of data in which either one of two different objects is always ranked last. Suppose A is the object of last choice for the group of sources X/A,and similarly Z for group X/Z;thus B * - * Y are the remaining objects. Then the two-plusthree configuration locates X/A at equal distance from all objects except A, which is further away, and X/Z at equal distance from all objects except Z, which in turn is more remote. Is it strange to suppose that there are merely two objects of last choice? Well, it is a structural property of the one-dimensional unimodal model (Coombs, 1964, Ch. 5). So the NMDS approach to unfolding cannot distinguish among a great number of distinct cases, some of which may be “perfect” data! Figure 2, bottom part, shows the triangle-star configuration, which is a universal zero stress solution for any set of data that contains precisely three objects
Smooth Order Invariant Unfolding Analysis
15
of last choice. Evidently, normalization on the variance is not a safeguard against these slightly complicated clustering effects. Degeneration is a phenomenon typical for internal unfolding. When the object points are restricted to known locations (external unfolding), some or all of the ideal points may occasionally tend to cluster around the centroid. But in this case such a contraction can only happen if the responses are unrelated to the a priori scale, and the true state of affairs is much more readily diagnosed. In the Thurstonian tradition the “criterion of relevance’’ (Torgerson, 1958, pp. 88-93) served this purpose, and a modem unfolding method will always return high stress values whenever there is no systematic, unimodal response with respect to the fixed object configuration. In order to keep the internal approach alive, it is necessary to impose some additional restrictions on the model. The next section will show how this can be done without having to sacrifice either the internal character of the analysis or its order invariance. 4. Smoothness Restrictions on the Pseudo-Distances
Degeneration is not easily characterized in terms of a single property of the joint configuration of object and ideal points. In the triangle-star we have seven clusters in a particular arrangement; in the object circle we have we cluster of ideal points and an irregular distribution of object points along the circle; in the two-plus-two we have clusters and isolates, and so on. However, all degeneracies reviewed in the previous section can be characterized as having a distance distribution in which all mass is concentrated at one or two values. If the joint configuration would have a more homogeneous scatter, then the distance distribution would have a variety of distinct values. It would be more smooth. This forms the basis of the method proposed here. It is not possible, or at least not easy, to impose restrictions directly on the distances, since these are already a function of the point coordinates. But is is possible to constrain the pseudo-distances, which are the target of approximation by the distances. If the pseudo-distances are “smooth”, the distances cannot be “rough”, because the fitting process is of the least squares type. We need only to consider the situation for any source i, because the pseudo-distances are determined independently for each of them. The
Heiser
16
following notation is helpful. Let r l , . . . , r, denote the rank numbers that would order the i’th row of the data in strictly descending order. The ordinary monotonicity requirements (4) can now be written as: “I(1)
5 y(r2) 5
-
5 ?I(r,,,).
*
(6)
So the object that is ranked first should get a smaller value than the object that is ranked second, and so on. The ‘y’s could be called the admissible pseudo-distances, they indicate the range of values that the pseudodistances are allowed to take. In these terms a degeneracy is characterized by, for some s: A
A
d(r1) = d ( r 2 ) =
A
* * *
A
= d(r,) 5 d(r,+1) =
*
--
A
= d(rm).
(7)
Obviously, under (6) the outcome (7) is admissible. If we agree to use ro for indicating the lower bound of the pseudo-distances, then the case s = 0 represents the situation of a single block of equal values, and the case s 2 0 two blocks of equal values. The notion of smoothness not only implies that a sequence like (7) should contain fewer equalities and more inequalities, but also that the successive values change gradually. It is therefore natural to consider the steps for s = 1, . . . , m, where it is understood that $ro) = 0. The only resmction of the monotonic approach is that the steps be non-negative. The approach taken here uses the additional restrictions
I 8, - e,-l I Ie.,
(9)
where 8. is the mean step, and 80 = 0. So the change of successive steps is bounded by the mean step 8.: after a small (large) step, there cannot immediately follow a large (small) one. This controls the “acceleration” of the pseudo-distances; for a large step to be possible, it has to be preceded by a sequence of increasing steps, and to be succeeded by a sequence of decreasing steps (the latter consequence follows from the absolute values in (9)). There are other ways to obtain a less concentrated distance distribution, so it might be helpful to underline what is deemed important. In the first place, while monotonicity is a condition on the first order differences, smoothness is defined here as a condition on the second order differences
Smooth Order Invariant UnfoMing Analysis
17
only; this keeps things tractable. Secondly, these second-order differences are bounded by an internally determined bound: the mean step. This makes the restrictions scale free, i.e., if some y satisfies (9) then any homogeneously rescaled version of it satisfies (9) as well. The idea of using the mean step as a bound originates from the consideration that in an equally-spaced sequence each step equals the mean step. Now as soon as some steps deviate from the mean step there should be some compensation for it in another part of the sequence. Translated back in terms of distances among points in a “non-degenerate” configuration: there may be as few small distances (for instance if an ideal point lies on the edge of the configuration), but then there generally have to be some intermediate and quite a lot of large distances as well. Alternatively, there might be numerous small distances (for instance if an ideal point lies near a dense region of object points), but then there have to be at least some larger ones. Thirdly, the constraints (9) in particular also hold for s = 1, the fist step, which corresponds to the value of the first pseudo-distance. In a precursor of the present method Heiser and Meulman (1983a) in fact used s = 1 as the only constraint in addition to monotonicity, and obtained quite satisfactory results. Apparently it is essential that an ideal point is not allowed to drift away too far from the most preferred object. But in some cases the single constraint on the fist step appeared to be too weak. Finally, note that the analysis remains nonmetric, or order invariant, because the steps are not evaluated with respect to the numerical values of the data, but only with respect to each other (given the ordering of the data, reflected in r l , . . . ,r,J. This distinguishes the present method from approaches using nonlinear function fitting (Muller, 1983), which do rely on the metric properties of the data, and keeps it within the optimal scaling family (Young, 1981). Figure 3 gives an example of the smoothness restrictions in action for the case of equal distances, all of (arbitrary) value 7.0. Under monotonicity alone, the pseudo-distances would all coincide with the distances. The smoothness requirement does not cause change in the middle range, but it is clear that the big jump at the beginning is now more gradually built up. It might come as a surprise that there is also an upward deviation at the end of the sequence. This is caused by the role of the mean step. If the last three pseudo-distances would be chosen closer to the distances, the mean step would be diminished, so that the constraint on the
Heiser
18
step differences (9) would become tighter, and the built-up would require more steps. Because raw stress equals the sum of squared distance of the o-points to the *-points in Figure 3, the decrease of the large deviations at the beginning outweighs the small increase at the end. This effect of the mean step is considered to be desirable in the present context.
n
0
n
B
E
_. 7.Oji
0
*
*
*
*
t
8
o
o
o
o
*
Q
o
*
0
0
0
0 I
1
I
2
I
3
I
1
4
5
6
I
7
I
8
9
l
1
,
I
l
0
1
+
1
1
,
2
t
1
3
ranked data
Figure 3. Smooth succession of pseudo-distances ( 0 ) when all dis-
tances (*) are equal. Some care has to be taken in considering the shape of a plot like Figure 3, because the spacing along the horizontal axis is arbitrary. The raw data might be spaced differently, which would alter the shape. It is also for this reason that it is better not to call what is involved here a smooth transformation, but to call it a smooth succession, or a smooth distribution. Smoothness is defined entirely along the vertical axis. The interested reader is referred to Heiser (1985) for details on the computation, the treatment of ties, and the application of this approach to
Smooth Order invariant Unfolding Analysis
19
square symmetric NMDS. The computational complications arising from the additional resmctions are not at all trivial, but a full discussion would lead us outside the scope of this paper. For the examples to be presented in the next section use was made of the FORTRAN program SMACOF3B, which has been designed to minimize the normalized stress ( 5 ) under the smoothness restrictions, with the sum of squared pseudo-distances as the NORM. The reasons for using this NORM, as well as the rationale of algorithm construction, are explained in De Leeuw and Heiser (1977). The smoothness restriction could be implemented in other NMDS programs just as easily.
5. Applications in the Attitude Domain 5.1 Study of the 1960 Presidential Campaign During the month preceding the 1960 presidential election, Sherif et al. (1965) assessed pro-Republican and pro-Democratic attitudes in the Pacific Northwest and the Southwest of the United States. The statements used are presented in Table 1. They are “deliberately designed to prevent a ceiling effect, that is a restriction of the range of positions caused by too moderate items at the extremes” (o.c., pp. 27-28). Four tasks were used in this order:
1. To select the one statement most acceptable to the subject; 2. To indicate any other statement(s) not objectionable or also acceptable;
3. To select the one statement the most objectionable; 4.
To indicate any other statement(s) that might be found objectionable.
On the basis of the responses on task (l), subjects were classified according to the position they chose as most acceptable. Next for each statement proportions of endorsement on each of the three other tasks were calculated for all groups. When these proportions are plotted against the rank numbers of the statements in the natural pro-Republican to proDemocratic order (o.c., Figure 2. la), the single-peakedness of the acceptance responses is striking. Also, the rejection responses turn out to be
Heiser
20
Table 1. Statements used in the presidential election study (source: Sherif et al., 1965). The bold-faced phrases are used for labeling the results. A
B
C
D
E
F
G
H
I
The election of the Republican presidential and vice-presidential candidates in November is absolutely essential from all angles in the country’s interests. On the whole the interests op the country will be served best by the election of the Republican candidates for president and vicepresident in the coming election. It seems that the country’s interests would be better served if the presidential and vice-presidential candidates of the Republican party are elected this November. Although it is hard to decide, it is probable that the country’s interests may be served if the Republican presidential and vicepresidential candidates are elected in November. From the point of view of the country’s interests, it is hard to decide whether it is preferable to vote for presidential and vice-presidential candidates of the Republican party or the Democratic party in November (“Middle of the road”.) Although it is hard to decide, it is probable that the country’s interests may be better served if the Democratic presidential and vice-presidential candidates are elected in November It seems that the country’s interests would be better served if the presidential and vice-presidential candidates of the Democratic party are elected this November. On the whole the interests of the country will be served best by the election of the Democratic candidates for president and vicepresident in the coming election. The election of the Democratic presidential and vice-presidential candidates in November is absolutely essential from all angles in the country’s interests.
single-dipped, as one would hope them to be. An internal unfolding analysis will not uncover very much substantially new here, because the data are well understood. Yet it is of considerable interest to see how the smooth approach handles the situation. It may confirm the “natural order’’ of the statements, and provides a way of establishing scale values “by means of the voting only”. Sherif et al.’s
Smooth Order Invariant Unfolding Analysis
21
Table 5.4, of the form statements x groups and based on the answers of 1237 subjects, was translated into column-wise rank numbers, transposed, and analyzed with SMACOF-3B in one dimension. The statements “R. absolutely essential” and “D. absolutely essential” are indeed the only ones that are ever ranked last. So we know for sure that an ordinary nonmemc unfolding program must give us the two-plus-three solution with zero stress. In Figure 4 the SMACOF-3B result is depicted, with the nine subject groups labeled from RRRR to DDDD in an obvious manner. Normalized stress of this solution is 0.025, which must be considered to be almost perfect. The program was run under ordinary conditions; i.e., it used a standard initial configuration in two dimensions (Heiser, 198l), converged to a two-dimensional solution, determined its principal components, and used the first of these as initialization for the iterations in one dimension. So the correct order was recovered without any external help. What is less than perfect is that groups RRRR, RRR, and DDDD are not closest to their most preferred statement; this is the prime reason for the departure of stress from zero. It does not show that for a good appreciation of an unfolding analysis it is necessary to regard the joint scale, and not the object scale and source scale separately. In conclusion, the smooth nonmetric unfolding analysis does not degenerate, and gives meaningful scale values on a joint bipolar scale. Although the minor deviations from the expected pattern certainly deserver further scrutiny, it seems that scalability is confirmed in a rather strong sense. In practical terms: the answers to task (1) are sufficient for predicting the answers to tasks (2), (3) and (4), and for studying further predictions from the judgment-involvement theory, such as assimilation of communications close to one’s own ideal point. 5.2 Power in the Classroom
In the fifties the Research Center for Group Dynamics at the University of Michigan conducted a broad investigation of the social relationships among children in classroom groups. Part of that research was reported by Gold (1958). His study concerned the way in which children attributed power to their peers. From pilot interviews 17 characteristics were selected which appeared as matters of concern in the children’s conversations. These are reproduced in Table 2. Next 152 boys and girls from
Heiser
22
R. absolutely essential
RRRR On the whole served best by R.
Dl
Hard to decide,b u D . Seem better served if D .
On the whole served best by D .
DDDD D . absolutely essem'al
Figure 4. One-dimensional SMACOF-3B analysis of the 1960 U.S. presidential campaign study (source: Sherif et al., 1965).
Elementary School - from about five to twelve years old - were asked, among other things, to say whether each characteristic was important, sometimes important, or not important when he or she had to decide whether or not to do something for another child (in fact, the experimental
Smooth Order Invariant Unfolding Analysis
23
method was somewhat more involved than this description suggests). The children were classified by their age, their sex, and the sex of the target child. Thus eight groups were created: younger boys with male target, younger boys with female target, younger girls with male target, and so on. Gold’s Table 2, containing for each group the rankings by proportion of times each item was rated important, will now serve to illustrate the use of nonmeaic internal unfolding. The primary aim of the original study was to show the existence of priority differences for the eight groups concerned, and to predict group-specific attributions of power characteristics to high power children. Table 2. properties Used in the Classroom Study (source: Gold, 1958). The bold-faced phrases are used for labeling the results. 1 2
3 4 5
6 7 8 9 10 11 12 13 14 15
16 17
Smart at school. Has good ideas about how to have fun. Good at making things. Good at games with running and throwing. Knows how to fight. Strong. Acts friendly. A good person to do things with. Asks you to do things in an nice way. Doesn’t start fights and doesn’t tease. Knows how to act so people will like him. Plays with you a lot. Likes to do the same things you like to do. Nice looking. Has things you’d like to have. Gives you things. Does things for you.
An unfolding analysis enables us to explore in more detail the strucnsre of the priority differences, and to discern how the three factors are operative, if at all. The two-dimensional SMACOF-3B result is presented in Figure 5. The groups are labeled with three-character abbreviations: young/old in the first position, then own sex (male/female), and sex of the target child
Heiser
24
(male/female) last. Normalized stress of this solution is 0.087, which is satisfactory according to current standards. At the left side of the plot we find properties which are universally rejected: smart, strong. making, nice looking. These are ranked low or very low by all groups. Similarly for fight and games, except that these are ranked quite high in the ymf group, the younger boys with respect to girls. At the other side, has things is also generally not very popular; only the yff group gives it an intermediate rank. So systematic group differences seem to be strongest in the direction upper-left to lower-right. No single background factor seems to
0
YPf games
0
goodperson fight
does thi&
o
omf * ylf"
no fights
*ofm yf,f
Oaks
0
making
nicelooking friendly
0
0
has things
gives things
0
strong 0
0
how to act
* * off
0
smart
Figure 5. Joint configuration obtained with SMACOF-3B for the power attribution data (source: Gold, 1958).
have a clear effect. But notice that the four groups at the upper-left side all have a mixed subject-target sex, and the four groups at the lower-right side have a matched subject-target sex. In the former type of group the properties good person, does things, gives things, and perhaps asks are relatively more important, while in the latter type of group same things,
Smooth Order Invariant Unfolding Analysis
25
fun, and how to act (except for ymm) are valued highly. This contrast emerges from the unfolding analysis as the most discriminatory one. Figure 6 presents a typical plot of the fitted values against the ranked data, for the yfm group. Some of the distances in the middle range are not quite what they should have been, but the small and the large ones fit well. The seventh rank is shared by two properties. Their fitted distances are quite different, while the corresponding pseudo-distances are more similar (but not exactly equal; the so-called primary approach to ties has been used, which allows them to become unequal). One should always take effects like this into account when the joint configuration is examined in greater detail. The internal unfolding provides a convenient way to explore the structure of the group differences. Evidently this structure is not onedimensional. Even if we would omit the universally rejected properties from the analysis - which is frequently a good idea anyhow - we would expect to see within the matched sex group close similarity in importance judgments between older boys and older girls, but a marked difference between younger boys and younger girls (in terms of plays and fun versus no fights and how to act). Another way of expressing the group differences is to say that the boys show greater variation of judgment pattern across age and sex target. This does not necessarily imply that there are no differences among the girl groups in an absolute sense, as all our conclusions are dependent on the particular selection of properties used. 6. Discussion A new way of specifying nonmetric unfolding has been presented that
avoids a recurrent problem in the existing methods, i.e., degeneration to uninformative constellations. Mere monotonicity appears to be too weak a requirement for scale construction of this type, but the addition of smoothness restrictions seems to repair most of the damage. The extra restrictions do cause some loss of generality, of course. In particular, if a “non-smooth” configuration would be constructed and used to generate artificial data, then the smooth method will not exactly “recover” the original positions of the points. Hopefully this limitation will not be a severe drawback in actual applications. In Section 3 it was shown that an
Heiser
26
n
-
__
-* 2
--
a
0
E
t
e
I
I
+
__
*
*
T
* 0
*
O
0
0
0
$
0
l
l
I
I
0
*
*
0 8
g
*
8
0
+ * 0 I
1
I
I
I
I
I
I
Figure 6. Fittcd pseudo-distances yfm in power attribution data.
l
(0)
I
and distances (*) for group
unconditional nonmetric analysis is certainly bound to fail under the ordinary specifications. Whether or not the smooth approach has something to offer here as well is unknown, but it does not seem to be quite impossible. A prime diagnostic for the success of the analysis is trying to predict fresh data. Whenever there are enough sources available, we can use the valuable short-cut of splitting the observations into random halves. The conclusions reached from the analysis of the first half can then be crossvalidated on the second half. For unfolding one possibility is to use an internal analysis first, and to keep the object points fixed in the second, external analysis. We should find unimodal responses in the set aside data. The PREFMAP-3 program of Meulman, Heiser, and Carroll (1986) could be used for this in most circumstances. In the case of presence-
Smooth Order Invariant Unfolding Analysis
27
absence data a logistic regression method is to be recommended, as shown in Ter Braak and Looman (1986). A second possibility arises when the unit of analysis is an aggregate of the unit of observation (as in the examples of Section 5). If a “source” provides in fact the mean (or modal, or median) response of a group of subjects, it is feasible to randomly split within groups. Now the cross validation involves the prediction of the omitted observations from our knowledge of group membership (and the ideal points for groups). The measure to use would be the
mean prediction error. Another type of diagnostics follows from the study of the stability of the solution. Again the split-half idea can be employed, but it is more efficient to resample a number of times from all available data (Efron, 1982). This way one gets multiple solutions, from which confidence intervals can be obtained for all parameters of interest. One such resampling strategy, the bootstrap, has been applied to memc unfolding by Heiser (1981, Ch. 6), in NMDS by Heiser and Meulman (1983b), and in nonmetric unfolding by Heiser and Meulman (1983a). As in cross validation, one may sample within or across sources. In the latter case the stability results only pertain to objects, and to stress. In the former case, which unfortunately has never been tried, stability information is obtained for the sources as well, and for the joint relationships. Another resampling strategy, the jackknife, has been adapted for the MDS situation by De Leeuw and Meulman (1986). If applied to unfolding, their method would also yield an assessment of the stability of the joint configuration. The distinction between units of observation and units of analysis can be made with respects to the objects too. That is, we may have some classification of the statements into theoretically homogeneous groups, and apply the above ideas to these groups. Grouping either subjects or statements (or both) into identifiable classes does not only give use the opportunity to start the study of stochastic variability. It will also frequently be helpful for the interpretation of the obtained configuration. It transfers a basic concept that is common in Q-methodology (Stephenson, 1953), generalizability theory (Cronbach, Gleser, Nanda, & Rajaratnam, 1972) and facet theory (Borg, 1976; Guttman, 1959) to unfolding.
28
Heiser
References Barlow, R. E., Bartholomew, D. J., Bremner, J. M., & Brunk, H. D. (1972). Statistical inference under order restrictions. New York: Wiley. Beyle, H. C. (1932). A scale for the measurement of attitude toward candidates for elective governmental office. American Political Science Review, 26, 527-544. Borg, I. (1976). Facetten- und Radextheorie in der multidimensionalen Skalierung. Zeitschrift fiir Sozialpsychologie, 7, 231-247. Borg, I. (198 1). Anwendungsorientierte Multidimensionale Skalierung. Berlin: Springer. Carroll, J. D. (1972). Individual differences and multidimensional scaling. In R. N. Shepard, A. K. Romney, & S . Nerlove (Eds.), Multidimensional scaling: Theory and applications in the behavioral sciences (Vol. 1, pp. 105-155). New York: Seminar Press. Coombs, C. H. (1950) Psychological scaling without a unit of measurement. Psychological Review, 57, 148-158. Coombs, C. H. (1964). A theory of data. New York: Wiley. Cronbach, L. J., Gleser, G . C., Nanda, H., & Rajaratnam, N. (1972). The dependability of behavioral measures: Theory of generalizability for scores and profiles. New York: Wiley. De Leeuw, J., & Heiser, W. J. (1977). Convergence of correction mamx algorithms for multidimensional scaling. In J. Lingoes (Ed.), Geometric representations of relational data (pp. 735-752). Ann Arbor: Mathesis Press. De Leeuw, J., & Meulman, J. (1986). A special jackknife for multidimensional scaling. Journal of Classification, 3, 97-112. Edwards, A. L. (1957). Techniques of attitude scale construction. New York: Appleton-Century-Crofts. Efron, B. (1982). The jackknife, the bootstrap and other resampling plans. CBMS-NFS Regional Conference Series in Applied Mathematics (SIAM Monograph #38), Philadelphia. Ferguson, L. W. (1935). The influence of individual attitudes on construction of an attitude scale. Journal of Social Psychology, 6, 115117. Gold, M. (1958). Power in the classroom. Sociometry, 21, 50-60.
Smooth Order Invariant Unfolding Analysis
29
Guttman, L. (1941). The quantification of a class of attributes: A theory and method of scale construction. In P. Horst et al. (Eds.), The prediction of personal adjustment (pp. 319-348). New York: Social Science Research Council. Guttman, L. (1944). A basis for scaling qualitative data. American Sociological Review, 9, 139-150. Guttman, L. (1947). The Cornell technique of attitude and opinion measurement: The basis for scalogram analysis. In S . A. Stouffer et al. (Eds.), Measurement and prediction (Vol. 4, pp. 46-90). Princeton, NJ: Princeton University Press. Guttman, L. (1959). A structural theory for intergroup beliefs and action. American Sociological Review, 24, 318-328. Heiser, W. J. (1981). Unfolding analysis of proximity data. Doctoral Thesis, University of Leiden. Heiser, W. J. (1985). Multidimensional scaling by optimizing goodnessof-fit to a smooth hypothesis. Internal Report RR-85-07, Dept. of Data Theory, University of Leiden. Heiser, W. J. (1986). Shifted single-peakedness, unfolding, correspondence analysis, and horseshoes. Internal Report RR-65-05, Dept. of Data Theory, University of Leiden. Heiser, W. J., & De Leeuw, J. (1981). Multidimensional mapping of preference data. Mathematiques er Sciences Humaines, 19, 39-96. Heiser, W. J., & Meulman, J. (1983a). Analyzing rectangular tables by joint and constrained multidimensional scaling. Journal of Econometrics, 22, 139-167. Heiser, W. J., & Meulman, J. (1983b). Constrained multidimensional scaling, including confirmation. Applied Psychological Measurement, 22, 139-167. Hinckley, E. D. (1932). The influence of individual opinion on construction of an attitude scale. Journal of Social Psychology, 3, 283-296. Hovland, C. I., & Sherif, M. (1952). Judgmental phenomena and scales of attitude measurement: Item displacement in Thurstone scales. Journal of Abnormal and Social Psychology, 47, 822-832. Kruskal, J. B. (1964a). Multidimensional scaling by optimizing goodness of fit to a nonmetric hypothesis. Psychometrika, 29, 1-28. Kruskal, J. B. (1964b). Nonmetric multidimensional scaling: A numerical method. Psychometrika, 29, 115-129.
30
Heiser
Kruskal, J. B. (1977). Multidimensional scaling and other methods for discovering structure. In K. Enslein, A. Ralston, & H. S . Wilf (Eds.), Statistical methods for digital computers (Vol. 3, pp. 296-339). New York: Wiley. Kruskal, J. B., & Carroll, J. D. (1969). Geometrical models and badnessof-fit functions. In P. R, Krishnaiah (Ed.), Multivariate analysis II (pp. 639-671). New York: Academic Press. Kruskal, J. B., Young, F. W., & Seery J. B. (1973). How to use KYST, a very flexible program to do multidimensional scaling and unfolding. Internal Report. Bell Laboratories, Murray Hill, NJ. Levine, M. V. (1970). Transformations that render curves parallel. Journal of Mathematical Psychology, 7, 410-443. Levine, M. V. (1972). Transforming curves in curves with the same shape. Journal of Mathematical Psychology, 9, 1-16. Likert, R. (1932). A technique for the measurement of attitudes. Archives of Psychology, no. 140. McGuire, W. J. (1984). Seach for the self Going beyond self-esteem and the reactive self. In R. A. Zucker et al. (Eds.), Personality and the prediction of behavior (pp. 73-120). New York: Academic Press. McGuire, W. J. (1985). Attitude and attitude change. In G.Lindzey & E. Aronson (Eds.), Handbook of social psychology (3rd ed., pp. 233-346). Reading, MA: Addison-Wesley. Meulman, J., Heiser, W. J., & Carroll, J. D. (1986). PREFMAP-3 User’s guide. Technical Memorandum, AT&T Bell Laboratories, Murray Hill, NJ. Muller, M. W. (1983). Multidimensional unfolding of preference data by maximum likelihood. Doctoral Thesis, University of South Africa, Pretoria. Pintner, R., & Forlano, G. (1937). The influence of attitude upon scaling of attitude items. Journal of Social Psychology, 8, 39-45. Roskam, E. E. (1979). A survey of the Michigan-Israel-Netherlandsintegrated series. In J. Lingoes, E. E. Roskam, & I. Borg (Eds.), Geometric representations of relational data (pp. 289-3 12). Ann Arbor: Mathesis Press. Sherif, M., & Hovland, C. I. (1961). Social judgment: Assimilation and contrast effect in communication and attitude change. New Haven: Yale University Press.
Smooth Order Invariant Unfolding Analysis
31
Sherif, C. W., Sherif, M., & Nebergall, R. E. (1965). Attitude and attitude change; the social judgment-involvement approach. Philadelphia: W. B. Sounders Comp. Stephenson, W. (1953). The study of behavior: Q-technique and its methodology. Chicago: University of Chicago Press. Ter Braak, C. J. F., & Looman, C. W. N. (1986). Weighted averaging, logistic regression and the Gaussian response model. Vegetutio, 65, 3-11. Thurstone, L. L. (1929). Theory of attitude measurement. Psychological Review, 36, 222-241. Thurstone, L. L. (1931). The measurement of social attitudes. Journal of Abnormal and Social Psychology, 26, 249-269. Thurstone, L. L. (1959). The measurement of values. Chicago: University of Chicago Press. Thurstone, L. L., & Chave, E. J. (1929). The meusurement of attitude. Chicago: University of Chicago Press. Torgerson, W. S. (1958). Theory and methods of scaling. New York: Wiley. Young, F. W. (1981). Quantitative analysis of qualitative data. Psychometrika, 46, 357-387.
This Page Intentionally Left Blank
New Developments in PsychologicalChoice Modeling G. De Soete, H.Feger and K. C. Klauer (eds.) 0 Elsevier Science Publisher B.V. (North-Holland), 1989
33
AN ANALYTICAL APPROACH TO UNFOLDING Hubert Feger University of Berlin, FR Germany An analytical procedure is described which determines in the oneand multidimensional case the positions of all boundary hyperspaces relative to each other. Thereby the position of all ideal points and object points in a set of isotonic regions is also determined as uniquely as possible. A criterion is given to determine the minimum number of dimensions necessary to represent the observed rank orders perfectly. Furthermore, rules are offered to derive quantitative information for solutions with one or more dimensions. How to treat data with error is explained and illustrated by examples. The discussion emphasizes the constructive - not approximate - nature of the solutions and the meaning of the dimensionality of a solution.
1. Notation Let A , . . . , N be a set of objects. A data source a , . . . ,n rank orders these objects with respect to preference, closeness, or similar criteria. A rank order ABCD of these objects indicates that A is preferred to B , B to C, etc. Transitivity is implied. The purpose of the unfolding procedure (Coombs, 1964) is to find a representation of a , . . . ,n and A , . . . ,N in a common (Euclidean) space, in which the data sources and the objects are represented as points. The notation of these points will be written with the same letters as the data sources or objects. The a , . . . , n are sometimes called ideal points, and the A , . . . , N the object points. The distances of all a , . . . ,n to all A , . . . , N have to be chosen such that the following monotone relation holds for every source: the stronger the preference of this source for an object, the shorter the distance between the points representing this source and this object. The position of an ideal point is characterized by an isotonic region in the representational space. An isoronic region is a subspace in which the ideal point a can assume every position without changing the rank order
34
Feger
of the distances of a to A , . . . ,N. There are open and closed isotonic regions. An isotonic region is closed if it is surrounded by a boundary in every direction of the space. A boundary between A and B , written A I B , is a subspace dividing the space such that on the A side of A I B all points are closer to A , and on the B side closer to B . If the space is one-dimensional, i.e., k = 1, the boundary has been called the working midpoint; for k = 2, the boundary line (identical with the mid-perpendicular of the line segment AB), or more generally for k > 0, the boundary is a hyperplane. Data may be complete or incomplete. They are defined as complete if (1) no rank order contains any ties, (2) there exists a specified number of dimensions, k, such that for every rank order observed there exists an isotonic region in a k-dimensional space, and for every smallest isotonic region (that is not divided by a boundary) a rank order has been observed. Further, the researcher interprets observations as data with or without error. For data with error quite often a representation is desired with a smaller number of dimensions that would be necessary for a representation for which an error free reproduction of the data would be possible. The spatial representation of rank orders is often called a solution. We will call a representation a qualitative solution if all minimal regions are known. We call a solution quantitative if information about the relative size of distances is made explicit.
2. A Criterion for Determining the Number of Dimensions The order information contained in all observed rank orders can completely be written as a set of contingency tables. One may count how often A precedes B (AB) and how often B precedes A (BA) in all rank orders. Further, one may count how often AB appears together in the same rank order with A C , or BA and A C , or AB and C A , or B A and C A are observed together. For a k-dimensional solution, one has to consider all combinations of sequences in k + 1 pairs. Thus, for k = 1, all combinations of two pairs are analyzed, leading to fourfold tables of the type:
An Analytical Approach to Unfolding
35
AB CA AC
BA
E€l
With N objects in k dimensions, one has to establish
1
contingency
tables. In a one-dimensional solution, every pair of boundaries defines one closed and two open isotonic regions, thus three regions. A contingency table with two pairs of sequences provides four cells, every cell describing a combination of sequences. If all four cells contain a frequency > 0 then a k = 1 solution is not possible. At least one cell should have a frequency 0 indicating that a certain kind of rank orders was not observed. In general, to represent a set of rank orders perfectly in a kdimensional space, there has to be a cell with zero frequency in every contingency table. This criterion is necessary but not sufficient because, as will be seen, every contingency table with a zero frequency implies the existence of certain isotonic regions. But the regions implied by different contingency tables will in general not be compatible. This may be demonstrated by the impossibility to find a one-dimensional solution for the rank orders ABC,BCA, CAB.
3. Isotonic Regions in the One-Dimensional Case
If the frequency count leads to a contingency table of the form: AB
AC CA
BA
El3 zero
this indicates that no isotonic region is necessary in which A is closer to an ideal point than B , and at the same time C is closer than A, i.e., the rank order CAB has not been observed. Let the other cells of the table not be empty. Then to provide an isotonic region for all rank orders observed the configuration of the boundaries A I B and A I C has to be
Feger
36
written A I B - A I C, with the B-side of A I B oriented to the A-side of A 1 C. The rule is: if the cell with the sequences AB, CA has a zero frequency then A I B and A I C define a closed isotonic region with the second letters of the pairs as those sides of the boundaries facing each other. The rule also holds for sequences in which the pairs are defined by four different letters, e.g., AB and CD. If a contingency table of the form
AB
BA
is observed, this implies A I B - C [ A with an impossible position for A, indicating a violation of the unfolding model. 4. Implications of Boundary Positions Three points extension: Let A I B - A I C be derived from a contingency table. Then A I B - A I C - B I C follows. Let A I B - B I C be derived. Then A I B - A I C - B I C is implied. Proof for the first part: A I B - A I C localizes A to the A side of A 1 B and C to the C side of A 1 C. But C has to be more “to the right” than B because A I B precedes A [ C. And B has to be on the B side of A I B. Then the only sequential order of A , B, C, can be A - B - C necessitating A I B - A I c - B I C. Four points extension: Let A I B - C I D and A I C - B I D be derived. Then (1) A I B - A I D - C 1 D and A I C - A I D - B 1 D, and (2) A I B - (B 1 C or C 1 B)--A I D and A 1 C - (B I C or C I B ) - B 1 D. The orientation of A I D to B I C is independent (quantitative) information.
Proofi Only the following four positions of A I B, A I C, B I D, C I D relative to each other are possible: 1. A I B - A
Ic -B I D
- c I D,
An Analytical Approach to Unfolding
37
Ic - A I B - C [ D - B I D , A I B - A Ic - c I D - B 1 D,
2. A 3.
4. A I C - A I B - B I D - C I D .
In all four positions, the three points extension implies the validity of the four points extension. E.g., in (1) A I B - A I C results in A I B - A I C - B I C, and B 1 D - C I D leads to B I C - B ID - C ID, therefore A I B - A 1 C - B Ic-B I D - C [ D . Further Ale- B ID in (1) implies A I B - A I D - B ID. Similar arguments can be given for the other positions. From these implications it can be derived that any procedure to find a solution (including algorithms to find an optimal solution in the presence of error) by additively combining solutions for partial sets of points will fail in general. Consider the case of three boundaries ordered A I B - A I C - B I C, implying A - B - C as the solution. To be compatible the analysis of the pairs AD, BD, CD should lead to the following results: e
*
*
ORDER BOUNDARIES (1) A I D - B I D (2) A I D - C I D ( 3 ) B I c - c (D
IMPLIED ORDER
SOLUTION
AIB-AID-BID AIC-AID-CID
A-B-D A-C-D
B(C-B(D-C(D
B-C-D
resulting in the overall solution A - B - C - D with A - B - C, the same as inferred from A I B - A I C - B I C. Thus, the position of boundaries is determined directly as well as indirectly.
5. Quantitative Information in the One-dimensional Case Two results to extract information on the relative size of some distances between object points for k = 1 will be given. The first rule is new, the second is an extension and a proof of work initiated by Coombs (1964).
Rule I : Let all those minimal regions in which point C could be located be placed on the A side of A I B . Then AC c BC. The proof is mvial; the example for complete data will show that rule I is independent of rule II.
38
Feger
Rule II: Let this orientation of boundaries: A I B
-C
ID
be derived from the data. Then the distance between the points that are oriented towards each other - the two inner points - is shorter than the distance between the two other (outer) points, i.e., BC < AD.
Proof: Three cases will be distinguished. (1) The boundaries are defined by three different points, and the inner points are identical, e.g. A I C - B I C . Since AA = BB = * * * = NN = 0 in Euclidean space, rule I1 is valid. Case (2) is as case (1) except that the two inner points differ, e.g., A I B - A I C . Then A I B - A I C - B I C is implied as is the order of the points in the solution: A - B - C , therefore AC > AB, BC. ( 3 ) The boundaries are defined by four points, e.g., A I B - C I D . With this orientation, the following sequences of object points are compatible: a. A - B - C - D , b. A - C - B - D , C.
A-C-D-B,
d.
C-A-B-D,
(a) and (b) imply BC < AD. In (c) A I B - C I D implies
1 -AB < AC 2
1 + -CD, 2
because CD is bisected by C I D. Thus
AB < 2AC
+ CD,
AB-AC
An Analytical Approach to Unfolding
39
6. A One-dimensional Example for Complete Data Table 1 reports 20 rank orders of N = 7 objects. All 210 contingency tables contained one cell with frequency zero, e.g.: AB
AC CA A IB - A
lP--M zero
I C implyingA I B - A I C - B I C;
BC CB A IB - B
BA
+l
AB
BA
zero
I CimplyingA 1 B - A I C - B I C ; BC
AC CA
CB
EEI
A I C - B I CimplyingA I B - A I C - B I C. All betweenness informations are consistent and lead to a qualitative solution represented in Figure 1. Table I . Rank orders for the complete one-dimensional
example ABCDEFG BACD EFG BCADEFG CBADEFG CBDAEFG CDBAEFG DCBAEFG DCBEAFG DCEBAFG DECBAFG
D ECBFAG EDCFBAG EDFCBAG EFDCBGA EFDCGBA EFDGCBA FEDGCBA FEGDCBA FGEDCBA GFEDCBA
40
Feger
Some contingency tables may contain more than one zero frequency, e.g., the one for the pair AG, D F
AG
GA
GA, DF leads to G I A - F I D , while AG, FD implies A I G - D I F . To resolve this seemingly inconsistency, both boundaries are located at the same point.
Figure 1. Qualitative solution for the complete one-dimensional example. The parenthescs refer to the rank order in Table 1 that is located in this isotonic region. In two cases, boundaries are located at the same point.
A general rule to identify the position of object points (applicable with
k > 1 as well) is to assign the letter of one side of a boundary to all isotonic regions of this boundary. This is done for both sides of a boundary and for all boundaries. If a letter is used x times, then all regions containing this letter x times are admissible locations of this object point; e.g.:
An Analytical Approach to Unfolding
BOUNDARIES:
COUNTS: LOCATIONS:
41
A
B
A A B A
B A B
B
B
B
-
A
C
B
C B
In the example with complete data, one finds the sequence A - B
-C -D
- E - F - G as the qualitative solution. To find a quantitative solution one may use all inequalities mentioned above but also the additivities implied in the qualitative solution, e.g.: AC=AB+BC,
+ CD,etc.
AD = AC
We report one solution for the combined system of inequalities and additive equations in Figure 2. 3
2
1 I
A
I
B
5
4
6
I
I
I
I
I
C
D
E
F
G
Figure 2. A quantitative solution for the one-dimensional complete data. The size of the intervals is not drawn in proportion to the magnitude of the values.
7. Data with Error: the Grade Expectations Example
When applying scaling models researchers quite often want to represent data in fewer than the number of dimensions required for a complete reproduction of all data. But the degree of correspondence between model and data should nevertheless be as close as possible, at least with respect to a given criterion. This criterion guides the choice among those representations which are possible in the space of reduced dimensionality. The criterion determines which parts or aspects of the observations may be neglected in the solution and thus be declared as “error”. The algorithm described below uses two criteria, an overall goodness of fit S and a fit s required to be satisfied for every isotonic region. While a variant of S is used in all MDS computer programs, s is introduced here
Feger
42
to give interpretations of details of a solution a h base. The algorithm is illustrated by the grade expectations data (Coombs, 1964, p. 92). In all analyses 62 rank orders are taken into account which include rank orders with ties but exclude rankings with zero frequency (e.g., number 18 in Coombs' Table 5.4). For theoretical reasons a onedimensional solution has to be found. For each of the m minimal regions m
s < 6, and S = C s 5 40. What is counted in s and S is the number of inversions necessary to transform the observed rank orders into those rank orders derived from the solution. The contingency tables show that an error free one-dimensional solution is not possible. The contingency table AC
CA
BA
can be interpreted as a matrix containing weights for arcs of an path between boundaries. E.g., the cell AB, CA represents the error free partial solution A I B A I C , while the cell BA, AC leads to A I C A IB with error weight 4. This means that 4 observed rank orders have to be changed from BAC * * * to, e.g., ABC - * * or BCA * * * to perfectly fit the solution A I C - A I B. Fixing s < 6 prevents a solution 52 A IB C I A which is not permitted anyhow, and B I A A I C is also not allowed. The contingency table now becomes a matrix AC
CA
in which unacceptable values are deleted. Considering BC
and
CB
An Analytical Approach to Unfolding
43
CB
BC
AC CA
Wl
and combining all three contingency tables to one matrix yields: AC
CA
CB
AB BA AC
corresponding to admissible partial paths: A IB
0 A I c, A 1 B 0B 1 C, A 1 c 4 A 1 B, A 1 c 0B 1 C.
From this information the algorithm searches for a graph in form of a path of which the nodes are the boundaries. Each boundary occurs exactly once in this path. The arcs of the path carry weights of which the sum should be the minimum S possible. In the example one finds 0 0 A IC B I C. A I B - B I C is replaced by the combination A IB A 1 B - A I C and A I C - B I C because its weight is not larger than the 4 0 sum of the parts replacing it. A I C A I B - B I C is not acceptable for two reasons: (1) it is not minimum; (2) AC < AB is implied in contradiction to A - B - C which is also implied. Continuing by including the boundaries A 1 D , B 1 D , C 1 D the matrix is: AC AB BA AC AD DA BC BD
with the shortest path:
CA
CB
DA
DB
DC
BC
Feger
44
Choosing AB, CA is neglecting BA, AC; and choosing AD, CB rejects DA, BC. With this in mind, one cell in every row and column is selected to build the path; the cell is indicated by an asterisk. The problem of finding a solution for data with error has thus been transformed to finding a shortest path for which algorithms and computer programs exist. The size of the problem has been considerably reduced by using a small value for s. This might prevent finding a globally optimal solution; if such a solution should be the goal then s = S might be chosen. But this increases the possibility - as will be shown later of obtaining solutions of which some details are systematically distorted.
-
AIF -
-
AIF
-
BIC - BID
-
C: ..BIE - BIF - CIE - AIG - CIF - BIG - DIE CIG.. A: ..BlE - CIE - BIF - AIG - CIF - DIE - BIG CIG.. M : ..BIF - CID - CIE - CIF - AIG - DIE - DIF EIF..
-
C: AIB CID.. A: AIB CID.. M: AIB BIE..
-
AIC - BIC
-
AID - AIE - BID
-
AIC - BIC
-
AID
-
AIE
-
EID
-
AIC - AID - AIE
-
AIF
-
C: ..DIF - EIF - DIG A : ..DIF - EIF - DIG M: ..BIG - CIG - DIG
-
-
EIG - FIG EIG - FIG EIG - FIG
Figure 3. Solutions for the grade expectations data. C refers to the solution given by Coombs (1964, Table 5.8 with 6 = 2). A refers to the solution obtained by the algorithm described in the text. M reports the MINIRSA solution (author of this MDS pmgram is E. Roskam,-University of Nijmegen; Rev. version Febxuary 1971 was used; d = .I06with 200 iterations, SFORM2).
A n Analytical Approach to Unfolding
45
A solution (found by hand) is given in Figure 3. It is slightly better than the one reported in Coombs (1964) but only in terms of the minimum number of inversions. Coombs sought the maximum number of observed rankings to be reproduced by the solution and his solution is better in this respect. While Coombs was looking for one majority of Ss to support a solution, the algorithm tried to find a large majority for every structural decision and exploits alternating majorities. MINIRSA is a multidimensional scaling program carefully developed and tested by E. Roskam. It was started here with k = 3. For k = 1 a stress equivalent of d = 0.106 was obtained. As other solutions in Figure 3 show, better solutions are possible. Part of the MINIRSA solution is * A I F - B I C * . * with the implication BF < AC. The corresponding contingency table is
,.
BC
CB
AFB FA
clearly demanding B I C - A I F with AC < BF, as implemented in the two other solutions. This may be considered a serious systematic distortion (the same is true, e.g., for B I F - C I 0). All three solutions agree in the determination of the qualitative J-scale as A - B - C - D - E - F - G. All evidence in all contingency tables points toward this solution. In contrast, any evidence for a quantitative J-scale is weak, especially for comparisons of the intervals defined by adjacent points in the J-scale. Often, only one rank order determines whether XY < or > YZ. The implications of the boundary positions (the three and four points extensions) prove to be very valuable for the analysis of incomplete data, such as the grade expectations data. They can also be used in constructing the admissible paths within the algorithm to find a solution for data with error.
8. Isotonic Regions in the Multidimensional Case For k = 2, a cell in a contingency table is defined by three pairs. If these three pairs are related to only three points, e.g., AB, AC, BC, then the configuration of the boundaries A I B, A I C, B I C forms a “star”, *ABC,
46
Feger
because A I B y A I C, B I C are mid-perpendiculars of the triangle ABC intersecting in one point. There are four topologically different possibilities how these boundaries may intersect (see Figure 4), only one is compatible with the unfolding model. This one - Figure 4 I - provides six smallest open isotonic regions with boundaries that differ with respect to the points facing each other. All incompatible configurations contain at least one region representing intransitive preferences, e.g., in Figure 4 I1 the region marked with an X corresponds to C > A, B > C, but A > B.
A1 B
iV
Ill A 6
E A
A B
B A
Figure 4. Topologically different configurations of the bqundaries AIB,AlC,BIC.
An Analytical Approach to Unfolding
47
If the three pairs defining a cell are related to four or more points, e.g., AB, AC, AD, then A I B, A I C, A I D form a boundary niangle (BT). To differentiate between different forms of a BT the orientation of a boundary is defined. If this orientation is important, A 1 B means that A is oriented outwards, B I A that B is oriented outwards and A inwards (see Figure 5). The cell with zero frequency in the contingency table determines the form of the BT. If this cell is, e.g., BA, CA, D A then the form is B I A, C I A, D I A - the A-side inwards (see Figure 6).
A is outwards B is inwards
A is inwards B is outwards
Figure 5. Two different orientations of a boundary in a boundary triangle.
Coombs (1964, Fig. 7.3) reports 12 rank orders for four points on a circle. Contingency tables in this case - three or more points on a circle - contain more than one cell with zero frequency. E.g., for the Coombs data, the cells AB, CA, AD and BA, AC, D A are zero frequency cells.
48
Feger
Every boundary is oriented inwards and outwards. Then these boundaries intersect in one point. Observed rank orders: (1)
(2) (3) (4)
ABCD DABC CABD DCAB
(5) (6) (7)
BACD BDAC CBAD
Contingency table for A I B, A I C , A I D:
ABACAD DA CA AD DA BA AC AD DA CA AD DA Form of the boundary triangle:
A'
Figure 6. Identification of the form of a boundary triangle.
If three boundaries form a BT, this may be a minimal region or not; if not, it is decomposable and the decomposition leads to information on the position of intersection points relative to each other. A BT is not a minimal region if it contains a pair of boundaries which have one point in common, and this point is for both boundaries oriented inwards or for
An Analytical Approach to Unfolding
49
both boundaries oriented outwards. In these cases, another boundary intersects the BT. Let the derived form of a BT be B I A , C I A , D I A. For the pair B I A , C I A the point A is inwards, i.e., B I A - A I C which implies B I A - B I C - A I C . Thus B 1 C intersects with A I D , written ADIBC. This intersection lies between the intersection of A I D with A I B and with A I C , i.e., between *ABD and *ACD. This is written *ABD - A D I B C - *ACD. From the pair B ] A , D I A one derives *ABC - A C I B D - *ACD; from the pair C [ A , D I A it is *ABC - ABICD - *ABD. Let the form of a BT be A I C , B I D , C I D. From B I D - B I C - D I C the intersection of BC and AC, thus *ABC is obtained, and *ACD - *ABC - ACIBD is the information on the location of the intersections. 9. Quantitative Information in the Multidimensional Case Rule I : If a point A is located exclusively in those isotonic regions for which “X closer than Y” is true, than AX < AY. This is true for a space with an arbitrary number of dimensions and the proof is trivial. An illustration is given for a BT of the form A I B , A I C , D I C. C I D does not intersect those open regions in which A is located, therefore AC < AD. Rule II (comparison of diagonals): Let the following relative position of stars be observed BCD
*
ACD
*
I
i *
ABD
which implies this intersection of two pairs of boundaries:
then the distance between the inner points is shorter than the distance
Feger
50
between the outer points (here: AC < BD).
To demonstrate the validity of this rule a well-known fact is used. The location of the intersection of the mid-perpendiculars is inside a triangle if all angles are acute, it is on the hypotenuse if one angle has go', and it is outside the mangle if it has a flat angle. Let the quadrilateral be a rectangle with diagonals AC = BD. Then extend BD in the direction of B:
The JABC becomes acute, and *ABD moves toward B while *ACD 1
remains at -AC. Because ;UBCD becomes flat, *BCD moves toward A, 2 and because JBAD becomes flat *ABD moves toward C. Then the assumed configuration of stars results. This is also true if BD is extended in the direction of D. On the other hand, if AC is extended in either or both directions
ABC ABD
*
c
*
ACD
*
BCD
results.
Rule 111 (comparison of opposite sides in a quadrilateral): Let the observed configuration of stars and two other intersections be:
An Analytical Approach to Unfolding
51
ABC
BCD
*
*
ABD
AB/CD
then AB < C D and BC < AD. From ADIBC two BT may be constructed, one with C I D , passing through *BCD and *ACD, the other with A I B , passing through *ABC and *ABD. The boundary closer to ADIBC, i.e., C I D represents a side equal to C D that is longer than the one represented by A I B which is AB. To see the validity of this rule one may start with a parallelogram ABCD in which A I D and B I C as well as A I B and C I D are parallel. If C and D move toward each other A I D and B I C incline toward each other over A I B to form the BT A I B , D I A , B I C.
10. Constructing a Multidimensional Solution A solution is complete, i.e., all qualitative and quantitative information has been retrieved from the data assuming the model is valid, if the position of all intersections of boundaries relative to each other is known. All intersections in which A I B participates lie on the same line - A I B , of course. The order in which they are located on A I B can be inferred from all BTs with A I B. A solution in k = 2 consists of the set of all boundaries and the information on the positions of the intersections on their boundaries. This will be demonstrated using an example of Coombs (1964, p. 164 - his Figure 7.8 does not contain all intersections). First, the positions of all intersections on A I B are determined, considering all BTs with
AIB.
Feger
52
From (1) A I B , A I C, D I A one derives *ABC - *ABD - A B / C D From (2) A I B , A I C , E I A one derives *ABC - *ABE - ABICE From ( 3 ) A I B , A I C, D I B one derives *ABC - *ABD - ABICD which is the same information as obtained from (1); the comparison of (1) and (3) thus provides the first consistency check. All information for A I B combined leads to: *ABC
- *ABD
- ABICD - *ABE
- ABICE - ABIDE.
Every other boundary cuts A I B exactly once, at that point, of course, where its intersection with A I B is located. This makes it possible to construct the complete lattice of boundaries. To construct a solution in k > 2 the following decomposirion rule may be used. If the analysis leads to the conclusion that a k-dimensional space is needed (or preferred) to represent the data then the zero (or minimum) cell of a contingency table is defined by k + 1 pairs of points. Then every k-tuple of these pairs can be selected and represented as a configuration of boundaries in a k-dimensional space as usual. E.g., let k = 3 and the zero cell be AE, BE, C E , DE. This is equivalent to four boundary triangles (1) A I E, B I E, C I E, (2) A I E, B I E , D I E , ( 3 ) A l E , C I E , D I E , (4) B I E . C I E , D I E . Of course, A I E , B I E, C I E may be decomposed to three lines A I E - E I B, A I E - E I C, B I E - E I C. From the four boundary mangles the threedimensional configuration of the points can be inferred to be a tetrahedron containing E as an inner point. For data with error and k 2 2 one strategy is to find all acceptable solutions for k = 2, then search for the optimal combination of twodimensional spaces to form a solution in k = 3, etc. To find the best fitting solution in k = 2 one first determines for ever- boundary line separately the acceptable sequences of intersections and than tests for compatibility. As a small example for k = 2 and data with error a reanalysis of the McElwain and Keats data (see Coombs, 1964, p. 175, for the data) will be reported. The authors collected 304 rank orders of childrens preferences for four radio stations A , B, C, D . A solution in k = 1 leads to many errors. With N = 4 objects 16 boundary triangles (and four stars) are to be determined. E.g., for the pairs AB, AC, AD the cell with the lowest
An Analytical Approach to Unfolding
53
frequency is BA, AC, DA with s = 1. The corresponding boundary triangle is
implying for the line AC: *ABC - ACIBD - *ACD. The same sequence of intersections on the line AC is implied by CA, BC, D C with s = 0. The sequences of intersections on all lines that were selected because the s-values were lowest and all sequences were compatible to form a solution are:
AB: *ABD - *ABC - ABICD AC: *ABC - ACIBD - *ACD AD: *ABD - *ACD - ADIBC BC: *ABC - *BCD -ADIBC BD: *ABD - ACIBD - *BCD C D : *ACD - *BCD - A B I C D This leads to the solution A-B
I I
D-C
with AC < BD, AB < C D , AD < BC; which is equivalent to the one found by McElwain and Keats: only two (DBAC, DCBA) of the 304 rank orders are not represented in the solution (McElwain and Keats do not explicitly state the quantitative informations).
54
Feger
11. Discussion What determines the dimensionality of the solution space? A set of rank orders may be characterized by those conditions it satisfies. E.g., ABC, BAC, BCA, CBA satisfies the condition: (A,C)B is empty. (A,C)B means either AC or CA preceding B. The points in parentheses will be called conditionals, and their number will be denoted by c. Data for which a k = 1 solution exists satisfy one or more conditions of the type with two conditionals. Data fitting a k = 2 space satisfy one or more conditions of the type “ ( A , B , C ) D is empty” or “(BC,D)A is empty” and “(BA,D)C is empty” for the data in Coombs (1964, p. 157). In general, the number of dimensions necessary for an error free representation is c - 1. With increasing c the restraints on the data are relaxed, i.e., more and more rankings are compatible with a solution. Expressed differently, the dimensionality of a solution is an index of agreement among the rank orders. The agreement is not identical with an average of Kendall’s tau or his coefficient of concordance W . The kind of agreement is expressed by the formula given above indicating which objects will not be preferred under the condition stated. One may, of course, characterize the positive side of the agreement, e.g., for data perfectly represented in k = 1 there exists one object in every triple which in all rankings is preferred to at least one of the two other objects. There is no necessity to represent dimensions in a solution as ares onto which the points project their positions. It is in this respect that the present approach departs fundamentally from earlier multidimensional procedures, including the one developed by Coombs and his coworkers. But Coombs’ basic idea is maintained: the essence of a solution is the configuration of isotonic regions. The present approach allows an exact specification of what is determined by the interaction of the data and the model, and what kind of information is not available. The problems of degeneracy are thus transformed to the task of listing all possible variants of solutions. The well known Monte Carlo studies on the recovery of preestablished solutions or on the uniqueness of representations only generate an impression of the extent to which an algorithm might fail. And these studies are difficult to evaluate because they used approximate algorithms. It has to our knowledge - never been shown and it is probably not true in
An Analytical Approach to Unfolding
55
general that these approximate procedures generate solutions containing all informations which definitely are uniquely determined by the model and the data. A procedure was outlined to handle data with error. It may again be pointed out that “error” in this case results from the desire of the analyst to use fewer dimensions than necessary, not from lack of agreement in repeated measurement. Of course, other criteria for the optimality of a solution may be used than the one offered here. But to minimize the stress as in MDS programs does not, as was demonstrated in an example, prevent serious distortions - especially of the quantitative aspects of a solution - which a researcher cannot detect. The result of the analytical approach is a statement about which topological configurations of the points in a Euclidean space are compatible with the data if the model is assumed to be valid. Usually, further quantitative information not implied in the topological structure can be derived. But a numerical representation, e.g., of the coordinates of the points is not given. To offer just one could be misleading; the isotonic regions of a solution give instead some limits for admissible sets of numerical representations, and that is what the model can provide without additional assumptions.
References Coombs, C. H. (1964). A theory of data. New York: Wiley.
This Page Intentionally Left Blank
New Developments in PsychologicalChoice Modeling G.De Soete, H. Feger and K. C. Klauer (eds.) 0Elsevier Science Publisher B.V. (North-Holland),1989
57
GENFOLD2: A GENERAL UNFOLDING METHODOLOGY FOR THE ANALYSIS OF PREFERENCE/DOMINANCE DATA Wayne S. DeSarbo University of Michigan, U.S.A. Vithala R. Rao Cornell University, U.S.A. This paper is a brief description of the GENFOLD2 methodology which is a set of multidimensional unfolding models and algorithms for the analysis of preference or dominance data (cf. DeSarbo & Rao. 1984, 1986). GENFOLD2 allows one to perform internal or external analyses, constrained or unconstrained analyses, conditional or unconditional analyses, metric or nonmetric analyses, as well as providing the flexibility of specifying and/or testing a variety of different types of unfolding-type preference models including simple, weighted, and general unfolding analysis. An alternating leastsquares algorithm is utilized in the estimation of the specified parameters. The melhodology is illustrated in this paper with a set of preference data for over-the-counter pain relievers. Some future research directions are also identified. 1. Introduction
From a managerial perspective, MDS methods are typically used to provide descriptions of preferences and/or perceptions of a sample of consumers toward a set of items in a product category. While these methods can assist in identifying “best” locations for existing or new products in the perceptual space, they offer little guidance on how to specifically alter This paper is a revised version of an article published in Zeitschriftfur experimentelle und angewandfe Psychdogie, 1988.35.
58
DeSarbo & Rao
existing products or design new products. This problem of “reverse transformation” or making inferences about desired product attributes from an inspection of the resulting MDS map has limited the use of MDS methods and has plagued applied researchers for some time; see Green (1975) for a discussion of this issue. Note that, in general, there will not be a unique “reverse mapping” since many combinations of product features and other marketing mix attributes may map into a specific perceptual product position. The objective of this paper is to present the GENFOLD2 methodology (DeSarbo & Rao, 1984, 1986) developed to address the “reverse mapping” problem in the context of spatial analyses of preferential data. GENFOLD2, GENeral UnFOLDing Analysis-Version 2, which, using the Carroll and Arabie (1980) classification, analyzes two-mode, polyadic, two-way, ratio or interval or ordinal scale, unconditional or conditional (assumptions concerning the comparability of the data), complete data. GENFOLD2, like traditional unfolding models, is a spatial, distance model which allows for the estimation of two sets of points in the same space, allowing for a variety of different model specifications. GENFOLD2 is an improved, modified version of GENFOLD (DeSarbo & Rao, 1983), utilizing a more efficient algorithm and providing joint space solutions which are “nondegenerate.” One particular option involving the reparameterization of stimulus and/or row coordinates enables the researcher to “manipulate” the derived spaces in answering various questions of relevance to applied work. We first review the relevant literature on the analytical problem of unfolding. GENFOLD:! is then presented in some detail and the algorithm employed in the estimation of the parameters is discussed. The GENFOLD2 methodology is illustrated with a small set of data on preference judgments for over-the-counter pain relievers. Finally, some directions for future research are discussed. 2. Brief Review of Literature
The literature on preference models has focused on two distinct types of spatial models - Tucker’s (1960) vector model and Coombs’ (1964) unfolding model. Both models assume that subjects arrive at their preference judgments by considering a multidimensional set of stimulus
GENFOLDZ
59
characteristics, but differ in their assumptions about how subjects combine stimulus information to arrive at a judgment. Davidson (1972, 1973) and Carroll (1972, 1980) compare these two types of models and discuss the assumptions and implications of each. Examining the unfolding-type (distance) spatial models, Bennett and Hays (1960) first generalized Coombs’ (1950) unidimensional unfolding model to the multidimensional case using the Euclidean distance memc. Here, subjects are represented as ideal points in the same multidimensional space of stimuli. Several authors have proposed algorithms for estimating stimulus scale values and ideal point coordinates from preference judgments assumed to follow the unfolding model (Lingoes, 1972, 1983; Bennett & Hays, 1960; Roskam, 1973; Young & Torgerson, 1967, Kruskal, Young, & Seery, 1973; Kruskal & Carroll, 1969; Schonemann, 1970; Carroll, 1972, 1980; Spence, 1979; Greenacre & Browne, 1982; Heiser, 1981; Takane, Young, & De Leeuw, 1977). This approach of estimating both ideal points and stimuli coordinates is known as internal analysis, as opposed to external analysis methods which estimate only ideal points given the stimuli coordinates (obtained from perhaps an analysis of similarities). Carroll (1972, 1980) has introduced PREFMAP and PREFMAP2 as a series of models and algorithms to perform analyses of preference data. His methods allow the user to select between internal or external, memc or nonmetric, and unfolding or vector model analyses. Three different (nested) unfolding models can be estimated in PREFMAP and PREFMAP2; these are the simple unfolding model (which equally weights the dimensions in the space); weighted unfolding model (which provides for unequal, possibly negative weights for the dimensions); and the general unfolding model (which allows for idiosyncratic orthogonal rotation of the space for each subject). There is controversy in the literature over the desirability of constraining the weights for the dimensions of the weighted unfolding model to be positive. Carroll (1972) claims that in the weighted unfolding model, a negative wit (weight on the r-th dimension for the i-th individual) has a clear interpretation - if wit is negative, the ideal point for individual i indicates the least preferred, rather than the most preferred value, and the farther a stimulus is along that dimension from the ideal point, the more highly preferred is the stimulus. He thus argues for not constraining the
60
DeSarbo
C?
Rao
weights to be positive, Other authors such as Srinivasan and Shocker (1973) and Davison (1976) dispute the value of unconstrained analyses. Srinivasan and Shocker (1973) present a nonmetric external unfolding analysis with this model using linear programming methods including nonnegativity constraints for the dimension weights. The same constraints are provided in a metric procedure using quadratic programming described by Davison (1976). Spence (1979) presents an interesting generalization of the external unfolding model allowing for linear constraints on the stimulus space as well as ideal points of individuals. In a similar vein, Heiser (1981) formulates an internal unfolding analysis that allows for restrictions to be placed concerning the relationship between ideal points and stimuli to avoid typical degenerate solutions. The nature of constraints used by Heiser (1981) does not call for use of external information (e.g., stimulus features or individual characteristics). In this paper, we present GENFOLD2, a methodology for the GENeral UnFOLDing Analysis of preferential data. This methodology was introduced by DeSarbo and Rao (1984, 1986) to accommodate a number of different unfolding model specifications. GENFOLD2 can handle various scales of data (i.e., ratio, or interval or ordinal), and unconditional as well as conditional preference data. Further, GENFOLD2 subsumes several of the previously published unfolding models such as the simple, weighted, and general unfolding models. The specification of reparameterizstions of stimulus and/or subject coordinates is extremely flexible in the sense that the user can relate stimulus coordinates to known characteristics of stimuli and individuals’ ideal points to their background variables. Thus, the derived spaces can be “manipulated” to yield pragmatically useful results.
3. The GENFOLD2 Methodology The full GENFOLD2 model is essentially a type of general unfolding model which accommodates, for example, Carroll’s (1972) simple, weighted, and general unfolding models as special cases. It also allows for the reparameterization of stimulus coordinates and/or individual ideal points. The underlying premises for stimuli and ideal point reparameterizations are that the physical or other characteristics of stimuli should in
GENFOLDZ
61
some way “determine” the stimulus coordinates and that individual characteristics (e.g., age, gender, education, etc.) should in some way “determine” their ideal points. These premises are useful in specifying the relationships on the stimulus space and ideal points. Although our formulation specifies these relationships to be linear in parameters, one could easily approximate nonlinearities in constraints by including higher order terms (e.g., squared and cross products) if deemed essential. We will now describe the full model with the following notation. Let: i=
1, . . . , I subjects;
j =
1,
. . . , J stimuli; 1, . . . , T dimensions; 1, . . . , L subject descriptor variables; 1, . . . , K stimulus descriptor variables;
t=
I= k=
Ai, = the “dispreference value” (inversely related to preference values) the i-th subject has for the j-th stimulus;
A
=
the I X J matrix [Aij];
xil =
the t-th coordinate of stimulus j ;
Yir =
the t-th coordinate of subject i’s ideal point;
Xj
. . . ,x , ~ ) , a T x 1 vector of
= (x,~,
the j-th stimulus coordi-
nates;
Y
=
(yi 1, . . . ,y i ~ ) a, T x 1 vector of ideal point coordinates for the i-th individual:
X=
the J x T matrix [xjt];
Y=
the I x T matrix
bit];
W i = subject i’s linear (symmetric) transformation matrix; ai =
subject i’s multiplicative constant;
bi =
subject i’s additive constant;
DeSarbo & Rao
62
cf = subject i’s exponent;
fi,
=
squared distance between subject i and stimulus j ;
el, = error; Ail = the I-th descriptor variable for subject i; A=
the I x L matrix
[Ail];
a/,= the importance or impact of the I-th descriptor variable on dimension c;
a=
the L x T matrix [al,];
Bjk = the k-th descriptor variable for stimulus j ; B=
the J x K matrix [Bjk];
yk =
the importance or impact of the k-th descriptor variable on dimension t;
7=
the K x T matrix [yk].
Then, the full GENFOLD2 model can be written as: n
Aij = Aij + eij, where:
The stimulus space and individuals’ ideal points are optionally reparameterized by the relationships:
Y=Aa and
X
= By,
(2)
where a and y are matrices of order L x T and K x T respectively that are estimated. As in CANDELINC (Carroll, Pruzansky, & Kruskal, 1980) and in Three-Way Multivariate Conjoint Analysis (DeSarbo, Carroll, Lehmann, & O’Shaughnessy, 1982), these constraints can aid in the
GENFOLD2
63
interpretation of the dimensions derived (cf. Bentler & Weeks, 1978; Bloxom, 1978; Noma & Johnson, 1977; De Leeuw & Heiser, 1980; Lingoes, 1980) and can replace the post-analyses property-fitting methods often used to attempt to interpret results. GENFOLD2 attempts to estimate the desired set of constrained and/or unconstrained parameters described (i.e., some subset of: Wi, X,Y,ui, bi, ci, a,y) given A and T (the number of dimensions) using an alternating least-squares algorithm in order to minimize the weighted sum of squares objective function:
where the 6i,s are defined by the user to weight the Ai, values differently. There has been considerable research attempting to cure unfolding of its tendency toward degenerate solutions. Degenerate solutions often occur in multidimensional unfolding in a number of ways. See DeSarbo and Rao (1984) for a discussion of these approaches. The degeneracy problem in unfolding is handled in GENFOLD2 in the expression (3) of the loss function, by the inclusion of the weights. We share Heiser’s (1981) implicit theory about a possible cause for degeneracy being the error or noise in the data, and we thus provide the flexibility of the user specifying 6, differently. For example, one may define the weights as:
respectively for the two cases of no preprocessing or specific preprocessing of the Ai,-values where p is an exponent and r(Ai,) represents the row ranks (from smallest = 1 to largest = J) of the Ai,. Other weighting options are also possible. For example, one could specify 6i, = 1, V i, j , so that the “weighted” loss function reduces to the nonweighted one. Or, one could specify a bimodal or step weighting function where, say, the fist three and last three choices would be highly weighted, and all others receive low weights. The choice of the “appropriate” weighting function depends upon such factors as the preprocessing options and scale assumptions of the data, the assumptions of the conditionality of the data, the assumptions
DeSarbo & Rao
64
concerning the reliability of the different data values, and, trial and error. Also, different 6, could be specified depending upon the assumptions made concerning the reliability of the Aij collected. In addition, the value of p needs to be decided usually by trial and error, although our experience indicates that the value of p = 2 appears to work well. Table 1. Features of the GENFOLD2 algorithm
Feature
Input options
Preprocessing of A
Row center; Row center and standardize; Row and column center; Double center A and row standardize; Remove geometric mean from rows or columns; Normalize columns or rows to unit sum of squares.
Method for generating staring values (e.g.. for X)
Random start; External analysis (i.e., given X); Values given for all; A “close” start on X (i.e., MDPREF solution); A “close” values on parameters (i.e., using PREFMAP2 with X given by MDPREF).
Type of unfolding model
Simple unfolding; Weighted unfolding; General unfolding.
Type of data scale
Ratio; Intcrval; Ordinal.
Type of analysis
Extcmal
Constraints on Y
Yes; No.
Constraints on X
Yes; No.
Constraints on W’
Symmctric W i ;Diagonal W’ options on non-nonnegativity constraints.
Restrictions on ci
Unconstrained; ci = c (constant) V i; ci = 1 ‘d i.
Specifications of ai and
ai = 1, bi = 0, Vi; ai = 1, hi = b, V i ;
(X given); Internal.
= 1, bi Unconstrained; ai = a, bi = 0, V i; = a, bi = b, V i; ai = a, hi unconstrained, V i; unconstrained, bi = 0, Vi; unconstrained, bi = b, V i ; ai and bi unconstrained, V i .
ai ai ai ai
GENFOLD2
65
General Description of the Algorithm: The algorithm for estimating the various parameters in the GENFOLD2 model uses alternating leastsquares method at the core, but it includes various options making it highly flexible and versatile. The list of several features built into the program are shown in Table 1. The exact details of computation are found in DeSarbo and Rao (1984). The technical details in estimating X and Y (or a and 7) within the alternating least-squares cycle for the special case of the simple unfolding model are described in Appendix I. 4. An Illustration
A sample of I = 30 undergraduate business students of the University of Pennsylvania was asked to take part in a small study designed to measure preferences for various brands of existing over-the-counter (OTC) analgesic pain relievers. These respondents were initially questioned as to the brand(s) they currently use (as well as frequency of use) and their personal motivations for why they chose such a brand(s) (e.g., ingredients, price, availability, etc.). They were then presented fourteen existing OTC analgesic brands: Advil, Anacin, Anacin-3, Ascriptin, Bayer, Bufferin, Cope, CVS Buffered Aspirin (a generic), Datril, Excedrin, Nuprin, Panadol, Tylenol, and Vanquish. Initially, they were presented colored photographs of each brand and its packaging, together with price per 100 tablets, ingredients, package claims, and manufacturer. Each subjectkonsumer was requested to read this information and return to it at anytime during the experiment if he/she so wished. After a period of time, they were asked to make likelihood to buy/use judgments on each of the fourteen brands on an eleven point scale (0 = definitely would not buy/use, 10 = would definitely bu y/use). We conducted the GEIWOLD2 analysis of A in T = 1, 2, and 3 dimensions for the simple unfolding model with the reparameterization option where X = By assuming interval scale, row conditional input data. As such, each vector of input data was standardized to zero mean and unit variance per subject. The brand design mamx, B (not shown in the paper) has also been standardized to zero mean and unit variance. This reparameterization specification was preferred since B contains features that consumers stated (in a pretest) were important in their choice of a
66
DeSarbo & Rao
specific OTC analgesic brand. All consumers were encouraged to read this information contained in the color photographs of each brand and its packaging prior to their judgments. Based on an examination of the associated variance accounted for statistics and respective solution interpretation, the T = 2 dimensional solution was selected (weighted R 2 = 0.921) as the most parsimonious one. Figure 1 depicts the derived joint space of fourteen brands (labeled Subjects’ preferences as A-N) and thirty ideal points (labeled as “*”s). appear to be quite diverse in spanning all quadrants of the space. However, there does appear to be a somewhat larger concentration of ideal points around the two ibuprofen brands A and K and the acetaminaphon brands C, I, L, and especially M. The model fit was extremely good across the thirty subjects; for fifteen of the subjects, the variance accounted for was over 0.95 and for the remaining fifteen it was between 0.90 and 0.95. Table 2. Correlations Between Design Variables (B) and Derived Stimulus Coordinatcs (X)for the GENFOLD2 Sirnple Unfolding Model
Dimension Feature variable 1 2 3 4 5 6 7
I
I1
.905
.273 .050 -.316 .815 -.399 -.089
-.533
-.573 .181 .716 -.860 .go0
.145
The 7 impact coefficients are also represented in Figure 1 as vectors given the “regression-like” manner they impact on the brand coordinate locations. Based upon these vectors, the location of the brands, and the correlations between X and B presented in Table 2, we can easily interpret the dimensions. (These correlations will vary according to the particular orthogonal rotation utilized. No rotation was utilized for this solution. Even as such, the correlations between dimensions for X, Y, and y are
GENFOLDZ
67
low: Cor(X 1, X 2 ) = 0.152; Cor(Y1, Y 2) = 0.135; and Cor(y1, y2) = 0.022.) Dimension I separates the lower cost, higher maximum dosage aspirin brands from the higher cost, lower maximum dosage aspirin substitutes. The second dimension separates the OTC analgesics that contain caffeine from those that do not. Thus, consumer preferences appear to be based upon aspirin-nonaspirin and caffeine vs. no caffeine. It is interesting to note the lack of brands in quadrant two of the figure since there are no aspirin substitute brands with caffeine available on the market presently.
-3.5
Symbol A
B C D E F G
Brand Advil Anacin Anacin-3 Ascriptin Bayer Bufferin Cope
2.5
0.5
-1.5
Symbol
Brand
H
cvs
1
I J K L M N
Dab4 Excedrin Nuprin Panadol Tylenonl Vanquish
2 3
Symbol
4.5
Feature
4 5
Mg. of Aspirin Mg. of Acetaminophon Mg. of Ibuprofen Mg. of Caffeine Mg. of Buffered Compounds
6
him
7
Max. Dosage
Figure 1. GENFOLD2 joint space for the brands, ideal points and product features
68
DeSarbo & Rao
5. Future Research We have presented a description of the GENFOLD2 unfolding model and our alternating weighted least squares algorithm for fitting it. The methodology was illustrated using a small set of preference data for fourteen brands of pain relievers. In other papers on this algorithm (cf. DeSarbo & Rao, 1984, 1986) we have shown how the model can be employed to investigate policy simulation and to derive optimal positioning of product features to tackle the “reverse mapping” problem described early in this paper. Although we believe that the algorithm is ready for use in several research situations, more work needs to be done to investigate its behavior under several experimental and real-world conditions. Several questions can be pursued in future research. While the weighted loss function does appear tentatively to provide nondegenerate solutions, obvious questions are raised as to why. What really causes degenerate solutions in unfolding? Is it a particular (and common) form of error structure found in most data sets? Does it result from a poorly determined model or flat objective function (or loss function) response surface? Our approach seems to relieve the symptoms of the disease, but we still do not really know for certain what the disease really is. More research is needed. Another related question concerns the choice of the weighting function 6,. While some guidelines can be established to rule out certain general forms of 6 , , the choice of a specific 6i, (especially p ) remains as a trial and error procedure. While applications suggest a p of 2 for 6i, defined in expression (4), more experience with the procedure must be obtained with more data sets before this recommendation can be general. Finally, experience with more real data sets is required in order to answer many of these issues raised and to properly evaluate GENFOLD2 as a reliable methodology.
Appendix I. A Technical Description of the GENFOLD2 Algorithm for the Simple Unfolding Model The simple unfolding model with options for a reparameterization of X and Y can be stated as:
GENFOLD2
69
where:
and (A-3)
with:
The algorithm utilized to estimate parameter values X (or 7) and Y (or a) utilizes an alternating weighted least squares formulation to minimize the loss function:
where 6ij the weighting function described in DeSarbo and Rao (1984). Assuming preprocessing, starting value, control parameters (see DeSarbo & Rao, 1984) have been stipulated, the algorithm cycles between two major estimation phases:
Phase I . A Quasi-Newton Gradient Procedure to Estimate X (or 7)and Y (or a) A Quasi-Newton unconstrained algorithm (Davidon, 1959; Fletcher & Powell, 1963) is utilized to estimate the joint space to minimize a,holding ai and bi values fixed at their current values. The partial derivatives of the loss function with respect to these parameters are:
DeSarbo & Rao
70
(A-7)
(A-9)
(A-10) For sake of convenience, let's assume that the relevant parameters to be estimated are contained in the vector 0 and VO is the vector of partial derivatives for this set of parameters. Let:
r=
T(L+K);
H, = an r x r positive definite symmetric matrix, at the n-th iteration;
h,
=
optimal step length at iteration n;
S , = the search direction at iteration n. The steps of the iterative algorithm used are as follows:
1. Start with given values 00, and an r x r positive definite symmetric matrix Ho = I (identity matrix) initially. Set n = 1. 2.
Compute VO at the point 0, and set
S, = -H,V@,.
(A-1 1)
Note that the first iteration, the search directions will be the same as the steepest descent direction, VO1, providing H = I.
3. Find the optimal step length h,* in the direction S,. This is done through use of a quadratic interpolation line search procedure. Then we set:
en+, = en + h,*Sn.
(A-12)
GENFOLDZ
71
4. This new solution
On+1 is tested for optimality and for maximum number of minor iterations. That is, we see if
(a)
(On- @,+I) < TOL, or
(b)
n > M I N O R (the user specified maximum number of such “minor” iterations).
If either of these two conditions holds, this procedure is terminated. If neither holds, then we proceed to step (5). 5. Update the H matrix as:
H,+1 = H,
+ M, + N,
(A- 13)
where: (A-14)
(A-15) Q, = VO,+1
- VO,.
(A- 16)
6. Set n = n + 1 and go the step (2). Gill, Murray, and Wright (1981) provide a derivation of this procedure as well as its convergence properties. The use of this Quasi-Newton method had been favorably compared with other gradient search procedures such as steepest descent and conjugate gradient methods (Himmelblau, 1972). It was found empirically that the approximate second derivative information can aid in speeding up convergence, especially when one was near the optimal solution. In addition, since the fist step of this algorithm is a steepest descent search, one could take advantage of a steepest descent search when initially far away from the optimal solution (empirical research demonstrates that steepest descent is best used in early iterations when far from the optimal solution). Note that there is an indeterminacy with respect to the parameters X and Y in that one can define:
DeSarbo & Rao
72
X* = X T Y* =YT, where T is an orthogonal transformation (T’T = TT’ = I), and still pro* duce the same Ai, values as defined in (A-2). This particular indeterminacy is important when conducting configuration matching analyses to compare the solutions of two different simple unfolding analyses.
Phase 2. A Weighted Least Squares Procedure to Estimate ai and bi Let’s first define: *
(A-17)
Then, current estimates of ai and bi can be obtained by performing I * * and a column of 1’s: separate regressions of A$ on Ai,
,.
bi = (LiLi)-’LfM,
(A-18)
where:
Li =
((19
i:)),
Mi = ((A?)),
i;
with a J x 1 vector of for subject i, and 1 a J x l vector of 1’s. Thus, the algorithm cycles back and forth between Phases 1 and 2 until either convergence in the value of the loss function is achieved or until one utilizes more major iterations or cycles than the user stipulates as maximum.
References Bennett, J. F., & Hays, W. L. (1960). Multidimensional unfolding: Determining the dimensionality of ranked preference data. Psychometrika, 25, 27-43. Bentler, P. M., & Weeks, D. G. (1978). Restricted multidimensional
GENFOLDZ
73
scaling models. Journal of Mathematical Psychology, 17, 138-151. Bloxom, B. (1978). Constrained multidimensional scaling in N spaces. Psychometrika, 43, 397-408. Borg, I., & Lingoes, J. C. (1980). A model and algorithm for multidimensional scaling with external constraints on the distances. Psychometrika, 45, 25-38. Carroll, J. D. (1972). Individual differences and multidimensional scaling. In R. N. Shepard, A. K. Romney, S . Nerlove (Eds), Multidimensional scaling: Theory and applications in the Behavior Sciences (Vol. I). New York: Seminar Press. Carroll, J. D. (1980). Models and methods for multidimensional analysis of preferential choice (or other dominance) data. In E. D. Lantermann & H. Feger (Eds.), Similarity and choice (pp. 234-289). Bern: Huber. Carroll, J. D., & Arabie, P. (1980). Multidimensional scaling Annual Review of Psychology, 31, 607-649. Carroll, J. D., Clark, L. A., & DeSarbo, W. S . (1984). The representation of three-way proximity data by single and multiple tree structure models. Journal of Classifrcation, 1, 25-74. Carroll, J. D., Pruzansky, S . , & Kruskal, J. B. (1980). CANDELINC: A general approach to multidimensional analysis of many-way arrays with linear constraints on parameters. Psychometrika, 45, 3-24. Coombs, C. H. (1950). Psychological scaling without a unit of measurement. Psychological Review, 57, 148-158. Coombs, C. H. (1964). A theory of data. New York: Wiley. Davidon, W. C. (1959). Variable metric method of minimizations. Argonne National Laboratory Report Number ANLE-5990. Davidson, J. A. (1972). A geometrical analysis of the unfolding model: nondegenerate solutions. Psychometrika, 37, 193-216. Davidson, J. A. (1973). A geometrical analysis of the unfolding model: general solutions. Psychometrika, 38, 305-336. Davison, M. L. (1976). Fitting and testing Carroll’s weighted unfolding model for preferences. Psychometrika, 41, 233-247. De Leeuw, J., & Heiser, W. (1980). Multidimensional scaling with resmctions on the configuration. In P. R. Krishnaiah (Ed.), Multivariate analysis V (pp. 501-522). Amsterdam: North-Holland. DeSarbo, W. S., & Carroll, J. D. (1981). Three-way memc unfolding. In Proceedings of the 1981 TIMSIORSA Market Measurement
74
DeSarbo & Rao
Conference, Providence, Rhode Island: Management Science. DeSarbo, W. S., & Carroll, J. D. (1983). Three-way unfolding via weighted least-squares. Unpublished Memorandum, AT&T Bell Laboratories, Murray Hill, NJ. DeSarbo, W. S., Carroll, J. D., & Green, P. E. (1984). An alternating least-squares procedure for the estimation of missing preference data in product concept testing. Unpublished Memorandum, AT&T Bell Laboratories, Murray Hill, NJ. DeSarbo, W. S., Carroll, J. D., Lehmann, D., & O’Shaughnessy, J. (1982). Three-way multivariate conjoint analysis. Marketing Science, 1, 323350. DeSarbo, W. S., & Rao, V. R. (1983). A constrained unfolding model for product positioning. In Proceedings of the 1983 ORSAITIMS Marketing Science Conference, Los Angeles, California. DeSarbo, W. S., & Rao, V. R. (1984). GENFOLD2: A set of models and algorithms for the GENeral UnFOLDing analysis of preferenceldominance data. Journal of Classification, 1 , 147-86. DeSarbo, W. S., & Rao, V. R. (1986). A constrained unfolding methodology for product positioning. Marketing Science, 5, 1-19. Fletcher, R., & Powell, M. J. D. (1963). A rapidly convergent descent method for minimization. Computer Journal, 6, 163-168. Gill, P. E., Murray, W., & Wright, M. H. (1981). Practical Optimization. New York: Academic Press. Green, P. E. (1975). Multivariate tools for applied multivariate analysis. New York: Academic Press. Greenacre, M. J., & Browne, M. W. (1982). An alternating least-squares algorithm for multidimensional unfolding. Presented at the 1982 Joint Meeting of the Psychometric and Classification Societies, Montreal, Canada. Heiser, W. J. (1981). Unfolding Analysis of Proximity Data. Doctoral Dissertation, University of Leiden, The Netherlands. Himmelblau, D. M. (1972). Applied nonlinear programming. New York: McGraw-Hill. Kruskal, I. B., & Carroll, J. D. (1969). Geometric models and badnessof-fit functions. In P. R. Krishnaiah (Ed.), Multivariate Analysis II. New York: Academic Press. Kruskal, J. B., Young, F. W., & Seery, J. B. (1973). How to use KYST,
GENFOLDZ
75
a very flexible program to do multidimensional scaling and unfolding.
Unpublished Memorandum, Bell Laboratories, Murray Hill, NJ. Lingoes, J. C. (1972). A general survey of the Guttman-Lingoes nonmetric program series. In R. N. Shepard, A. K. Romney, and S. Nerlove (Eds.), In Multidimensional scaling: Theory and applications in the behavioral sciences (Vol. I). New York: Seminar Press. Lingoes, J. C. (1983). The Gutman-Lingoes nonmetric program series. Ann Arbor: Mathesis Press. Noma, E., & Johnson, J. (1977). Constraining nonmemc multidimensional scaling configurations. Technical Report #60, University of Michigan, Human Performance Center. Roskam, E. E. (1973). Fitting ordinal relational data to a hypothesized structure. Technical Report #73MA06, University of Nijmegen, The Netherlands: Schonemann, P. H. (1970). On memc multidimensional unfolding, Psychometrika, 35, 349-366. Spence, I. (1979). A general metric unfolding model. Paper presented at the 1979 Psychometric Society Meetings, Monterey, CA. Srinivasan, V., & Shocker, A. D. (1973). Linear programming techniques for multidimensional analysis of preferences. Psychometrikn, 38, 337-369.
Takane, Y., Young, F. W., & De Leeuw, J. (1977). Nonmemc individual differences multidimensional scaling: An alternating least-squares method with optimal scaling features. Psychometrika, 42, 7-67. Tucker, L. R. (1960). Intra-individual and inter-individual multidimensionality. In H. Gulliksen & S . Messick (Eds.), Psychological scaling: Theory and applications. New York: Wiley. Young, F. W., & Torgerson, W. S . (1967). TORSCA: A Fortran IV Program for Shepard-Kruskal multidimensional scaling analysis. Behavioral Science, 12, 498.
This Page Intentionally Left Blank
New Developments in Psychological Choice Modeling G . De Soete, H. Feger and K. C. Klauer (eds.) 0 Elsevier Science Publisher B.V. (North-Holland), 1989
77
MAXIMUM LIKELIHOOD UNIDIMENSIONAL UNFOLDING IN A PROBABILISTIC MODEL WITHOUT PARAMETRIC ASSUMPTIONS Patrick M,Bossuyt Erasmus University, Rotterdam, The Netherlands Edward E. Roskam University of Nijmegen, The Netherlands This paper presents a new probabilistic unidimensional unfolding procedure for paired comparisons data. This procedure is related to a probabilistic unfolding theory in which a nonparametric random ideal coordinate assumption is added to the familiar unidimensional unfolding assumptions. The procedure can be used to find a maximum likelihood sequencing of alternatives or their midpoints, based on choices of a single subject or a group of subjects. It requires a seriation strategy and the calculation of maximum likelihood binomial probability estimates under order restrictions. Algorithms are presented for both purposes. The unfolding procedure can easily be modified to suit related probabilistic unfolding theories.
1. Introduction The large appeal of Coombs’ (1950, 1964) unfolding theory can be likely attributed to the attractive plausibility of its main ideas. According to the unfolding theory a subject in a choice situation compares the available alternatives with an ideal alternative and chooses the alternative least dissimilar from this ideal. The ideal can be subjective, but some
The research reported in this paper was supported by Grant No. 40-30 of the Dutch Foundation for the Advancement of Pure Science Z.W.O. This paper is a revised version of an article published in Zeitschruf fur Sozialpsychologie, 1987, 18, 282-294.
I8
Bossuyt & Roskam
intersubjective cognitive structure is expected to exist in the pattern of dissimilarities between the alternatives. Representing dissimilarities as distances Coombs has proposed an “unfolding” procedure, based on these two notions. This procedure is elegant in its simplicity and leads to the construction of an underlying unidimensional sequencing of the alternatives and a partial order on the distances, out of a set of subjective preference rankings. Greenberg (1965) proposed a closely related procedure to be used with paired comparisons data. In spite of the simplicity of both procedures successful applications of Coombs’ and Greenberg’s unfolding procedures are infrequent. Both procedures require all choices or rankings to be perfectly consistent with a pattern of distances in the underlying unidimensional space. In practice this appears to be a very strong necessary condition. Violations of this condition, however small they may be, and however likely they are to occur, cannot be handled in a satisfactory way. Several authors have relaxed the consistency requirement by adding probabilistic assumptions to the unfolding theory. Examples of probabilistic unfolding theories for binary choices have been presented in the literature by authors as Bechtel (1968), Coombs, Greenberg & Zinnes (1961), DeSarbo & Hoffman (1986), Croon (in press), Ramsay (1980), Schonemann & Wang (1977), Sixtl (1973), and Zinnes & Griggs (1974). In this paper a new unfolding procedure for paired comparisons data is presented. The procedure is related to a simple, nonparametric probabilistic unidimensional unfolding theory. In a way, it is the probabilistic successor of Greenberg’s (1965) proposal. The procedure differs from Greenberg’s in that it requires a conditional estimation of binomial choice probabilities, subject to a hypothesized underlying sequencing of the interalternative midpoints. A maximum likelihood seriation strategy is then adopted in finding the most plausible underlying sequencing. The second section of this paper contains a description of what we will call the probabilistic midpoint unfolding theory. It describes the assumptions and the resulting ordinal restrictions on the choice probabilities for the case of a single subject. In the third section these results are extended to data from a population of subjects. In the fourth section the estimation of choice probabilities under order restrictions is discussed, and an algorithm for finding maximum likelihood estimates is presented. In
Maximum Likelihood Unidimensional Unfolding
79
the fifth section a branch and bound scheme is proposed for finding the best underlying sequencing, using the maximum likelihood principle. The paper concludes with a brief comparison of model and procedure with other probabilistic unfolding models for paired comparisons.
2. Probabilistic Midpoint Unfolding 2.1 A Single Subject The theory is intended for the familiar paired comparisons task in which the “no-choice’’ option has been eliminated. This means that each pair of elements { x , y } of a set S of s alternatives has been presented nV times, of which x has been chosen kV times and y kyx times, where kV + k,, = nV. As data, or model of the data, we have then a binary choice frequency structure d , k >
2.1 .I Assumptions The following set of four assumptions defines the theory. A. 1 (Choice based on dissimilarities) In making a choice, the subject has picked the alternative least dissimilar to the ideal alternative Z.
A.2 (Subjective metric) The dissimilarities in A . l can be represented by a distance function. Let S’ = S u { z } . A metric d then can be defined on S’ x S’ such that x is chosen out of { x , y } if and only if the distance between the ideal z and x does not exceed the distance between z and y: d,, c dzy. A.3
(Unidimensionality) The metric space (S’,d) can be mapped into a metric line. For every two elements x , y E S’, there exist real-valued coordinates x , y such that the distance between these elements can be expressed as dxy = I x - y I . (We use no additional notation in distinguishing between a point on the metric line and its coordinate.)
A.4
(Random ideal coordinate) The coordinate of the ideal alternative on the metric line is a random variable Z with a cumulative dismbution function H ( x ) = Pr(Z I x ) .
Bossuyt & Roskam
80
AS
(Nonidentical coordinates) All distances between the alternatives in A.2 are nonzero: for every two elements x, y E S’: dxy > 0.
The assumptions A. 1 to A.3 define the unidimensional unfolding theory as proposed by Coombs (1950, 1964). Choices are seen as resulting from a comparison of dissimilarities, and these dissimilarities can be represented as distances. The assumption A.4 is added in order to accommodate for small inconsistencies in unidimensional unfolding. Here we assume that the origin of the inconsistencies can be found in a random variability in the distances d,,, which is itself a consequence of a random uncertainty in the location of the ideal on the metric line. For a review of alternative probabilistic assumptions in unfolding see Croon (in press). A similar set of assumptions has been used by a number of authors who have proposed a related theory on probabilistic unfolding (Bechtel, 1968; Jansen, 1981; Sixtl 1973). However, these authors specified the exact functional form of the cumulative distribution function in their version of assumption A.4. Bechtel (1968) for example assumed that a cumulative normal distribution function was always appropriate, whereas Jansen (1973) and Sixtl (1973) proposed the logistic function. Those strong parametric assumptions will not be needed in the present approach, because only the assumed existence of a function H together with its monotonic nondecreasing property will be used. The fifth assumption A S is added to avoid problems in the representation of the dissimilarities. Together the assumptions A.l to A S define what will be called the probabilistic midpoint unfolding (PMU) theory. In line with the use of terminology advocated elsewhere (Bossuyt & Roskam, 1987) a “PMU model” will be a structure of the appropriate type in which the assumptions of the theory are satisfied. In this case a PMU model will be a binary choice frequency structure for which there exist a set of coordinate values and a cumulative distribution assumption such that assumptions A.l to A.4 are satisfied. It is difficult if not impossible to define a set of necessary and sufficient conditions on a choice frequency function k to guarantee the existence of a PMU model. However, such a set of conditions can be defined on a structure of choice probabilities. This result will be the kernel of our approach in constructing a PMU model. Given a choice frequency structure we will look for maximum likelihood estimates of the
Maximum Likelihood Unidimensional Unfolding
81
choice probabilities satisfying this set of necessary and sufficient conditions. If estimates of these probabilities are available the problem of finding values for the alternative coordinates and a distribution function for the ideal coordinate can be solved easily.
2.1.2 Binary Choice Probabilities Assumption A.4 of the midpoint unfolding theory is basically a probabilistic choice assumption. It implies that each choice out of a pair of alternatives { x , y } can be regarded as the result of an independent Bernoulli mal, where x has a binary choice probability (BCP) pq of being chosen. As a consequence, the choice frequency kq is a value from a binomial distribution with parameters (nq, pxy).The structure <S,p > will be called a BCP structure. The following relation holds for these binary choice probabilities:
p,=Pr(IZ-xI
I IZ-yl).
(1)
This is a simple result of the four assumptions made earlier. Relation (1) can be reformulated using the concept of a midpoint. The midpoint between two points x and y is defined as the point mv of the metric line for which the distances dmVX= dmVYare equal. Its coordinate 1
value is then defined as mv = - ( x + y ) . If we refer in the following to 2 the midpoints in S all midpoints mq between nonidentical elements x , y of S will be meant. Equation (1) now becomes: x
3
pv = Pr(2Z 5 x
+ y ) = Pr(Z I mq) = H ( m x y )
Pyx = 1 - H b x y ) .
(2)
The conditions on the choice probabilities to be derived depend on the possible sequencings of the coordinates of the alternatives and the midpoints on the metric line. In order to formulate them properly we will introduce some additional terminology. The unfolded order will be defined as a permutation of the alternatives in S that is monotonic with respect to the ranking of the corresponding coordinate values on the metric line in a PMU model. The unfolded midpoint order is defined the same way. The latter is a permutation of the midpoints in the set S that is monotonic with the ranking of the corresponding midpoint coordinates.
Bossuyt & Roskam
82
Obviously every permutation of the alternatives in a set S is a possible unfolded order. This means that, given appropriate data, a PMU model may exist in which this permutation is monotonic with the ranking of the coordinate values. However, not all permutations of the midpoints in S are possible unfolded midpoint orders. A permutation of the midpoints is a possible unfolded midpoint order if there exists a set of alternative coordinates such that the midpoint coordinates can be defined in the usual way. We will call a permutation of the midpoints that satisfies this condition a midpoint order. This implies that a permutation of the midpoints in a set S is a midpoint order if, for all midpoints mw, mv in S, there exist coordinate values for the elements v, w, x, y that satisfy: mW precedes mv in the permutation e v
+ w -x -y
2 0.
(3)
Through equation (1) and the nondecreasing property of the cumulative distribution function H there exists a monotonic relation between the choice probabilities in a PMU model and the unfolded midpoint order. In general, a BCP structure will be said to satisfy midpoint monotonicity if there exists a midpoint order such that the probabilities can be ranked monotonically with it: pw S p v whenever m, precedes mq in the midpoint order. Midpoint monotonicity proves to be a necessary and sufficient condition on a BCP structure for the existence of a PMU model.
Theorem 1. The following two statements are equivalent : 1. The BCP structure 4 , p > for a binary choice frequency structure 4, k > satisfies midpoint monotonicity.
2. There exists a PMU model for cS,k>.
Proof: The first condition follows from the second as a simple consequence of the relation (2). To show that the second follows from the first take a set of coordinates satisfying the inequalities (3) derived from the sequencing of the midpoints in the midpoint order. Then define an arbitrary cumulative distribution function such that (2) is satisfied for all midpoints in S. As S is finite, this poses no problem. Theorem 1 will be central in our unfolding technique. If the unfolded midpoint order is given the choice probabilities are known to be monotonic with respect to this order. Consequently, the estimates of these
Maximum Likelihood Unidimensional Unfolding
83
probabilities will have to satisfy the corresponding ordinal restrictions. An algorithm to calculate these estimates is presented later on. If the unfolded midpoint order is not known we can, for every possible midpoint order, obtain the corresponding probability estimates subject to midpoint monotonicity. A midpoint order is then defined to be a maximum likelihood unfolded midpoint order in S if there is no other midpoint order for which the maximum likelihood estimates under midpoint monotonicity result in a higher value of the likelihood for < S , k > . One more result will be of use. If there exists a PMU model and the choice probabilities are organized in a matrix with the row and column indices arranged in the unfolded order, then the elements in each row of the resulting matrix do not increase from the left toward the main diagonal and do not decrease from the main diagonal to the right. This pattern has been called characteristic monotonicity by Dijkstra, Van der Eijk, Molenaar, Van Schuur, Stokman, and Verhelst (1980). An example of a BCP matrix satisfying characteristic monotonicity can be found in Table 2. Basically, a BCP structure 4 , p > satisfies characteristic monotonicity if there exists a permutation of the alternatives in S such that for each triple of alternatives w ,x , y , pwx I pwy I pxy whenever w precedes x and x precedes y in this permutation. Midpoint monotonicity then implies characteristic monotonicity. The reverse does not hold. For an example, take the choice probabilities in Table 2. If we rank them and take the corresponding monotonic permutation of the midpoints, the result is not a midpoint order because the resulting inequalities (3) on the coordinates are inconsistent. However, if the set S contains five elements or less, characteristic monotonicity always implies midpoint monotonicity. 2.2 A Population of Subjects In most paired comparisons applications the alternatives are presented to more than one subject. This occurs when the population of interest for the analysis consists of m subjects. Given this, there exists a wide variety of research designs for this multiple paired comparisons task. Not all designs use the same sampling procedure for the subjects. We will distinguish between the following two sampling schemes.
Bossuyt & Roskam
84
(SS.l) All pairs are presented at least once to each subject.
(SS.2) On each presentation of a pair of alternatives a subject is randomly sampled from the population. Each subject had a probability pzi of being selected, with m
Pzi = 1. i =1
Both sampling schemes imply that we have as data a set of binary choice frequency structures d , k > (i = 1, rn) as described earlier.
2.2 .I Assumptions We start by assuming that for each subject i (i = 1, rn) the assumptions A.l to A S defined earlier hold. This means that for each subject there exists a PMU model with coordinates xi, yi, rnqi and a cumulative distribution Hi function for the ideal coordinate. It is typical for the unfolding theory to assume some additional structure relating the metrics in the individual PMU models. This assumption follows from the premise that a considerable degree of intersubjectivity is to be expected in the dissimilarity pattern for the alternatives. In deterministic unidimensional unfolding (Coombs, 1964) either one of the following two assumptions is made. A.6
(Joint unfolded order) There exists a permutation of the elements in S that is an unfolded order for each subject i (i = 1, rn).
A.7
(Joint unfolded midpoint order) There exists a permutation of the midpoints in S that is an unfolded midpoint order for each subject i (i = 1, m).
It will be clear that A.7 implies A.6 but not conversely. Both follow from the stronger assumption that the memcs of all subjects are proportionally related. The latter is frequently assumed in probabilistic unfolding; it will not be needed in the present approach. The assumptions A.l-A.6 or A.l-A.7 define the joint probabilistic midpoint unfolding (JPMU) theory. In the following subsection we will examine necessary and sufficient conditions on the binary choice probabilities for the existence of a JPMU model.
Maximum Likelihood Unidimensional Unfolding
85
2.2.2 Binary Choice Probabilities We will have to distinguish between situations in which the first sampling scheme SS.1 has been followed and situations in which the second scheme SS.2 has been adopted. We start with the former. Suppose the assumptions A.l to A.7 hold. In that case there exists a PMU model for each subject. Assumption A.7 then implies that midpoint monotonicity is satisfied in all individual BCP structures for the joint unfolded midpoint order. Obviously midpoint monotonicity is also satisfied in every individual BCP structure if assumption A.6 holds, but a joint unfolded midpoint order does not necessarily exist. Yet through assumption A.6 all individual unfolded midpoint orders have to be related. More specifically, characteristic monotonicity has to be satisfied in each BCP structure for the joint unfolded order. These results are still valid in case sampling scheme SS.2 has been followed. Yet the construction of a JPMU model may be severely handicapped if a large number of the n,i are zero. In that case several probabilities pxyi cannot be estimated. In the extreme situation where every subject has chosen out of only one pair of alternatives all midpoint orders will be equivalent in terms of likelihood, because only one subjectdependent choice probability can be estimated. To deal with these situations we will follow a different approach. If sampling scheme SS.2 has been adopted, the binary choice probability p , that x is chosen by a subject selected at random can be expressed as
We will now formulate the necessary conditions for the existence of a JPMU model on the “joint” choice probabilities pq. Since we assume that a PMU model exists for each subject, the individual BCP structures satisfy midpoint monotonicity. If there exists a joint unfolded midpoint order (A.7), the monotonicity is preserved in the joint BCP structure through addition (equation (4)).In a similar way it can be shown that the joint BCP structure satisfies characteristic monotonicity for the joint unfolded order if assumption A.6 holds.
Bossuyt & Roskam
86
Recapitulating, the following strategies may be followed. If assumption A.6 holds, there are two sampling-dependent strategies.
(SS.l) For each subject, find the unfolded midpoint order using midpoint monotonicity on the choice probabilities under the condition that characteristic monotonicity holds within each BCP structure for the joint unfolded order. (SS.2) Find the joint unfolded order by using characteristic monotonicity on the joint choice probabilities. If assumption A.7 is assumed to hold, these strategies are altered as follows.
(SS.l) Find the joint unfolded midpoint order by using midpoint monotonicity on every individual BCP structure. ( S S . 2 ) Find the joint unfolded midpoint order by using midpoint monotonicity on the joint choice probabilities.
3. Estimation of the Probabilities In this section a general algorithm will be described to find the maximum likelihood estimates of binomial probabilities under order restrictions. As data we have a binary choice frequency structure 4 , k > . Let T be a set of ordered pairs ( x , y ) of the set of alternatives S. For each pair of alternatives x , y E S either ( x , y ) is a member of T , ( y , x ) is a member of T , or ( x , y ) nor 6 , x ) is a member of T. The probabilistic assumption A.4 implies that the binary choice frequencies k, are values from a binomial distribution with parameters (n,, p,). Let 4, be the choice proportion of x in { x , y } , 4, = k,/n,,. If a function g assigns estimates of the BCP p in T , then the log likelihood of the choice frequencies in T is equal to the function
denotes summation over all elements x , y of
plus an additive constant. T
T. Let R be a reflexive, transitive binary relation on T. The relation R then establishes a partial order on the set T. Assume the estimates gzy of
Maximum Likelihood Unidimensional Unfolding
87
the BCP p , in T are known to satisfy the following restrictions: ,g
XYRVW
5 gvw'
(6)
The problem of finding the maximum likelihood estimates of the probabilities p , in T , conditional on R , consists of finding the function f that maximizes the likelihood ( 5 ) within the set of all functions g satisfying the restrictions (6). If the ordinal restrictions are satisfied by the choice proportions, the latter are the conditional maximum likelihood estimates. If they are not, some other function f satisfying the order restrictions and maximizing the likelihood has to be found. We define two functions n and q on the power set of T. For each subset B c T , B
which implies that n~ contains the sum of presentations and qB the weighted average of the choice proportions in B . The basic principles of the algorithm are embodied in Lemma 1 and Lemma 2.
Lemma 1. If, within a subset B likelihood LB is maximized for fxy
c T,
= 40 vXY
all estimates are equal, then the
B.
Proof: Let f, = s for a real s within B . Obviously, if fxy = s, the function f satisfies the restrictions (6) in B . The likelihood LB then can be expressed as a function of s:
LB( k :s) = C [q, In s
+ ( 1 - qxy) In ( 1 - s)] nxy
B
= qB Ins
+ (1 - qB)ln(l - s).
(7 )
It is well known that the function L B ( s ) reaches a unique maximum at s = 40.0
Bossuyt & Roskarn
88
For purpose of what follows one last piece of terminology needs to be introduced. We will call a partition of a subset B c T into k subsets Bj (k > 1) an R-consistent partition of B if for every two subsets Bj, B, with qBi c q ~ there ~ , are no elements ( v , w ) of Bi and ( x , y ) of B, for which
(x,y)R( v , w ) . Call such a partition the greatest R-consistent partition of B if there does not exist an R-consistent partition for any of the subsets Bi in this partition.
Lemma 2. Let ,g = 40 for all ( x , y ) in a subset B
c T.
The following
two statements are equivalent.
1. There exists an R-consistent partition of B. 2. There exists a function f on T satisfying (6) such that f increases the likelihood in B: LB(k :f)> LB(k :g).
Proof: First we show that (2) follows from (1). Because of k
qB=zi=l
“B
and the convexity of the function (7),the result follows. To show that (1) follows from (2), create a partition of B by assigning two elements (x,y), (v,w) to the same subset Bi if and only if fxy = f,,. Since f satisfies the restrictions (6), the resulting partition is R-consistent. Set h, = qBi for all (x,y) in each subset Bi in this partition. Then, by Lemma 1, LB(k : h ) > LB(k :g). 0 The following theorem now can be proven. Theorem 2 . If for a subset B
c T either there
1. exists an R-consistent partition of B, this partition is the greatest Rconsistent partition, and fv = q ~ , ,for , all (x,y) in each subset Bi in this partition, or 2.
there does not exist an R-consistent partition, and f, = k Y > in B ,
for all
then the function f maximizes the likelihood in B.
Proof: Suppose there exists a function g on the elements of B such that L ~ ( k : g>) L ~ ( k : f ) Through . Lemma 2, the latter implies that there exists
Maximum Likelihood Unidimensional Unfolding
89
an R-consistent partition of this subset Bi. Since this contradicts the assumptions, such a function g does not exist: f maximizes the likelihood in B . 0 By Theorem 2 the problem of finding the function satisfying (6) and maximizing ( 5 ) can be solved by finding the greatest R-consistent partition of T , if there exists one. In the algorithm we propose, the set T is partitioned (not necessarily R-consistent) into two subsets, say T1 and T2. Initially T I contains only one element. Then, one by one, the elements of T z are transferred to T 1 , and each time the greatest R-consistent partition of the new T I is found. If, finally, T I = T , the maximum likelihood estimates in T have been found. A more detailed description of this algorithm can be found in Bossuyt (1987). Both the definition of the likelihood and the binary relation R refer to the case of a single subject. However, the extension to the case of a group of subjects with sampling scheme SS.l or SS.2 is straightforward. For sampling scheme SS.2, the algorithm is applied to the joint frequencies. For sampling scheme SS.1, the algorithm is repeated for each of the m frequency structures, for the same set T and the same relation R . It would be interesting to have a statistical test of the hypothesis that the probabilities satisfy the order restrictions as defined by (6) versus the alternative that they do not. A generalized likelihood ratio test seems indicated, since the maximum of the likelihood can be calculated both conditionally and unconditionally. Unfortunately the distributions of the statistic under the null hypothesis cannot be traced easily. For large sample frequencies, this distribution is a weighted chi-square distribution, but the weights for characteristic and midpoint monotonicity are hard to obtain (Robertson, Wright, & Dykstra, 1988). To overcome this difficulty we suggest a nonparametric estimation of the relevant quantiles of these distributions using order statistics. This can be done by calculating the value of the likelihood ratio for a large number of binary choice frequency structures generated by Monte Carlo simulations for parameters satisfying the order restrictions. A test with an approximate size a can then be based on the estimated 1 - CL quantile of the distribution of these values.
90
Bossuyt & Roskam
4. The Maximum Likelihood Unfolded Order In this section a branch and bound algorithm is described to find the maximum likelihood unfolded midpoint order in the case of a single subject and the maximum likelihood unfolded order in the case of a group of subjects. Again the extension to the remaining cases will be comparatively straightforward. A branch and bound algorithm guarantees that the resulting solution is optimal because it evaluates all possible permutations at least implicitly. The branch and bound principle will be described first in its general form. In the remaining subsections the details for the case of the unfolded order and the unfolded midpoint order will be specified. 4.1 Branch and Bound The algorithm first calculates the value of the likelihood under no order Then by some suboptimal method an initial permutarestrictions La. tion is generated. The corresponding conditional estimates are found, and the value of the likelihood function is calculated. If this value equals L-, the initial permutation is a maximum likelihood solution and the algorithm stops. If the likelihood for the initial permutation is lower than L-, its value is stored as LCut,and the initial permutation is stored as a provisional solution. Next the algorithm generates a permutation tree. An example of a permutation tree for a set of five elements is given in Figure 1. Let rbe the number of elements in the required permutation. Except for the branch at level 1, each branch in this tree corresponds to a subset of permutations in which the (r - 1) leftmost elements are as specified by the labels on the node. For example, a path through branches a at level 2, branch b at level 3, and branch c at level 4 in Figure 1 corresponds to the subset of all permutations in which abc are the three leftmost elements: abcde and abced. Consequently, the branch at level 1 corresponds to the set of all permutations of the elements in S, and a down to level r - corresponds to one permutation only. The algorithm looks for possible improvements on the initial solution in the following way. Starting from level 1 it consecutively mes to establish a path along branches between nodes down to the level r-. From a branch at level r all branches to the branches at level r + 1 are examined
Maximum Likelihood Unidimensional Unfolding
Level 1 Level 2 Level 3 Level 4 Level 5
91
I
i
lA,,c, 4,lA,J-.LiAAAi
d e c e c d
d e b e b d
A A A A A A i A r
c e b e b c c d b d b c d e c e c
Figure 1. A partial look at a permutation tree for five elements.
for feasibility. One device for evaluating feasibility common to all branch and bound schemes is to calculate the upper bound of the likelihood in the subset of permutations that corresponds to the path down to the branch at level r + 1. This maximum can be calculated using the restrictions on the estimates that are shared by all permutations in the corresponding subset. If the upper bound of the likelihood is lower than the current cutoff value LCuf,the branch is discarded. No element of the corresponding subset of permutations will lead to an improvement on the current provisional solution. The algorithm then continues to evaluate the remaining branches from level r to r + 1. If the upper bound is higher or equal than the current cutoff value LCuf,the procedure is repeated at level r + 1. If the algorithm arrives at level r - , the upper bound is equal to the likelihood for the probabilities corresponding to a single permutation. If this likelihood is equal to the cutoff criterion, the permutation is equal to the provisional solution in terms of the likelihood. It is stored, and the search continues. If the likelihood is higher than the cutoff criterion, the permutation replaces the provisional solution(s) and the corresponding value of the likelihood function becomes the new cutoff criterion L,,,. If this new criterion equals L-, the provisional solution is a maximum likelihood solution and the search stops.
92
Bossuyt & Roskam
If all branches from level r to r + 1 have been evaluated, the algorithm backtracks to level r - 1 along the path and checks whether all branches from this level have been evaluated. If not, the next branch is examined. Otherwise, the algorithm backtracks to the branch at level r - 2 along the path. If, ultimately, the algorithm has backtracked to level 1 and all branches have been examined for feasibility, the current provisional solution has to be a maximum likelihood solution. 4.2 Characteristic Monotonicity When looking for a maximum likelihood permutation of the alternatives in S, the permutation tree contains as many levels as there are elements in S. An initial solution can be found by selecting the smallest choice proportion, say (lab, and taking a and b to be the first two elements in the permutation. A sequencing of the remaining alternatives can be based on the rule: v precedes w if pov Ip,. This rule has been described by Greenberg (1965) who refers to a suggestion from Coombs. Before calculating the upper bound of the likelihood to examine the feasibility of a branch, the algorithm first checks for the presence of permutations in the corresponding subset that have already been implicitly evaluated. Since the permutations abcde and edcba will lead to equivalent values of the likelihood, only one of them needs to be evaluated. If all permutations in a subset have been evaluated at an earlier stage, the branch can be discarded. The upper bound under characteristic monotonicity in a subset of permutations is calculated using the algorithm described in the previous section. We will illustrate the construction of the set of ordered pairs T and the binary relation R by an example, with S = {a,b,c,d,e}. Suppose a path along the branches a,b has turned out to be feasible and the branch to the branch c at level 4 has to be examined. The set T is then composed of the ordered pairs ( a h ) (a,c) ( a , d ) ( a , e ) (b,c) ( b , d ) ( h e ) (c,d) (c,e). The relation R contains as elements
Maximum Likelihood Unidimensional Unfolding
93
4.3 Midpoint Monotonicity
When looking for a maximum likelihood midpoint order the permutation 1 tree has -s(s - 1) levels: the number of midpoints in S. An initial solu2 tion is found by taking a maximum likelihood permutation under characteristic monotonicity, and using the following rule: mwx precedes mwy if w precedes x and x precedes y in the unfolded order. If this permutation of the midpoints is a midpoint order, a maximum likelihood solution has been found. If the number of elements in S does not exceed five, the permutation will always be a midpoint order. If the number of elements in S exceeds five and the permutation is not a midpoint order, some midpoint order consistent with the maximum likelihood permutation under characteristic monotonicity is arbitrarily selected. Three devices are used to evaluate the feasibility of a branch. A branch is discarded a. if all permutations in the subset have been explicitly or implicitly evaluated, or b.
if the subset does not contain any midpoint orders, or
c. if the value of the upper bound in the subset is lower than the current cutoff criterion. Device (b) is invoked because not every permutation of the midpoints is a midpoint order. To check this, the algorithm takes the inequalities (3) that are shared by all elements in the subset of permutations that is to be examined, and sees if there exists a solution. For this purpose we use an algorithm by Chernikova (1969, modified by Nagels and Elzinga (Roskam, 1987). If there is no solution, the subset does not contain any midpoint orders and the branch is discarded. Though this algorithm always produces a maximum likelihood solution, it soon becomes very time-consuming as the number of midpoints increases. In that case some suboptimal modifications could turn out to be necessary. One modification consists of evaluating only those midpoint orders that are consistent with the maximum likelihood order under characteristic monotonicity. This leads to a considerable reduction in size of the permutation tree, but the amount of time necessary to evaluate all the branches might still lead to problems. For a large set of alternatives S
Bossuyt & Roskam
94
a suboptimal pairwise interchange strategy could be used.
5. An Example Greenberg (1965) asked 163 housewives to choose out of all pairs of nine phrases, describing possible attitudes towards the Volkswagen automobile. The phrases ranged from excellent (A), over indifferent (E), to terrible (I). Greenberg attempted to find an underlying midpoint order by using midpoint monotonicity on the resulting choice proportions. Unfortunately, his attempt was not successful. As can be concluded from an inspection of Table 1, the choice proportions do not satisfy midpoint monotonicity, they do not even satisfy characteristic monotonicity. Greenberg attributed this to “sampling error”. Table 1. Choice proportions, based on the proportions collected by Greenberg (1965).
A
B C D E F G H I
A
B
C
D
E
F
G
H
I
0.500 0.859 0.804 0.822 0.693 0.669 0.620 0.583 0.491
0.141 0.500 0.798 0.730 0.626 0.577 0.589 0.503 0.429
0.196 0.202 0.500 0.663 0.577 0.521 0.466 0.436 0.374
0.178 0.270 0.337 0.500 0.515 0.460 0.417 0.387 0.245
0.307 0.374 0.423 0.485 0.500 0.350 0.350 0.227 0.209
0.331 0.423 0,479 0.540 0.650 0.500 0.264 0.239 0.141
0.380 0.411 0.534 0.583 0.650 0.736 0.500 0.215 0.153
0.417 0.497 0.564 0.613 0.773 0.761 0.785 0.500 0.153
0.509 0.571 0.626 0.755 0.791 0.859 0.847 0.847 0.500
By multiplying the proportions reported in Greenberg (1965) Greenberg (1965), by 163 we obtained a set of choice frequencies. The choice proportions based on these frequencies (Table 1) are not entirely equal to the proportions in Greenberg (1965), which seems to imply that not all subjects did make all choices. Greenberg used sampling scheme SS.1 for his subjects. However, since the individual choice frequencies are unknown, we proceed as if sampling scheme S S . 2 had been adopted. It is reasonable to assume that there exists a joint unfolded order (A.6) of the nine phrases within Greenberg’s population of housewives. Therefore we looked for the
Maximum Likelihood Unidimensional Unfolding
95
maximum likelihood joint unfolded order using characteristic monotonicity. Not surprisingly, this order corresponded to the a priori order of the nine phrases. The maximum likelihood estimates can be found in Table 2. The corresponding value of -2 log likelihood ratio was 0.431. This value of the test statistic is lower than the .95 quantile of the distribution under characteristic monotonicity (13.315) estimated in a series of 500 Monte Car10 simulations. Table 2. Choice probabilities estimated under characteristic monotonicity based on the proportions in Table 1.
A B
C D E F G H I
A
B
C
D
E
F
G
H
I
0.500 0.859 0.813 0.813 0.693 0.669 0.620 0.583 0.491
0.141 0.500 0.798 0.730 0.626 0.583 0.583 0.503 0.429
0.187 0.202 0.500 0.663 0.577 0.521 0.466 0.436 0.374
0.187 0.270 0.337 0.500 0.515 0.460 0.417 0.387 0.245
0.307 0.374 0.423 0.485 0.500 0.350 0.350 0.233 0.209
0.331 0.417 0.479 0.540 0.650 0.500 0.264 0.233 0.149
0.380 0.417 0.534 0.583 0.650 0.736 0.500 0.215 0.149
0.417 0.497 0.564 0.613 0.767 0.767 0.785 0.500 0.149
0.509 0.571 0.626 0.755 0.791 0.851 0.851 0.851 0.500
6. Discussion The probabilistic unfolding models for binary choices can be distributed over two categories. The first category contains the random configuration models (Croon, in press). These models are inspired by the Thurstonian scaling approach. They assume that either the coordinates of the ideal or/and the alternative coordinates, or the ideal-alternative distances, are random variables. Examples are the models proposed by Bechtel (1968), Coombs, Greenberg & Zinnes (1961), Croon (in press), Ramsay (1980), and Zinnes & Griggs (1974). The second category contains models that are inspired by other scaling approaches. Examples are the Bradley-Terry-Luce approach (Schonemann & Wang, 1972), the Rasch model (Sixtl, 1973; Jansen, 1981), and the Fechnerian scaling model (see Bossuyt & Roskam, 1985). The present model belongs to the first category. It lacks an advantage of the other models in this category, since it does not provide exact
96
Bossuyt & Roskam
estimates of the ideal and alternative coordinates. Instead, the maximum likelihood unfolded order or unfolded midpoint order can be used to define a solution space for the coordinate values. However, other models acquire this advantage at the cost of strong parametric assumptions. Bechtel (1968) for example has specified a model much the same as ours, but he assumes that the distribution function is normal. As Sixtl (1973) has argued, this assumption is likely to be violated in most choice situations. A disadvantage of all existing probabilistic unfolding models in both categories is that they assume that there exists a joint metric. There is considerable evidence that this condition is not always met in practice. Sherif and Sherif (1967, 1969) for example demonstrated that there are situations in which a joint unfolded order exists without an intersubjective agreement on the interalternative dissimilarities. The procedure presented in this paper offers a way out of this difficulty by presenting the user with the choice between two assumptions: the existence of a joint unfolded order versus the existence of a joint unfolded midpoint order. If necessary a generalized likelihood ratio test with approximate size can be used to test the corresponding hypotheses on the binary choice probabilities. It is not difficult to extend the applicability of the approach proposed in this paper to related theories on probabilistic unidimensional unfolding, or probabilistic choice theories not involving the concept of an ideal alternative. In fact, any theory for which necessary ordinal conditions on the choice probabilities can be formulated lends itself to this strategy. This approach has been successful in a series of experiments designed to evaluate the appropriateness of probabilistic unidimensional unfolding models for paired comparisons data (Bossuyt & Roskam, 1985).
References Bechtel, G. G. (1968). Folded and unfolded scaling of preferential pair comparisons. Journul of Mathematical Psychology, 5, 333-357. Bossuyt, P. M. (1987). An algorithm for finding the maximum likelihood estimates of partially ordered binomial probabilities. Unpublished internal report, Mathematical Psychology Group, K. U. Nijmegen. Bossuyt, P. M., & Roskam, E. E. (1985). A nonparametric test of
Maximum Likelihood Unidimensional Unfolding
97
probabilistic unfolding models. Paper presented at the 4th European Meeting of the Psychometric Society, Cambridge, Great Britain. Bossuyt, P. M., & Roskam, E. E. (1987). Testing probabilistic choice models. Communication & Cognition, 1, 5- 16. Chernikova, N. V. (1965). Algorithm for finding a general formula for the non-negative solutions of a system of linear inequalities. U.S.S.R. Computational Mathematics and Mathematical Physics, 5, 228-233. Coombs, C. H. (1950). Psychological scaling without a unit of measurement. Psychological Review, 57, 145-158. Coombs, C.H. (1964). A theory of data. New York: Wiley. Coombs, C. H., Greenberg, M., & Zinnes, J. (1961). A double law of comparative judgment for the analysis of preferential choice and similarity data. Psychometrika , 26, 165-171. Croon, M. (in press). A comparison of statistical unfolding models. Psychometrika. DeSarbo, W. S., & Hoffman, D. L. (1986). Simple and weighted unfolding threshold models for the spatial representation of binary choice data. Applied Psychological Measurement, 10, 247-264. Dijkstra, L., Van der Eijk, C., Molenaar, I. W., Van Schuur, W. H., Stokman, F. N., & Verhelst, N. (1980). A discussion on stochastic unfolding. Methoden & Data Nieuwsbrief, 5 , 158-175. Greenberg, M. G. (1965). A method of successive cumulations for the scaling of pair-comparison preference judgments. Psychometrika, 30, 44 1-448. Jansen, P. G. W. (1981). Spezifisch objektive Messung im Falle nichtmonotoner Einstellungsitems. Zeitschrift f i r Sozialpsychologie, 12, 169-185. Ramsay, J. 0. (1980). The joint analysis of direct ratings, pairwise preferences and dissimilarities. Psychometrika, 45, 149- 166. Robertson, T., Wright, F. T., & Dykstra, R. L. (1988). Order restricted statistical inference. New York: John Wiley & Sons. Roskam, E. E. (1987). ORDMET3: An improved algorithm to find the maximin solution to a system of linear inequalities. Internal report 87 MA 06, Mathematical Psychology Group, K. U. Nijmegen. Schonemann, P. H., & Wang, W. M. (1977). An individual differences model for the multidimensional analysis of preference data. Psychometrika, 37, 275-309.
98
Bossuyt & Roskam
Sherif, M., & Sherif, C. W. (1967). The own categories procedure in attitude research. In M. Fishbein (Ed.), Readings in attitude theory and measurement. New York : John Wiley & Sons. Sherif, M., & Sherif, C. W. (1969). Social psychology. New York: Harper & Row, Tokyo: Weatherhill. Sixtl, F. (1973). Probabilistic unfolding. Psychometrika, 38, 235-248. Zinnes, J. L., & Griggs, R. A. (1974). Probabilistic multidimensional unfolding analysis. Psychometrika, 39, 327-350.
New Developments in Psychological Choice Modeling G. De Soete, H. Feger and K. C. Klauer (eds.) 0Elsevier Science Publisher B.V. (North-Holland),1989
99
LATENT CLASS MODELS FOR THE ANALYSIS OF RANKINGS Marcel A. Croon Tilburg University In this papers several latent class models for the analysis of rank order data are developed and discussed. These models try to accommodate the rationale of individual choice models to the situation in which a large number of respondents is sampled from a non-homogencous population. By considering these individual choice models as statistical error theories, these models may be seen to fall within the domain of general latent structure analysis and as such, they may provide a viable alternative to the more traditional scaling methods for the analysis of rankings.
1. Introduction For the analysis of rank order data several more or less traditional methods are available. A first very broad class of data analysis methods belongs to the domain of what commonly is called scaling techniques and encompass various methods which all essentially aim at a geometrical or pictorial representation of the data. The first class of methods can further be subdivided in two subclasses depending upon the geometric model on which these methods are based. Here we refer of course to the well known distinction between vector and distance models. Unfolding analysis belongs to the subclass of the distance models, since its main objective is to represent subjects and stimuli as points in a joint space in such a way that the rank order of the distances between one particular subject point and the stimulus points optimally reflects the observed preference ranking of the stimuli by the corresponding subjects. Vector This paper is a revised version of an article published in Zeitschrifr fur experimentelle und angewandte Psychology, 1988, 35, 1-22.
100
Croon
models, on the other hand, usually represent subjects by means of vectors or directions in the joint space while the stimuli remain mapped upon points. In these models the orthogonal projections of the stimulus points on the subject vectors are assumed to be related to the observed evaluation scores or rankings. Quite often these geometric scaling models succeed in adequately representing and summarizing the essential information in the data. Occasionally however, there arise situations in which these scaling methods seem less attractive and appropriate. This is for instance the case when a large sample of respondents is asked to rank a small number of stimuli on an evaluation criterion. In such a situation it is very likely that almost all or at least a majority of all possible rankings will indeed occur in the sample. As will be shown in a later section of this paper, distance or vector scaling models have some difficulties in adequately representing such abundant data, notwithstanding the fact that, when fitting a scaling model, a large number of parameters is estimated. In these situations, some relief may eventually be given by use of alternative methods, some of which will be developed in this paper. A second class of methods is more closely connected with the interest mathematical psychologists and economists have shown for the development of individual choice models. A landmark in this tradition is undoubtedly Luce’s (1959) monograph Individual choice behavior, in which, starting from a not too unacceptable axiom, the author derives a fairly simple unidimensional choice model. The same model was described some years earlier in a more statistically oriented way by Bradley and Terry (1952). Even much earlier, the German set-theoretician Zermelo (1929) gave it some considerations in solving some chess tournament problems. (Amazingly enough, quite recently another settheoretician (Jech, 1983) arrived in a seemingly independent way at the same model when analyzing similar tournament problems.) Although the BTL model, as it is known since then, has been used primarily for the analysis of paired comparison data, it can easily be adapted for the analysis of rank order data. Luce (1959) himself devoted some pages of his monograph to this extension, but his remarks were mainly of a theoretical nature. Similar theoretical remarks on the analysis of rank order data by means of individual choice models can be found in Block and Marschak (1960) and in Luce and Suppes (1965). From a more statistical point of view, the adaptation of the basic rationale of the BTL
Latent Class Models for the Analysis of Rankings
101
model to the analysis of rankings was treated by Pendergrass and Bradley (1960), Fienberg and Larntz (1976) and Beaver (1977). A related reference is Plackett (1975). As will be shown in the next section, all these models for the analysis of rankings lead to still manageable expressions for the ranking probabilities in terms of a small number of parameters which represent the stimulus scale values on an underlying unidimensional continuum. Due to the relative simplicity of the these expressions the maximum likelihood estimates of the unknown parameters can be determined for most data sets by a rapidly converging algorithm such as the Newton-Raphson procedure. After obtaining these estimates, various statistical tests can be performed in order to determine whether the proposed model provides an acceptable fit for the data at hand. Unfortunately, it is precisely at this point that most users will be disappointed with the final result. Quite frequently, and especially so in the case of large samples of respondents, these statistical tests will indicate a very bad model fit, necessitating the conclusion that ultimately the proposed model does not apply to the data. The reason for this unfortunate state of affairs is however easy to give. The BTL model is a model for individual choice behavior. If we apply this model to the rankings observed in a random sample of respondents from a particular population, we implicitly assume that all members of this population perceive and evaluate the stimuli in essentially the same way. This strong assumption of complete homogeneity in the population is certainly untenable in social psychological applications of the BTL model. People consistently differ in their stimulus evaluations and any analysis, which does not leave room for these interindividual differences to show up, is doomed to fail and to misrepresent the data. In this paper an attempt is made to accommodate the rationale of the BTL model to the case when respondents are sampled from a nonhomogeneous population by linking this choice model to latent class analysis. Originally, latent class analysis, and latent structure analysis in general, was proposed by Lazarsfeld (see e.g., Lazarsfeld & Henry, 1968) to explain associations between observed variables in terms of unobserved latent variables. In our application of it we assume that the nonhomogeneous population can be partitioned in several subpopulations, each of them being homogeneous with respect to the stimulus evaluations. In this way each subpopulation defines a latent class which is
Croon
102
characterized by a particular set of stimulus scale values.
2. Latent Class Models for Rankings We first introduce some notation. Since we assume in the sequel that only a finite number n of stimuli are used in a ranking experiment, we may represent these stimuli by the first n natural numbers. Hence, if S denotes the stimulus set, we have
S = {1,2,.
. . , i , . . . , n}.
Furthermore, since we will only consider the case in which the subjects are required to rank the entire set of stimuli on some evaluation criterion, the ranking given by any particular subject may be represented by an ordered n-tuple r:
r = ( r l r r 2 , . * . , rk, *
6
*
,rn),
in which r l is the number of the stimulus ranked first by the subject. In general, rk is the stimulus occupying the k-th position in the subject’s ranking. The probability that a randomly selected subject will given the ranking r will be represented by pr or, if necessary, more explicitly by p(r1J-2, *
* *
,rfl).
Next we will discuss two adaptations of the basic BTL model to the ranking task. Both models lead to manageable expressions for the ranking probabilities pr in terms of some unknown parameters, which may be interpreted as the stimulus scale values. Since the first ranking model we will discuss is related to the strict random utility formulation of the BTL model, we will from now refer to it as the strict utility ranking (SU for short) model and since our second ranking model is based on the model proposed by Pendergrass and Bradley (1960) for the analysis of mple rankings, we will refer to it in the sequel as the Pendergrass-Bradley (PB for short) model.
2.1 The Strict Utility Ranking Model A first adaptation of the BTL model starts from the well-known observa-
tion that the BTL model is compatible with a particular random utility model. Yellott (1977) is one of the most relevant references in this
Latent
Class Models for the Analysis of Rankings
103
respect. Suppose that the presentation of an arbitrary stimulus i results on the part of the subject in a subjective impression or evaluation, the strength of which may be represented by a real number ui. This real number is not to be considered as an unknown constant, but as a realization of a random variable Ui. Then, the BTL model is compatible with the random utility model which assumes that the random variables Ui follow independent double-exponential distributions with constant scale parameter but with differing location parameters, which correspond to the stimulus scale values. Since we may assume, without loss of generality, that the constant scale parameter is equal to one, this random utility model leads to the following density function for the random variable Ui:
- ai)]}.
f(~i= ) exp{-(ui - ai) - exp[-(ui
In this expression the location parameter ai represents the scale value of stimulus i. If we denote, for paired comparisons data, the probability that stimulus i is preferred to stimulus j by p (i,j), we may derive p ( i , j ) = Prob (Ui 2 U,) =
e' e4.
+eaj
Note that in this and also in the following derivations the assumption that the different random variables involved are independent of each other is crucial. If p ( i , J , k ) denotes the probability that in a ranking task with only three stimuli the ranking ( i , j , k ) will be given, then we obtain under this random utility model: p (i,j , k ) = Prob (Ui2 U,2 U,)
These results are well-known and can be found for instance in Bradley (1965) and in Yellott (1980). In the general case of a ranking task which involves n stimuli, we may derive the following expression for the ranking probabilities pr :
Croon
104
p r = Prob(U,, 2 Url 2
-
* *
2 Urm)
In order to elucidate the true nature of this at first sight impressive expression, we give some concrete versions of it for the case of II = 4 stimuli. Then, for instance,
and
These expressions illustrate the role played by Luce's choice axiom is their derivation. For instance, the last expression shows that the probability of the ranking (3,1,4,2) can be thought of as the product of the probabilities that a particular item will be selected from a set of available alternatives. The first term in this product corresponds to the probability that item 3 will be chosen from the set { 1,2,3,4}; the second term represents the probability that item 1 will be chosen from {1,2,4} and finally, the third term is the probability that item 4 will be chosen from {2,4}. In this model for ranking probabilities, it is implicitly assumed that the ranking of the stimuli takes place by means of a sequence of selections of items from sets of alternatives which remain available at each choice point. Furthermore, at each selection point the choices are assumed to be governed by the same set of stimulus values.
2.2 The Pendergrass-Bradley Model A second approach to the adaptation of the BTL model to the analysis of
rankings has been proposed by Pendergrass and Bradley (1960), who however only discuss the case of mple rankings (Le., n = 3) extensively. These authors assume there exist strictly positive real numbers vi such that
Latent Class Models for the Analysis of Rankings
105
in which s = v: ( v 2 + v3) + v; (v 1 + vg)+ v: (v 1 + v2). One easily sees that s equals the sum of the v:vj terms over all permutations of the symbols i, j and k. In this model the ranking probability p ( i , j , k ) is given as the product of three paired comparisons probabilities: P ( i , j , k ) = P ( i d * p( L k ) * PW).
By defining ai = lnvi, this model can be reparametrized as follows: p (i,j,k) =
exp(2ai + a,) S
Using the assumption that a ranking probability can be defined as the product of the paired comparisons probabilities which are induced by the ranking, the generalization of this model to the case in which n stimuli are to be ranked leads to the following expression for the ranking probability:
1
in which s again is the sum of all terms which occur in the numerator of some ranking probability. This sum is taken over all permutations of the stimulus indices. A possible advantage of the Pendergrass-Bradley approach resides in the fact that, as Fienberg and Larntz (1976) have shown, it allows for a log linear representation so that its theoretical analysis and its practical application may benefit from the general results available from the theory of log linear models. As described so far, both ranking models are yet unidentified since the scale values are defined except for a translation along the real axis. One usually solves this identification problem by imposing the following linear constraint on the scale values: n
C ai = 0. i=l
106
Croon
This constraint fixes the origin of the scale at zero. In the sequel we will always assume implicitly that this type of restriction has been imposed on the scale values. This leaves n - 1 independent scale values to be estimated.
2.3 Estimating the Parameters For a given set of observed rankings, both models for the analysis of rankings allow the determination of the maximum likelihood estimates of the stimulus scale values by means of Newton-Raphson iteration procedure. In this paper we will not dwell on the technical aspects of this procedures. It suffices here to say that in all our applications of this procedure to real data, the algorithm converged very rapidly. Even in the case of very bad starting values for the unknowns, convergence was generally reached in fewer than 10 iterations. However, it should be stress that the Newton-Raphson estimation algorithm only converges if maximum likelihood estimates exist. As an example of a situation in which these estimates do not exist, consider the case when, with n = 4 stimuli, only the following rankings are observed: (1,2,3,4), (1,2,4,3), (2,1,3,4), and (2,1,4,3). In this example the subset {1,2} dominates the subset {3,4} in the sense that each item from the dominating subset is always ranked before each item from the dominated subset. In such cases the maximum likelihood function achieves its maximum at the boundary of the parameter space: the scale values of the items in the dominating set tend to plus infinity, whereas the scale values of the items in the dominated set tend to minus infinity. So, in order for the maximum likelihood estimates to be defined, no dominating subsets of items should exist. For a similar condition in the case of paired comparisons, see Mattenklott, Sehr and Mieschke (1982).
2.4 Latent Class Models for Rankings Both versions of the BTL model for rankings can be used in the formulation of a latent class model for the analysis of rankings from nonhomogeneous populations. Basic to these latent class models is the assumption that the non-homogeneous population can be divided into a set of T homogeneous subpopulations or latent classes, each of them characterized by a distinctive set of stimuli scale values which are assumed to govern the ranking choices of the respondent belonging to that particular
Latent Class Models for the Analysis of Rankings
107
class. So, instead of one set of scale values, we now have T sets of scale values which, in due course, have to be estimated from the data. If we denote the probability that ranking r is given within latent class t by prt, then both expression (1) and (2) can easily be adapted to accommodate for the existence of different latent classes. The analogue of expression (1) becomes
(3)
whereas for expression (2) we have
rn-l
1
If we denote the probability that a randomly selected subject belongs to latent class t by q,we obtain the following expression for the probability pr that ranking r is observed when sampling is from the entire population: T Pr
=
W r t .
t=l
Obviously, the parameters nt satisfy the following constraint:
cT Kl = 1. t=l
As a consequence, the total number of independent parameters to be estimated equals (n - l)T + (T - 1 ) = nT - 1. A necessary condition for the latent class model to be identified is that the number of independent unknowns is smaller than, or equal to, the number of independent rankings: nT-lIn!-l
or T < ( n - l ) ! .
So, for instance, for n = 4, the number of latent classes should be smaller than or equal to 6. However, this condition is by no means sufficient to ensure identifiability of the model.
108
Croon
The estimation of the unknown scale values and of the latent class probabilities would not pose any new difficulty if we only knew which latent class each respondent belonged to. If this were the case, then we could determine for each class how frequently each ranking was generated by subjects belonging to it and on the basis of these observed frequencies frt all unknown parameters could be estimated. Unfortunately, latent class membership is, in our context, an unobserved variable, which implies that the data at our disposal should be considered as “incomplete data” in the sense as defined by Dempster, Laird and Rubin (1977). Consequently, the estimation of the unknown parameters should preferably proceed by means of the EM algorithm proposed by these authors. In this algorithm each iteration consists of two steps: an E-step and an M-step. During the E-step the missing data are estimated on the basis of the observed data and of the currently available provisional estimates of the model parameters. During the M-step maximum likelihood estimates of the model parameters are determined again using the completed data resulting from the preceding E-step. By alternating E- and M-steps a sufficient number of times, one may hope to achieve convergence to the global maximum likelihood solution. Although for relatively simple estimation problems, which are characterized by concave likelihood surfaces, the EM algorithm usually converges to the global maximum, such a reassuring observation cannot be made for more involved estimation problems for which the likelihood surface may have several local maxima. In these latter cases, different runs of the estimation procedure, each run starting from different initial parameter values, may give some comfort to the user, provided that these different runs converge to what seems to be essentially the same final solution. Another remark which is of some relevance concerns the rate with which the EM algorithm converges. In general, convergence is quite slow, at least if the rate of convergence is measured by the number of iterations required before the convergence criterion is reached. In our case at hand, the E-step of the EM algorithm consist of the determination of the frequencies frtwith which each ranking r is observed within each latent class t. During this step the provisional estimates ai, of the stimulus scale values are used to determine the ranking probabilities prt. Depending upon which choice model is implemented, expression (3) or (4) is used in this respect. Then the following weights wrt are computed:
Latent Class Models for the Analysis of Rankings
109
in which the provisional estimates xt of the latent class probabilities are used. These weights represent the conditional probabilities that a particular ranking r originated from latent class t. Finally, the unobserved fiequency frt is estimated by frt
= wrt*fr
where fr is the observed frequency of ranking r in the entire sample. During the M-steps of our iterative procedure, the maximum likelihood estimates of the stimulus values and of the latent class probabilities are determined again. The new estimates of the latent class probabilities are easily computed in the following way
in which the summation runs over all rankings. The determination of the new scale values is of course somewhat more involved since it requires a Newton-Raphson iteration procedure for each latent class apart. By alternating the E- and M-steps a sufficiently large number of times one may hope to reach in the end the maximum of the likelihood function. In our implementation of the EM algorithm two stop criteria were used. In the first place the iteration process was discontinued whenever the difference between two successively evaluated log likelihoods was smaller than and secondly, the iteration process was stopped after 250 iterations. If in the latter case there was an indication that the likelihood function still might increase substantially, a new iteration process was performed, starting from the previously obtained estimates of the parameters. Moreover, if there was any suspicion that the final solution might represent a local maximum of the likelihood function, a new iteration process was performed, starting from different initial estimates of the parameters. In the preceding pages, we tacitly assumed that the number T of latent classes was known beforehand. From a practical point of view this is certainly never the case. Instead, one would rather consider the parameter T
110
Croon
as an additional unknown to be estimated by the analysis. An obvious way to proceed in this respect is by means of statistical model tests. This amounts to the estimation of the model parameters under different hypotheses on the number of latent classes and the subsequent comparison of the value of the log likelihood function for each model with the value of that function under an appropriate null model. In the case of a latent class analysis of rankings, the appropriate null model assumes that the n ! different rankings define the categories of a multinomial random variable, implying that for each ranking r its theoretical probability p r can be estimated by f , l N . If we denote the value of the log likelihood function under this null model by F o and the value of the log likelihood function under the model with t latent classes by Fi, then, standard results from the theory of log likelihood ratio tests imply that, under the hypothesis that t latent classes suffice to explain the data, the test statistics
follows asymptotically a chi square distribution with degrees of freedom equal to n ! - nt. Large values of this test statistic lead to the rejection of the hypothesis of t latent classes. In this case one should repeat the analysis with (t + 1) latent classes. As the final estimate of T one takes the smallest value of t for which the test statistic became nonsignificant. Although the rational of this procedure to estimate T, apart from the often overlooked fact that actually a sequential estimation procedure is being carried out, seems impeccable, the direct dependence of the test statistic on the sample size renders the conclusions based on it somewhat insecure. For large samples, the test procedure becomes so powerful as to reject any low dimensional model. For a latent class analysis this generally implies that only for sufficiently large values of t the ensuing test statistics will turn out to be nonsignificant. Similar problems have been encountered with for instance covariance structure analysis, where also in large samples any model tends to be rejected as inadequate. As a response to these difficulties Joreskog (1978) and Bentler and Bonett (1980) have recommended to perform hierarchically nested model tests which in their opinion might be more informative that the tests that compare each model with the saturated null model. In our context this amounts to testing the hypothesis of t latent classes against the hypothesis o f t + 1 latent classes by means of the test statistic
Latent Class Models for the Analysis of Rankings
111
which is asymptotically chi square distributed with n degrees of freedom.
3. Some Numerical Examples We will illustrate our latent class model by analyzing a data set from the comparative cross-national study “Changing mass publics” which is described in Barnes et al. (1979). The present author would like to thank Dr.F. Heunks of the Sociology Department of the University of Tilburg for making these data available. In this study respondents from five Western countries were asked to rank the following four political goals according to their desirability: 1. Maintain order in the nation;
2. Give people more say in the decisions of the government;
3. Fight rising prices; 4. Protect freedom of speech.
In the rest of this paper we will only use the data from the German sample in which N = 2262 respondents gave a complete ranking of the four items. Table 1 contains the 24 possible rankings together with their observed frequency of occurrence. For clearness’ sake, we stress the fact that the rankings run from more to less desirability. The inclusion of this particular ranking task in the study was inspired by Inglehart’s theory on value orientations (see e.g., Inglehart, 1979). This theory draws a distinction between a materialistic and a post-materialistic value orientation. Persons characterized by a materialistic value orientation are supposed to care primarily about social and economic stability and security, whereas post-materialistically oriented persons rather emphasize the humane and spiritual aspects of social life. If asked to rank the four political goals on a desirability criterion, materialists can be expected to give precedence to the items 1 and 3 from the list, whereas for post-materialists the items 1 and 3 should occupy the first positions in the ranking. It is obvious that for this quite simple ranking task the assumption of complete population homogeneity is untenable. If Inglehan’s theory on value orientations is correct, one may expect at least two different latent classes, each of them
112
Croon
Table 1. Observed frequencies of the 24 rankings in the German sam-
ple No.
Ranking
Frequency
No.
Ranking
Frequency
1 2 3 4 5 6 7 8 9 10 11 12
1234 1243 1324 1342 1423 1432 2134 2143 23 14 234 1 2413 243 1
137 29 309 255 52 93 48 23 61
13 14 15 16 17 18 19 20 21 22 23 24
3124 3142 3214 324 1 3412 342 1 4123 4132 4213 423 1 4312 4321
330 294 117 69 70 34 21 30 29 52 35 27
55
33 59
corresponding with one of the ideal-typic value orientations. We first discuss the results of our analyses based on the strict utility model. Table 2. Model fit tests for the analyses based on the SU model ~~~
T
L
df
a
1 2 3 4
315.05 84.32 23.68 10.59
20 16 12 8
0 0 0.022 0.226
Table 2 summarizes the corresponding model fit tests for the analyses starting with T = 1 and continuing till T = 4. This table contains the log likelihood ratio statistic L, the corresponding degrees of freedom and the associated a-level for each of the analyses. From this table we infer that the hypothesis T = 3 should be rejected at a = 0.05 but not at a = 0.01, whereas the hypothesis T = 4 cannot be rejected at a = 0.05. In the
Latent Class Models for the Analysis of Rankings
113
sequel we will restrict ourselves to a discussion of the solution with three latent classes. Table 3 contains the parameter estimates for this case. Table 3. Parameter estimates for the SU model with T = 3 latent classes ~~~
Parameter
Latent Class 1
a 11 a2 a 31 a 4r
1.99 -0.92
0.06 -1.13 0.33
2
3
0.59
-0.69 0.63
-1.07 1.73 -1.25 0.45
-0.01 0.07 0.22
From the table we note that the estimates of the stimulus scale values are quite similar in the classes 1 and 2. These two classes conform to our expectations as to how materialists should evaluate the stimuli. Both classes are clearly characterized by a rejection of the post-materialist items. They differ with respect to which materialist item is emphasized: in latent class 1 item 1 is preferred to item 3 whereas in latent class 2 the reverse is the case. Since item 1 seems to tap “law and order” sentiments whereas item 3 is more concerned with problems of economic stability, the following very bold and tentative hypothesis may be formulated. At least in Germany, Inglehart’s conception of a fairly homogeneous block of materialists, needs revision; instead, a distinction should be drawn between people who are primarily concerned about economic stability and people who give precedence to problems of social stability. On the other hand, latent class 3 may be identified as the post-materialist class although this characterization is not as pure as one might wish. The materialist item 3, which is indeed very popular in the entire German sample, still scores rather high in this third class. Finally, we note that only 22 percent of the respondents are estimated to belong to this third class, whereas the remaining 78 percent are distributed over the two materialist classes, underlining the strong concern with problems of social and economic stability in Germany. Of course, the latter conclusion could also have been reached by inspecting the original data as given Table 1 and counting, for instance, the number of times each item occupies the
114
Croon
Table 4. Model fit tests for the analyses based on the PB model T
L
df
a
1 2 3 4
284.5 1 3 1.79 25.88 14.12
20 16 12 8
0 0.01 1 0.01 1 0.079
first position in the ranking. We next turn to a similar discussion of the results obtained by the analysis based on the PB model. Table 4 summarized the model fit tests for this model. As 'may be seen from this table, the successive model fit tests for this model result in a somewhat less clear picture than it was the case for the SU model. First or all we note that for this model even the hypothesis T = 2 could not be rejected at the 1% level and that moreover the incremental fit, gained when moving from T = 2 to T = 3, is not significant at the 5 % level: L = 5.9098 with 4 degrees of freedom. However, the incremental fit obtained by moving from T = 3 to T = 4 is highly significant at the 1% level: L = 11.7673 with 4 degrees of freedom. So the decision as to which number of latent classes to retain is an uneasy one, due to the relatively bad fit provided by the analysis with three latent classes. Two possible explanations for this fact may be given. In the first place it cannot be excluded that the analysis with three latent classes did not yet converge to the optimal maximum likelihood solution, despite the fact that approximately 1200 EM iterations were performed. Such a situation might arise if the likelihood surface is very flat in the neighborhood of the global maximum. A second explanation is that the estimation procedure converged to a local maximum of the likelihood function. However, several runs of the estimation procedure, each starting from different initial estimates, were performed and all runs resulted in essentially the same final solution. Moreover, as we shall discuss within a few moments, the solution with T = 3 for the PB model was quite similar to the corresponding solution for the SU model. Table 5 contains the parameter estimates for the PB analysis with three latent classes. The similarity between the PB and the SU solutions is striking. Once again we obtain
Latent Class Models for the Analysis of Rankings
115
Table 5 . Parameter Estimates for the PB model with T = 3 Latent Classes
Parameter a 1r a 21 a3t
a 41 Kt
Latent Class 1
2
3
0.64 -0.48 0.19 -0.35 0.26
0.55 -0.47 0.72 -0.80
-0.75 0.57 -0.06 0.24 0.15
0.59
two materialistic classes, both characterized by a rejection of the postmaterialistic items and differing from each other with respect to which materialistic item is given prominence. Furthermore, the third latent class of the PB solution also seems to capture the post-materialistic value orientation. The major difference between the SU and the BP solution has to do with the estimates of the latent class probabilities. In the PB analysis only 15% of the respondents are estimated to belong to the postmaterialistic class, whereas the corresponding figure in the SU analysis was 22%. Moreover, under the PB model the distribution of the materialists over the two materialistic classes is more uneven than it is under the SU model. A few additional comments on the solutions obtained with for two and four latent classes may be in order here. First of all we note that in all these cases the SU and the PB solutions were very similar to each other. From the analysis with two latent classes, a pronounced materialistic and a clear post-materialistic class emerged. The analysis with four latent classes resulted for each of the models in a solution in which, in addition to two materialistic classes and one post-materialistic class all three similar to those obtained to those obtained by the analysis with three latent classes, a fourth latent class emerged in which the items 2 and 3 were highly evaluated. This class was probably called into existence to accommodate for the large popularity of item 3 in the German sample. At this point it may also be of some interest to compare our latent class analyses with the results of a classical unfolding analysis. To this end we analyzed our data set with the MINIRSA program from the
116
Croon
MDS(X) integrated series of scaling programs produced by A. Coxon and his collaborators from the University of Edinburgh. The MINIRSA program was originally developed by Prof. E. Roskam of the University of Nijmegen. The three-dimensional solution obtained by MINIRSA provided by a perfect fit to the data. However, the resulting geomemc representation of the stimuli and rankings can hardly be considered as informative, since in this solution the four stimuli were located at the vertices of a tetrahedron. As a matter of fact, the start configuration computed by MINIRSA immediately yielded this perfect solution. This result does not come as a surprise since it is well known that preferential choice data for n stimuli can always be unfolded perfectly in a joint space with n - 1 dimensions. Moreover, this perfectly fitting configuration can be constructed without any reference to the data whatsoever and it is in this sense that we may label it uninformative. On the other hand, the one-dimensional solution proved a very bad fit to the data: STRESS-HAT equal to 0.393 after 144 iterations; a result which did not come as a surprise either, since for four stimuli the unidimensional unfolding model can maximally account for seven rankings. So it seems that we have no other option than to retain the two-dimensional solution. For this solution STRESS-HAT was equal to 0.175 after 105 iterations. Since for four stimuli in two dimensions, maximally 18 different rankings can be accounted for by the unfolding model (see Table 7.1 in Coombs, 1964), we cannot expect a perfect representation of our data in two dimensions. But even then the MINIRSA solution was somewhat suboptimal since only 12 rankings were perfectly accounted for by the two-dimensional solution. These 12 rankings represented only 62% of all rankings in the German sample. This situation highlights a major problem encountered by many scaling techniques when they are applied to abundant data just as ours. If relatively few stimuli are used in an investigation and if moreover all logically possible response patterns occur in the sample, a low dimensional scaling solution seldomly provides an acceptable fit to the data, whereas a solution in a high dimensional space easily becomes uninformative. Figure 1 represents the two-dirnensionaI MINIRSA solution after an orthogonal rotation which was performed in order to let the first dimension optimally correspond to the opposition between materialistic and post-materialistic items. Although it is extremely risky to interpret a
Latent
Class Models for the Analysis of Rankings
117
2341 3214
+1
3241
**
3
3421 3123 '3412
..231 4
21 34
2431
4321
.
4312
.
31 42
*2
4231
I
+1
-1
241 3
*4
.
134-2
421 3 1324 1234
'*I
413;
41 23
9
1432
-1
1423
1243
.
2143
-2 Figure 1. Two-dimensional MINIRSA solution. (The rankings are
represented by correspondingly labeled points; the items are represented by astcrisks.)
configuration which consists of only four points, we may try to relate the
118
Croon
results of the unfolding analysis to the results obtained by the latent class analyses. The most striking feature in Figure 1 is perhaps the fact that item 1, item 3 and the cluster composed of items 2 and 4 are approximately located at the vertices of an equilateral triangle. This stimulus configuration is undoubtedly the best fitting two-dimensional projection of the tetrahedron obtained by the procedure which computes a start configuration for the MLNIRSA analysis. In Figure 1 we easily recognize the contrast between the materialistic and the post-materialistic items. Due to the orthogonal rotation we performed on the final MINIRSA solution, this contrast defines the first dimension of the configuration. The second dimension on the other hand shows a contrast between the two materialistic items. In a certain sense this particular triangular pattern of the stimulus configuration agrees with the results of our latent class analyses in which three latent classes were retained. The fact that the latent class analyses resulted in two distinct materialistic classes seem to be reflected in the unfolding analysis by the opposition between the materialistic items along the second axis, whereas the contrast between the materialistic and post-materialistic items along the first axis corresponds to the distinction between the two materialistic and the one post-materialistic classes. 4. Discussion
In this paper we developed two latent class models for the analysis of rankings. Basic to these models is the assumption that a nonhomogeneous population can be broken down in a set of subpopulations which are homogeneous with respect to the way in which the stimuli are evaluated. For each subpopulation or latent class a probabilistic choice model is assumed to hold. These choice models, which generally associate a scale value with each stimulus, can be considered as a formulation of a stochastic error theory which may explain inconsistencies in the rankings generated by different members of the same subpopulation. In this respect latent class analysis of ranking data falls within the broad domain of general latent structure analysis in which observed variables are treated as imperfect operationalizations or indicators of underlying, theoretically relevant latent variables. Moreover, our approach implies that the dismbution of the respondents within each latent class over the different
Latent Class Models for the Analysis of Rankings
119
rankings can be viewed as a parametric multinomial distribution, showing that the latent class approach to the analysis of rankings may also be considered as specific instance of a finite mixture problem (Redner & Walker, 1984). Without doubt, a latent class approach is relevant for the analysis of rankings obtained in large samples of respondents, especially when only a relatively small number of stimuli are used in the ranking task. In such a situation, one may expect that almost all logically possible response patterns do occur in the sample, a fact which may lead to quite suboptimal or even uninformative results if the more traditional scaling techniques are used. Furthermore, we presume our approach to be so some interest for the analysis of data from a ranking experiment in which the subjects only have to select and to rank a limited number of stimuli from a larger set of available alternatives, the so-called rank kln data in the terminology of Coombs (1964). However, there still remain some substantial problems to solve in our approach. First of all, as has been noted already, the EM algorithm often converges very slowly and efficient numerical procedures to accelerate the convergence process should be sought for. On a more theoretical level a comparison of the different probabilistic choice models which may be implemented in a latent class model, should be carried out, eventually leading to still more general and flexible models.
References Barnes, S. H. et al. (1979). Political action. Mass participation in five Western countries. London: Sage. Beaver, R. J. (1977). Weighted least-squares analysis of several univariate Bradley-Terry models. Journal of the American Statistical Association, 72, 629-634. Bentler, P. M., & Bonett, D. G. (1980). Significance tests and goodness of fit in the analysis of covariance structure. Psychological Bulletin, 88, 588-606. Block, H. D.., & Marschak, J. (1960). Random orderings and stochastic theories of response. In I. Olkin, S . Ghurye. W. Hoeffding, W. Madow & H. Mann (Eds.), Contributions to probability and statistics (pp. 97- 132). Stanford: Stanford University Press.
120
Croon
Bradley, R. A. (1963). Another interpretation of a model for paired comparisons. Psychometrika, 30, 3 15-318. Bradley, R. A., & Terry, M. E. (1952). Rank analysis of incomplete block designs. I. Biometrika, 39, 324-345. Coombs, C. H. (1964). A theory of data. New York: Wiley. Dempster, A. P., Laird, N. M., & Rubin, D. M. (1977). Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society, Series B, 39, 1-38. Fienberg, S . E., & Larntz, K. (1976). Log linear representations for paired and multiple comparisons models. Biometrika, 63, 245-254. Inglehart, R. (1977). The silent revolution. Princeton: Princeton University Press. Jech, T. (1983). The ranking of incomplete tournaments: A mathematician’s guide to popular sports. American Mathematical Monthly, 90, 246-266. Joreskog, K. G. (1978). Structural analysis of covariance and correlation matrices. Psychometrika, 43. 443-477. Lazarsfeld, P. F., & Henry, N. W. (1968). Latent structure analysis. Boston: Houghton-Mifflin. Luce, R. D. (1959). Individual choice behavior. New York: Wiley. Luce, R. D., & Suppes, P. (1965). Preference, utility and subjective probability. In R. D. Luce, R. R. Bush, & E. Galanter (Ms.),Handbook of mathematical psychology (Vol. 3, pp. 97-132). New York: Wiley. Mattenklott, A., Sehr, J., & Mieschke, K. J. (1982). A stochastic model for paired comparisons of social stimuli. Journal of Mathematical Psychology, 25, 149-168. Pendergrass, R. N., & Bradley, R. A. (1960). Ranking in mple comparisons. In I. Olkin, S . Ghurye, W. Hoeffding, W. Madow & H. Mann (Eds.), Conrributions to probability and statistics (pp. 133-351). Stanford: Stanford University Press. Plackett, R. L. (1975). The analysis of permutations. Applied Statistics, 24, 193-202. Redner, R. A., & Walker, H. F. (1984). Mixture densities, maximum likelihood and the EM algorithm. SIAM Review, 26, 195-239. Yellott, J. I. (1977). The relationship between Luce’s choice axiom, Thurstone’s theory of comparative judgment and the double exponential dismbution. Journal of Mathematical Psychology, 15, 109-144.
Latent Class Models for the Analysis of Rankings
121
Yellott, J. I. (1980). Generalized Thurstone models for ranking: equivalence and reversibility. Journal of Mathematical Psychology, 22, 48-69. Zermelo, E. (1929). Die Berechnung der Turnierergebnisse as ein Maximumproblem der Wahrscheinlichkeitsrechnung. Mathematische Zeitschrift, 29, 436-460.
This Page Intentionally Left Blank
New Developments in Psychological Choice Modeling G . De Soete, H. Feger and K. C. Klauer (eds.) 0 Elsevier Science Publisher B.V. (North-Holland), 1989
123
THE WANDERING IDEAL POINT MODEL FOR ANALYZING PAIRED COMPARISONS DATA Geert De Soete University of Ghent, Belgium
J . Douglas Carroll AT&T Bell Laboratories, Murray Hill, NJ, U.S.A.
Wayne S. DeSarbo University of Michigan, U.S.A. A recently developed probabilistic multidimensional unfolding model for paired comparisons data is described. Unlike the stochastic multidimensional unfolding models previously proposed in the literature, the present model is a moderate utility model. After presenting the model in its most general form, some properties and special cases are discussed. Subsequently, some practical issues related to applying the model, such as parameter estimating and model testing, are addressed. Finally, an illustrative application is reported.
1. Introduction Ever since Coombs (1950, 1964) introduced the unfolding model for representing preferential choice data, attempts have been made to reformulate the model in a stochastic way. These attempts were motivated by the uncertainty and inconsistency that typically characterizes human choice behavior. Although it is in principle possible to develop probabilistic
Thc first author is supported as “Bevocgdverklaard Navorscr” of Ihc Belgian “Nationaal Fonds voor Wctenschappclijk Ondcrzoek”. This paper is a rcviscd version of an article published in Zeitschrifr fur Sozialpsychologie, 1987, 18, 274-281.
De Soete, Carroll, & DeSarbo
124
models accounting for first choices on sets consisting of more than two stimuli, almost all effort have been directed towards developing models for representing pairwise choice data that were obtained by means of the time-honored method of paired comparisons. While most probabilistic versions of the unfolding model were limited to the unidimensional case (Bechtel, 1976; Coombs, Greenberg, & Zinnes, 1961; Sixtl, 1973), a few attempts were undertaken to develop a probabilistic multidimensional unfolding model. Schonemann and Wang (1972; Wang, Schonemann, & Rusk, 1975) suggested a model in which the probability that subject i prefers stimulus j to stimulus k was defined as Pijk =
1 1 + eXp[-C(d:k
- d,?k)] ’
(1)
where dij denotes the Euclidean distance between the points representing subject i and stimulus j in an r-dimensional space. Since model (1) is based on the well-known Bradley-Terry-Luce (Bradley & Terry, 1952; Luce, 1959) model, it implies the strong stochastic transitivity condition which states that if Pijk 2112 and Pik1 2 112, then pijl 2 max(pijk,pikl).
(2)
A quite different multidimensional stochastic model was developed by Zinnes and Griggs (1974). In this model the coordinates of both the sub-
ject and the object points are assumed to be independently normally distributed with a common variance. When a subject is presented a pair of stimuli, he or she is assumed to sample for each element of the pair independently a point from his or her ideal point distribution. This leads to the following choice probability:
where F”(v1 ,v2,hl,h2) denotes the doubly noncentral F distribution with degrees of freedom v1 and v2 and noncentrality parameters hl and h2, and where di, now indicates the Euclidean distance between the mean point of subject i and the mean point of object j . As De Soete, Carroll and DeSarbo (1986) demonstrated, this model also implies strong stochastic transitivity.
The Wandering Ideal Point Model
125
Although empirical choice proportions sometimes do satisfy strong stochastic transitivity, there is strong empirical evidence (Becker, DeGroot, & Marschak, 1963; Coombs, 1958; Krantz, 1967; Rumelhart & Greeno, 1971; Sjoberg, 1977, 1980; Sjoberg & Capozza, 1975; Tversky & Russo, 1969; Tversky & Sattath, 1979) indicating that often pairwise choice proportions violate strong stochastic transitivity in a systematic way. Empirical choice proportions seem to be influenced not only by the difference in utility between the choice objects, but also, to some extent, by the similarity or comparability of the choice alternatives, even when the stimuli differ substantially in utility. Similar stimuli, on the contrary, tend to evoke more extreme choice proportions, even when the difference in utility is not that large. A less stringent condition which is usually satisfied by empirical choice data is moderate stochastic transitivity which states that if pijk 2 112 and Pik/ 2 112, then pi,^ 2 min(pijk,pikl).
(4)
It can be proved that any model of the form
where F is monotonically increasing with F ( x ) = 1 - F (-x), ui, the utility of stimulus j for subject i, and di,k a (semi-)metric on the set of choice objects for subject i, implies (4) but not necessarily (2) (Halff, 1976). A model of the form ( 5 ) is called a moderate utility model. Contrary to models implying (2), moderate utility models can account for the empirically observed similarity effects. In this paper we discuss a recently developed probabilistic multidimensional unfolding model, called the Wandering Ideal Point (or WIP for short) model @e Soete et al., 1986) which is, unlike the SchonemannWang and Zinnes-Griggs models, a moderate utility model. The WIP model is an unfolding analogue of the wandering vector model originally proposed by Carroll (1980) and further elaborated by De Soete and Carroll (1983). In the wandering vector model, each stimulus is represented by a fixed point in a multidimensional space, while each subject is represented in the same space by a vector emanating from the origin with a terminus that follows a multivariate normal dismbution. When a subject is presented a pair of stimuli, he or she samples a point from that
De Soete, Carroll, C? DeSarbo
126
distribution and chooses the stimulus that has the largest orthogonal projection on the vector from the origin in the direction of the sampled points.
2. The Wandering Ideal Point Model 2.1 General Formulation In the WIP model, both the subjects and the stimuli are represented as points in a joint r-dimensional space. Whereas the stimuli 1,2, . . . , M are represented by fixed points X I , x2, . . . , x ~ the , subjects are represented by random points. More specifically, a subject i (i = 1,N) is represented by a random point Yi which is assumed to follow a multivariate normal distribution
Yi
- N(Pi, xi).
(6)
It is assumed that the distributions of the N subjects points are independent of each other, i.e.,
COVar(Yi, Yi,) = 0 for i, i f = 1, N and i # if. According to the model, each time a pair of stimuli U,k) is presented to a subject i, he or she samples a point yi from Yi. Following Coombs’ unfolding model, the subject prefers stimulus j to k whenever
d(Yi,Xj) < d ( ~ i , x d
(7)
where d ( . ; ) denotes the Euclidean distance function, i.e.,
d2(yi,xj) = (yi - Xj>’(yi -
(8)
An illustration of the WIP model is shown in Figure 1. In the figure, the sampled point yi is closer to x, than to xk. Therefore, subject i would on this particular occasion prefer stimulus j to stimulus k. Since the subject always prefers the choice alternative that is closest to yi, yi can be considered as subject i’s ideal point. However, since each time a pair of stimuli is presented, a new point yi is sampled from Yi, a subject’s ideal point is not fixed, but “wanders” from trial to trial. Hence the name the wandering ideal point model.
The Wandering Ideal Point Model
127
Figure 1. Illustration of thc WIP model. The ellipse represents the random subject point.
By squaring both sides of (7) and rearranging terms, we obtain that subject i prefers stimulus j to k whenever (xk
- xj)’yi < (xk’xk - x j ’ x j ) / 2 .
(9)
Consequently, the probability that subject i prefers object j to k is
{
pijk = Rob (Xk
}
- X j ) ’ Y i < (Xk’Xk - X j ’ X j ) / 2 .
(10)
Since it follows from (6) that (xk
where
-xj)yi
- N((Xk
- xj)’pi,
6$k)
(11)
De Soete, Carroll, & DeSarbo
128
eq. (10) becomes
where @ denotes the standard normal distribution function. Equation (13) provides the general formulation of the WIP model. 2.2 Properties
It is easy to show that the WIP model is a moderate utility model. By defining uij
= xj’pi - X j ‘ X j l 2
(14)
eq. (13) can be rewritten as
Since as a covariance matrix Xi is always positive (semi-)definite, 6 i j k is a (semi-)metric and eq. (15) is of the form (5). That the choice probabilities defined by the W P model do not necessarily satisfy strong stochastic transitivity is readily demonstrated by means of a simple counterexample. Let
then pijk = 0.98 and pikl = 0.69, but pijl = 0.82. Figure 2, taken from De Soete et al. (1986), visualizes some of the properties of the WIP model. When the distances between x , and pi and between x k and pk (in the figure indicated as di, and dik respectively) are fixed, the probability that subject i prefers stimulus j to k varies as a function of the distance between x , and xk. This illustrates that extreme choice proportions are more likely to occur when the stimulus points are close, while distant object points are more likely to induce more moderate choice proportions.
The Wandering Ideal Point Model
129
1.o
x
0
+ .-
c3
3 U U
w
LL W LY
a 0.5 LL
0
>-
t
-I
m
--- dij - dij
6
m
0
a a
-.-. 0.0 0.00
= 1.20 dik = 2.00 = 1.50 dik = 2.00
dij = 1.80 d i k = 2.00
I
1.97 DISTANCE BETWEEN j AND k
3 $3
Figure 2. Probability of prefemng stimulus j to k as a function of the distance between j and k for fixed dij and dik (adapted from De Soete et al., 1986). 2.3 Degrees of Freedom The following parameters occur in the WE' model: the mean subject points pi, the subject covariance matrices Zi, and the stimulus points x,. Thus, with N subjects and M stimuli, the WIP model has in its general form as defined in eq. (13)
De Soete, Carroll, & DeSarbo
130 r
1
L
parameters. However, the model does not determine all these parameters uniquely. More specifically, the choice probabilities are invariant under the following family of transformations of the parameters: a. Translation of the subject and the stimulus points: Adding the same arbitrary r-component vector to all subject and stimulus points does not affect the choice probabilities. b. Central dilation of the subject and the stimulus points: Simultaneous transformations of the form
xi
+ axj
(j= 1,M)
Yi
+ aYi
(i = l,N),
where a is an arbitrary positive constant, leave the choice probabilities invariant. Note that
CLYj - N(api, a2Zj). c. Orthogonal rotation of the subject and stimulus points: Applying the same orthogonal rotation T to all stimulus and subject points does not affect the choice probabilities predicted by the model. Note that the distribution of TYi is
TYi
- N ( T p j , TZiT’).
Because of these indeterminacies, we must subtract r + 1 + r(r - 1)/2 from (16) (r for the translational indeterminacy, 1 for the scale indeterminacy, and r(r - 1)/2 for the rotational indeterminacy), in order to obtain the degrees of freedom of the general WIP model:
(M + N)r + Nr(r+ 1)/2 - r(r
+ 1)/2 - 1.
(17)
2.4 Special Cases
In empirical applications, it might be interesting to impose restrictions on the general WIP model, either to reduce the number of parameters to be estimated or to verify specific hypotheses. The validity of a hypothesis
The Wandering Ideal Point Model
131
can be tested statistically by comparing the fit of the restricted model with the fit of the general model. First of all, various kinds of restrictions can be imposed on the covariance matrices of the subject points. The Z i can for instance be constrained to be diagonal. Due to the rotational indeterminacy mentioned earlier, setting the off-diagonal elements of the covariance matrices equal to zero only imposes real constraints on the general WIP model when N > 1. The degrees of freedom of this constrained model are (M
+ 2N)r - r - 1.
(18)
Note that when N = 1, (18) equals (17. A more restrictive constraint which is effective even when N = 1, requires all X i to be identity matrices. In this case, the model has (N
+ M ) r - r ( r + 1)/2
degrees of freedom. Besides, or in addition to, constraining the covariance matrices Xi, various linear constraints could be imposed on the coordinates of the stimulus points in order to relate the stimulus point locations to known characteristics of the stimuli. Similarly, the mean subject points can be related to background information on the subjects by imposing appropriate linear restrictions on the ~ i . De Soete and Carroll (1986) consider the special case where it is supposed, in analogy with the factor analysis model, that the M stimuli have r (< M> dimensions in common and that, in addition, there is a specific dimension for each stimulus. The stimulus coordinates can therefore be written as
X* = (X I,) where X = (xl, . . . ,x,)' contains the coordinates of the M stimuli on the r common dimensions and I, is an identity matrix of order M. Assume that Y', the (r + M)-dimensional random point representing subject i, is distributed as follows
where ,Ot denotes an s by t matrix filled with zeros. Le., Y; is assumed
De Soete, Carroll, & DeSarbo
132
to have zero expectation and a variance of y? on each specific dimension. Now, since
-
( x i - xj”)’Y; N((Xk - xj)’pj, 6$k
+ 22),
the model becomes
3. Applying the WIP Model 3.1 Parameter Estimation
In order to apply the WIP model, one must dispose of replicated paired comparisons for one or more subjects (or groups of subjects). Maximum likelihood estimates of the model parameters can be obtained by maximizing
L
= nnp”(l- p . . N
M
i jck
V
)(Nijk-nij~)
lJk
(19)
where Ni;k denotes the number of times stimulus pair 0 , k ) was presented to subject i and ni$ the number of times subject i preferred j to k. De Soete et al. (1986) use a generalized Fisher scoring algorithm for maximizing log L. This amounts to iteratively applying the following updating rule till no further improvement is possible: e(4+1)
where
= e ( 4 ) + a(4)I(eC4))+g(e(q)>,
(20)
The Wandering Ideal Point Model
133
8 is a vector containing the parameters to be estimated, q is the iteration index,
a is a stepsize parameter, g is the gradient of log L:
I(€))is the Fisherian information matrix:
The classic scoring algorithm utilizes the regular inverse of the information matrix. Because the WIP model does not determine all parameters uniquely, the information matrix is not of full rank and has no regular inverse. Therefore, following Ramsay (1980), the Moore-Penrose inverse I(@+ is used. 3.2 Model Validation
One of the advantages of maximum likelihood estimation is that it enables statistical model evaluation in a straightforward way. Whenever a model o is subsumed under a more general model R, the null hypothesis that o fits the data equally well as R can be tested by means of the statistic A
.
u = -2log(L,/Ln)
i,
(19)
and i n denote the maximum of (19) for models w and R where respectively. U follows under the null hypothesis asymptotically a chisquare distribution with degrees of freedom equal to the difference between the degrees of freedom of model R and the degrees of freedom of model o. The most general model, referred to as the null model, against which the WIP model can be tested, only assumes that for each subject i and each pair of stimuli U,k)the data are sampled from a binomial distribution with probability pjjk. It is well-known that the maximum likelihood estimate of pi,k under this model is simply ni,k/Nj,k. When the goodness-of-fit of two non-nested models needs to be compared, one can resort to Akaike's (1977) information criterion which is defined for model w as
De Soete, Carroll, & DeSarbo
134 1
AIC, = 2 log 15,
+2~,,
where v, is the degrees of freedom of model o. The AIC statistic is a badness-of-fit measure that corrects for the gain in goodness-of-fit due to an increased number of free parameters in the model. The model with the smallest value for the AIC statistic is considered to give the most parsimonious representation to the data. 4. Illustrative Application
As an illustrative application, we report the WIP analyses carried out by De Soete et al. (1986) on a data set gathered by Rumelhart and Greeno (1971). Rumelhart and Greeno (197 1) obtained pairwise preference judgments from 234 undergraduates about nine celebrities. These celebrities consisted of three politicians (L. B. Johnson, Harold Wilson, Charles De Gaulle), three athletes (A. J. Foyt, Johnny Unitas, Carl Yastrzemski), and three movie stars (Brigitte Bardot, Sophia Loren, Elizabeth Taylor). The subjects were treated as replications of each other, so that the case N = 1 applies. Two versions of the WIP model were applied in two dimensions: the general model with a diagonal covariance mamx (which is in the case of N = 1 equivalent to using an unconstrained covariance mamx) and the WIP model with an identity matrix as covariance matrix. Both models were tested against the null model described in the previous section. The chi-square statistic for the general WIP model has 17 df and amounted to 9.8, while the chi-square statistic for testing the constrained WIP model has 19 df and amounted to 10.1. Both chi-square values are clearly nonsignificant, showing that both representations give a good account of the data. Since the WIP model with an identity covariance mamx is subsumed under the general WIP model, a likelihood ratio test can be performed to see whether the constrained model fits the data equally well as the more general model. The relevant chi-square statistic has 2 df and amounted to 0.3, which is clearly not significant. This implies that the ideal point appears to wander to an equal degree in all directions of the space. This two-dimensional solution is presented in Figure 3. As is apparent from the figure, the politicians, athletes, and movie stars clearly show up as identifiable clusters. The politicians constitute the most preferred group of celebrities, whereas the movie stars are generally preferred
The Wandering Ideal Point Model
135
to the athletes. For a further discussion of this application, and a comparison with analyses of the same data according to other models, we refer the reader to De Soete et al. (1986).
SOPHIA LOREN
b
TAyLCN
CENTROID IDEAL POINT L.B. JOHNSON b
BRIGITTE BARDOT
HAROLD WILSON 0
A.J. FOYT
CHARLES DE GAULLE
0
JOHNNY UNITAS
CARL YASTRZEMSKI
Figure 3. Representation of the Rumelhart and Greeno (1971) data according to the WIP model with identity covariance matrix.
References Akaike, H. (1977). On entropy maximization principle. In P. R. Krishnaiah (Ed.), Applications of statistics (pp. 27-41). Amsterdam: NorthHolland. Becker, G. M., DeGroot, M. H., & Marschak, J. (1963). Probabilities of choice among very similar objects. Behavioral Science, 8, 306-31 1. Bechtel, G. G. (1968). Folded and unfolded scaling from preferential comparisons. Journal of Mathematical Psychology, 5, 333-357. Bradley, R. A., & Terry, M. E. (1952). Rank analysis of incomplete block designs. I. The method of paired comparisons. Biometrika, 39,
136
De Soete, Carroll, & DeSarbo
324-345. Carroll, J. D. (1980). Models and methods for multidimensional analysis of preferential choice (or other dominance) data. In E. D. Lantermann & H. Feger (Eds.), Similarity and choice (pp. 234-289). Bern: Huber. Coombs, C. H. (1950). Psychophysical scaling without a unit of measurement. Psychological Review, 57, 145-158. Coombs, C. H. (1958). On the use of inconsistency of preferences in psychological scaling. Journal of Experimental Psychology, 55, 1-7. Coombs, C. H. (1964). A theory of data. New York: Wiley. Coombs, C. H., Greenberg, M., & Zinnes, J. (1961). A double law of comparative judgment for the analysis of preferential choice and similarities data. Psychometrika, 26, 165-171. De Soete, G., & Carroll, J. D. (1983). A maximum likelihood method for fitting the wandering vector model. Psychometrika, 48, 553-566. De Soete, G., & Carroll, J. D. (1986). Probabilistic multidimensional choice models for representing paired comparisons data. In E. Diday et al. (Eds.), Data analysis and informatics IV (pp. 485-497). Amsterdam: North-Holland. De Soete, G., Carroll, J. D., & DeSarbo, W. S. (1986). The wandering ideal point model: A probabilistic multidimensional unfolding model for paired comparisons data. Journal of Mathematical Psychology, 30, 28-41. Halff, H. M. (1976). Choice theories for differentially comparable alternatives. Journal of Mathematical Psychology, 14, 244-246. Krantz, D. H. (1967). Rational distance function for multidimensional scaling. Journal of Mathematical Psychology, 4 , 226-245. Luce, R. D. (1959). Individual choice behavior. A theoretical analysis. New York: Wiley. Ramsay, J. 0. (1980). The joint analysis of direct ratings, pairwise preferences, and dissimilarities. Psychornetrika, 45, 149-165. Rumelhart, D. L., & Greeno, J. G. (1971). Similarity between stimuli: An experimental test of the Luce and Restle choice models. Journal of Mathematical Psychology, 8, 370-38 1. Schonemann, P. H., & Wang, M.-M. (1972). An individual difference model for the multidimensional analysis of preference data. Psychometrika, 37, 275-309. Sixtl, F. (1973). Probabilistic unfolding. Psychometrika, 38, 235-248.
The Wandering Ideal Point Model
137
Sjoberg, L. (1977). Choice frequency and similarity. Scandinavian Journal of Psychology, 18, 103-115. Sjoberg, L. (1980). Similarity and correlation. In E. D. Lantermann & H. Feger (Eds.), Similarity and choice (pp. 70-87). Bern: Huber. Sjoberg, L., & Capoza, D. (1975). Preference and cognitive structure of Italian political parties. Italian Journal of Psychology, 2 , 391-402. Tversky, A., & Russo, J. E. (1969). Substitutability and similarity in binary choices. Journal of Mathematical Psychology, 6 , 1-12. Tversky, A., & Sattath, S. (1979). Preference trees. Psychological Review, 86, 542-573. Wang, M.-M., Schonemann, P. H., & Rusk, J. G. (1975). A conjugate gradient algorithm for the multidimensional analysis of preference data. Multivariate Behavioral Research, 10, 45-99. Zinnes, J. L., & Griggs, R. A. (1974). Probabilistic, multidimensional unfolding analysis. Psychometrika, 39, 327-350.
This Page Intentionally Left Blank
New Dcvclopments in Psychological Choice Modeling G. Dc Swte, H. Fcgcr and K. C. Klauer (eds.) 0 Elsevicr Science Publishcr B.V. (North-Holland), 1989
139
ANALYSIS OF COVARIANCE STRUCTURES AND PROBABILISTIC BINARY CHOICE DATA Yoshio Takane McGill University, Canada Pair comparison judgments are often obtaincd by multiple-judgment sampling, which givcs rise to dependencies among observations. Analysis of covariance structures (ACOVS) provides a general methodology for taking apart between-subjcct and within-subject variations, Lhcrcby accounting for thc dependencies among observations. In this expository papcr we show how various concepts underlying ACOVS can be used in constructing probabilistic choicc models that take into account systematic individual differences.
1. Introduction Stimulus comparison presents a general paradigm in diversified fields of scientific investigations (Bradley, 1976). In bioassay strength of life of an organism is compared with dosage levels of a drug. In psychology, econometrics and political science, a subjective quality of a stimulus (e.g., subjective length of a line, grayness of a color, preference toward a political candidate, etc.) is compared against that of another. In statistics loglinear analysis of a frequency table compares the strengths with which subjects belong to certain categories. In a mental test subjects’ ability is compared against difficulty of a test item.
The work reported in this papcr has bccn supported by Grant A6394 to the author from the Natural Scicnces and Engineering Research Council of Canada. Thanks arc due to Jim Ramsay for his helpful comments on an earlier draft of this paper. This papcr is a rcviscd version of an articlc publishcd in Communication & Cognition, 1987, 20, 45-62.
140
Takane
In each case, pi,, the probability that stimulus i is chosen over stimulus j , indicates the degree to which stimulus i dominates stimulus j . However, there are two possible interpretations of pi,, which closely parallel two sampling schemes of pair comparison data (Thurstone, 1927). In Case 1 replications (both within and across stimulus pairs) are made strictly within a single subject, and thus inconsistency in choice is attributed to momentary fluctuations in the internal state of the subject. The pi, in this case represents the proportion of times stimulus i is chosen over stimulus j by the subject. In Case 2, on the other hand, the probability distribution is over a population of subjects. That is, the stochastic nature of choice is attributed to subject differences. The pi, in this case represents the proportion of the subjects in the population who choose stimulus i over stimulus j . Despite the difference in the interpretation, basically the same class of models have been used in both cases. Typically, these models assume statistical independence among observed choice probabilities. However, in Case 1 all pair comparison judgments are made by a single subject, so that there should be no sequential effects. This rules out identifiable stimuli to be used in Case 1 because of the memory effect. In Case 2, each subject is supposed to contribute one and only one observation. This usually ensures the statistical independence. On the other hand, it requires a huge number of subjects. Pair comparison experiments thus rarely use either one of these extreme sampling designs. Instead they typically employ a mixed design, in which each of a group of subjects is asked to respond to all possible pairs of stimuli. That is, replications over different stimulus pairs are obtained within subjects, and replications within stimulus pairs are obtained across subjects. This mixed mode sampling scheme is analogous to the treatment by subject design in ANOVA and is called multiple-judgment sampling in this paper. This sampling design is especially popular in preference judgments, because researchers in this area are often interested in how preferences toward various stimuli correlate with each other, how patterns of preference distribute in the population of subjects, and how an individual’s pattern of preference can be represented in relation to others. In the multiple-judgment sampling pij can be still interpreted as the proportion of the subjects who choose stimulus i over stimulus j , as in Case 2. However, due to within-subject replications across different
Analysis of Covariance Structures
141
stimulus pairs, observed choice probabilities are no longer statistically independent. Systematic individual differences give rise to the dependencies among the observations. For example, a person who tends to prefer product A to B may also tend to prefer C to D. Models of pair comparisons in this case should take into account the systematic individual differences in pair comparison judgments. However, with notable exceptions (Bock & Jones, 1968, pp. 143-161; Bloxom, 1972; Takane, 1985) nearly all previous models of pair comparisons ignored the systematic individual differences. What is needed is a general methodology for separating the systematic individual differences components in the data from strictly random components. The method particularly relevant in this context is the analysis of covariance structures (ACOVS) originally proposed by Bock and Bargman (1966) and subsequently amplified by Joreskog (1970). As has been demonstrated recently (Takane, 1985), the ACOVS framework can be successfully used to extend conventional Thurstonian pair comparison models to multiple-judgment sampling situations. In addition the ACOVS framework may bring on considerable richness to analysis of pair comparison data in general. The purpose of this paper is to explore and overview this possibility. 2. Thurstonian Models of Pair Comparisons
Let us begin with a brief review of Thurstonian random utility models (Thurstone, 1927, 1959). Over the past several years there were interesting developments in this approach (Takane, 1980; Heiser & de Leeuw, 1981; Carroll, 1980; De Soete & Carroll, 1983), which directly lead to the ACOVS formulations of these models. In Thurstone’s original pair comparison model each stimulus is associated with a random variable (called a discriminal process) with prescribed distributional properties. Let Yi represent the random variable for stimulus i. It is assumed that ~i
- N ( m i , o;),
i = 1,..., n
(1)
where mi = E ( Y i ) and sf = V ( Y i ) . The m i represents the mean scale value (e.g., preference value), and sf the degree of uncertainty of stimulus i. When stimuli i and j are presented for comparison, random variables
142
Takane
corresponding to these stimuli, namely Yi and Y,, are generated, and the comparison is supposedly made on the realized values of the random variables at the particular time. The comparison process is supposed to take the difference between Yi and Y,, and either the value of Yi - Y, or some monotonic transformation of it is directly reported, or only its sign (if Yi - Y, is positive or negative) is reported in the form of choice (either stimulus i is chosen or stimulus j is chosen). Under the distributional assumption made above,
where
with si, = Cov(Yi, Y,). Thus the probability that stimulus i is chosen over stimulus j is given by
where qi, = (mi - m,)/di,, and @ and
Analysis of Covariance Structures
143
particular stimuli are compared. Krantz (1967) calls this condition “simple scalability”. However, numerous studies (Debreu, 1960; Krantz, 1967; Restle, 1961; Tversky & Russo, 1969; Rumelhart & Greeno, 1970; Tversky, 1972a, b; Sjoberg, 1977, 1980) reported violations of simple scalability in a variety of empirical situations. All stimuli are not equally comparable. The equal comparability holds only when stimuli to be compared are relatively homogeneous. When the stimuli are radically different on “irrelevant” dimensions (i.e., dimensions other than the one on which the comparison is supposedly made), they tend to be less comparable, and the choice probabilities tend to be less extreme (closer to 1D). If, on the other hand, the stimuli are similar, they are more comparable, and consequently more extreme choice probabilities tend to result (Krantz, 1967; Tversky & Russo, 1969; Rumelhart & Greeno, 1971). Thus differential degrees of similarity among stimuli give rise to context dependencies in the stimulus comparison process, called the similarity effect. This means that di, in Thurstone’s original model has its role to play. In particular, it has been shown (Halff, 1976) that d~ has distance properties, and di; satisfies the three metric axioms (minimality, symmetry and triangular inequality) required of the distance. The distance properties of dij make Thurstone’s general model considerably richer in its descriptive power than those models that assume simple scalability. Specifically, Thurstone’s general pair comparison model satisfies moderate stochastic transitivity (MST), but it can violate strong stochastic transitivity (SST), which is known to be equivalent to the simple scalability (Tversky & Russo, 1969). It is interesting to point out that di,, the distance between stimuli i and j , can be interpreted as a type of dissimilarity between the stimuli. Thus, dividing, mi - m, by di, in qi, in Thurstone’s general model is consistent with the empirical evidence (mentioned earlier) indicating that more dissimilar stimuli are less comparable. Sjoberg (1977) observed a high correlation between di; estimated from pair comparison judgments and a direct similarity rating between stimuli i and j separately obtained. The di, is thus not only theoretically expected to represent the stimulus dissimilarity, but there is also some empirical evidence to support the theory. The problem is how we may recover di; in Thurstone’s general model without overparametrizing it. Attempts to extend Thurstone’s pair
Takane
144
comparison model beyond Case 5 are almost as old as Thurstone’s original proposal of the model (Thurstone, 1927). For example, in Case 3 it is assumed that si, = 0 for all i and j, thereby reducing the number of parameters considerably. Case 4 was derived as a convenient numerical approximation to Case 3. However, in these cases differential comparability (d;,) between stimuli is exclusively atmbuted to individual uncertainties ($ and sf). Thus, they are rather restrictive as models of contextual effects in stimulus comparison processes. A couple of significant proposals were made in early 1980’s in the way of partially recovering di, in Thurstone’s model. Takane (1980) and Heiser and de Leeuw (1981) independently proposed the factorial model of pair comparisons (hereafter called the “HI., model), in which the covariance matrix between discriminal processes was assumed to have a lower rank approximation. That is,
s = ( S i j ) = XX‘
(5)
where X is an n by b (< n) matrix where n is the number of stimuli and b is the rank of mamx S. This amounts to assuming d$ = (Xi - Xj)’(Xi - Xi),
(6)
where xi and x, are i th and j th row vectors of X, since s; = x;‘x, and sf = x,’x,. That is, di, is assumed to be the Euclidean distance between stimuli i and j represented in a b dimensional Euclidean space. The X then represents the matrix of stimulus coordinates. An interesting development was due to Carroll (1980) and De Soete and Carroll (1983). The model is called the wandering vector model (WVM). In this model it is assumed that stimuli are represented as points in a b dimensional space where stimulus coordinates are given by X as in the THL model, that there is a random vector that varies over time, and that the projections of the stimuli onto this vector at a particular time determines the pair comparison judgment at the particular time. Under an appropriate distributional assumption on the vector we may derive the distribution of Yi - Y,, and the choice probability, pi,. Let u* denote the wandering vector, and let u* - N(v,I). Then Yi - Yj = (x;
- Xj)’U* - “(Xi - Xi)%, 4 1 ,
where dij is the same as in (6). It follows that
(7)
Analysis of Covariance Structures
145
pi, = Pr[(Xi - Xj)’U* > 01 ‘ij
$(z)dz = O(ri,),
= -00
where ri, = (xi - x,)’v/dij. It has been shown @e Soete, 1983) that the WVM is a special case of the THL model in which not only di, but also mi and m, are constrained in a special way; i.e.,
mi = xi’v and m, = x,’v.
(9)
Scale values of stimuli are represented in a particular direction in the space. Thus, although the THL model and the WVM were initially derived on the basis of entirely different rationales, they are quite similar to each other. Both the THL model and the WVM are designed to account for the differential comparability among the stimuli. However, these models strictly apply to either Case 1 or Case 2, where differences processes, Yi - Y,, and consequently observed choice probabilities, are assumed statistically independent across all pairs of stimuli. Both Takane (1980) and De Soete and Carroll (1983) developed parameter estimation procedures for their models. They both assume the statistical independence among the observations, while they use the data obtained by the multiplejudgment sampling. As has been discussed, the independence assumption is not tenable in the multiple judgment sampling. However, the assumption is made in virtually all previous estimation procedures for the Thurstonian pair comparisons models (e.g., Hohle, 1966; Bock & Jones, 1968; Arbuckle & Nugent, 1973; Takane, 1980; De Soete & Carroll, 1983; De Soete, Carroll, & DeSarbo, 1986). In order to account for the statistical dependencies among observations, pair comparison models had to await analysis of covariance structure formulations (Bloxom, 1972; Takane, 1985), to which we now turn. In closing of this section it might be pointed out that analogous developments (models of simple scalability to moderate utility models) can be traced in the Bradley-Terry-Luce (Bradley & Terry, 1952; Luce, 1959) type of constant utility model approach (Restle, 1961; Tversky, 1972a, b; Strauss, 1981). However, these developments are not readily amenable to the ACOVS formulations. See Indow (1975) and Luce
Takane
146
(1977) for insightful reviews of this line of development.
3. ACOVS Formulations In order to reformulate the THL model and the WVM in terms of analysis of covariance structures (ACOVS; Joreskog, 1970), let us first generalize the variance structure of these models. It was originally assumed that d2. ‘1 = (xi - x,)’(xi - x,) in these models. To this we may add 83 + gf + k$, where g? and gf are stimulus-specific uncertainties left unaccounted for by (xi - x,)’(xi - x,), and k$ represents uncertainty associated with a specific stimulus pair. These quantities represent amounts of specification error at two different levels. We now generalize this to covariance structures. Let t be a vector of ti, = Yi - Y, + ei, arranged in a specific order, where ei, is the error random variable associated with stimulus pair, ij. In a complete sampling design each subject makes judgments for all possible pairs of stimuli. In such a case t is a of dimensionality M = n(n - 1)/2, where n is the number of stimuli. Let A be an M by n design matrix for pair comparisons, whose rows are arranged in the same order as the elements of t. Each row of A corresponds with a specific comparison. If that comparison involves stimuli i and j and the direction of the comparison requires Yi - Yj (rather than Y, - Yi), the row has 1 in the ith column, -1 in the jth column and zeroes elsewhere. Let y be an n-component vector of Yi, and let e be an M-component vector of ei,. We assume
e
- N(O,K ~ )
(10)
where K2 is assumed to be diagonal with its diagonal elements denoted by k$. It may be further assumed k$ = k2 for all ij. Then t may be expressed, using mamx notation, as t = Ay+e.
(11)
The Ay takes differences between Yi and Y, in prescribed directions for all possible pairs of stimuli. We make a further structural assumption on y; namely, y = xu* + w*,
(12)
Analysis of Covariance Structures
147
where W*
=w
+m
with w
- N(0,G2),
(13)
and
u* = u + v
with
u - N(0,I).
(14)
Here m is the vector of mi (i = 1,..., n) and w is the random vector of stimulus specificities. The G2 is usually assumed to be diagonal with its ith diagonal element, g’, and indicates the degrees of stimulus specificities or uncertainties. The u* is the wandering vector introduced earlier. It follows that
t = A(XU* + w*) + e
-
N[A(Xv
+ m), A(XX’ + C2)A’ + K2].
(15)
When it is assumed that v = 0, then E (t) = Am, and since A m is the vector of mi - m,, this case corresponds with the THL model. If, on the other hand, it is assumed that m = 0 we obtain E(t) = AXV. This represents the mean structure, (xi- x,)’v, required of the WVM. The covariance structure, A(XX’ + G2)A’ + K2, remains the same for the both models. Note that diagonals of this covariance matrix are of the form, (xi - x;)’(xi - x ; ) + g’ + 8,” + k$, which is indeed the variance structure required of both the THL model and the WVM. Note also that offdiagonal elements of A(XX’ + G2)A’ + K2 are no longer zero, implying non-independence among the elements of t. It is interesting to note that the WVM is a random effect alternative to Bechtel, Tucker, and Chang’s (1971) scalar product model. In this model subjects are treated as fixed effects; i.e., for subject k, tk = AXvk and vk is explicitly estimated for each k. Analogous ACOVS formulations of classical Case 5 and Case 3 are also possible. Although these cases are not likely to provide satisfactory descriptions of pair comparison data, they may serve as good benchmark models. In Case 5 di, is assumed to be constant across all combinations of i and j . The simplest way this could occur is when s? and sj are constant, and si, is zero. In the ACOVS formulation this can be achieved by setting X = 0, and G2 = s21. Note that si, = 0 is not absolutely necessary to achieve di; = constant. It is sufficient to have sij = constant (Guttman,
148
Takane
1954). This case corresponds with X = Al, where 1, is an n-component vector of ones. However, this reduces to the previous case, since Al, = OM. Bock and Jones (1968), in their primitive attempt to incorporate systematic individual differences in Thurstone’s pair comparison model, present a model which is essentially equivalent to the ACOVS formulation of Case 5 in which K2 = 0 is also assumed. In Case 3 it is assumed that si, = 0 for all distinct pairs of i and j . This case can be obtained by X = 0 or X = Al,, and G2 being diagonal (not necessarily constant). Model (15) may be fitted to the data by the maximum likelihood or the generalized least squares method (Browne, 1974, 1984), when t is directly observed. In either case some existing programs, such as LISREL (Joreskog & Sorbom, 1981), EQS (Bentler, 1985) and COSAN (McDonald, 1980), may be used for actual computation. When only choices are observed, t has to be reduced to choice patterns. Correspondingly the distribution o f t must be converted into the probability dismbution of the choice pattern. Let h denote an observed pattern, and let f be the density function oft. Then Pr(h) = J f (t)dt
(16)
R
where R is the multidimensional rectangular region formed by the direct product of intervals Ri,, where Rij = (0, W) if stimulus i is chosen over stimulus j (rij > 0) and Ri, = (-, 0) is stimulus j is chosen over stimulus i (ti, c 0). Equation (16) is generally extremely difficult to evaluate due to nonzero covariances among the elements of t. However, the first and the second order marginal probabilities are relatively easily evaluated:
is the univariate marginal density of ri, - “(mi - m,), - x,) + g? + g j + k;)]. (The mi - mj must be replaced by (xi - x,)’v in the WVM.) Similarly,
where (xi
fii
- X,)’(Xi
Analysis of Covariance Structures
149
pi,,qr = Pr(i is chosen over j and q is chosen over r )
where fijqris the bivariate marginal density of ti, and tqr. Muthen (1984) developed LISCOMP, a computer program for the generalized least squares estimation of the ACOVS model for categorical data using the first and the second order marginal probabilities. It has been shown (Christoffersson, 1975; Muthen, 1975) that a loss of information incurred by ignoring higher order marginal probabilities in the estimation is relatively minor. Alternatively, LISREL may be used with tetrachoric correlations, but it only allows the simple least squares estimation. The ACOVS formulation of the WVM can be readily extended to the wandering ideal point (WIP) model recently proposed by De Soete et al. (1986). In the WIP model a subject is represented as a point which varies over time. The relative distances between stimulus points and the subject point at a particular time are supposed to determine preference relations observed at the particular time. The distribution of the subject point is assumed due to time-sampling of observations within a single subject. However, with the ACOVS formulation the model can be extended to the distribution of the ideal point over a population of subjects. Let u* be a random vector of coordinates of the subject point, and let U*
- N(v, D2),
where D2 is a diagonal matrix. (The D2 can be always made diagonal by rotating the space appropriately.) Let d(u*) be a vector of one half times squared Euclidean distances between stimulus points and the ideal point, i.e.,
where &u*) = (xi - u*)’(xi - u*). In the WIP model the distance is assumed inversely related to preference. Thus, we may set
150
y = -d(U*)
+w
in (1 l), where w is defined in (13). Then
t = A(-d(u*) + w) + e
+ +e
= A(Xu* - % x ( ~ ) w)
- N[A(Xv - % x ( ~ ) A(XD2X’ , + G2)A’ + K2],
(21)
where
x ( ~= ) diag(XX’)ln
=
Note that this model differs from the WVM in that it has the additional term in the mean structure XD2X‘ ( rather than XX’) in the covariance structure. Reparametrization by X* = XD will make the covariance structure identical in form to that of the THL model and the WVM. However, the mean will then be A(X*v* - Kdiag(X*’D-2X*)ln), so that we cannot get rid of D2 entirely. Vector has (xj’x, - xi‘xi) as its elements. Due to the nonlinear nature of this term, a special computer program is necessary to fit the ACOVS WIP model. An extension to choice data may be done in a manner similar to that in the WVM. 4. Possible Generalizations
A general method for analysis of covariance and mean structures (ACOVS with structured means) was given by Joreskog (1970). The method includes, among other things, conventional factor analysis, variancecomponent models, path analysis, linear structural equations, etc. Our approach is a special case of this general approach. Sorbom (1981) has shown how the ACOVS with structured means could be treated in a unified manner by analysis of moment structures (AMOMS) (see also Bentler, 1983). In our case the mean and covariance structures in (15) can be expressed as
M = A(X(VV’+ I)X’
+ mm’+ G ~ ) A +’ K~
(22)
Analysis of Covariance Structures
151
in terms of AMOMS, where it is further assumed that v = 0 or M = 0. Perhaps Bloxom (1972) was the first to note the importance of the ACOVS methodology in modeling pair comparison data. He developed his simplex model of pair comparisons (similar to Case 5 ) based on the ACOVS framework. Takane (1985), in an attempt to incorporate systematic individual differences into the THL model and the WVM, arrived at the ACOVS formulations of these models, which are similar in form to Bloxom’s simplex model. Working in the general ACOVS framework opens up an number of possibilities. First of all, a variety of interesting hypotheses (assumptions) can be tested explicitly. For example, G2 = a21 and/or K2 = b21 may be assumed and tested, or G2 = 0 and/or K2 = 0 may be assumed in (1 5 ) and their empirical validity tested. Bechtel et al.’s (1971) model corresponds with m = 0, G2 = 0 and K2 = 0. In the THL model we may relax XX‘+ G2 into a general positive definite matrix, S. We then have
E ( t ) = Am and V(t) = ASA’+ K2.
(23)
The goodness of fit comparison between this model and the original THL model tests the adequacy of the factorial decomposition of S into XX’+ G2. Two particularly interesting possibilities emerge, when stimulus information and/or subject information is available. Stimuli can be characterized by a set of externally supplied attribute values (Bock & Jones, 1968), by a set of features (Rumelhart & Greeno, 1971; Tversky & Sattath, 1979), or by a set of combinations of levels of manipulated factors (Sjoberg, 1975). Similarly, subjects performing the comparisons may be characterized by their background variables, such as sex, age, socioeconomic status, levels of education, etc. In the ACOVS framework these external variables can be incorporated in a relatively straightforward manner. Let B be an n by p (< n ) matrix of stimulus information. There are at least of couple of ways to incorporate this information. For example, we have
Takane
152
t = A(BS* + XU*
+ w) + e - N[A(Bm* + Xv), A(BD2B’ + XX‘ + G2)A’ + K2]
(24)
where s* - N(m*,D2). This model attempts to explain part of stimulus variability by B and the rest by X. This is analogous to Yanai’s (1970) approach to factor analysis with external criteria, in which whatever effects that can be explained by the external criteria are first partialed out, and factor analysis is applied to a residual covariance mamx. This is to see if there is anything interesting left unaccounted for by the external criteria. More simplified or complicated versions of this model may be obtained, as desired, by specializing s* in (24); e.g., s* = m*, s* = Pq* + r, etc. In either case it may be further assumed that v = 0 and/or X = 0. An alternative way to incorporate B is to constrain X by BQy where Q is analogous to regression coefficients. This amounts to assuming that all that has been explained by X can be explained by B. We then have
t = A(BQU* + w) + e
- N[ABQv,
A(BQQ’B’ + G2)A’ + K2].
(25)
A slight generalization of this model would replace Qu* by Qu* + s where s - N ( 0 , D2). We then have
v(t) = A(B(QQ’
+ D ~ ) B +’ G ~ ) A +‘ K
~ .
Subject information may also be incorporated in several ways. When the information is provided in nominal variables (e.g., male or female), one possibility is to partition the data into groups and to analyze them separately (Joreskog, 1971; Muthen & Christoffersson, 1981). This allows completely different covariance structures as well as mean structures across the groups. Of course, it is entirely permissible to constrain some elements in the covariance and mean structures to be equal across the groups. In fact, the gist of the general ACOVS method is that we may explicitly test the empirical validity of such constraints. Alternatively, subject information may be incorporated in a manner similar to regression analysis. Let zk be the q-component vector of the ktk subject’s background variables, and let mk and vk represent m and v in the THL model and the WVM, respectively, for subject k. We have
Analysis of Covariance Structures
153
two options. We may impose a regression structure on either mk or vk. In the first case, we have mk = Pzk and assume v k = 0, so that E (tk) = APzk and V(tk) = A(XX’ + G2)A’ + K2 or A(PD2P‘ + XX’ + G2)A’ + K2. (In either case XX’ + G2 may be replaced by a more general positive definite mamx, S.) In the second case we assume vk = P*zk while mk = 0, so that E(tk) = AXP*zk and v(tk) = A(XX’ + G ~ ) A + ’ K~ or A(x(P*D*~P*’+ I)X’ + c ~ ) A ’+ K ~ ) . (Again XX’ + G2 may be replaced by S.) Both stimulus and subject information can be simultaneously incorporated. Resulting models are combinations of those for the stimulus information and those for the subject information. All the generalizations discussed in this section carry over to the WIP model in a relatively straightforward manner. Assuming that we have both stimulus and subject information, X = BQ and vk = P*zk, we obtain, in the simplest case,
E(tk) = A(BQP*zk - !h diag(BQQ’B’)l,), with V(tk) = A(BQQ’B’
+ G2)A’ + K2.
5. Concluding Remarks In this paper we have shown that the ACOVS methodology is useful in probabilistic pair comparison modeling. No empirical examples are given, and the paper largely remained expository. An obvious follow-up is to exemplify the methodological ideas described in this paper through the analyses of actual data sets. Although some of the ACOVS models for pair comparisons presented in this paper can be fitted by existing programs (e.g., LISREL, LISCOMP), there are others that cannot. For example, no ready-made programs exist for parameter estimation for the ACOVS wandering ideal point models. The normality assumption on u and w, and consequently on t in (15), may not be adequate. In that case we may either transform the data or use a fitting criterion that does not assume normality. Asymptotically distribution free methods (Browne, 1984) may be useful in this context. It may appear that the proposed ACOVS models of pair comparisons have too many parameters to be estimated, particularly when the observed data are binary choices. This is indeed true for the general ACOVS
Takane
154
model. However, it is not true in our practical applications of the ACOVS model, since matrix A is always a fixed matrix in the pair comparison models. The number of parameters can be further reduced, if desired, by assuming that G2 and/or K2 are constant diagonal matrices. There are other possible generalizations that have not been explicitly discussed in this paper. For example, an extension to multiple choice situations seem to be rather straightforward. Also, treating subject’s background variables as random effects (rather than fixed effects) is already feasible in LISREL (Joreskog & Sorbom, 1981). This case corresponds with the error-in-variable regression analysis in the ACOVS framework. Our prospect of further developing the ACOVS methodology in connection with probabilistic choice models is thus bright, despite the fact that there are numerous tasks yet to be accomplished. Appendix How the ACOVS THL model and the WVM (15) may be fitted by LISREL is not so trivial. In this appendix we explain how this is done. We also explain how (24) and (25) can be fitted by LISREL. McArdle and McDonald (1984) provide a general framework for establishing the necessary correspondence. We appreciate Michael Browne’s help (personal communication) in clarifying the matter. The LISREL model consists of three submodels:
+ rt + 4 Measurement Model for y: i = Arfi + E Measurement Model for x: i = AXE + 8,
1. Structural Equation Model: 4 = B( 2.
3.
where the symbols with a tilde on top denote random vectors. Aside from its distributional assumption (i.e., multivariate normality) the model is completely specified by the following eight matrices: A,, Ax, B, r @ = E(&), Y = V ( r ) , 0, = V ( i ) and 0 6 = V ( s ) . (We stick with the notational convention used by Joreskog and Sorbom (1981) as much as possible.) Throughout this appendix it is assumed that Ax = I, @ = I, 0, = K2 (diagonal matrix) and 08 = 0 (zero matrix). The moment structure of i is then expressed as
Analysis of Covariance Structures
155
M =A~(I B)-’(TT’
+ Y ) ( I- B)-‘A,’ + K
~ .
(A-1)
The following results hold: Result I . The moment structure of (t in our notation) for the ACOVS THL model or the WVM is obtained by setting
Ay = [ A 01
k3 and
(Proof) A,(I - B)-’ = [A AX]. Thus, (A-1) becomes
= A(mm’
+ G2 + X(vv’ + 1)X’)A’ + K2,
which is identical to (22). Result 2 . The moment structure of to (24) is obtained by
(t in our notation) corresponding
A, = [ A AB* 01
and
Takane
156 r
1
where in order to avoid confusion our B is denoted by B*. (Note that in the above both A and B* are assumed known a priori, so that AB* can be evaluated a priori.) (Proof)Ay(I - B)-’ = [A AB* AX]. Thus, (A-1) becomes
M = [ A AB* AX]
= A(G2
[ 1 ii+ { !]] m+*’
[$i]+K2
+ B*(m*m*‘+ D2)B*’ + X(vv’ + 1)X’)A’ + K2,
which is identical to the moment structure required by (24). The above specification is apparently not unique. For example, setting
A, = [A 0 01
will give the same result. This latter specification may be more general than Result 2 in that it does not assume that both A and B* are known a priori. However, in Result 3 both A and B* have to be assumed known a priori. Result 3. The moment structure of (t in our notation) corresponding to (25) is given by setting A, = [ A AB* 01
B=
r= and
[!I
5 i]
Analysis of Covariance Structures
Y=
157
.I!
T!
0 0 1
!][
(Proof) Ay(I - B)-’ = [A AB* AB’Q]. Thus, (A-1) becomes M = [ A AB* AB*Q]
[ [“ “1 [c’ 0 0 0
0 0 VV‘
+
= A ( G ~ B*Q(VV’
+
+ I)Q’B*’)A’ + K
0 0 0 0 0 I
B!i’]+K2 Q’B*A‘
~ ,
which is identical to the moment structure required of (25). A slight generalization can be made by setting Y=
::]
0 D2 0 .
The moment structure then becomes
M = A(G2 + B*(Q(vv’ + 1)Q’ + D2)B*’)A’ + K2.
References Arbuckle, L., & Nugent, J. H. (1973). A general procedure for parameter estimation for the law of comparative judgment. British Journal of Mathematical and Statistical Psychology, 26, 240-260. Bechtel, G . G., Tucker, L. R., & Chang, W. (1971). A scalar products model for the multidimensional scaling of choice. Psychometrika, 36, 369-387. Bentler, P. M. (1983). Some contributions to efficient statistics in structural models: Specification and estimation of moment structures. Psychometrika, 48, 493-517. Bentler, P. M. (1985). Theory and implementation of EQS, a structural equations program. Los Angeles: BMDP Statistical Software. Bloxom, B. (1972). The simplex in pair comparisons. Psychometrika, 37, 119-136. Bock, R. D., & Bargman, R. E. (1966). Analysis of covariance structures.
158
Takane
Psychometrika, 31, 507-534. Bock, R. D., & Jones, L. V. (1968). The measurement and prediction of judgment and choice. San Francisco: Holden Day. Bradley, R. A. (1976). Science, statistics and paired comparisons. Biometrics, 32, 213-232. Bradley, R. A., & Terry, M. E. (1952). The rank analysis of incomplete block designs. I. The method of paired comparisons. Biometrika, 39, 324-345. Browne, M. W. (1974). Generalized least squares estimates in the analysis of covariance structures. South African Statistical Journal, 8, 1-24. Browne, M. W. (1984). Asymptotically distribution free methods for the analysis of covariance structures. British Journal of Mathematical and Statistical Psychology, 37, 62-83, Carroll, J. D. (1980). Models and methods for multidimensional analysis of preference choice (or other dominance) data. In E. D. Lantermann & H. Feger (Eds.), Similarity and choice. Bern: Hans Huber. Christoffersson, A. (1975). Factor analysis of dichotomized variables. Psychometrika, 40, 5-32. Debreu, G. (1960). Review of R. D. Luce, Individual choice behavior: A theoretical analysis. American Economic Review, 50, 186-188. De Soete, G. (1983). On the relation between two generalized cases of Thurstone’s law of comparative judgment. Mathehatiyues et Sciences humaines, 21, 45-57. De Soete, G., & Carroll, J. D. (1983). A maximum likelihood method for fitting the wandering vector model. Psychometrika, 48, 553-566. De Soete, G., Carroll, J. D., & DeSarbo, W. S. (1986). The wandering ideal point model: A probabilistic multidimensional unfolding model for paired comparisons data. Journal of Mathematical Psychology, 30, 28-4 1. Guttman, L. (1946). An approach for quantifying paired comparisons rank order. Annals of Mathematical Statistics, 17, 144-163. Halff, H. M. (1976). Choice theories for differentially comparable alternatives. Journal of Mathematical Psychology, 14, 244-246. Heiser, W., & De Leeuw, J. (1981). Multidimensional mapping of preference data. Mathematiques et Sciences humaines, 19, 39-96. Hohle, R. H. (1966). An empirical evaluation and comparison of two
Analysis of Covariance Structures
159
models for discriminability. Journal of Mathemathical Psychology, 3, 174-183. Indow, T. (1975). On choice probability. Behaviormerrika, 2, 13-31. Joreskog, K. G. (1970). A general method for analysis of covariance structures. Biometrika, 57, 239-251. Joreskog, K. G. (1971). Simultaneous factor analysis in several populations. Psychometrika, 36, 409-426. Joreskog, K. G., & Sorbom, D. (1981). LJSREL VI user guide. Mooresville, IN: Scientific Software. Krantz, D. H. (1967). Small-step and large-step color differences for monochromatic stimuli of constant brightness. Journal of the Optical Society of America, 57, 1304-1316. Luce, R. D. (1959). Individual choice behavior: A rheoretical analysis. New York: Wiley. Luce, R. D. (1977). The choice axiom after twenty years. Journal of Mathematical Psychology, 15, 215-233. McArdle, J. J., & McDonald, R. P. (1984). Some algebraic properties of the Reticular Action Model for moment structures. British Journal of Mathematical and Statistical Psychology, 37, 234-251. McDonald, R. P. (1980). A simple comprehensive model for the analysis of covariance structures: Some remarks on applications. British Journal of Mathematical and Statistical Psychology, 33, 161-183. Muthen, B. (1978). Contributions to factor analysis of dichotomous variables. Psychometrika, 43, 551-560. Muthen, B. (1984). A general structural equation model with dichotomous, ordered categorical, and continuous latent variable indicators. Psychometrika, 49, 115-132. Muthen, B., & Christoffersson, A. (1981). Simultaneous factor analysis of dichotomous variables in several groups. Psychometrika, 46, 407419. Restle, F. (1961). Psychology of judgment and choice. New York: Wiley. Rumelhart, D. L., & Greeno, I. G. (1971). Similarity between stimuli: An experimental test of the Luce and Restle choice models. Journal of Mathematical Psychology, 8, 370-381. Sjoberg, L. (1975). Uncertainty of comparative judgments and multidimensional structure. Multivariate Behavioral Research, I I , 207-218.
160
Takane
Sjoberg, L. (1977). Choice frequency and similarity. Scandinavian Journal of Psychology, 18, 103-115. Sjoberg, L. (1980). Similarity and correlation. In E. D. Lantermann & H. Feger (Eds.), Similarity and choice. Bern: Hans Huber. Sorbom, D. (1981). Structural equation models with structured means. In K. G. Joreskog & H. Wold (Eds.), Systems under indirect observations: Causality, structure and prediction. Amsterdam: North-Holland. Strauss, D. (1981). Choice by features: An extension of Luce’s model to account for similarities. British Journal of Mathematical and Statistical Psychology, 34, 50-61. Takane, Y. (1980). Maximum likelihood estimation in the generalized case of Thurstone’s model of comparative judgment. Japanese Psychological Research, 22, 188-196. Takane, Y. (1985). Probabilistic multidimensional pair comparison models that take into account systematic individual differences. Transcript of the talk given at the 50th Anniversary Meeting of the Psychometric Society, Nashville. Thurstone, L. L. (1927). A law of comparative judgment. Psychological Review, 34, 273-286. Thurstone, L. L. (1959). The measurement of values. Chicago, IL: University of Chicago Press. Tversky, A. (1972a). Choice by elimination. Journal of Mathematical Psychology, 9, 341-367. Tversky, A. (1972b). Elimination by aspects: A theory of choice. Psychological Review, 79, 28 1-299. Tversky, A., & Russo, J. E. (1969). Substitutability and similarity in binary choices. Journal of Mathematical Psychology, 6, 1-12. Tversky, A., & Sattath, S. (1979). Preference trees. Psychological Review, 86, 542-573. Yanai, H. (1970). Factor analysis with external criteria. Japanese Psychological Research, 12, 143-153.
New Developments in Psychological Choice Modeling G. De Soete, H. Feger and K. C. Klauer (eds.) 0 Elsevier Science Publisher B.V. (North-Holland),1989
161
TWO CLASSES OF STOCHASTIC TREE UNFOLDING MODELS J. Douglas Carroll AT&T Bell Laboratories, Murray Hill, NJ, U.S.A.
Wayne S. DeSarbo University of Michigan, U.S.A.
Geert De Soete University of Ghent, Belgium In this paper we propose two versions of stochastic choice models based on tree structure models - called “tree unfolding models”. These models can be viewed as discrete (tree structure) analogues of a recently proposed class of continuous (spatial) random utility models for paired comparisons choice data called the “wandering vector” and “wandering ideal point” models.
1. Introduction A class of continuous spatial models for paired comparisons choice data have been proposed by Carroll, De Soete, DeSarbo and others (Carroll, 1980; DeSarbo, De Soete, & Eliashberg, 1987; DeSarbo, De Soete, & Jedidi, 1987; DeSarbo, Oliver, & De Soete, 1986; De Soete & Carroll, 1983, 1986; De Soete, Carroll & DeSarbo, 1986; Schonemann & Wang, 1972; Wang, Schonemann, & Rusk, 1975; Zinnes & Griggs, 1974) These
Geert De Soete is supported as “Bevoegdverklaard Navorser” of the Belgian “Nationad Fonds voor Wetenschappelijk Onderzoek”. This paper is a substantially revised version of an article entilled “Stochastic tree unfolding (STUN) models’’ published in Communication & Cognition, 1987, 20, 63-76.
162
Carroll, DeSarbo, & De Soete
models are all variants of one form or other of what Carroll, De Soete and DeSarbo have called the “wandering vector model” and the “wandering ideal point model”. More generally this class of models can be referred to as multidimensional models for probabilistic choice, or, perhaps, more appropriately, stochastic multidimensional spatial choice models. They comprise an important subclass of a family of stochastic choice models called random utility models. References on this class of models can be found in Luce (1977) and McFadden (1976). In this paper we propose two versions of stochastic choice models based on tree structure models - called “tree unfolding models” (Furnas, 1980; De Soete, DeSarbo, Furnas & Carroll, 1984a, 1984b). For the history of the use of the term “unfolding” for this class of models see Carroll (1972, 1980), Furnas (1980). or Coombs (1964). Probably a better and more informative name for this class of models is “ideal point models”, since they generally assume preference is related to distance from an “ideal” stimulus point by a non-increasing monotonic function. From this point of view, the current class of models would be termed “tree ideal point” rather than “tree unfolding” models. However, in keeping with the historical use of the term “unfolding”, particularly in mathematical psychology and psychometrics, we shall continue to use the (perhaps) less informative but more “colorful” name “tree unfolding”. Thus we call these models, generically, Stochastic Tree UNfolding (or STUN) models. We shall deal in the current paper with two classes of tree structure models, differing in the way their respective “metric” is defined given the tree structure (ultrametric versus path length metric, sometimes called an “additive” metric) and with two different stochastic formulations (one entailing simple additive i.i.d. normal error, and the other entailing stochastic assumptions in which the structure of the tree becomes a central component). 2. The Two Structural Models
Given a fixed tree (i.e., a connected graph without cycles), there are (at least) two ways to define a metric on the objects represented as nodes of that tree. For now we restrict ourselves to the case where objects are placed only at terminal nodes. (See Carroll and Chang, 1973, for a
Stochastic Tree Unfulding Models
163
discussion of tree structure models in which objects are associated with, and distances defined between, all nodes, including internal as well as terminal nodes, of a tree.) The first of the two types of tree metrics we are considering in the present paper is the ultrametric, in which the distance between two (terminal) nodes is defined as what is often called the “height” (see Johnson, 1967) of their “least common ancestor” (1.c.a.) internal node. The 1.c.a. is the internal node at which the two first meet, or the “lowest” one which they share in common, in the hierarchy defined by the tree. It is important to note, here, that the ultramemc is dependent on the tree being a hierarchical tree; i.e., the ultramemc is based on a partial order being defined on the nodes (based on a subordinate-superordinate relationship between certain pairs of nodes). In particular, the distance between two (terminal) nodes is defined as the height of their 1.c.a. internal node. Generally speaking, these height values are assumed non-negative, and (more importantly) they are assumed to respect the same partial order as the (internal) nodes with which they are associated. That is, if A c B then h ( A ) 5 h ( B ) where A and B are two internal nodes, “c” can be interpreted as meaning that A is below B in the hierarchical order imposed on the tree (in set theoretical terms, “ A c B” has the usual meaning that the set [of terminal nodes contained in] A is a subset of [those contained in] B ) , and h ( . ) denotes the height values. These two conditions on the heights make the induced distance satisfy the ultrametric conditions. In particular, the ultramemc inequality can be stated as:
dik Imax (di,, d,k)
for all i, j , k ,
which can easily shown to be equivalent to saying that all triangles are
acute isosceles (isosceles, with the two longest sides equal). The ultrametric inequality (together with non-negativity) is a special case of (but much stronger than) the triangle inequality, which must be satisfied per definition by any memc. The ultramemc inequality (which, by itself, does not require non-negative distances) together with non-negativity comprise the ultrametric conditions. As pointed out by Johnson (1967) there is, in fact, a one-to-one relation (or isomorphism) between ultrametics and hierarchical trees, in the sense that, given an ultramemc, the hierarchical tree (and the non-negative height values of its internal nodes) are immediately defined, and vice versa (i.e., given a hierarchical
164
Carroll, DeSarbo. & De Soete
tree plus a set of height values satisfying non-negativity, a unique ultrametric is defined). In fact, we can generalize this by saying there is a one-to-one relation (isomorphism) between a set of dissimilarities (not necessarily distances, since they may not satisfy non-negativity) satisfying the ultrametric inequality (but not necessarily non-negativity) and hierarchical trees (with possibly non-negative height values). Finally, and most generally, an isomorphism exists between the set of all ordinally defined ultrametrics (i.e., rank orders, including ties, of dissimilarities such that any set of dissimilarities satisfying that rank order will satisfy the ultrametric inequality) and the set of all hierarchical trees (independent of the height values). Since any non-decreasing monotonic function of an ultrametric (preserving non-negativity) is also an ultrametric this is a very important property. Satisfaction of the ultramemc inequality, since it is based only on the ordinal properties of the distances or dissimilarities, is invariant under monotonic transformation of those distances (dissimilarities). A special case of such a monotonic transformation, of course, is addition of an additive constant, which can transform non-negative values into negative ones (or vice versa). However, given values satisfying the ultrametric inequality they can always be easily transformed into values satisfying non-negativity as well (and thus the full set of ultrametric conditions) by the simple device of adding a sufficiently large positive additive constant. The path length (or additive) metric is (superficially at least) quite different from the ultramemc. In this metric weights or lengths are associated with the links or the brunches of the tree (the edges in the graph connecting nodes of the tree to one another) and distance is defined as the length of the (unique) path joining the two nodes. Trees with this metric are sometimes called “free” or “unrooted” trees (see Cunningham, 1978) because they do not have a unique root (or “most superordinate” node) as do hierarchical (or “rooted”) ,trees. In the case of this path length metric it is quite important, however, that the branch lengths be nonnegative (otherwise the resulting “distances” may not satisfy the metric axioms - in particular the triangle inequality and/or non-negativity of the resulting “distances” may not hold). Unlike ultrametric distances, path length distances are not ordinally invariant - i.e., an increasing monotonic transformation (even if it retains non-negativity) of path length distances will not necessarily be path length distances, and, in fact, may not
Stochastic Tree Unfolding Models
165
be distances at all (Le., may not satisfy the metric axioms, and, in particular, the mangle inequality may be violated) (cf. De Soete, 1983). However, the path length property of these distances is invariant under addition of an additive constant, at least if the possibility of some negative lengths (and possibly negative distances) is allowed. In fact, the topology (i.e., the network structure) of the tree will not change as a result of addition of a (positive or negative) constant to these path length distances, but only some of the branch lengths (in particular, those of the branches linking terminal nodes to internal nodes of the tree, some of which may, in fact, become negative). In this sense, path length “distances” as well as ultrametric distances can be viewed as defined only on an interval scale. (The word “distances” is put in quotes here, since addition of a sufficiently large negative constant may lead to violations of the triangle inequality, or even of non-negativity, in which case these numbers will not be true distances at all, but only “dissimilarities”.) However, these two types of tree metrics are not as distinct as it may seem at first. The ultrametric can easily be seen to be a special case of the path length metric, obtained by defining the branch lengths in a particular way. In particular, assuming all lengths to be non-negative, the length of the branch connecting any two nodes can be defined to be the difference in height values between the superordinate (higher) and subordinate (lower) of the two nodes, where the “height” of a terminal node is defined to be zero. On the other hand, a set of path length or additive distances defined on a tree can be decomposed into the sum of an ultrametric distance defined on the same tree (defied by appropriately specifying heights for the internal nodes) and a second set of values which are of the form 6; = Ci + c, (i.e., are additively decomposable) for i # j. If the c’s in this second decomposition are nonnegative (and the decomposition can always be so defined that they are) these values (6;) can be viewed themselves as path length distances on a very special class of tree with only one internal node (to which all the terminal nodes attach) called (by graph theorists) a “star”, or (by numerical taxonomists) a “bush”. However, this decomposition is not unique. In fact, even the particular hierarchical tree with which the ultramemc component is associated is not unique, since a path length tree with n terminal nodes can be converted into an additive tree in n - 1 different ways, by “rooting” the (otherwise “unrooted”, or nonhiemchical) tree associated with the path length
Carroll, DeSarbo, & De Soete
166
-
metric at any one of n 1 different positions (at any one of n - 1 different internal nodes). Even if the particular hierarchy associated with the tree is specified, the heights, and thus the ultrametric component of the decomposition is not unique, but is defined only up to an additive constant. For a further discussion of the interrelations between these two types of tree metrics see Carroll (1976). Carroll and Pruzansky (1980), Furnas (1980) or Carroll, Clark, and DeSarbo (1984) for the three-way case. In the tree unfolding models for individual differences in preferences both the stimuli and the subjects are represented as terminal nodes in a tree. Like the spatial unfolding or “ideal point” models, a subject’s preferences are assumed to be inversely related to distances between the node representing that subject (the tree analogue of an “ideal point” for that subject) and the nodes representing the stimuli (the tree “stimulus points”). These distances are defined either as ultrametric or path length distances. In the papers by De Soete, DeSarbo, Furnas and Carroll (1984a, 1984b) a penalty function approach is described for fitting these models to a subjects by stimuli matrix of preference scores, sometimes referred to as a rectangular matrix of two-mode proximities between two sets of entities (in this case, subjects and stimuli). In the present paper we present stochastic versions of these models which are appropriate for paired comparisons preference data - the kind of data assumed in the “wandering vector” and “wandering ideal point” models. However, unlike many continuous stochastic preference models, where it is possible to fit the model to a single matrix of paired comparisons data (either replicated over subjects, or amalgamated over different individual subjects, but with those subjects treated as replications), at least one of the two classes of stochastic tree unfolding models (called SSTUN, for the “Simple” or “Special” STUN model) requires more than one subject (or paired comparisons matrix) to yield a non-trivial solution. (With only a single such matrix the tree obtained will always have the topology of a simple linear order corresponding to the best one dimensional solution for that preference matrix.) The second class of models (called GSTUN, for the “General” STUN model), however, can, in principle at least, recover the entire tree structure for stimuli, even from a single subject.
Stochastic Tree UMolding Models
167
3. The SSTUN Model In the simplest form of the Stochastic Tree Unfolding Model, called the SSTUN model, the paired comparisons are assumed to be generated from a process very closely related to Thurstone’s (1927) case V model, but with the “tree” preference scale values defined by the tree distances. That is, for individual i on trial t, in which stimulus j and k are to be compared, the probability p i , j k = P i (j> k) (the probability that i prefers stimulus j to k on trial r) is assumed to be generated from the following process:
with
d f k = djk
-+ &i,
(3b)
where ~f and I$ are independently normally distributed with mean zero and variance 02,and where ji, denotes the (ultrametric or additive) tree distance between the nodes representing subject i and stimulus j . This is exactly equivalent to the Thurstone case V model with subject i ’ s mean “discriminal process” equal to dj, and with all subjects having a common variance 2 of the discriminal process. Note, that we may drop in (2) the t superscript because p i , j k is independent of t, since the & J ’ s are assumed independent both of t and j . This property of independence of t is true of all the models to be assumed here, so that, henceforth, the t will generally be omitted. Note, also, that, without loss of generality, we may assume (3 equal to a constant, which can be taken, with appropriate scaling of the &,, as (J = I/*, in which case pj,jk
= @(ajk - j i j ) .
A preliminary version of a procedure which uses the De Soete, DeSarbo,
Furnas and Carroll penalty function procedure for fitting the structural tree unfolding model as a central component has been devised and successfully applied to some marketing data (cf. DeSarbo, De Soete, Carroll, & Ramaswamy, 1988).
168
Carroll, DeSarbo, & De Soete
This “simple” version of stochastic tree unfolding does not, however, utilize the tree structure in any inherent manner in generating the stochastic components. The distances, d i j , in this case (or other numbers assumed related in an inverse linear fashion to preference scale values) could have been generated by any process whatever - the structural model and the stochastic component are simply (as it were) “grafted” onto one another without any essential theoretical link interconnecting them. Furthermore, this leads to a Thurstone Case V model for each subject, which is known to entail strong stochastic transitivity for each subject; i.e., if p j , j k and pi,^ are both equal to or larger than 1/2, then P i , j l 2 max ( p i , j k , p , , ~ ) .There is considerable evidence in the literature, however, that in many realistic situations, strong stochastic transitivity does not obtain. At best, a weaker condition known as moderate stochastic transitivity [in which, under the same conditions, P i , j / 2 min ( p i , j k , pi,^)] can be expected to hold. The more general models to be discussed below, called General Stochastic Tree UNfoldinl (GSTUN) models satisfy moderate (but not strong) stochastic transitivity Furthermore, these models do indeed utilize the topology of the tree struc. ture in a central way. 4. The GSTUN Model
Given a fixed tree (hierarchical in the ultrametric case, or a rootless, nonhierarchical tree, or “free tree” in the path length case), we can define a matrix associated with that tree which we call the “Path Matrix”, and shall denote as P. We give this matrix that name because, in the case of a path length metric the matrix can be viewed as defining the unique path connecting every pair of terminal nodes i and j . In the case of an ultrametric it does not define such a path, but rather defines the “least common ancestor’’ node for every pair i and j , which can be viewed as defining the “path” connecting the two (if we think of the pair interconnecting directly via their 1.c.a. node). In the case of a path length metric the Path Matrix P is a matrix whose rows correspond to pairs of terminal nodes i and j , where (in the present case of the tree unfolding models) i corresponds to a subject and j to a stimulus. The columns of P correspond to branches in the tree. The general entry in P,which we shall designate as p ( i , h (for the entry in the
Stochastic Tree UMolding Models
169
row corresponding to node pair ( i , j ) and to branch 4) will be 1 if and only if branch 4 is included in the path connecting i to j , and 0 otherwise. This mamx is, thus, a binary “indicator” matrix indicating which branches are involved in the path interconnecting each pair of nodes i and I. Given a set of branch lengths h 1, h z ,..., h~ which we may represent as a Q-dimensional (column) vector h, the distances dij (i = 1,..., I, j = 1,..., J ) , which can be “packed” into another column vector of I x J components, d, can be defined via the mamx equation d = Ph
(4)
Now, let us suppose that h, rather than being a fixed vector of branch lengths, is a random variable. Furthermore, let us assume that, for individual i the dismbution of h is hi
- N ( P i , xi)
(5)
then, on a particular paired comparisons ma1 in which subject i is comparing stimulus j to stimulus k we have: pi,jk
PjU > k )
where Sj,jk = djk - di,. Under the assumptions we have made in this General Stochastic Tree Unfolding Model the dismbution of sj,,k is:
while
170
Carroll, DeSarbo. & De Soete
where p(i,) and p ( ~ r ( are ) the row vectors corresponding to the (i, j) and ( i , k) rows of the Path Matrix P, respectively. Since Zi is a covariance mamx, it is positive definite (or semidefinite) so that 6$j)(jk) is the squared generalized Euclidean distance between rows (i, j) and (i, k ) of P in the metric of Xi. Consequently, we obtain
where 0 denotes the standard normal distribution function. Since 6 is a (Euclidean) metric, model (10) is a moderate utility model (see Halff, 1976). It should be evident that exactly the same development will hold for the ultrametric case, except that the rows of the Path Matrix P correspond to Q internal nodes (rather than Q branches) with ~ ( i , )being ~ 1 if and only if node q is the 1.c.a. of i and j. It is clear, however, if one considers the structure of the path matrix, P, that the dismbution of the choice probabilities for subject i are dependent only on the distribution of those components of h that affect the distances from the node for subject i to the stimulus nodes. Those distributional parameters for subject i involving components of h not affecting these distances are indeterminate without further constraints.
5. Some Special Cases of GSTUN While the general model with pi and Zi completely unconstrained is of at least theoretical interest, this completely general model is much roo general for practical application to real data. As already noted, in some cases some of the parameters are intrinsically undefined. In all cases it has more parameters than observed data values, and therefore cannot be uniquely fitted. Thus, without further constraints, GSTUN should be taken as providing a broad theoretical framework within which a large number of interesting special cases can be viewed, rather than as a tractable statistical model in and of itself. Let us now, however, consider some cases which are of particular interest. Most importantly is the case in which pi = p and Zi = Z, for all i. Imposing these constraints is the simplest way of avoiding the intrinsic indeterminacies already alluded to above.
Stochastic Tree U@olding Models
171
These constraints on the p i ’ s and X i ’ s will, in fact, be assumed henceforth (unless otherwise indicated) in the present paper. Now, let us consider two further constraints of interest on Z (now assumed common over subjects). 1)
Z diagonal (i.e., the hq’s are independently normally distributed, but with different variances). In this case Z = diag (0;)so that 6 ( i , ) ( i k ) is the weighted Euclidean memc with weights 0;.Le.,
Q 66,) 2)
(ik)
=
0; b ( i j ) q
- P(ik)ql2.
q=l
Z = 021. This is case (1) above, with the variances of the hq’s all equal. We may, without loss of generality assume the common variance to be one, thus making Z = I, since we may absorb an arbitrary scale constant into the definition of p. In this case, of course, 6 ( i , ) ( i k ) is the ordinary Euclidean distance between p ( i , ) and p ( i k ) ; i.e.: 6tj)(ik)=
cQ
(P(ij)q
- P(ik)ql2,
q=l
It might be noted that the GSTUN model (and all its special cases) can be viewed as a special case of the wandering vector model (WVM) (Carroll, 1980; De Soete & Carroll, 1983) but with a diferenr Q dimensional stimulus space defined for each subject (corresponding to the submatrix of P associated with the distances from that subject’s node to the J stimulus nodes), with centroid vector p i , covariance matrix Z i for that subject (assuming here, for the moment, the most general form of GSTUN). This observation should be viewed, however, as being primarily of theoretical interest. In particular, it demonstrates (in a somewhat different way) that GSTUN “inherits” many important properties of the WVM. One notable feature of the GSTUN model in the ultrametric case is that (regardless of definition of Z) the variance term 6 $ , , ( i k , will be identically zero for any pair of pairs (ij) and (ik) sharing the same 1.c.a. nodes. This is because the rows p ( i j ) and p ( i k ) must be identical in such cases, and thus the distance between those rows (however, defined) will be zero. However, since we have already assumed that, for the i, j and k under consideration, subject i meets both stimuli j and k at the same internal
Carroll, DeSarbo, & De Soete
172
&,
node, it follows that, in this ultrametric case, = &. COIISeqUently dik - di, is zero, and pi,jk = @ - . Thus pi,jk is undefined, but could be
I:[
defined by fiat as equal to 1/2. It should be further noted, however, that since ultramemc distances can, as discussed earlier, be obtained as a special case of the path length metric, this makes it possible to fit the latter model but with constraints on the p’s that make the expected values of the distances ai, conform to the ultrametric conditions.
6. The Relation between the GSTUN and SSTUN Models
As the names imply the General STUN model, GSTUN, should include the Simple or Special STUN model, SSTUN, as a special case. It turns out this is true, under “general” conditions on the tree structure, for the GSTUN path length model (or the special case of that alluded to above in which the average distances are constrained to satisfy the ultrametric conditions). This relies on the fact that, in the case of this model the covariance matrix for individual i, Zi, can be so chosen as to make the covariances of the additive error components (added to the structural model) equal to an identity matrix (or a scalar times an identity matrix). (Note that we have now gone back to the most general case in which each subject is allowed a separate covariance matrix. We assume, however, that the centroid, p, of h is the same for all i.) Details follow. To demonstrate the fact that the SSTUN model is a special case of the GSTUN model (with path length metric and general covariance matrix Xi, different for each subject) let us first define Pi to be the submatrix of the path matrix P corresponding to those pairs of nodes including i (i.e., i paired with all J stimuli). Then, for given h, the J-dimensional (column) vector di defined as
.
di = Pih
(1 1)
contains the distances from i to the J stimuli. Under the assumptions made in the GSTUN model, the distribution of di will be di
where di
i) .
- N(&, PiZiPi’)
(12)
= Pip (the vector of expected values of tree distances for subject
Stochastic Tree UNolding Models
173
In order to make the GSTUN model equivalent to the SSTUN model (since we may choose P and p so that the expected values are already equal) it is necessary and sufficient to make the covariance matrices equal. The covariance matrix (of the distribution of “distances” over mals) in the SSTUN model is (without loss of generality, since we may choose 02 = 1)
ZfS’ = I
(13)
while as we have already seen, for the GSTUN model it is
z p = PiXiPi’
(14)
(where Zi is a Q x Q mamx of covariances of the branch lengths for subject i , while Cis) and XiG) are J x J matrices of covariances of the dij’s). Given a fixed tree (and memc) for subject i , and thus a fixed path matrix Pi, sufficient (but not, generally, necessary) conditions for Xic) = Xis) = I are that Pipi’ be nonsingular, so (Pipi’)-’ exists, (which is necessary) and that
zi = Pi’(PiPi’)-2Pi.
(15)
The definition of Xi in eq. (15) is sufficient, but not necessary, as there will generally be other definitions of Xi, differing primarily in what generalized inverse of Pi’Pi is used to define X i . In this case, the mamx Pi’(PiPi’)-*Pi defining Zi is chosen as the Moore-Penrose generalized inverse. For the case of the GSTUN model with an ultramemc, the matrix Pipi’ will be singular, (p) and p> will be identical if ( i , j ) and ( i , j ’ ) share the same 1.c.a. node.) In fact, it can be proved that, under the conditions of this model (which we might call GSTUN(u) for GSTUN with an ultrametric) the mamx Pipi’ will always be singular, except for the relatively uninteresting case in which the tree has a structure equivalent to a (single) linear order for the stimuli, because of the fact that many rows in Pi will be replicated. This might lead one to assume that the ultrametric case of SSTUN cannot be generated as a special case of GSTUN(u). This, however, is not the case! In fact, GSTUN(u) with Z = a21 will be exactly equivalent (in terms of the predicted choice probabilities, in any case) to SSTUN, if we resolve the indeterminacy mentioned earlier “by fiat”, by defining pi,,k = 112 in the case in which
174
Carroll, DeSarbo, & De Soete
(subject) i meets (stimuli) j and k both at the same internal node. The choice probabilities predicted by GSTUN(u) in this case will then be identically the same as those predicted by SSTUN! Thus, in fact, GSTUN(u) (with a scalar covariance matrix) is indistinguishable from SSTUN. (Of course, GSTUN(u) is still more general, as it need not be restricted to a scalar covariance matrix, and, in this case, its properties are distinctly different from SSTUN - even if SSTUN is generalized to allow different variances or a general covariance matrix.) If we assume GSTUN(pl), i.e., GSTUN with path length metric, under very general conditions the matrix Pipi’ will be nonsingular. (In fact, it can be proved that it will always be nonsingular so long as no branches in the tree are omitted altogether; i.e., so long as there are no nodes that are identified and thus collapsed into a single node.) In fact, even weaker conditions are necessary - all that is needed to guarantee nonsingularity of Pipi’ is that no two stimuli are placed at the same terminal node. Even the “degenerate” star or bush tree mentioned earlier, with only a single internal node, leads to a nonsingular Pipi’ (in fact, in this case Pi is a square J x J matrix which itself is nonsingular). It thus would seem to follow that the GSTUN(p1) model is the only completely general model (in this sense), and thus, we feel, is to be favored over GSTUN(u) on theoretical grounds. As discussed earlier, it is possible to fit GSTUN(p1) with constraints making the expected distances ultrametric, so we do not feel that this theoretical argument favoring GSTUN(p1) necessarily rules out what is essentially an ultrametric model (in its “central tendency”). In fact the SSTUN model with ultrametric distances can be generated as a special case of GSTUN(p1) with ultrametric constraints on d E(d) (which implies certain constraints on p).
References Carroll, J. D. (1972). Individual differences and multidimensional scaling. In R. N. Shepard, A. K. Romney, & S . B. Nerlove (Eds.), Multidimensional scaling: Theory and applications in the behavioral sciences (Vol. 1). New York: Seminar Press. Carroll, J. D. (1976). Spatial, non-spatial and hybrid models for scaling. Psychometrika, 41 , 439-463. Carroll, J. D. (1980). Models and methods for multidimensional analysis
Stochastic Tree UNolding Models
175
of preferential choice (or other dominance) data. In E. D. Lantermann & H. Feger (Eds.), Similarity and choice. Bern: Huber. Carroll, J. D., & Chang, J. J. (1973). A method for fitting a class of hierarchical tree structure models to dissimilarities data, and its application to some body parts data of Miller’s. Proceedings of the 81st Annual Convention of the American Psychological Association, 8, 1097-1098. Carroll, J. D., Clark, L. A., & DeSarbo, W. S. (1984). The representation of three-way proximities data by single and multiple tree structure models. Journal of Classification, I , 25-74. Carroll, J. D., & Pruzansky, S. (1980). Discrete and hybrid scaling models. In E. D. Lantermann & H. Feger (Eds.), Similarity and choice. Bern: Hans Huber. Coombs, C. H. (1964). A theory of data. New York: Wiley. Cunningham, J. P. (1978). Free trees and bidirectional trees as representations of psychological distance. Journal of Mathematical Psychology, 17, 165-188. DeSarbo, W. S., De Soete, G., Carroll, J. D., & Ramaswamy, V. (1988). A new stochastic ultrametric tree unfolding methodology for assessing competitive market structure and deriving market segments. Applied Stochastic Models and Data Analysis, 4 , 185-204. DeSarbo, W. S., De Soete, G., & Eliashberg, J. (1987). A new stochastic multidimensional unfolding model for the investigation of paired comparison consumer preferencekhoice data. Journal of Economic Psychology, 8, 357-384. DeSarbo, W. S., De Soete, G., & Jedidi, K. (1987). Probabilistic multidimensional scaling models for analyzing consumer choice behavior. Communication & Cognition, 20, 93- 116. DeSarbo, W. S., Oliver, R. L., & De Soete, G. (1986). A probabilistic multidimensional scaling vector model. Applied Psychological Measurement, 10, 78-98. De Soete, G. (1983). Are nonmemc additive tree representations of numerical proximity data meaningful? Quality & Quantity, 17, 475478. De Soete, G., & Carroll, J. D. (1983). A maximum likelihood method for fitting the wandering vector model. Psychometrika, 48, 553-566. De Soete, G., & Carroll, J. D. (1986). Probabilistic multidimensional
176
Carroll, DeSarbo, & De Soete
choice models for representing paired comparisons data. In E. Diday, Y. Escouffier, L. Lebart, J. Pages, Y. Schektman, & R. Tomassone (Eds.), Data analysis and informatics IV (pp. 485-497). Amsterdam: North-Holland. De Soete, G., Carroll, J. D., & DeSarbo, W. S. (1986). The wandering ideal point model: A probabilistic multidimensional unfolding model for paired comparisons data. Journal of Mathematical Psychology, 30, 28-41. De Soete, G., DeSarbo, W. S., Furnas, G. W., & Carroll, J. D. (1984a). Tree representations of rectangular proximity mamces. In E. Degreef & J. Van Buggenhaut (Eds.), Trena3 in mathematical psychology. Amsterdam: North-Holland. De Soete, G., DeSarbo, W. S., Fumas, G. W., & Carroll, J. D. (1984b). The estimation of ultramemc and path length trees from rectangular proximity data. Psychometrika, 49, 289-3 10. Fumas, G. W. (1980). Objects and their features: The metric representation of two class data. Unpublished Doctoral Dissertation, Stanford University. Halff, H. M. (1976). Choice theories for differentially comparable alternatives. Journal of Mathematical Psychology, 14, 244-246. Johnson, S . C. (1967). Hierarchical clustering schemes. Psychometrika, 32, 241-254. Luce, R. D. (1977). Thurstone’s discriminal processes fifty years later. Psychometrika, 42, 461-489. McFadden, D. (1976). Quantal choice analysis: A survey. Annals of Economic and Social Measurement, 5, 363-390. Schonemann, P. H., & Wang, M.-M. (1972). An individual difference model for the multidimensional analysis of preference data. Psychometrika, 37, 275-309. Thurstone, L. L. (1927). A law of comparative judgment. Psychological Review, 34, 273-286. Wang, M.-M., Schonemann, P. H., & Rusk, J. G. (1975). A conjugate gradient algorithm for the multidimensional analysis of preference data. Multivariate Behavioral Research, 10, 45-99. Zinnes, J. L., & Griggs, R. A. (1974). Probabilistic, multidimensional unfolding analysis. Psychometrika, 39, 327-350.
New Developments in Psychological Choice Modeling G. De Soete, H. Feger and K. C. Klauer (eds.) 0 Elsevier Science Publisher B.V. (North-Holland), 1989
177
PROBABILISTIC MULTIDIMENSIONAL ANALYSIS OF PREFERENCE RATIO JUDGMENTS Joseph L. Zinnes National Analysts, Philadelphia, PA, U.S.A. David B. MacKay Indiana University, U.S.A. A probabilistic multidimensional model is described for analyzing preference ratio judgments. This model combines the unfolding model of Coombs with the probabilistic model of Hefner, in which stimuli and individuals are represented by multivariate normal distributions. A simple procedure is described for approximating the maximum likelihood estimates of the location and variance parameters of the model. Two simulations show how well this procedure works, especially when there is considerable variability in the data. 1. Introduction
For some time now we have been confronted by a seemingly insolvable problem: how can be study, in a serious and convincing way, individual choice of interesting, multi-attribute stimuli when those stimuli are clearly identifiable. The problem with identifiable stimuli is that individual choices of those stimuli cannot be replicated a large number of times. It is easy enough, at least if one has sufficient reinforcers available, to ask subjects to indicate over and over again whether they can detect a signal buried in background noise, or whether they can identify which of two tones was presented, etc. But it is quite a different matter to ask subjects
This reseach was supported by National Science Grant SES-8120871. This paper is a revised version of an article publishcd in Cornrnitnicafion & Cognilion, 1987, 20, 17-43.
178
Zinnes & MacKay
over and over again which of two specific cars they prefer or which of two specific houses they would buy. Subjects can readily identify these stimuli and therefore can readily recall their previous responses. This is precisely the same problem that occurs in the testing field. Large numbers of replicated choices are important to the study of choice behavior because of the nature of the class of choice models that we believe are relevant. These models are inherently probabilistic and have large numbers of parameters to estimate. To estimate these parameters accurately and also to carry out sensitive statistical tests, namely those which discriminate between alternative choice models, generally require a considerable amount of replicated choices. At present we see only one way out of this predicament and that is to study individual choice behavior by collecting numerical judgments of preferences, rather than by obtaining choice data. Unlike choice data, these numerical judgments are obtained by having subjects indicate both which stimulus they prefer and by how much. The value of numerical judgment has been pointed out by numerous writers (Anderson, 1982; Eisler, 1982). They contain, under appropriate conditions, more information than simple choice responses, and therefore they make it possible, at least in principle, to obtain accurate parameter estimates using few if any replications of the individual judgments. This, at least, is our hope for the present. That hope does, of course, depend on a leap of faith: that numerical judgments and choice responses will be compatible, that the same underlying model will apply and therefore that the estimates of the parameters obtained by numerical judgments are precisely the same as those that would have been obtained had it been possible to replicate choice responses a large number of times. This is rather a large assumption. It is one we expect to investigate more fully in the future. Thus, in this paper, we pursue only the question of how to estimate, using numerical judgments, the parameters of a specific choice model, when it is reasonable to assume that that choice model is appropriate. The specific numerical judgment discussed in this paper is a ratio judgment, or what we call a preference ratio judgment. We assume that stimuli are presented pairwise to the subjects and that the subjects are asked to indicate how much they prefer one stimulus over another. The instructions that we have used in our own experimental work attempt to
Analysis of Preference Ratio Judgments
179
make it clear to the subject that what is wanted is a ratio judgment. The response of two, for example indicates that one stimulus is preferred twice as much as the other. To make sure that these instructions are understood, the subjects are given warm up trials involving very simple stimuli, such are lines of different lengths. The subjects then practice making ratio judgments concerning the relative lengths of pairs of lines. Although we confine ourselves in this paper to preference ratio judgments, it should not be concluded that this is the only type of judgment that could have been used to extract numerical information from subjects. In fact, the ubiquitous rating judgment has been used in the preference domain to do just this (Bechtel, 1976; Saaty, 1980; Scheffk, 1952; Sjoberg, 1967). Our use of the ratio judgment stems from our belief that it is the most appropriate judgment to use if our underlying preference model is indeed correct. In this model, the utility or desirability that a person has for a stimulus is represented by a Euclidean distance. Thus, choosing between two stimuli is conceptually equivalent to comparing two distances in a Euclidean space. Since distances in the model are determined only up to a multiplicative transformation, it would not make sense to compare the differences between two distances, as would be suggested by a rating judgment. This is the case, because differences are not invariant over multiplicative transformations. Their magnitude is thus totally arbitrary within the model. It would make sense, however, to determine the ratio between a pair of distances, because that value is invariant over a multiplicative transformation. Within the model, therefore, the preference ratio judgment is a meaningful judgment. There is another issue concerning our use of the ratio judgment. Even though we assume that subjects are carefully instructed to perform a ratio judgment, it does not follow that they will actually carry out those instructions. It has been shown (Birnbaum, 1982) that under some conditions subjects apparently respond to stimulus differences even when they are instructed to respond to their ratios. Birnbaum has, however, provided some evidence to indicate that when stimuli can be represented as distances, subjects appear to respond to the stimulus ratios when instructed to do so. It is, therefore, not unreasonable for us to assume that the preference ratio instruction does indeed generate a preference ratio response, that it does generate a response based on the ratio of two distances. However, the final determination of the precise conditions under which this
180
Zinnes & MacKay
assumption (and the others to be described in the following section) are valid, will have to wait for more detailed experimental tests. The preference model we use in this paper is a probabilistic, multidimensional version of Coombs’ unfolding model (Coombs, 1964). The probabilistic aspects are based on a model first put forth by Hefner (1958). The essential idea of Hefner’s model is to represent each stimulus in terms of a multivariate normal distribution and subjects’ decision processes in terms of random samples from these distributions. The Hefner model, or closely related models, have been used in connection with a number of different types of data. It has been used to explain same-different judgments (Zinnes 8z Kurtz, 1968; Zinnes 8z Wolf, 1977), choice response (Bijckenholt & Gaul, 1984; Croon, in press; De Soete, Carroll, & DeSarbo, 1986; Suppes & Zinnes, 1963; Zinnes & Griggs, 1974), similarity judgments (MacKay & Zinnes, 1981; Zinnes & MacKay, 1983) and recognition responses (Ashby 8z Townsend, 1986). The attraction of Hefner’s model is its conceptual simplicity. It is a very natural and powerful extension of the single dimensional choice models of Thurstone (1927). It is powerful, because the properties of the multivariate normal are well known and therefore one can answer in detail basic questions concerning the goodness-of-fit of the model and the invariance of its parameters over different experimental conditions. Our experimental work with preference ratio judgments has just begun (MacKay, Ellis, & Zinnes, 1986; MacKay & Zinnes, 1986). In these experiments, subjects made preference ratio judgments concerning residences that differed with respect to environment, location and economic characteristics. The Coombs-Hefner preference model discussed in this paper was applied to the data of these experiments and appeared to do well in explaining those data. In the following sections we focus on the problem of obtaining the maximum likelihood estimates of the parameters of the Coombs-Hefner preference model, when the data consist of preference ratio judgments. In Section 2, a simple approximation of the likelihood functions is developed. In Section 3, a simple expression for the initial estimates of the parameters is worked out. In the final two sections, two simulations are described, the purpose of which is to provide some idea of the accuracy and feasibility of the maximum likelihood estimates, especially when there is considerable variability in the data.
Analysis of Preference Ratio Judgments
181
2. The Preference Model
In the unfolding model of Coombs (1964), subjects and stimuli are both represented as points in an r-dimensional Euclidean space. The preference of the subjects are assumed to be determined by the distances between the subject points, called “ideal points”, and the stimulus points. The smaller the distance di, between ideal point i and stimulus point j , the more desirable is stimulus S, to subject Pi. To this deterministic model of Coombs we add the probabilistic assumptions of Hefner (1958). In particular, we let the r-dimensional random vectors Xi = (Xil,. . . ,Xi,), i = 1 ,..., m, be associated with the m ideal points and assume that they have an r-variate normal distribution with mean vector ui = (uilr . . . , ui,) and covariance mamx oT1,. Similarly, for the stimulus points, the r-dimensional random vectors X, = (X,l,. . . ,Xi,), j = m + l , . . . , m+n are associated with the n stimulus points and it is assumed that they also have an r-variate normal distribution with mean vector u, = @,I, . . . , u,,) and covariance matrix uf I,. The notation is intended to indicate that the variances of the components of each stimulus point do not differ from dimension to dimension, but that on any given dimension, the variances of the components of different stimuli may differ. Thus within a single dimension, these assumptions are precisely those of a Thurstone (1927) case 3 pair comparison model. The same is true for the variances of the ideal points. It would be desirable to formulate more general assumptions concerning the covariance matrices, but doing this might tend to increase the number of parameters that would have to be estimated. Under these assumptions, the interpoint distance di, is a random variable. On each mal, its value is determined by subject i sampling from the ith ideal and j t h stimulus dismbution and “calculating” the Euclidean distance between the two sample points. Thus, in terms of the rdimensional random vectors Xi and X,, the distance di, is given by d$ = (Xi - Xj)’(Xj - X j ) .
(1)
In contrast, the true distance Di, is not a random variable, but is defined in terms of the mean vectors ui and u, by
182
Zinnes
C ?
MacKay
It may also be noted that the true distance Di, does not correspond to the expected value of the distance di, and, in fact need not be monotonically related to it (Zinnes & MacKay, 1983). It will be useful to define the joint variance o$ by the equation
which can be conceptualized as the variance of the difference between the components x j k and x j k on each of the r dimensions. This term, which appears in many of the equations that follow, should not be confused with the variance of the distance di,. That variance, unfortunately, has a considerable more complex expression. To deal with preference ratio judgments, the experimental task of interest here, we use a direct adaptation of the decision rule of the Coombs unfolding model. It is assumed that subject i reports the preference ratio Ri,k when the ratio of the distances di, and dik equals Rijk, that is, when
Since the interpoint distances di,, i = 1, . . . , m, and j = m+l, . . . , m+n, are random variables, their values and that of the ratio Ri,k, can be expected to change with replications. The decision rule given in (4)only asserts that the subject accurately reports the ratio as it is perceived on each mal. It will be helpful to make one more assumption. This assumption concerns the independence of the distances di, and dik when the subject judges the stimulus pair S j and s k . We shall assume that the subject randomly selects two independent samples from his ideal point distribution, one of which is used to determine the distance dij and the other the distance dik. Under these conditions, the two distances dij and dik will be independent random variables. Whether this assumption is plausible or reasonable would depend on the specific details of the experimental procedure. If the two stimuli to be judged are presented sequentially or in widely separated spatial positions, the subject would have a tendency to evaluate each of the stimuli
Analysis of Preference Ratio Judgments
183
independently. This might also happen when the stimuli are complex, requiring the subject to spend a significant amount of time considering each stimulus separately. In any event, we assume in what follows that the two-sample, independence case applies and therefore that the ratio judgment Ri,k is based on the ratio of two independent random variables. Whether our results can be generalized to the one-sample, dependent case remains to be seen.
3. The Likelihood Function We consider first the problem of evaluating the probability density function of the ratio judgment Ri;k. This density function is needed because it forms the basis of the likelihood function that is to be maximized. Under the assumptions stated thus far, it follows that the "standardized" squared distance d $ / o $ has the noncentral chi-square distribution f 2 ( v , h i , ) , where the degrees of freedom equals the dimensionality of the space r and the noncentrality parameter h i ; equals
(Hefner, 1958; Zinnes & MacKay, 1983). Because of this and the independence assumption stated previously, we can immediately conclude that the ratio of the standardized squared distances
has the doubly noncentral F distribution F " ( v , , V k , h i , , X i k ) (Bulgren, 1971; Suppes & Zinnes, 1963; Zinnes & Griggs, 1974). The two noncentrality parameters of this distribution h i j and h i k are defined in (5), as they are for the noncentral chi-square distribution, while the degrees of freedom v, and v k are both equal to the dimensionality of the space r. These results indicate that there is a close relationship between the probability density function of the ratio judgment Ri;k and the probability density function of the doubly noncentral F distribution. Specifically, letting g (Ri;k) be the desired density function of R i , k , then
Zinnes & MacKay
184
where h”(.) is the density function of the doubly noncentral F distribution F ”(vj,Vk,hijhik). Equation (6) shows that if will be sufficient to focus our attention on developing a procedure for evaluating the function h”(.) of the doubly noncentral F distribution, in order to obtain a simple procedure for evaluating the density function g (&;k). We consider next, therefore, the F distribution. The exact expression of the density function of F ” distribution has been worked out (Bulgren, 1971; Kendall & Stuart, 1961, p. 252), but it is not expressible in closed form. It contains a doubly infinite series of terms, which for some values of the parameters - namely those in the tails of the distribution - converge extremely slowly. For practical applications, it is essential to find a simple, approximate expression of this density function. Two simple possible approximations immediately suggest themselves. One approach uses the central chi-square to approximate the noncentral chi-square distribution (Patnaik, 1949), this approach makes it possible to convert the F ” distribution to the central F distribution. The other approach uses a normal distribution to approximate the noncentral chisquare distribution. Although this latter approach has been used successfully to approximate the cumulative distribution function of the F ” distribution (Zinnes & Griggs, 1974), it did not seem to work as well to approximate the density function of this distribution. Consequently, our discussion here is confined to the former approach, based on using the central chi-square to approximate the noncentral chi-square distribution. From the Patnaik approximation, it follows that if sj has the noncentral chi-square distribution ~ ’ ~ ( v , , then h . ) s;/p; will have approximately the central chi-square distribution x il(v:) where the degrees of freedom v: I‘
equal
* v; =
(Vj
+ hj12
vj
+ 2h;
(7)
and the multiplicative factor p , is given by
v; + 2h; p i = v j + hi
-
Thus, to obtain the central F approximation of the F
”
distribution, we
Analysis of Preference Ratio Judgments
start with the distribution function of the F
185
distribution
Multiplying both sides of the inequality in (9) by p2v;/p1v;, use of (7), (8) and the definition
and making
reduces (9) to
Now we can make direct use of the Patnaik approximation. According to this approximation, the left-hand side of the inequality in (1 1) has approximately a central F distribution. Therefore, (1 1) can be written as follows
where H(*) is the distribution function of the central F distribution The degrees of freedom v; and vz, which in general will not be equal to each other and will have noninteger values, are given by (7). The final result is obtained by differentiating (12) with respect to f, which gives the approximation
F(v;,v;).
This equation expresses h", the density function of the F " distribution, in terms h, the density function of the central F distribution. It may be noted that the function h, which is the key element of (13), has a simple closed form expression and therefore (1 3 ) does indeed provide a straightforward procedure for evaluating h" for any of the values of its four arguments. To summarize, (13) which gives the approximation that is fundamental to evaluating the likelihood function to be maximized, replaces the density
Zinnes & MacKay
186
I L
n 1.50-
a
0.0
1.0
I .5
2.0
2.5
3.0
RANDOM V A R I A B L E
Figure 1. The central F approximation of the doubly noncentral F distribution. The degrees of freedom v1 and v2 are both equal to 2. The solid lines are the exact values, the dashed lines are the central F approximation. For curve A: hl = 1, h2 = 30, for curve B: hl = 1, hz = 4; for curve C: hl = 1, h2 = 1.
function of the F If dismbution, having equal degrees of freedom and integer values, with the density function of the central F distribution, having unequal degrees of freedom and noninteger values. Some idea of the accuracy of the approximation given in (13) is shown in Table 1 and Figure 1. Table 1 gives both the approximate and the exact values of the function hCf 1 vI,v2,hl,X2)where f = 1, v1 = v2 = 2, 4, 8 for a number of different values of and h2. The absolute and relative errors, given in columns 5 and 6 of the table, suggest that the approximation is quite good. The absolute errors do not exceed .02 and the relative errors do not exceed 6 percent. Furthermore, the larger relative errors seem to occur only for values in the tails of the distribution, where according to the last column of the table, convergence of the infinite series in the exact expression tends to be slowest. Figure 1 offers additional support for the approximation given in (13). Unlike Table 1, this figure attempts to show how the accuracy of the
Analysis of Preference Ratio Judgments
187
approximation affects the evaluations of the function g (Ri,k) for the entire range of values of the variable Ri,k = di,/dik. Three different distributions are plotted in this figure for the three different values of hi,: 1, 4 and 30. To highlight any major weakness of the approximation, the degrees of freedom v1 and v2 were both set equal to 2 for all three cases. Distributions with larger degrees of freedom tend to be more symmetric and therefore tend to be easier to approximate. Table 1. Exact and approximate values of the probability density function h(1 I v1 ,vz,hl,b)
11
h2
Exact
h (0 Approx.
.8 .I 1.O 1.o 1.o 1.o 1.o
1.o 1.o 2.0 3.0 4.0 8.0 12.0
.5184 .4920 S294 .5158 .4890 .3352 .1982
.5311 SO09 .5497 .5384 .5109 .3427 .I952
Degree of freedom
v, .8 .1 1.o 1.O 1 .O
1.o 1.0 1.o
.8 .1 1.O 1.0 1.0
1.o 1.O
1.o
Errof Percent
Kb
-.0128 -.0089 -.0203 -.0226 -.0218 -.@I74 -.0030
-2.46 -1.80 -3.94 -4.39 -4.47 -2.22 1.52
44 28 58 76 86 130 173
-.w4 -.0033 -.@I87 -.0122 -.0143 -.0108 -.0011 .006 1
-.57 -.44 -1.15 -1.65 -2.04 -2.14 -.34 6.23
52 33 70 86 97 149 192 288
-.0012
-.I1 -.09 -.27 -.48 -.71 -1.35 -.11 6.09
64 42 89 107 121 191 273 415
= v2 = 4
1.o 1.O 2.0 3.O 4.0 8.0 12.0 20.0
.7596 .7403 .7609 .7393 .7037 .5032 .3 138 .0978
.7640 .7436 .7697 .7515 .7180 S140 .3149 .0917
1.O 1.O
1.0980 1.0843 1.0937 1.0684 1.0283 .7246 .3815 .0973
1.0992 1.0852 1.0967 1.0736 1.0356 .7344 .3819 ,0913
2.0 3.O 4 .0 9.0 15.0 25.0
Absolute
-.OOO9 -.MI30 -.0051 -.0072 -.@I98
-.ooo4 ,0059
“The error equals the exact value minus the approximate value. bIndicates the number of terms summed to obtain the exact values of the density function given in the table.
188
Zinnes & MacKay
From the discrepancy between the exact and approximate values in this figure (the difference between the solid and dashed lines), it is evident that the approximation has its largest absolute error at the middle of the distribution, especially when the distribution is highly skewed. In general, however, the dashed lines (the approximate values) follow the solid lines (the exact values) quite closely, even in the tails of the distribution. For the level of accuracy typical of judgmental data, it would appear that (13) provides a reasonable level of accuracy. This is particularly encouraging, and to some extent surprising, since the approximation used in (13) is quite simple. 4. Starting Values
Even though a simple approximation was developed in the previous section for the density function of the observation Rijk, the likelihood function containing products of these density functions will still tend to be quite complicated. It is, therefore, not likely that a simple, closed-form solution exists that maximizes this function and consequently, it will be necessary to use iterative methods to obtain the maximum likelihood (ML) estimates of the unknown parameters. There are a number of standard iterative procedures that can be used (e.g., Chandler, 1969; IMSL, 1979). It does not seem to make a great deal of difference which one is selected, provided the iterative process starts with reasonably good parameter estimates. We consider next, therefore, procedures for obtaining good starting value (SV) estimates for both the coordinates and the uncertainty values of the stimulus and ideal points. Our concern here is to develop quick and simple procedures that can be expected to produce moderately accurate parameter values. SV estimates of the coordinates. For the purpose of obtaining these initial estimates, we assume that the joint uncertainty value = of + 0; is small relative to the distance DC. Well established metric, nonprobabilistic procedures can then be utilized. Although the accuracy of these SV estimates will depend very much on the validity of this assumption, it should be noted that these estimates are only the initial values of the iterative process. The final values, the ML estimates, do not require this assumption.
05
Analysis of Preference Ratio Judgments
189
Metric analyses of the unfolding model start generally with a set of I scales. The I scale for single subject Pi consists of the set of distances between ideal point i and each one of the stimulus points. To obtain these I scales in the present case, some preliminary analysis of the ratio judgment Ri;k is necessary. Define Rbk = log Rijk
and
Then the problem of estimating the distances Di, 0’ = m + l , . . . , m+n) for the subject Pi consists of solving the linear systems of equations R t k = D t - D;k
0‘ = m + l , . . . , m+n)
(14)
for D t . The least squares solution of (14) is
-*
Dij = RC.
+ 0;.
where we have let
and have taken to be the least squares estimate of DC. To be precise, it should be made clear that the solution given in (15) is only an approximation, and that this is true even when the ratio judgment Ri,k is averaged over an infinite number of replications. This follows from the fact that the expected value of the log of the ratio judgment Ri;k does not equal the true ratio log(Di,lDik) and will only approach the true value in the limit when the ratios Di;loi; and Diklo& both increase without bounds. This latter point is discussed more fully in the next section. Several different metric approaches can be used to analyze these I scales (Bechtel, 1976; Carroll, 1972; Schonemann, 1970). We have found it very effective to use Schonemann’s procedure to solve simultaneously for the coordinates of the stimulus and ideal points, but then to discard the
Zinnes & MacKay
190
solution for the ideal points. With the coordinates of the stimuli now treated as known, it is possible to set up, for each subject, a linear regression problem to solve for the coordinates of the ideal points and, incidentally, for the subject-specific constant Di., which appears in (15). Both the unknown coordinates and the antilog of the unknown constant 0:. turn out to be equal to the regression weights of the linear regression equation (see, for example, (26) in Zinnes & MacKay, 1983). It should be mentioned that this procedure for solving for the stimulus coordinates has had to depend on group data. In essence we have had to assume that the stimulus coordinates do not differ from subject to subject. It would have been desirable to have been able to solve for the stimulus coordinates separately for each subject, but, under the assumptions of the present model, this does not seem to be possible. SV estimates of uncertainty values. To determine the SV estimates of the uncertainty values oi (i = 1, . . . , m) and 0,(j= m + l , . . . , m+n), it will also facilitate matters to proceed as we did in the previous section. We assume that the uncertainty values are small compared to the interpoint distances and we also make use of the log transformation. This transformation is especially useful here, because the expected value and variance of the log of a random variable having the central F distribution have, at least approximately, very simple expressions. In particular, if log f has the central F distribution F (v;,vg), then
1
E(1/2 logf) = 1 / 2 y -- 7
,:
Var(l/2 logf)
=
1/2
I:, :,I +
when v;, and v; are large (Kendall & Stuart, 1963, p. 379). These results are directly relevant here, because as we have seen in Section 3, the central F distribution is closely related to the distribution of the ratio judgment Rijk. We begin, therefore, with an attempt to use (18) and (19) to obtain an approximate expression for the mean and variance of RTjk. From the definition of R i j k and letting
Analysis of Preference Ratio Judgments
191
we can write
The notation hjk is appropriate here, because, as noted earlier, under the assumptions of the model, hjk has the doubly noncentral F distribution F ”(v1, V 2 , h i j l h i k ) . Equation (13) now provides the motivation to define
hjk =
aik
hjk aij
where, as in (lo),
because from (13) we know that hjk will have approximately the central F distribution F (v;,v;) with the degrees-of-freedom parameters determined, as in (7), by
Substituting (21) in (20) gives
which expresses the square of the ratio judgment R$k directly in terms of a variable having approximately a central F distribution. We are now in a position to apply the approximation of (18) and (19), along with (24) to obtain the mean and variance of log Ri,k. It is appropriate to apply (18) and (19) in the present case, because under our present assumption concerning the size of ai, relative to Di,, the noncentrality parameter hi, will be large. And, from (23), it is clear that then the degrees of freedom v i will also be large, and, in fact, in the limit
Zinnes & MacKay
192
*
vij
hij
= -. 2
This fullfils the conditions required by (18) and (19). Thus, applying the operator 1/2 log to both sides of (24), we obtain the expected value
which, from (18), (25) and the assumption that hi; is large, reduces approximately to
We can proceed similarly to obtain the variance of R;k, but this time making use of (19) instead of (18). Equation (24) then becomes Var(R ;k) = var( 1 / 2 log fijk),
Equation (27) is the basic result we need. It suggests that an estimate of the joint variance o$ can be obtained by solving a simple, linear system of equations, provided that we have estimates of the left hand side of (27), namely, the variance of R Q k and have estimates of the true distance Di;. The latter estimates present no problem, since these interpoint distances can be calculated directly from the SV estimates of the coordinates determined in the previous section. For an estimate of Var(R;k) we can make use of the fact that under our present assumptions, the approximation given in (26) is still valid and therefore in the limit,
Analysis of Preference Ratio Judgments
193
Consequently, we can use for an estimate of Var(R&)
where the summation is taken over the ni;k replications of Ri;k and the distances Di, and Dik are, as before, calculated from the SV estimates of the coordinates. Keeping these estimates in mind, we return to the system of equations given in (27). Since these equations are linear in o $ / D $ the least squares solution is 2 *l *l si;2 - D~;(s;;. - si.. ),
where we are using s$ as the estimate of o$ and
In order to arrive at estimates of the uncertainty values oi and a, for the subjects and the stimuli, the estimates of the joint variances s$ given in (29) can be carried out one step further. To do this, however, requires distinguishing between two cases. Case I. Assume that the subject and stimulus uncertainty values are unique to each subject. Then the best that can be done with the estimate of the joint variances s$ is to set the subject uncertainty estimate si equal to some small arbitrary value and to solve for the stimulus uncertainty s, using
There is, however, the possibility that one of the uncertainty estimates might turn out to be negative. This can be avoided by letting the uncertainty parameter for subject Pi be defined by
Zinnes & MacKay
194
si = 1/2 min si, . i
(33)
This solution has the convenient property of equating the uncertainty estimate for subject Pi with the smallest stimulus uncertainty estimate for this subject. The estimates of uncertainty parameters given in (32) and (33) have severe limitations. Because of their non-uniqueness properties, it is not meaningful to compare the subject and stimulus uncertainty values over different subjects. It would only be meaningful to compare the stimulus uncertainty values within a single subject. Case 2. Assume that the stimulus uncertainty values do not differ for different subjects. Possible subject differences would then be solely reflected by differences of the subject uncertainty values. Thus, for this case there are exactly m subject uncertainty values and n stimulus values to estimate. The relevant equations for doing this are the equations: s' = sf = sQ, i = 1 ,
. . . , m,
j = m+l,
. . . , m+n
(34)
which is just a simple linear system of equations in the unknowns sf and sf. Consequently, the least-squares solution is "2
where 1
z SQ
s3 = ( l / m )
i
This solution has the property of equating the average stimulus variance with the average subject variance, that is, letting
Analysis of Preference Ratio Judgments
195
While this solution is convenient, it also has the undesirable property of allowing some variance terms to take on negative values. To avoid this, the minimum subject variance could be equated with the minimum stimulus variance. This is analogous to the approach taken in Case 1 . It can be accomplished by adding and subtracting a constant to the variance estimates obtained from (35) and (36). Specifically, if we let
. . ,m = m + l , . . . , m+n
2
i = 1,.
(41)
2
j
(42)
mi = m i n s i , 1
m, = min s,, i
then if mi is less than m,, we can define the new estimate si2 and s;’ in terms of the previous estimates, obtained from (35) and (36), by the equations si2 = s’ s;‘
+ (1/2) I mi - m, I
= sf - ( 1 /2) I mi - m, I .
(43) (44)
If the converse should be the case, namely that m, is less than mi, the same result can be achieved by interchanging the plus and minus signs in (43) and (44). The fact that the estimates of the uncertainty values are nonunique, as they were in Case 1, means that here too there are limitations as to which uncertainty values can be meaningfully compared. In the present case, it would be meaningful to compare the stimulus uncertainty values among themselves and similarly to compare the subject uncertainty values among themselves. It would not, however, be meaningful to compare the stimulus uncertainty values with the subject uncertainty values. This is to say, that within the framework of the probabilistic unfolding model we have assumed, it is not possible to discriminated between the variability due to the subjects and that due to the ideal points. And this is even true in the present case, where we have assumed that the subjects do not differ with respect to the uncertainty values associated with the stimuli. 5. Simulation I
In the previous sections, we have been concerned with developing a simple procedure for obtaining ML estimates of the parameters of a probabilistic, multidimensional choice model, using as data pairs ratio
Zinnes & MacKay
196
judgments. This procedure has consisted primarily of obtaining a simple, although approximate, expression for the likelihood function that is to be maximized and of obtaining a simple, although approximate, expression for the estimates of the parameters that can be used as the initial values of an iterative process. To determine how effectively and accurately this procedure works, two simulation studies were performed.
Y
...
: T
A
1
I
1
B
I
Figure 2. The original and recovered configurations of Simulation 1, Series 4. Panel A shows the original configuration. The 12 stimulus points are labeled A through F and 1 through 6; the 12 ideal points are labeled 0 through 2. The 6 stimulus points on the inner hexagon had an uncertainty value of 1.2. The remainder 6 stimulus points and 12 ideal points had an uncertainty value of . l . Panel B shows the configuration recovered from a nonmetric analysis; panel C the SV configuration, and panel D the ML configuration.
I
Analysis of Preference Ratio Judgments
197
Simulation I contained 24 points, 12 of which were treated as stimuli and the remaining 12 as ideal points. The 12 stimulus points were located on the vertices of two hexagons, one of which was completely contained within the other. The 6 vertices of the inner hexagon lie on a unit circle, while those on the outer hexagon lie on a circle of radius 1.732. The ideal points were randomly located throughout both hexagons. This arrangement of both the stimuli and ideal points is shown in panel A of Figure 2. Values of the uncertainty parameter were also assigned to each of the 24 points. The 6 stimuli on the outer hexagon and the ideal points were given the uncertainty value of .l. The 6 stimuli on the inner hexagon were assigned a series of larger values. In Series 1, each one of the six points on the inner hexagon was assigned an uncertainty value of .3. In the remaining 3 series, 2-4, these points were assigned the values .6, .9, and 1.2, respectively. Since the points on the inner hexagon lie on a unit circle, it can be seen that in Series 3 and 4, these points actually had substantial amounts of uncertainty. These large uncertainty values were selected for the points on the inner hexagon, because such values tend to make it very difficult for the estimation procedure to recover the underlying stimulus configuration (Zinnes & MacKay, 1983). The simulated data in each of the four series consist of 12 sets of 66 pairwise ratio judgments of the 12 stimuli, one set for each of the 12 subjects. This corresponds to a complete set of data for each subject, when it is assumed that the subject do not replicate the judgments, that is, do not judge each set of stimuli more than once. Preference ratio judgments were constructed by randomly sampling from the bivariate normal distribution, this means invariances were determined in each of the four simulations as indicated previously. In some cases, however, it turned out to be highly desirable to make a slight modification of some of the preference ratios calculated by this process. There were a few instances in which the values of the interpoint distances obtained from these random samples prove to be extremely small or extremely large, thus resulting in either a very small or very large ratio. Since these extreme values had a strong biasing influence on the estimates of the uncertainty parameter, it was considered to be desirable to place upper and lower bounds on the numerical values of the preference ratios. Consequently, a lower bound of .1 and an upper bound of 10 was
198
Zinnes & MacKay
arbitrarily imposed on these ratios. Calculated ratio values below the lower bound of .1 were set equal to .1 and those above 10, were set equal to 10. The effect of using the upper and lower bounds is explored in the following section. The sets of data generated in each of the four simulations were analyzed using three different estimation procedures: the SV and h4L procedures discussed in the previous section, as well as by KYST, a typical nonmetric (NM) procedure (Kruskal, Young, & Seery, 1973). The KYST analysis was performed by converting the data to I scales and using the standard options of that program that are relevant for analyzing I scales. This includes the options: “Split by rows”, stress 2 (which gave better results that stress l), “lower corner matrix”, a value of STRMIN equal to .OO01, and a starting configuration determined by the TORSCA option, which gave lower stress values than those obtained using the true parameter estimates. In all cases, the KYST analysis terminated normally, before reaching 200 iterations. It will be recalled that there are two types of parameters to estimate: the coordinates and uncertainty values. Since the 12 stimuli and 12 ideal points are embedded in a 2-dimensional space, there are 48 + 3 or 51 parameters to estimate in all. However, the actual number of independent parameters is somewhat less than this, because of the uniqueness properties of these parameters. The preference ratio judgment is, in fact, invariant over translation, rotation and stretching of the coordinate axes. Because of this, the number of independent coordinates to estimate is actually equal to 47. Table 2 shows the degree to which the configuration of stimulus and ideal points was recovered by each of the estimation procedures. Two different measures of recovery, R and D 2 ,are shown in this table: R is the correlation between corresponding interpoint distances in the true and estimated configuration; D is the sum of squared differences between optimally aligned coordinates of the true and estimated configuration. The origins of the coordinate axes for both configurations were placed at the centroid. Both R and D 2 were calculated using the 12 stimulus points and the 12 ideal points. Table 2 makes it clear that the accuracy of the three estimation procedures, while quite good at low levels of uncertainty, deteriorates as the level of uncertainty increases. This is to be expected, because the higher
Analysis of Preference Ratio Judgments
199
Table 2. Hexagon example: recovery of distances and coordinates
0
.3 .6 .9 1.2
sv .960 .94 1 365 .785
Correlation (R)' ML
Nonmetric
.998 .994 .986 .968
.953 .916 .659 ,579
Squared differences (D 2 ) b .3 .6 .9 1.2
352 1.353 3.630 6.875
.067 .153 .354 .926
1.201 2.04 1 1.329 9.312
R is the correlation between corresponding interpoint distances of the true and estimated configurations. It includes points of both stimuli and individuals. D 2 is the sum of squared differences between optimally aligned coordinates of the true and estimated coordinates. It includes the points of both stimuli and individuals. levels of uncertainty would tend to produce ratio judgments having a higher degree of variability. It is therefore reasonable to expect the standard errors of the coordinate values to be a function of the uncertainty values. It is also evident from this table that the rate of deterioration differs for the three estimation methods. The accuracy of the ML estimates seem to decrease only slightly with increases in the level of uncertainty. The deterioration of the SV estimates is somewhat greater and that of the NM (the nonmetric estimation procedure) greater still. These conclusions are consistent with both the R and D 2 statistics. In general, it appears that the MI., estimates of the coordinates are actually quite good, even when the levels of uncertainty are substantial. These conclusions are also evident from the plots shown in Figure 2. This figure shows the configurations recovered from the three different estimated procedures for Series 4, the one containing the highest level of uncertainty. The plots in this figure show the locations of both the 12
Zinnes & MacKay
200
stimulus and the 12 ideal points. For comparison purposes, Figure 2 also shows, in panel A, the location of the stimulus and ideal points in the true configuration. Except for stimulus point 2, located on the inner hexagon, it can be seen that the ML configuration, is as expected from Table 2, exceedingly accurate, even at this high level of uncertainty. The SV and the NM configurations are, as expected from Table 2, considerably less accurate at this high level of uncertainty. This is particularly true of the NM configuration, where the inner hexagon is quite highly distorted. The positions of stimulus points 1 and 2 are in fact reversed from their true position, while stimulus points 1 and 5 actually coincide in the NM configuration. The ideal points, in the three recovered configurations shown in Figure 2, have properties that are very similar to those of the stimulus points. The accuracy of the ideal points in the ML, configuration is substantially better than that in the SV and NM configurations. In fact, the locations of the ideal points in the NM configuration are especially poor. The ideal points T, X, Y and Z, while located within the outer hexagon in the true configuration, have actually been placed well outside this configuration in the NM configuration. In addition, the ideal points U, Q, S, and 0, while having very distinct positions in the true configuration, actually coincide in the NM configuration. Table 3. Hexagon examplc: rccovery of unccrtainly valucs
.3 .6 .9
1.2
.1
.I ,I .1
.368 .44 1
.so0 SO7
,095
.I13 ,152 .293
.303 ,606
384 1.123
.OX8 .089 .089
.088
Note: 0 1 and 0 2 arc thc uncertainty valucs of thc coordinatcs
of thc inncr and outer hexagons, rcspcctivcly.
Table 3 gives some indication of how well the uncertainty values are recovered by two of the estimation procedures, the SV and the ML
Analysis of Preference Ratio Judgments
201
procedures. Estimates using nonmetric methods are not shown in this table because those methods are purely deterministic and therefore do not provide for an estimation of variances. The uncertainty values shown in this table are consistent with the coordinates that were obtained in the process of calculating the sum of squared differences D 2 . In other words, as the same multiplicative factor was applied to the joint variances were applied to the coordinates in the process of optimally aligning the estimated configuration with the true configuration. This is appropriate, are only determined up to a multiplicative because the joint variances transformation. To arrive at the uncertainty values of the stimuli and the ideal points, the standardization discussed in the previous section was used. The minimum uncertainty values of the stimulus was set equal to the minimum uncertainty value of the ideal points. In the present case, this effectively means setting the uncertainty value of the ideal points equal to the uncertainty value of the stimulus points on the outer hexagon. Table 3 shows that the accuracy of the uncertainty estimates depends on the magnitude of the true value. The large uncertainty values are not estimated as well as the smaller values, although the ML estimates of the larger uncertainty values are actually quite good. In contrast, the SV estimates do deteriorate substantially at the higher levels of uncertainty. This is to be expected, since they were derived by assuming that the interpoint distances are large relative to the sizes of the joint variances.
05
05
6 . Simulation I1 In the previous simulation, lower and upper bounds were placed on the ratio judgments, to avoid the effects of extreme values biasing the parameter estimates. To determine whether, in general, such limits should be used, an additional simulation study was performed, one in which the number of replications of the preference ratio judgments was varied. Four levels of replications were used: stimuli on the inner hexagon were set equal to .3. As in the previous simulation, the stimuli on the outer hexagon were assigned an uncertainty value of .l, as were the ideal points also. The configuration of stimuli and ideal points used in this simulation was identical to the one used in the previous simulation.
Zinnes & MacKay
202
The data from the simulation were analyzed using two different approaches. In the No Limits approach, the ratio judgments obtained from random samples of coordinates were not modified, even when under some conditions they produced extreme values. In the Fixed Limits approach, lower and upper bounds of .1 and .10 were imposed on the ratio judgments. Table 4. Hexagon example: SV estimates of the Uncertainty parameter for different number of replications
Replications
1 2 4 8
No limits"
.252 .238 .300 .293
.187 .145
.116 .112
Fixed limits'
.368 .338 .321 .334
.095
.WO .070
.068
Note: The correct values of 01 and 0 2 are .3 and . I , respectively. No upper limit was placed on the ratio judgments to be
analyzed. Lower and upper limits of the ratio judgments to be analyzed were set at .1 and 10, respectively. The results for both approaches are shown in Table 4, for both the SV and ML estimation procedures. The values in this table show quite unmistakably that under the No Limits approach, the estimates of the uncertainty values become increasingly more accurate as the number of replications increase. This is not the case, however, for the Fixed Limits approach. Although the estimate of 01 improved slightly with increases in replication, the estimate of 0 2 becomes appreciably worse. Furthermore, although the No Limits approach is slightly better than the No Limits approach at low levels of replication, it is clearly the case that the No Limits approach is superior at the higher levels of replication. Thus, a concern for possible biasing effects of extreme values of the preference ratio appears to be justified only when there are few if any replications of the preference ratio judgments.
Analysis of Preference Ratio Judgments
203
These results seem very encouraging. Even though the data of these simulation studies are highly artificial, they seem to show that the maximum likelihood estimation procedure does amazingly well in recovering the location and the uncertainty values of the stimuli and the ideal points, even in the presence of a high degree of uncertainty. Of course, these results depend upon the validity of the underlying probabilistic model with which we are working. If the model were inappropriate in some situation, one could not expect the maximum likelihood estimation procedure, or any estimation procedure, to accurately estimate the unknown parameter values. It should also be recalled that the SV procedure was only intended to provide the starting values for the ML iterations. The fact that the SV estimates are as good as they are, given their relative simplicity, is also encouraging. Thus under normal conditions, when the data do not contain substantial amounts of variability, the ML iterations should converge fairly rapidly.
References Anderson, N. H. (1982). Methods of information integration theory. New York: Academic Press. Ashby, F. G., & Tonwsend, J. T. (1986). Varieties of perceptual independence. Psychological Review, 93, 154-179. Bechtel, G. G. (1976). Multidimensional preference scaling. The Hague: Mouton. Birnbaum, M. H. (1982). Controversies in psychological measurement. In E. Wegener (Ed.), Social attitudes and psychological measurement. Hillsdale, NJ: Erlbaum Biickenholt, I., & Gaul, W. (1984). A multidimensional analysis of consumer preference judgments related to print ads. Methodological advances in marketing research in theory and practise. EMACESOMAR Symposium, Copenhagen. Bulgren, W. G. (1971). On representations of the doubly non-central F distribution. Journal of the American Statistical Association, 66, 184186. Carroll, J. D. (1972). Individual differences and multidimensional scaling. In R. N. Shepard, A. K. Romney, & S . B. Nerlove (Eds.),
204
Zinnes & MacKay
Multidimensional scaling: Theory and applications in the behavioral sciences (Vol. 1). New York: Seminar Press. Chandler, J. P. (1969). STEPIT - Finds local minima of a smooth function of several parameters. Behavioral Science, 24, 81-82. Coombs, C. H. (1964). A theory of data. New York: Wiley. Croon, M. (in press). A comparison of statistical unfolding models. Psychometrika. De Soete, G., Carroll, J. D., & DeSarbo, W. S. (1986). The wandering ideal point model: A probabilistic multidimensional unfolding model for paired comparisons data. Journal of Mathematical Psychology, 30, 28-41. Eisler, H. (1982). On the nature of subjective scales. Scandinavian Journal of Psychology, 23, 161- 171. Hefner, R. A. (1958). Extensions of the law of comparative judgment to discriminable and multidimensional stimuli. Doctoral dissertation, University of Michigan. IMSL (1979). IMSL library reference manual. New York: International Mathematical and Statistical Libraries, Inc. Kendall, M. G., & Stuart, A. (1961). The advanced theory of Statistics (Vol. 2). New York: Hafner. Kendall, M. G., & Stuart, A. (1963). The advanced theory of statistics (Vol. 1, 2nd 4.).New York: Hafner. Kruskal, J. B., Young, F. W., & Seery, J. B. (1973). How to use KYST, a very flexible program to do multidimensional scaling and unfolding. Bell Telephone Laboratories, Murray Hill, NJ. MacKay, D. B., & Zinnes, J. L. (1981). Probabilistic scaling of spatial judgments. Geographical Analysis, 13, 21-37. MacKay, D, B., & Zinnes, J. L. (1986). Probabilistic multidimensional scaling of spatial preferences. In R. Golledge & H. Timmerman (Eds.), Behavioral modeling: Approaches in geography and planning. New York: Croom-Helm. MacKay, D. B., Ellis, M., & Zinnes, J. L. (1986). Graphic and verbal presentation of stimuli: A probabilistic MDS analysis. Advances in Consumer Research, 13, 529-533. MacKay, D. B., & Zinnes, J. L. (in press). Probabilistic multidimensional scaling of residential preferences: An experimental evaluation. Cahiers de Gebgraphie de Besuncon.
Analysis of Preference Ratio Judgments
205
Patnaik, P. B. (1984). The non-central chi-square and F distributions and their approximations. Biometrika, 36, 202-232. Saaty, T. L. (1980). The analytic hierarchy process. New York: McGraw-Hall. Scheffd, H. (1959). The analysis of variance. New York: Wiley. Schdnemann, P. H. (1970). On metric multidimensional unfolding. Psychometrika, 35, 349-366. Sjoberg, L. (1967). Successive intervals scaling of paired comparisons. Psychometrika, 32, 297-308. Suppes, P., & Zinnes, J. L. (1963). Basic measurement theory. In R. D. Luce, R. R. Bush, & E . Galanter (Eds.), Handbook of mathematical psychology (Vol. 1). New York: Wiley. Thurstone, L. L. (1927). A law of comparative judgment. Psychological Review, 34, 273-286. Zinnes, J. L., & Kurtz, R. (1968). Matching, discrimination, and payoffs. Journal of Mathematical Psychology, 5, 392-421. Zinnes, J. L., & Griggs, R. A. (1974). Probabilistic multidimensional scaling unfolding analysis. Psychometrika, 39, 327-350. Zinnes, J. L., & Wolff, R. P. (1977). Single and multidimensional samedifferent judgments. Journal of Mathematical Psychology, 16, 30-50. Zinnes, J. L., & MacKay, D. B. (1983). Probabilistic multidimensional scaling: Complete and incomplete data. Psychometrika, 48, 27-48.
This Page Intentionally Left Blank
New Developments in Psychological Choice Modeling G . De Soete, H. Feger and K. C. Klauer (eds.) 0 Elsevier Science Publisher B.V. (North-Holland), 1989
207
TESTING PROBABILISTIC CHOICE MODELS Patrick M. Bossuyt Erasmus University, Rotterdam, The Netherlands Edward E. Roskam University of Nijmegen, The Nctherlands A framework for the concepts “model”, “theory” and “data” within the context of probabilistic choicc models is presented. What is commonly called a test of a model comes down to an asscssment of the goodness-of-lit relation between a model of a theory and a model of data. A distinction is made between theories that lead to ordinal restrictions on choice probabilities (Type I), and theories that specify a functional relation between utilities and choice probabilities (Type 11). The study of probabilistic choice behavior could benefit from an assessment of the goodness-of-fit of Type I theories with statistical methods usually reserved for Type I1 theories.
1. Introduction
Probabilistic choice theory, and with it, the use of probabilistic choice models, originated from the psychophysical laboratories. The primary aim was the measurement of subjective sensations in a way that could be theoretically defended. Gradually, the ideas became intertwined with notions from classic algebraic utility theory and axiomatic measurement theory. During the past decades we have seen a further development of probabilistic choice theory in psychology, largely separated from a
This research was made possible by grant 00-40-30 from the Dutch Foundation for the Advancement of Pure Science Z.W.O. This paper is a revised version of an article published in Cornrnunicalion & Cognition, 1987,20,5-16.
208
Bossuyt & Roskam
growing field of behavioral decision studies. On the other hand, some of the basic ideas have been taken up by economists, which led to the development of a research area known as (probabilistic) discrete-choice modeling. Despite this growing tradition in probabilistic choice models both in psychology and economics, anyone trying to get acquainted with the literature is hindered by a surprising lack of unanimity in the terminology, especially when it comes to the use of words as “theory” or “model”. As a consequence, it will seldom be immediately clear what is meant when we hear that a “model is tested”. We were confronted with this difficulty in a research project within which we were to compare a numerous subclass of probabilistic choice theories and models known as “probabilistic unfolding”. In order to define what was actually at stake in this comparison, we were forced to construct a conceptual framework with a clear, distinct meaning for “model” and “test of a model”, and within which the relation between theory and observations could be adequately described and evaluated. A sketch of this framework is presented in this paper. We do not present an overview of probabilistic unfolding, for which the reader is referred to Croon (in press) and Bossuyt and Roskam (1985). The probabilistic choice theories mentioned in this paper only serve an exemplary purpose. They are discussed within their axiomatized form, with a variable set and a variable binary choice probability form, the familiar set-theoretical and logical constants, and the usual predicates and operations on the reals. Within the class of probabilistic choice theories, we will make a rather uncommon distinction between Type I and Type 11 theories. Any probabilistic choice theory assumes some form of regularity in the choices people make. In Type I theories these regularities are defined through ordinal or equality constraints on the choice probabilities. In Type XI theories the regularity resides in relations between the choice probabilities and a representation by means of a real-valued utility function, defined over the set of alternatives. Close to its meaning in logic, a possible realization of a theory will be a system of the appropriate set-theoretical structure: for a theory on probabilistic choice, this will be an ordered couple consisting of a set of alternatives and a set of choice probabilities (a choice probability function). A possible realization of a theory will be called a model of this theory if it
Testing Probabilistic Choice Models
209
satisfies all its valid sentences: its axioms and all sentences that can be logically derived from them. Within our view, a model of a theory cannot be a model of the data, for the simple reason that probabilities are in general not part of the recorded data, which means that they cannot be derived from the observations through a coding or classification procedure. At most they can be estimated, but never observed. Corresponding to the idea of a possible realization of a theory, we therefore use, following Suppes (1962), the notion of a possible realization of the data. This is again a set-theoretical structure of the appropriate type, containing all the information needed to test the theory in question. A possible realization of the data will be called a model of the data if the information it contains is valid. Along this exposition, we will defend the thesis that the relation between assumptions on probabilistic choice and observations should more frequently be evaluated through Type I theories, and with a statistical approach similar to the one that has up till now been followed for Type I1 theories. The difference between the use of Type I theories and Type II theories for evaluating the theory-observations relation parallels an existing distinction between “scaling” and the evaluation of necessary and sufficient axioms for a homomorphic mapping of an algebraical relational system on a numerical relational system in axiomatic measurement theory (Krantz, Luce, Suppes, & Tversky, 1971, pp. 32-33). 2. Models of Theory To ease the exposition, we will restrict ourselves to (forced) binary choices, or paired comparisons, and a single choosing entity, either a single subject, a group or a population. In the following we will refer to this entity as “the subject”. In a binary choice situation we have a nonempty set of alternatives from which option sets of two elements are constructed. Out of each option set, the subject has to choose one, and only one, element, the no-choice option being eliminated. A probabilistic choice theory describes a choice from an option set probabilistically. This means, as the result of an independent Bernoulli mal, with a particular choice probability which is supposed to remain constant over repeated presentations of the same option set. A possible realization of a theory of probabilistic binary choice is then an ordered couple
Bossuyt & Roskam
210
4 , p > satisfying the following: BCP.l S is a nonempty set BCP.2 p is a real-valued function defined on S x S as follows : v x , y E s, x # y : a. 0 I p ( x , y ) I1 b. P @ , Y ) + p b , x ) = 1 c. p (x,x) = 112. We will call such a couple 4 , p > a binary choice probability (BCP) system, and read p ( x , y ) as “the probability that the alternative x is chosen out of an option set consisting of x and y only”. In addition to the fundamental probabilistic assumptions, all probabilistic choice theories assume some form of consistency or regularity in the choice probabilities. We will distinguish between two classes or theories, which we will introduce with some simple examples.
2.1 Theories of Probabilistic Choice - Type I In a first class of theories the consistency is imposed through a number of constraints on the choice probabilities. A simple theory capturing this consistency assumption is the “weak stochastic transitivity” theory T I , defined through the following axiom A. 1: A.l V x , y , z E S : P ( X , Y ) 2 p ( Y , x ) & p ( Y , z ) ~ P ( z , Y )imply p ( x , z ) 2 p ( z , x ) . Loosely interpreted, this axiom expresses the notion that if a subject is inclined to choose x out of { x , y } and y out of { y , z ) , this subject will also be inclined to choose x out of {x,z}. In a way, weak stochastic transitivity is the simplest probabilistic analogue of the rational transitivity prescription in algebraic utility theory. A second theory, which we will call T 2 , can be defined through the “substitutability” axiom A.2: A.2 V x , y , z E S : P(X,Y) 2 p ( Y , x > ifs p ( x , z ) 2 p b z ) . This axiom states that whenever a subject is inclined to choose x rather than y out of { x , y } , the probability of choosing x out of an option set with a third element will exceed that of choosing y , and vice versa. To
Testing Probabilistic Choice Models
21 1
complete our set of examples of Type I theories, we present T 3 . It is defined through axiom A.3 which states that the probability of two intransitive choice cycles is the same in any subset of three alternatives:
A.3 V x , y , z E S : P(X,Y)P(YJ)P(ZJ)=P(X,Z)P(Z,Y)P(YJ). So far we have three examples of Type I theories of probabilistic choice. In each theory a distinct form of consistency is imposed on the choice probabilities, either through ordinal constraints (A.1, A.2) or through an equality (A.3). For any of the theories on probabilistic choice, a possible realization will be called a model of theory Ti if it satisfies the corresponding axiom A.i. If a BCP system (as defined through BCP.1, BCP.2) is given, testing whether or not it is a model of a particular theory then becomes a straightforward task: one simply checks the relevant axioms. As an example, consider the BCP system <E,p‘>: E = { e , f , g ) p ’ ( e , f ) = 3/5 p ’ ( f , g ) = 2 / 3 p‘(e,g) = 7/10.
(1)
It is easy to see that this BCP system satisfies weak stochastic transitivity (A. 1) and is therefore a model of T 1. It satisfies substitutability (A.2) and is a model of T 2 , but it fails to satisfy A.3, so it fails to be model of theory T 3 . The three theories we presented as an example are hierarchically related to one another. Theory T1 is a subtheory of T 2 : substitutability implies weak stochastic transitivity and, hence, any model of T 2 will also be a model of T I . The same holds for T z and T 3 : T z is a subtheory of T 3 . As a consequence, if a BCP system fails to be a model of T 2 , it cannot be a model of T 3 . In the following subsection we turn to the Type I1 t heories.
2.2 Theories on Probabilistic Choice - Type I1 Whereas in the Type I theories the consistency was imposed through constraints on the choice probabilities themselves, in Type I1 theories the regularity in choices is defined through a representation by means of a real-valued function defined over the set of alternatives S (or, alternatively, its product set). Usually the latter function is interpreted as a utility function. Here also we will present three simple examples.
Bossuyt & Roskam
212
-
The first example is the weak utility theory T 4 : A.4 V x , y
E
S : p(x,y) 2 p(y,x)
u ( x )2
uQ).
This theory states that there exists a utility function u defined over S, with a tendency to choose x rather y if the utility of x , u (x), exceeds that of y . A second example is a Fechnerian theory T g: A S For a real-valuedfunction H : V X ,y E S: p ( ~ , y=) H [ u ( x ) ~ ( y ) ] .
-
As a third example we present the strict utility theory T g :
How can we test whether or not a BCP system is a model of any of these Type I1 theories? If we succeed in finding an adequate representation, this means, a utility function u for which the choice probabilities satisfy the relevant axiom(s), we have demonstrated the model relation. Take again the BCP system <E,p’> (1). It is a model of the weak utility theory T4 (or, alternatively, “it is a weak utility model”) since the choice probabilities satisfy axiom A.4 with the following function u‘ on E : u‘(e) = 3, u’cf) = 2, u’(g) = 1. We were not able to find a similar function to satisfy either A S or A.6, so we cannot make a decision on the model relation between <E,p’> and the theories T 5 and T 6. Fortunately, there exists another way to test these model relations. Ever since the late fifties, a number of authors have been studying the equivalency relations between what we have labeled Type I theories and Type II theories (see for example Block & Marschak, 1960; Luce & Suppes, 1965). Simultaneously, the study of the formal foundations of measurement led to the formulation of sets of necessary and sufficient conditions for homomorphic mappings of certain algebraic relational systems to numerical relational systems (Krantz et al., 1971; Suppes & Zinnes, 1963). The study of probabilistic choice has also taken advantage of these results. Some examples might illustrate these relations. (We refer to the authors mentioned earlier for proofs of the results to follow.) If a BCP system 4 , p > is a weak utility model, then it is also a model of T 1, but not conversely. So axiom A . l is a necessary, but not a sufficient
Testing Probabilistic Choice Models
213
condition for a BCP system to be a model of the weak utility theory T 4 . If 4 , p > is a model of T 1, then it is also a model of T4 if S is finite. A second example: if a BCP system 4 , p > satisfies A.6, and, hence, is a model of theory T 6 , then it will also satisfy A.3 and be a model of T 3 , but not conversely. The converse relation holds if the set S is finite, and vx,y E
s: 0 < p ( x , y ) < 1.
A third example: a BCP system satisfying A S will also satisfy A.2, but no sufficient conditions are known for a BCP system satisfying A.2 to be a model of T S. These relations can help us a great deal in evaluating the model relation between a BCP system and a Type I1 theory. If the system fails to satisfy a necessary condition, the model relation has to be rejected; if it satisfies a set of sufficient conditions, it is a model of the corresponding theory. In other cases, no decision can be made. Take again the BCP system <E,p‘>. It failed to satisfy A.3, so it cannot be a strict utility model. However, it satisfies A.2, so it still may be a model of a Fechnerian theory T 5 . A number of authors share the opinion that only Type I1 theories can be called probabilistic choice theories; a view that reveals itself when the Type I theories are discussed as “observable properties” (as in Luce & Suppes, 1965). We think that this distinction is unwarranted, and incorrect. Type I and Type 11 theories differ in the way the regularity in (probabilistic) choice is defined: in the first class of theories the constraints are defined on the choice probabilities themselves, whereas in the second class of theories they are related to a representation by a utility function. In a way, both classes of theories can be seen as defining “properties” on the choice probabilities, but neither the first set of axioms nor the second is “observable”. It will be obvious that the utility functions u are unobservable, but the same holds for the binary choice probabilities. Since we never observe choice probabilities a BCP system cannot be a possible realization of empirical data. How a researcher interested in the relation between a probabilistic choice theory and a body of observations can proceed is discussed in the following section.
Bossuyt & Roskam
214
3. Models of Data Corresponding to possible realizations of theory, we will use the notion of possible realizations of data as a help in evaluating the theory-data relation. As far as we know, Suppes (1962) has been the first to make this distinction. Due to the background of probabilistic choice modeling, our conception of “models of data” differs slightly from his. A possible realization (or valid interpretation) of the data will be a set-theoretical structure of the appropriate type, designed to incorporate all the information about the experiment which can be used in tests of the adequacy of the theory. In our view, “data” refers to “recorded data”: everything that can be obtained from empirical observations through a coding or classification procedure. The “data” as data are never observed: “behavior does not yield data by parthogenesis” (Coombs, 1964). It may be obvious that not all observations will be coded. All details that appear inessential for the intended use will be omitted. Ultimately, what is to be coded will depend on the conceptual framework used. For the probabilistic choice theories introduced earlier, an ordered couple d,C>will be a convenient realization of the data if it satisfies the following: BCS.l S is a nonempty set BCS.2 C is a set of indexed binary-valued functions q , defined on S x S as follows: either one of the following holds a. c&,y) = 1 & c l ( r , x ) = 0 b. c~(x,Y)= 0 & c ~ @ , x = ) 1 c. q ( x , y ) = 0 & q ( r , x ) = 0. If these conditions are fulfilled, we call d , C > a binary choice (BC) system. We read cl(x,y) = 1 as “the alternative x has been chosen out of the option set { x , y } on occasion 1” and q ( x , y ) = 0 as “the alternative x has not been chosen out of the option set {x,y} on occasion 1”. Condition BCS.2.c implies that the option set {x,y} has not been presented on occasion 1. A BC system will be called a model of the data if its components are valid, both within the sense of the empirical observations and the
Testing Probabilistic Choice Models
215
fundamental probabilistic assumption. Here “validity” has to be interpreted as “truth preserving”. One side of this quality is easily understood: if a subject chooses a out of { a , b } on occasion h, and a BC system contains ch(a,b) = 0, it obviously cannot be a model of the data. However, a complete evaluation of this validity relation is not easily made, since it involves checking a plentitude of assumptions, related to the data collection procedure used, the experimental design, and what Suppes (1962) has called “ceteris paribus conditions”: disturbing environmental conditions, such as unwanted noise, bad lighting, and so on. To mention one, our basic probabilistic assumptions imply that every presentation of an option set of two alternatives can be treated as equivalent to any other presentation of this option set. This assumption will not be automatically met and will require a careful design of the choice environment. Though this subject certainly deserves more-elaboration, we will not expand on it in this paper. 4. Theory Versus Data: Evaluating Goodness-of-Fit Suppose we have a theory on probabilistic choice and a set of observations. We will be interested in the relation between observations and theory. This means that we are interested in the question whether out theoretical assumptions can be maintained in the light of the empirical observations made. This does not means that we will try to evaluate whether or not the model or the data - the binary choice system - is a model of the theory. This cannot be done, for it would imply checking the relevant axioms: a senseless job, since these are expressed in terms of choice probabilities, and the model of the data only contains choices. For Type I1 theories the following strategy is usually adopted. Given a model of the data, one mes to construct a model of the theory corresponding maximally to it. Most frequently, the likelihood will be the correspondence criterion. The likelihood function expresses the joint probability of a BC system, given the estimated BCP system. Because of the fundamental probabilistic assumption, each choice (i.e., each q ( x , y ) ) is a realization of an independent Bernoulli trial, governed by the choice probability p ( x , y ) . To give an example, to construct a strict utility model of theory T g , the utility function u is sought maximizing the likelihood, where the choice probabilities are calculated through A.6. To construct a
Bossuyr & Roskam
216
Type I theories
Type I1 theories
1 model of theory goodness-of-fit relation
I
model of data
I
empirical observations
Figure 1. The theory-observations relation in probabilistic choice.
Fechnerian model of theory T 5 , a particular distribution function H is selected and the maximum likelihood utilities u are sought. The likelihood is calculated using AS. Suppose all this has been done. At this stage, a number of authors present some statistical decision procedure, calling it a “test of the model”. No matter what this procedure consists of, this terminology itself cannot be correct. Obviously, the BCP system obtained through the maximum likelihood strategy will be a model of the theory, by construction. On the other hand, the BC system had to be a model of the data, for, if not, the whole strategy could not have been adopted. What will actually be done then is an evaluation of the correspondence relation between the model of the theory and the model of the data, a more common word for “correspondence” in this context being “goodness-of-fit” (Figure 1). If the maximum likelihood method has been used, the likelihood function is the indicated device for this purpose. Within the Neyman-Pearson approach to statistical testing, the generalized likelihood-ratio test can be used to test the null hypothesis that the model of the data constitutes a set outcomes from a BCP system that is a model of the data, versus the alternative hypothesis that it is not.
Testing Probabilistic Choice Models
217
It is worth reemphasizing that this is not a test of the model relation, but an evaluation of the statistical correspondence between a model of the theory and a model of the data in terms of likelihood. Unfortunately, this likelihood ratio test is not very useful when it comes to evaluating why a model of the theory did not correspond to the model of the data. A failure of the likelihood ratio test tends not to be particularly instructive: it tells us that the goodness-of-fit is in fact rather bad, without indicating why. However, there exists a second way of evaluating the relation between empirical observations and Type I1 theories. Suppose we were able to construct a model of a Type I theory, fulfilling the sufficient conditions for being a model of a Type II theory also, and showing an acceptable goodness-of-fit with the model of the data. In that case we know, without having constructed a representation with the utility function(s), that the BCP system obtained will also be a model of the Type II theory. We do not claim that this idea of evaluating the goodness-of-fit of a model of the data with a Type I1 theory through a model of a Type I theory is a new one. However, the evaluation of the correspondence relation between models of the data and models of Type I theories has usually been done in an unsatisfactory way. Luce and Suppes’ (1965, p. 379) quotation still applies to a major practice in the field: Lacking satisfactory statistical methods, authors often simply report, for example the number of violations (...) and, on some intuitive basis, they conclude whether the failures are sufficiently numerous to reject the hypothesis. In olhcr situations, tables or plots of data are reported and the reader is lclt pretty much on his own to reach a conclusion. Because the rcsuIts are almost never clear cut, one is left with a distinct feeling of inconclusiveness. Usually the evaluation between observations and theory is not based on the goodness-of-fit relation between a model of the data and a model of the theory, estimated on the basis of the maximum likelihood principle. Instead a system of choice proportions is constructed and the model relation with the theory is evaluated. In general, a failure of the relevant axiom does not lead to a rejection of the model relation since, due to statistical errors, we are likely to find a number of violations. However, as Luce and Suppes remarked, the determination of the acceptable number of violations is based on rather intuitive grounds, and so is the assessment of
218
Bossuyt & Roskam
the relation between theory and observations. To give an example, take the following BC system, d ; , C > , which we take to be a model of the data, with F = { d , e , f , g , h } and C a set of 10 functions, from which the system @,k> of binary choice proportions can be obtained:
k ( d , e ) = 0.9 k ( d , f ) = 0.5 k ( d , g ) = 0.9 k ( d , h ) = 1.0 k ( e , f ) = 1.0 k ( e , g ) = 1.0 k ( e , h ) = 0.4 k ( f , g ) = 0.8 k ( f , h ) = 1.0 k ( g , h ) = 0.1. One can easily check that the system &,k> of binary choice proportions fails to be a model of the weak stochastic transitivity theory T I . There are three failures of A.l, due to the existence of an intransitive cycle in { e , f , h } . We may ask ourselves whether or not these three violations constitute enough evidence to reject the hypothesis that we can construct a model of T1 with an acceptable goodness-of-fit to the model of the data &,C>. We have no idea whether or not this is the case. Two of the intransitive triples can be made transitive simply by reversing one out of the ten choice out of the option set { d , f ) , and out of {e,g}. However, we can proceed in much the same way as we would do to evaluate the relation with a Type I1 theory. Taking the likelihood as the criterion, we can look for the “best fitting” model of T I . Using a branch and bound algorithm developed for this purpose (Bossuyt & Roskam, 1985) we found that the following BCP system &,p”> was the maximum likelihood model of the weak stochastic transitivity theory:
p”(d,e) = 0.5 p ” ( d , f ) = 0.5 p”(d,g) = 0.9 p”(d,h) = 0.5 p ” ( e , f ) = 0.5 p”(e,g) = 1.0 p”(e,h) = 0.5 p ” ( f , g ) = 0.8 p ” ( f , h ) = 0.5 p”(g,h) = 0.1. To test the null hypothesis mentioned earlier, with level 0.05, we used the log likelihood ratio. It turned out to be h = -26.676. From a series of simulations in which the behavior of the likelihood ratio under the null hypothesis was observed, we know that this value does not lie in the critical region ([-28.679,0.0], n = 1000). Therefore, we do not reject the null hypothesis. Since d;,p”> is a model of T 1 and since the set F is finite, the BCP system cF,p“> is also a weak utility model. A utility function u
Testing Probabilistic Choice Models
219
for which c F , p f f >satisfies A.4 is:
u(d)=2 u(e)=2 ucf)=2 u(g)= 1 u(h)=2 We can proceed in a similar way for other Type I theories. Contrary to evaluating the goodness-of-fit relation for each model of a Type I1 theory, the subsequent goodness-of-fit evaluation for models of hierarchically related Type I theories gives us more information on which theoretical assumptions on probabilistic choice did survive a confrontation with the data, and which did not. It is our conviction that the construction of appropriate BCP models based on the maximum likelihood principle offers both a more promising way of evaluating the relation between theoretical assumptions and empirical observations, and an approach to the representation of choice probabilities which is more in line with the principles of axiomatic measurement theory.
References Block, H. D., & Marschak, J. (1960). Random orderings and stochastic theories of response. In I. Olkin, S. Ghurye, W. Hoffding, W. Madow, & H. Mann (Eds.), Contributions to probability and statistics. Stanford, CA: Stanford University Press. Bossuyt, P. M., & Roskam, E. E. (1985). A nonparamemc test of probabilistic unfolding models. Paper presented at the 4th European Meeting of the Psychometric Society, Cambridge. Coombs, C. H. (1964). A theory of data. New York: Wiley. Croon, M. (in press). A comparison of statistical unfolding models. Psychomerrika, in press. Krantz, D. H., Luce, R. D., Suppes, P., & Tversky, A. (1971). Foundations of measurement theory (Vol. 1). New York: Academic Press. Luce, R. D., & Suppes, P. (1965). Preference, utility and subjective probability. In R. D. Luce, R. R. Bush, & E. Galanter (Eds.), Handbook of mathematical psychology (Vol. 3). New York: Wiley. Suppes, P. (1962). Models of data. In E. Nagel, P. Suppes, & A. Tarski (Eds.), Logic, methodology and philosophy of science. Stanford, CA: Stanford University Press. Suppes, P., & Zinnes, J. L. (1963). Basic measurement theory. In R. D. Luce, R. R. Bush, & E. Galanter (Eds.), Handbook of mathematical psychology (Vol. 1). New York: Wiley.
This Page Intentionally Left Blank
New Developments in PsychologicalChoice Modeling G. De Soete, H. Feger and K. C. Klauer ( 4 s . ) 0Elsevier Science Publisher B.V. (North-Holland), 1989
22 1
ON THE AXIOMATIC FOUNDATIONS OF UNFOLDING: WITH APPLICATIONS TO POLITICAL PARTY PREFERENCES OF GERMAN VOTERS Bernhard Orth University of Hamburg, FR Germany Sufficient conditions for the existence of a qualitative J-scale of the unfolding model are given in terms of an “unfolding structure”. This measurement structure is illustrated by a set of hypothetical data and then applied to an analysis of prcfcrence orderings of German political parties obtained from about 4,000 voters in 1969, 1972, and 1980. It turns out that these prefcrence orderings cannot be unfolded appropriately in either one or two dimensions. On the level of aggregated orderings according to the most preferred party, however, these exist one-dimensional unfolding solutions (representing the parties) as well as structurally simple graph theoretical representations of these groups of voters. The findings suggest the hypothesis that preferences for German parlics are determined by both the political left-right dimension and the preferred party coalition.
Earlier drafts of portions of this paper benefited very much from discussions with David H. Krantz and Clyde H. Coombs. The analyzed data are taken from the studies “Bundestagswahl 1969” and “Bundcstagswahl 1972” (stored by the Zentralarchiv fiir empirische Sozialforschung under ZA-Nr. 0426 and ZA-Nr. 06350637, respectively) and “ZUMA-Bus 1980” (Zentrum fur Umfragen, Methoden und Analysen e.V.). The 1969 and 1972 data are given in Norpoth (1970) and the 1980 data in Pappi (1983). I wish to thank Franz Urban Pappi for pointing out and correcting some errors in the 1980 data. Last but not least I thank Gesine Muller for her assistance in the data analysis. This paper is a revised version of an article published in Zeitschrifr fir Sozialpsychologie, 1987, 18, 236-249.
222
Orth
1. Introduction The measurement theoretical foundations of the method of unfolding (Coombs, 1950, 1952, 1953, 1964) are still poorly understood. Although some attempts have been made, e.g., by Suppes and Zinnes (1963), Ducamp and Falmagne (1969), and Krantz, Luce, Suppes, and Tversky (1971) necessary and/or sufficient conditions for the existence of either a qualitative or an quantitative J-scale have not yet been established. Suppes and Zinnes and Krantz et al. consider unfolding representations in terms of a single scale representing both objects (stimuli) and persons (or persons’ ideal points) whereas Ducamp and Falmagne propose a representation in terms of two different scales (one scale for the objects and the other one for the persons). Both types of representations can be said to characterize the unfolding model. But they do not correspond to the actual unfolding method which aims at the construction of a (qualitative or quantitative) stimulus scale and which provides just ordering information with respect to persons or ideal points. The approach taken in this paper closely corresponds to the actual method of unfolding. It is based on a representation suggested by Orth (1976), that takes into account a stimulus scale only. This representation, however, is not stated in terms of the primitive notions (i.e., the individual I-scales) but in terms of a defined relation on the set of objects. As a consequence, the formulation of axioms becomes fairly complex and a little bit cumbersome. For this reason and because of space limitations this paper gives sufficient conditions for the existence of a qualitative J-scale only. These axioms are stated i n terms of an unfolding structure in Section 2. The unfolding is illustrated by an example with hypothetical data (Section 3) and then applied to real data on preferences for political parties (Section 4). These data have been gathered from representative samples of Gemian voters in the years 1969, 1972, and 1980. The final Section 5 is devoted to a brief discussion of the tindings. 2. An Unfolding Structure Let A and P be sets and 2 be a binary relation on A x P . The sets A and P are conceived of as sets of empirical objects and persons, respectively, and the relation up 2 bp is to be interpreted as “ p (weakly) prefers h over a’’ or “the distance between u and p is larger than or equal to that
On the Axiomatic Foundations of Unfolding
223
between b and p.” (Note that unfolding is not concerned with up 2 b4 with p f 4; these cases will be excluded below.) A qualitative J-scale may be defined as a real-valued function @ on A, unique up to strictly increasing transformations, such that, for all a, b E A,
a
>* b
iff @ ( a )2 Q(b)
where >* is uniquely determined by 2. mere, “uniquely determined” essentially means that >* is appropriately defined in terms of 2; and exact definition will be given below.) This definition of a qualitative J-scale corresponds to the unfolding representation studied in this section. It turns out to be convenient to introduce a betweenness relation on the set A. Such a relation facilitates the formulation of sufficient conditions for the existence of a function Q satisfying the representation above as well as the construction of such a function. A betweenness relation can be defined in terms of 2 as follows: Definition I . Let A, P , and 2 be as above. For all a , b, c that b is bemeen a and c, denoted a I b I c, iff
up 2 bp, cp either a
f c
and c4
bq, aq
E
A, we say
and
or a = b = c, for some p, 4
E
P.
According to this definition, an object b is between two objects a and c (on the J-scale to be constructed) whenever there are two persons (say, p and 4 ) such that one of them prefers both b and c over a and the other one prefers both b and a over c. There are thus four combinations of two I-scales leading to a I b I c; these four cases are illustrated in Figure 1. Unfolding assumes that the individual preference orderings are singlepeaked. In terms of Definition 1, this basic property of single-peakedness can be stated as follows:
If a I b I c, then either up 2 bp or cp 2 bp (or both) for all a, b, c E A and p E P. Hence, whenever a 1 b I c, singlepeakedness is violated if there is a person preferring both a and c over b. Definition 1 also allows n-ary betweenness relations ( n > 3 ) to be defined quite naturally in terms of the ternary relation:
Orrh
224
fiil
IiJ ab
ac bc
ab
ac
bc
hill ab
ac bc
IIVI
ab
ac
bc
ap 2 b p 5 CP. c q L bq 2 aq
ap 2 b p z CP, cq L aq z b q
ap 2 CP z bp, cq a bq E aq
a p a cp z bp. cq z aq z b q
p ' cba q : abc
p: cba q: bac
p: bca q: abc
p: bca q: bac
Figure 1. Four cases of two persons' I-scalcs yielding thc the bctweenness relation a 1 b I c according to dclinition 1 .
a ( b ( c ( d iff a ) b ) c , a ) b ) dn, l c l d , a n d b l c l d ; and
a l b l c l d ) e iff a I b I c Id, a 1 bl c I e, a I b I d l e, a I c I d l e, and h I c I d l e ; and so on (for all a, b, c, d, e E A). These definitions almost directly yield a method for constructing a qualitative J-scale. Axioms for the existence of a qualitative J-scale must put suitable restrictions on the primitives A and P and especially on 2,and they have to make sure that the betweenness relation will satisfy some important properties needed for deriving a complete and consistent ordering of the objects. The conditions given in the next definition serve this purpose.
Definition 2. Let A be a set with a least two elements, let P be a nonempty set, and let 2 be a binary relation on A x P. The relational system U i , P , z > is an unfolding structure iff, for all a, h, c, d E A and p , 4 E P , the following four axioms hold: 1. Either up 2 bp or bp 2 up; and if up 2 hi/,then p = 4.
2. If ap 2 bp and hp 2 cp, then up 2 cp. 3. There exists an r
E
P such that ar 2 hr.
On the Axiotnatic Foundations of Unfolding
225
4. If a I b I c and a I c I d, then either bp 2 cp or dp 2 cp. According to Axiom 1, 2 is conditionally connected. That is, 2 holds only for pairs with a common element out of P, and for those pairs, it is connected. Thus, this axiom merely specifies the preference orderings to be individual ones and, together with Axiom 2 which asserts transitivity of these orderings, it assumes that these I-scales are weak orders. Axiom 3 ensures that there are at least two different I-scales. Note that otherwise unfolding could not be done. (As shown in the next section, this version of Axiom 3 actually is somewhat stronger than necessary.) It can be said that Axioms 1 to 3 together just described the kind of data typically used for unfolding. The crucial and empirically interesting assumption is Axiom 4. This is essentially a type of single-peakedness condition. Together with the other axioms. it implies the important property of transitivity of betweenness. 1. If a I b I c and a I c I d, then a I b I d and h I c I d; and
2. If a I b I c, b I c I d, and b
f
c, then a I h I d and a I c I d
(for all a, 6 , c, d E A). As in similar contexts (e.g., Orth, 1980), transitivity of betweenness gives rise to “unidimensionality” in the sense of a consistent ordering of the elements of the set A . Together with the following “technical” Condition C, Axioms 1 to 4 can be shown to be sufficient for the existence of a qualitative J-scale as stated in the theorem below.
Condition C. A contains a finite or countable subset A‘ such that there is E A’ with a I b’ I c, for all a, c E A.
b’
Theorem. Let 4 , P , > > be an unfolding structure satisfying Condition C. Then there exists a real-valued function @ on A, unique up to strictly increasing transformations, such that, for all cz, b E A , a 2* b
iff
@(a>2 ~ ( h )
where 2* is uniquely determined by 2.
Remarks. By “uniquely determined” it is meant that ?* is the only simple order on A with the property: if a 2* b and h 2* c, then a 1 b I c. The proof of the theorem is fairly simple and will be omitted because of space
226
Orth
limitations. Condition C is required because the unfolding structure (Definition 2) is not restricted to finite sets. If A is finite, however, Condition C can be dropped. 3. An Example with Hypothetical Data
This section gives a simple example with hypothetical data in order to illustrate how the unfolding structure can be applied empirically. Table 1 contains fictitious preference orderings of five objects a, b, c, d, and e from six persons p , q, r, s, t, and u. The first step is to determine the betweenness relation according to Definition 1. For every triple of objects, it must be checked whether a I b I c, b 1 a 1 c or a I c I b holds. (Note that a 1 b I c iff c I b I a holds.) It may happen that two or even all of these cases apply; this would be due either to ties within the I-scales or to violations of Axiom 4 of the unfolding structure. The data from Table 1 yield the betweenness relation given in Table 2. Table I . Fictitious preferences of six persons p , q, r, s t, and u for five objects a, b, c, d, and e cach.
Persons p q r S
t U
I-scales aecbd cadeb cdaeb eacbd acedb ebacd
Prefcrcncc orderings dP L bP L CP I eP L aP bq L eq L d4 I aq L cq br er 2 ar I dr cr ds bs 2 cs I as es bt dt I et I ct 2 at du cu 2 uii I bu 2 eu
Next, the axioms of the unfolding structure can be tested. The I-scales in Table 1 are connected and transitive. Thus, Axioms 1 and 2 are satisfied. Axiom 3, however, turns out to be violated because there is no person preferring d over c and no one preferring b over e. Nevertheless, it will be shown below that a qualitative J-scale does exist for the present data. Hence, this example shows that Axiom 3 is somewhat stronger than necessary. In order to test Axiom 4 one has to consider all those combinations of objects that satisfy the premise of this axiom. These cases are
On the Axiomatic Foundations of Unfolding
227
called “possible tests”, and for each possible test one has to check whether the corresponding conclusion of the axiom is satisfied for all (or perhaps for how many) persons. The present example yields the possible tests and conclusions of Axiom 4 as given in Tnble 3. The conclusion in the first line of this table is satisfied because all persons prefer either c over a or c over d (or both). Similarly, all the other conclusions turn out to hold for all persons. Thus, Axiom 4 is perfectly satisfied. Now, suppose that the I-scales in Table 1 are sufficiently distinct (in spite of the violation of Axiom 3). We can then conclude from the theorem in Section 2 that there exists a perfect qualitative J-scale for the present data. Table 2. Betweenness Relation Obtaincd From the Data in Table 1.
This scale indeed exists and it can be constructed as follows. According to Table 2, we have b I a 1 c, b I a I d, a I c I d , and b I c I d and hence b I a I c I d. Similarly, we obtain b I e I a 1 c, b I e 1 a I d, b I e I c I d, b I a I c I d, and e I a I c I d and thus the 5-ary betweenness relation b 1 e 1 u I c I d which gives the ordering of the objects on the qualitative J-scale. In case of violations of Axiom 4 for some objects or persons, this information can be used for constructing a “dominant J-scale’ ’ (Coombs, 1964). An empirical application of the unfolding structure (or at least a determination of the betweenness relation) might be useful even if strong violations of Axiom 4 are to be expected. Such an example will be studied in the next section. 4. Applications to Political Party Preferences
In this section, the unfolding structure will be applied to preference orderings of five German political parties obtained from representative samples
228
Orth
Table 3. Possible tests and conclusions of Axiom 4 when tested with the data in Table 1. (Conclusions must be tested V p E P . )
Premise
Conclusion
of voters in studies on the occasion of elections for the Federal Diet in the years 1969, 1972, and 1980. The five German parties used in these studies are given in Table 4, together with their relative position on the political left-right dimension. The 1969 and 1980 I-scales were obtained by pair comparisons and the 1972 I-scales by a rank order method. The complete and transitive preference orderings from 1969 and 1972 are given in Norpoth (1969, p. 355, Table 2) and those from 1980 in Pappi (1983, p. 432, Table 3). Since the latter ones contain some minor errors the corrected data (as well as the 1969 and 1972 data) are reproduced in Table 5. Norpoth (1979) did a multidimensional unfolding analysis (with the program MINIRSA; cf. Roskam, 1977) on the 1969 and 1972 data. He obtained one- and two-dimensional solutions and made an effort to interpret these solutions although several ones quite obviously were degenerate. Pappi (1983) examined these data and those from 1980 more carefully. He argued that the preference orderings of the majority of the voters cannot be unfolded to any dimensional structure. He then studied more closely those voters whose preferences were compatible with a Jscale corresponding to the left-right dimension. The present section gives a reanalysis of the three sets of data in terms of the axiomatic approach outlined above.
On the Axiomatic Foundations of Unfolding
229
Table 4. Five German political parties used in representative studies in 1969, 1972, and 1980. (The ordcr from top to bottom corresponds to the parties position on lhe political leftright dimension as rated by expert as we1 as by voters, e.g., Klingemann, 1972.)
K
DKP
S
SPD
F
FDP
C
CDU/CSU
N
NPD
Deutsche Kommunistische Partei (German Communist Party) (1969: Aktion DemoklratischerFortschritt) SozialdemokratischePanci Deutschlands (Social Democratic Party) Freie Demokratische Partei (Free Democratic Pany) Christliche Demokratischc Union/ Christliche Soziale Union (Christian Democratic Union/ Christian Social Union) NationaldemokratischcPartei Deutschlands (National Democratic Parly)
Table 5 contains a total score of more than 4,000 preference orderings. In determining the betweenness relation according to Definition 1 (for each set of data), it is immediately seen that all the three cases a I b I c, b I a I c, and a I c I b hold with respect to every triple (a,b,c) of the five parties K, S , F, C, and N. It follows (cf. Section 3) that Axiom 4 of the unfolding structure is violated. It is also easily seen that these violations are not accidental and thus cannot be attributed to chance. We therefore dispense with a detailed test of that axiom. For each set of data, there is clearly no (one-dimensional) qualitative J-sca1e . On the other hand, determining the betweenness relation also reveals clear differences with respect to the number of I-scales yielding either a I b I c or b I a I c or a I c I b for triples of parties. A closer look at the betweenness relation will show some systematic regularities and thereby provide some insights on why there is no qualitative J-scale. Table 6 gives the percentages of those persons whose preference ordering are not compatible with either a I b I c or b I a I c or a I c I h (for all mples of the five parties). For example, the table shows that many I-scales are
230
Orth
Table 5. I-Scales of the five German partics given in Tablc 4 together with their frequencies in representative samplcs of voters in the years 1969 (N = 907), 1972 (N = 1785), and 1980 ( N = 1316). (After Norpoth, 1979, and Pappi. 1983.) 1969
FSNCK FSNKC FSCNK FSCKN FSKNC FSKCN FKNSC
1972
1
6 16 1 1
2 1 35 33 2 3
50
1 1 1 1 7 20 1 94 1
-
FKSCN SNCFK SNCKF SNFKC SNKCF SCNFK SCFNK SCFKN SCKNF SCKFN SFNCK SFNKC SFCNK SFCKN SFKNC SFKCN SKNFC SKCNF SKCFN SKFNC SKFCN KNFCS KNFSC KCSFN KFSCN KSCFN KSFNC KSFCN
2
6 122 141 6 1 29 85 1 3 1 1
13 25 315 210 9 45
1 13 1
1 1 1 1 2
79 3 4 1 1
-
1 45 42
1
3 3
1980
9 8 160 264 8 48
13
1
-
1
-
4
2
1969
FCKSN FCSKN FCSNK FCNSK FNSCK CKSFN CSKFN CSKNF CSFKN CSFNK CSNKF CSNFK CFKSN CFKNS CFSKN CFSNK CFNKS CFNSK CNKSF CNKFS CNSFK CNFKS CNFSK NKSCF NKFSC NKCSF NKCFS NSFKC NSFCK NSCFK NFSKC NFSCK NCSFK NCFSK
2 5 19
1 2 135 138 11 2 2 56 66 1 10
1972
1980
10 31 1 2 1 2 2 97 35 1
37 40 2
6 2 40 178 17
49 119 1 3 1 74 207 2 19
1
5
3
2 7 2 9
2 2 1 10
1
4 1 1
1 2 1
1
1 1 6
1 1 1
2
On the Axiomatic Foundations of Unfolding
23 1
compatible neither with S I KI N nor with K I N I S; but almost all I-scales are compatible with KIS IN. Similarly, K l F l N and K l C l N almost uniquely hold. A somewhat different pattern holds for triples containing K (and not N) as well as for those containing N (and not K). It is very clear that neither K nor N are between two of the other three parties. It is less clear, however, whether we have, for example, K l S l F or KIFIS. Nevertheless, these nine mples (with K and/or N) are best compatible with the 5-ary betweenness relation K I S IF I C IN and thus with a J-scale corresponding to left-right dimension. However, it is the final mple (S,F,C) that severely violates this J-scale (as well as every other one). For every set of data, there is a substantial portion of I-scales being compatible with neither S l F l C nor F l S l C nor SICIF. Exactly these cases lead to substantial violations of Axiom 4. They result from the fact that every possible ordering of S, F, and C is contained in frequently observed I-scales. This has also been noted by Pappi (1983). Essentially the same result is obtained when some idiosyncratic Iscales are deleted from Table 5. The numbers in parentheses in Table 6 give the corresponding percentages after deletion of those I-scales which have not been observed at least two times in every year. This criterion excludes about 5% of the I-scales. It is interesting to see that now almost all the triples containing K andor N are uniquely determined in terms of betweenness. These are exactly those mples corresponding to the leftright dimension. The problem with the triple (S,F,C), however, still remains. The results on betweenness also provide information about possible two-dimensional unfolding representations of the five parties. We know that S, F, and C must be represented as a “triangle”. We also know that this mangle is “between” K and N which are at the extremes of such a configuration. However, these two pieces of information cannot be combined in any satisfactory manner. According to Table 6, the mples with N (and without K) do not tell us whether, for example, N is closer to F or to S (i.e., whether S IF IN or F I S 1 N). The same holds for these other triples as well as for those with K (and without N). Thus, all possible locations of K and N relative to S, F, and C are equally well (or rather poorly) supported. To put it differently, there is no “rotation” of the “triangle” built by S, F, and C that can properly be fitted “between” K and N. These problems with a two-dimensional unfolding representation
232
Orth
arise from the fact that all 12 logically possible I-scales having S, F, and C (in any order) before K and N (in any order) were frequently observed. In terms of unfolding, the three vertices of the mangle with S, F, and C yield three boundary lines separating the plane into six isotone regions (corresponding to the six orderings of S , F, and C). The boundary line between K and N, however, can cross at most four of these regions but not six. Hence, two-dimensional unfolding cannot account for all of these 12 I-scales at the same time. These results again hold for all three sets of data. Table 6, however, also reveals some interesting differences between the years 1969, 1972, and 1980, especially with respect to the triple (S,F,C). In 1969, the betweenness relation of this triple that is compatible with most I-scales is S I C I F, whereas in 1972 it is F I S I C and in 1980 it is S 1 F I C. Thus, in 1969 the largest difference is that between S and F, in 1972 it is that between F and C, and in 1980 it is that between S and C. A closer study of this type of information has been given elsewhere (Orth, 1986). It has been shown there that in 1969 F is relatively far away from both S and C which are fairly close together, in 1972 F and S are much closer to each other, and in 1980 S and C are now relatively far away from each other and F is somewhat cloJer to C than before but still closest to S. It is to be noted that these changes correspond very well with the coalition governments in Germany between 1969 and 1980. Moreover, also Pappi (1983) showed some influences of party coalitions on party preferences. It is thus tempting to conjecture that the qualitative J-scale corresponding to the political left-right dimension was not found for these data from 1969, 1972, and 1980 because it was heavily distorted by preferences for political party coalitions.
TubZe 6. Percentages of persons whose preference orderings are not compatible with either a I b I c or b I a I c for all triples of the five parties K, S, F, C, and N. Percentages in parentheses are obtained after excluding some idiosyncratic I-scales (see text).
Triples with K and N
K/S/N
WIN K/C/N
Triples with K
WSF K/S/C K/F/C
Triples with N
SF/N
S/C/N F/C/N
Triple W,C)
S/F/C
1969
1972
1980
1.1 (0.0 0.9 (0.0 1.2 (0.0
0.5 0.0 0.6 0.0 2.4 0.0
0.6 0.0) 0.3 0.0) 1.7 0.0)
S/K/N
1.5 (0.0 1.0 (0.0 1.8 (0.0
0.5 0.0 0.6 0.0 0.7 0.0
0.7 0.0) 0.5 0.0) 0.4 0.0)
K/F/S
3.2 (1.9 1.7 (0.0 1.3 (0.0
1.8 0.8 3.4 0.0 3.3 0.0
0.7 0.4) 2.5 0.0) 2.4 0.0)
WSIN
63.6 (64.6
43.4 44.6
20.1 20.6)
F/S/C
F/K/N C/K/N
KJC/S K/CF
C/S/N C/F/N
1969
1972
1980
47.2 (47.8 47.3 (47.8 47.1 (47.8
67.7 67.8 67.7 67.8 66.4 67.8
51.8 51.9) 52.0 51.9) 51.5 51.9)
K/N/S
2.4 (0.6 2.9 (0.9 2.2 (0.9
1.6 1.0 5.9 3.7 5.9 3.7
1.4 1.2) 6.8 5.0) 6.8 5.0)
S/KF
3.2 (1.5 3.1 (2.1 4.0 (1.9
2.2 0.9 2.5 1.9 2.5 0.8
3.0 2.3) 3.2 2.5) 1.8 0.4)
S/N/F
19.0 (18.6
16.5 16.8
30.2 30.6)
S/C/F
K/N/F K/N/C
SMC
F/K/C
S/N/C F/N/C
1969
1972
1980
51.7 (52.2 51.8 (52.2 51.7 (52.2
31.8 32.2 31.8 32.2 31.2 32.2
47.6 48.1) 47.7 48.1) 46.8 48.1)
96.0 (99.4 96.1 (99.1 96.0 (99.1
98.0 99.0 93.6 96.3 93.5 96.3
98.0 98.8) 92.6 95.0) 92.8 95.0)
93.6 (96.6 95.3 (97.9 94.7 (98.1
96.0 98.3 94.1 98.1 94.3 99.2
96.3 97.3) 94.3 97.5) 95.7 99.6)
17.4 (16.8
40.1 38.6
49.7 48.8)
t d
w
w
234
Orth
5. Discussion
An axiomatic approach to unfolding can facilitate a detailed analysis of structural aspects of a set of preference orderings. The concept of betweenness given in Definition 1 enables one to study not only the whole set of objects at once but also to do so for every mple or every subset of the preferential objects under study. In case of violations of axioms, this very feature of betweenness might be of use to gain some insights into what could have led to those violations. This was illustrated by applying an unfolding structure to data on political party preferences from the years 1969, 1972, and 1980. The preference orderings were shown to be not unfoldable to a qualitative J-scale of the five parties. This is mainly due to one of the ten triples of parties. Furthermore, it was argued that there is also no appropriate twodimensional unfolding representation for these data. A comparison of the three sets of data with respect to that particular triple of parties revealed differences which seem to be related to the governmental coalitions existing at the respective times. It should be noted that essentially the same differences have been found (Orth, 1986) with the data analyzed on an aggregated level where group preference orders were built according to the most preferred party. These aggregated preferences yield almost perfect J-scales for the three years differing with respect to the parties S , F, and C just as the betweenness relation does here. These findings suggest the hypothesis that political party preferences are determined by both the perceived position of the parties on the political left-right dimension (in terms of the distance from a person’s ideal party or own position on that dimension) and the preferred party coalition. Such an explanation would be consistent with Pappi’s (1983) finding that governmental coalitions have some impact on party preferences.
References Coombs, C. H. (1950). Psychological scaling without a unit of measurement. Psychological Review, 57, 145-158. Coombs, C. H. (1952). A theory of psychological scaling. Engineering Research Institute, University of Michigan, Ann Arbor. Coombs, C. H. (1953). Theory and methods of social measurement. In
On the Axiomatic Foundations of Unfolding
235
L. Festinger & D. Katz ( a s . ) , Research methods in the behavioral sciences (pp. 471-535). New York: Dryden. Coombs, C. H. (1964). A theory of data. New York: Wiley. Ducamp, A., & Falmagne, J. C. (1969). Composite measurement. Journal of Mathematical Psychology, 6 , 359-390. Klingemann, H. D. (1972). Testing the left-right continuum on a sample of German voters. Comparative Political Studies, 5, 93-106. Krantz, D. H., Luce, R. D., Suppes, P., & Tversky, A. (1971). Foundations of measurement (Vol. 1). New York: Academic Press. Norpoth, H. (1979). Dimensionen des Parteienkonflikts und Praferenzordnungen der deutschen Wahlerschaft: Eine Unfoldinganalyse. Zeitschrift fur Sozialpsychologie, 10, 350-362. Orth, B. (1976). An axiomatization of unfolding. Paper presented at the 7th European Mathematical Psychology Group Meeting, Stockholm. Orth, B. (1980). On the foundations of multidimensional scaling: An alternative to the Beals, Krantz, and Tversky approach. In E. D. Lantermann & H. Feger (Eds.), Similarity and choice (pp. 54-69). Bern: Huber. Orth, B. (1986). Grundlagen des Entfaltungsverfahrens und eine axiomatische Analyse von Priiferenzen fur politische Parteien. Unpublished manuscript, University of Hambrug. Pappi, F. U. (1983). Die Links-Rechts-Dimension des deutschen Parteiensystems und die Parteipriferenz-Profileder Wahlerschaft. In M. Kaase & H. D. Klingemann (Eds.), Wahlen und politisches System (pp. 422441). Opladen: Westdeutscher Verlag. Roskam, E. E. (1977). A survey of the Michigan-Israel-NetherlandsIntegrated series. In J. C. Lingoes (Ed.), Geometric representations of relational data (pp. 289-312). Ann Arbor, MI: Mathesis Press.
This Page Intentionally Left Blank
New Developments in Psychological Choice Modeling G. De Soete, H. Feger and K . C. Klauer (eds.) 0 Elsevier Science Publishcr B.V. (North-Holland),1989
237
UNFOLDING AND CONSENSUS RANKING: A PRESTIGE LADDER FOR TECHNICAL OCCUPATIONS Rian A . W. van Blokland-Vogelesang Free Univcrsity, Amsterdam, The Netherlands The social prestige of occupations can be measured by letting people rank order occupations according to social prestige. This has been done by Goldbcrg (1976) for technical occupations. For all judges a “consensus ranking” can be determined: the mean or median ranking. Thc rcsulting consensus ranking for these data is a ranking of these occupations according to social prestige. In unfolding the individual rankings a J scale is sought on which stimuli and individuals can be placed. This J scale forms a common reference frame for thc evaluation of stimuli. To find qualitative and quantitative J scalcs for cornplcte rankings a computer program, “UNFOLD”, has been developed. For thc Goldberg data a “nested set” of quantitative J scalcs was found: a J scalc for larger numbers of stimuli contains a smallcr J scale as a propcr subsct. This indicates a stable and rcliable solution. To cxplain departures from the perfect unfolding model, Feigin and Cohen’s (1978) error model has been used.
1. Introduction The unfolding model and technique are thoroughly discussed in Coombs (1964). The historical context in which the unfolding technique originated was the debate concerning “majority decisions”: finding a consensus ranking for a number of stimuli in a group of individuals, see Coombs (1964, Ch. 18). Under the condition that all subjects’ rankings are single peaked preference functions (SPF’s) on a common quantitative J scale, the consensus ranking proved to be a pattern of the J scale: the ranking of the This paper is a revised version of an article published in Zeifschr@ fur Sozialpsychologie, 1987, 18, 250-257.
238
vun Bloklund- Vogelesang
median individual on the J scale. This argument can be reversed. By folding back the median ranking for a group of subjects, (an approximation to) the best J scale can be found. The unfolding model is a deterministic model. It has to be rejected if departures from the perfect model occur. For a variety of reasons, however, individuals mostly do not all produce preference rankings which are consistent with one underlying J scale. In addition, much research is precisely directed at finding an underlying frame of reference in a certain domain of investigation. Therefore, we need a criterion for the “best” J scale. The best J scale will be defined as that scale for which the total number of inversions from individuals’ rankings is a minimum. The minimization of total number of inversions is an often used criterion in nonparametric statistics. In the case of ranking data this criterion also follows from the Mallows (1957) and Feigin and Cohen (1978) models. To determine scale values for stimuli and subjects on the quantitative J scale, linear programming techniques have been used. To explain departures from the perfect unfolding model, the Feigin and Cohen (1978) model has been used. The results of the unfolding procedure will be illustrated on the Goldberg (1976) data on social prestige of technical occupations. In the following, first the unidimensional unfolding model as outlined in Coombs (1964) will be shortly introduced (Section 2). Subsequently, Section 3 will concern consensus rankings, Section 4 finding the best unfolding scale. In Section 5 the Feigin and Cohen (1978) model and its adjustment to the unfolding situation will be treated. In Section 6 the unfolding procedure will be illustrated on the Goldberg (1976) data. In Section 7 a discussion follows. 2. Coombs’ Unidimensional Unfolding Model Coombs’ (1964) unidimensional unfolding model was devised for the analysis of complete orderings of preference. Suppose there are n individuals ranking k objects from most to least preferred. Each individual and each object may be represented on a single dimension, called the J scale (“Joint”). The points representing the individuals are called “ideal points”, each representing the best possible object from the point of view of the individual. Each individual’s preference ranking of objects is given by the rank order of the distances of the object points from the ideal
Unfolding and Consensus Ranking
239
point, the nearest being most preferred. In the unidimensional unfolding model possible orders of preference (“admissible patterns”) correspond to intervals of the J scale. Other orders of preference do not correspond to intervals of the J scale and, hence, are called “inadmissible patterns”. For four stimuli A, B, C and D, two different J scales (“4-scales”) are possible, depending on the order of the midpoints ad and bc. The relative magnitude of the distances d(AB) and d(CD) depends on the order of the midpoints ad and bc (see Figure 1).
I . d(AB)
>
d(CD)
(ad precedes bc)
midpoints:
ad
bc
bd
Interval : -
I-
A
11. d(AB) < d(CD)
(bc precedes ad)
mi dpoi nts:
ab
‘1
In1terval : -
,ABCD
A
C
B
1I
ac
bc
ad
I
‘2 BACD, B
D
bd
cd
I
I
1 1 1 1 1 BCAD
‘3
CBAD ‘4
CBDA
‘5
CDBA
‘7
DCBA
‘6
C
I_
0
Figure 1. The two possiblc midpoint ordcrs for 4-scale ABCD; ad preccdes bc (top) and bc prcccdcs ad (bottom).
So, without restrictions on the order relations between the midpoints, there are eight admissible patterns in 4-scale ABCD. This scale is called the qualitative J scale. With restrictions on the order relations of midpoints one of the fourth intervals 1 4 is excluded (only one order of ad and bc is possible), this scale is called a quantitative J scale. A qualitative J scale contains 2k-’ patterns, a quantitative J scale
[i]+ 1. The quantitative J
scale can be represented by a unidimensional continuum, because of the fixed midpoint order; the qualitative J scale can not.
van Bloklad-Vogelesang
240
3. Consensus Ranking The existence of common J scales is, generally speaking, a consequence of cultural homogeneity (Coombs, 1964, p. 397). The unidimensional J scale represents a reference frame for the evaluation of objects.
‘m
preference
(a)
t
A
B
preference
C
J scale
(b)
t
qualitative
I Figure 2. Single pcakcd prcfcrencc functions (lop). Black’s case: the majority decision is thc top choice of thc mcdian individual in the group (bottom). The significance of the existence of a common J scale for a group of individuals may be best understood in the context of the historical tradition in which the unfolding model originated. In the fifties many a discussion centered around the problem of “consensus rankings”: how to construct a social preference out of manifold individual preferences. Black (1948a, 1948b) proved that the majority decision (“consensus”) for a set of
Unfolding and Conscnsux Runking
24 1
options is the top choice of the median individual, given that individuals’ preferences functions are single peaked (see Figure 2). Arrow (1951) proved that the consensus ranking is a folded J scale, on which the first option is !he top choice of the median individual in the group. Goodman (1954) and Coombs (1954) proved that if individuals’ preference rankings are generated from a common underlying quantitative J scale, the consensus ranking is the ranking of the median individual on the J scale (see Figure 3). These results are very strong and one might ask whether they apply in the case where not all individuals’ ordering are single peaked. In particular one might be interested in the conditions under which the median ranking will be the consensus ranking for any set of rankings (not necessarily SPF’s). This issue was answered by Kemeny (1959) and Kemeny and Snell (1972). They showed that the mean and median rankings are consensus rankings in general, that is, without the assumption that individuals’ rankings be SPF’s. This will hold if the number of inversions between rankings is used as a distance measure and at least in the case of a substantial number of subjects. preference
t
quantitative 4
J scale
Figure 3. Coombs irnd Goodman’s case: Ihc consensus ranking is a folded J scale arid is the ranking of the median individual in thc group.
242
van Blokland-Vogelesang
A second question might be the following: under the assumption that individuals share a common frame of reference (individuals do not choose rankings at random), how sure can we be that the resulting consensus ordering is a folded J scale? By using a probability model for ranking data it can be proved that the median ranking is a folded J scale in general, at least for large numbers of subjects. Any ranking model for which probabilities of rankings decrease with increasing numbers of inversions from the median ranking can be used. The Feigin and Cohen (1978) model is an example of such a model.
social preference
collective ideal point
Figure 4. In folding back thc consensus ranking the J scale may be found.
4. Finding the Best J Scale Unidimensional unfolding is a technique for finding the latent dimension, the “J scale”, on which the preference rankings are based. The data are complete ordering of preference of n individuals for a fixed set of k stimuli. If a stable frame of reference is underlying people’s preferences, rankings will unfold into a common J scale.
Unfolding and Conscnsus Ranking
243
The best J scale is defined as that scale for which the total number of inversions from subjects’ rankings is a minimum. This can be explained briefly as follows. Each J scale has a certain number of admissible patterns (“I scales”). Every individual is supposed to have a pattern of preference in mind, the “latent pattern”, which is identical to one of the admissible patterns of the J scale. In reporting his or her latent pattern of preference the subject may make errors. So, the “manifest” pattern of preference may be different from the latent pattern. For each admissible pattern of the J scale the number of inversions from an individual’s manifest pattern of preference is assessed. That admissible pattern which has a minimum number of inversions from the individual’s pattern is taken as the latent pattern for this individual. In this way, the number of inversions needed is minimized for each individual and, i n general, for all individuals. The minimization of total numbers of inversions is an often used criterion in nonparametric statistics (cf. Lehmnnn, 1975). To find best qualitative and quantitative J scales for complete rankings of preference, a number of procedures have been devised, see Van Blokland (1988, i n press). The computer program UNFOLD has been written by Piet van Blokland on the basis of these procedures. It should be stressed that the minimization of the total number of inversions from subjects’ rankings is the only criterion used to find the best J scale. No other criteria such as quasi independence (Davison, 19759, observed versus expected numbers of errors and “uniqueness” of the found scale (Van Schuur, 1984) are used. There is no user interaction and there are no parameters which have to be set by the uscr. 4.1 The Benefits of UNFOLD -
Best qualitative and quantitative scales for subsets of 4 <: k 2 9 stimuli out of a maximum number of IS stimuli. separate analysis for each number of objects. Results for any number or subser of stimuli are never dependent on previous steps in the analysis.
-A
-
Scale values for objects, for midpoints between objects, for the patterns of the J scale and for individunls.
244
van Blokland-Vogelesang
- Options to analyze specific qualitative or quantitative J scales. - A test for goodness of fit based on a nonparametric error model for ranking data. Best qualitative J scales can be determined for subsets of a maximum of 11 stimuli. Scale values and the test for goodness of fit are determined for quantitative J scales only. All sets of data which have been analyzed to date produced bipolar scales. These scales could be interpreted in a substantially meaningful way. Moreover, the best quantitative J scales formed a “nested set” in that the items of a smaller scale were included in the larger scale in the same order. This was not deliberately sought but just followed from the data. Such a nested set of scales indicates that a stable continuum is underlying the data. The algorithms used in the program are based on: -
backtracking and branch-and-bound-methods;
- finding a good solution first; - eliminating inferior scales as quickly as possible. The computer program has four main parts: 1. finding the median ranking
2. finding the best qualitative J scale 3. finding the best quantitative J scale 4. assessing goodness of fit, scale values etc.
To get a quick estimate of the total number of inversions needed, in a first step only those qualitative J scales which arise from unfolding the median ranking are considered. Not until a later stage are all possible J scales investigated. In the first step the total number of inversions needed is (under)estimated. For the 20 qualitative scales having the best underestimate of the total number of inversions the actual minimum number of inversions is assessed. Then all possible qualitative scales are investigated, the ten best ones are retained and printed.
245
Unfolding and Consensus Ranking
The best qualitative J scales probably will include a very good quantitative J scale and are thus first investigated to assess the total number of inversions from subjects’ preference patterns. Having determined a good quantitative J scale most remaining scales can be skipped as candidates for the best quantitative J scale. The ten best quantitative J scales are retained and, on request, are printed. Scale values and the test for goodness of fit are printed too.
4.2 Scale Values by Linear Programming The procedure to assess scale values for stimuli starts from the order of the midpoints on the quantitative J scale. The distances between the successive midpoints are called the 6i’s. The distances between the midpoints must satisfy a number of equality constraints (see Van der Ven, 1977). The restrictions on the 6’s can be represented by a system of linear equations which can be solved by using linear programming techniques, under the constraints of 1.
6i > 0
2.
6i
i = 1,2,
...,
I;[
-
is at a minimum.
This last constraint is imposed on the 6’s to obtain a maximum distinction in metric relations on the J scale (see Van Blokland, in press; cf. Coombs, 1964, p. 101). To this end, SIMOPT (by E. Kalvelager, Free University, Amsterdam) was incorporated in UNFOLD. The scale value of an individual is the midpoint of the admissible pattern which corresponds to his or her pattern of preference. An inadmissible pattern is assigned the scale value of that admissible pattern which has a minimum number of inversions from it. It may happen that more admissible patterns exist which have the same number of inversions from that inadmissible pattern. In that case the inadmissible pattern will get the scale value of the admissible pattern which is closer to the median ranking (i.e., it has less numbers of inversions from it). This seems reasonable, since a unimodal and symmetric function of social preference is assumed to exist on the J scale. This rule corresponds to the principle of “regression to the mean”.
van Blokland-Vogelesang
246
5. Feigin and Cohen’s Model A nonparametric family of distributions for ranking data has been derived by Mallows (1957). A particular case of this family is the Feigin and Cohen (1978) model. The model is based on the number of inversions between rankings. Suppose there are n subjects, each of whom ranks the same set of k stimuli according to some criterion. Assume that each subject ranks the objects independent of the other subjects. The ranking of a subject is a permutation w of the numbers (1, . . . , k). The distribution of a ranking w in the Feigin and Cohen model depends on two parameters, w,, a location parameter, and 0, a non-negative dispersion parameter (0 I 0 Sl). The number of inversions between the ranking 61 and the basic ordering w, is given by X(w,,o). The probability distribution of a ranking w is
Po,.e(w)= (f (0))-1 0x ( w - w ) ,
0I051
(1)
where f (0) = C 6x(oo’0),a nonnalizing constant. Consequently, the probability distribution of X = X (o,,w)is:
where
k
[i]
= number of objects = maximum possible number of inversions
a!
= number of possible orderings with x inversions from
f (0)
=
a:
w,
W, a normalizing constant.
X
The model of Feigin and Cohen can be interpreted as stating that subjects have the same latent ordering w, in mind and make errors in reporting it. A low value of 0 corresponds to subjects making few errors. The larger the value of 0 the more errors and the more improbable the existence of one underlying ordering will be. Both parameters w, and 0 can be estimated by maximum likelihood methods. From (1) the likelihood equation given the sample of n subjects is given by
247
Unfolding and Consensus Ranking
(i = 1, ..., n ) and hence the maximum likelihood estimate for o, is given by the value 6, of o, for which C X(o,,oi)is minimal. i
This means that 6, is the median ranking (the consensus ranking of Section 1): that ordering which has a minimum number of inversions from all subjects’ rankings. The maximum likelihood estimate of 8 is found from the mean number of inversions X from the median ranking: 8 is that value of 8 for which X = E ( X I &,e) (cf. Feigin & Cohen, 1978). So, the Feigin and Cohen model fits precisely in the framework of unfolding and consensus rankings. n
5.1 Adjustment of the F&C Model to the Unfolding Situation By adjusting the Feigin and Cohen model to the unfolding situation, a probabilistic aspect is added to the deterministic unfolding model. The number of errors observed can be compared to the number of errors expected, to arrive at a measure for goodness of fit of the unfolding model to the data. The application of Fei in & Cohen’s model in the unfolding situation involves distinguishing
[!I+
1 latent classes, “latent rankings”, one for
each admissible pattern of the quantitative J scale. Analogously to the Feigin & Cohen situation, the most likely quantitative J scale is that ordered set of
[:]
+ 1 admissible
patterns for which the total number of
inversions from individuals’ rankings is minimal. The model assumptions for the application of Feigin & Cohen’s model in the unfolding situation can be formulated as follows: 1. The quantitative J scale is known or has been estimated.
2. Each individual has a latent pattern of preference identical to one of the + 1 admissible patterns of the J scale.
[i]
3. The ranking actually given by the judge has, according to Feigin & Cohen’s model, this person’s latent pattern as basic ordering. When judging objects according to their latent pattern of preference people can make mistakes.
248
van Blokland-Vogelesang
Observing an admissible pattern does not mean that the subject has made no errors. For example, a subject may have reported pattern BCAD, which is an admissible pattern of J scale ABCD. This person may have had pattern BACD in mind and made no errors on it. But she might also have had pattern ABCD in mind and made one inversion on it. With X = number of inversions made by the subject when stating her pattern of preference
Y = minimum number of inversions the researcher has to apply to an inadmissible pattern in order to change it into an admissible one the relation between X (latent number of inversions) and Y (manifest number of inversions) can be investigated. For inference about 8 from Y we have determined P (Y=y I X = x ) , the probability of needing y inversions to fit a subject’s pattern into the J scale when the subject has made x inversions on a certain admissible pattern of the J scale. P ( Y = y 1 X = x ) is independent of 8. In assessing P (Y =y I X = x ) all admissible patterns of the J scale are assumed to have equal probabilities. This seems a rather unrealistic assumption, but appears to work out very well in practice. The approach using varying probabilities for the admissible patterns of the quantitative J scale has been treated in Van Blokland et al. (1987). From P e ( X = x ) and P ( Y = y I X = x ) , P e ( Y = y ) can be determined: Pe(Y=y) =
maw)
C
P ( Y 9 I X=x)P,(X=x)
(X
2 y).
Y=O
With 7 known, 8 can be estimated analogously to the Feigin and Cohen situation: 8 is that value of 8 for which 7 equals E e ( Y ) . Instead of 8 it is recommended to use Ee(T), the mean of the dismbution of Kendall’s (1975) z. Ee(7) is unbiasedly estimated from the sample by Z (Feigin & Cohen, 1978). The advantage of using 2, rather than 8, is that it is a well-known estimate of the concordance between the subjects and their underlying rankings, and 0 I Z I 1. The number of inversions X and z are linearly related: A
A
z = 1 - 2 [;]-lx(w,,w)
and so are their mean values X and 2. Hence, 2 is a linear transformation of a sufficient statistic for 8. Once 8 is known, 2 = E e ( z ) can be found A
Unfolding and Consensus Ranking
249
from the X-distribution given the estimated value of 0. In Appendix I, E e ( X ) and Ee(T) are given for selected values of 8 and k.
5.2 Goodness of Fit of the Unfolding Model to the Data For the test of goodness of fit of the unfolding model to the data preference patterns with the same number of inversions Y from the J scale are grouped into the same category (cf. Feigin & Cohen, 1978). The Pearson X 2 test for goodness of fit has numbers of degrees of freedom which are one less than the number of categories into which the data are lumped. For each extra numerical parameter estimated one more degree of freedom should be subtracted. Since w, is not a numerical parameter but an ordering, we need not subtract an extra degree of freedom if coo is estimated. The x2 approach in this situation is then an approximate procedure. This point needs some further research. However, this is beyond the scope of this paper. The observed frequencies obs, are the frequencies of the y-values in the data, the expected frequencies exp, can be assessed via nPi(Y=y). As a test statistic Pearson's X 2 is used:
can be referred to a X*-distribution with df = maxCy) - 1 degrees of freedom. The higher values of Y may have to be grouped because of small expected frequencies.
6. The Goldberg Data on Social Prestige of Technical Occupations In this section the results of the unfolding procedure are illustrated using the Goldberg data on social prestige of technical occupations. In Goldberg (1976) the relevance of cosmopolitanfiocal orientations to professional values and behavior is discussed. Professionals are said to have a value system and behavioral patterns different from those of other occupational groups. Associated with the characteristics of professionals are the concepts of cosmopolitanism, defined as orientation to an outer reference group and localism, defined as an orientation to the inner reference group. According to Goldberg, in the literature cosmopolitanism has been confused with professionalism; also, a doubtful bipolar concept of
van Blokland-Vogelesang
250
professionals as either cosmopolitan or local has been introduced. In his article he presents theoretical arguments as well as empirical evidence which point to the conclusion that an orientation which combines both cosmopolitan and local reference groups (“cosmo-local”) may be compatible with the values and behavior considered important to professionalism.
6.1 The Goldberg Data The Goldberg data on the social prestige for technical occupations come from a sociological survey of graduates of the Faculty of Indusmal and Management Engineering (Technion, Haifa, Israel) and is referred to in Goldberg (1976), Feigin and Cohen (1978) and Cohen and Mallows (1980) who also present the data. People were asked to rank ten occupations according to the degree of social prestige associated with each one. There were 143 complete responses. The Goldberg questionnaire was in Hebrew. The translation of the ten occupations is according to A. Cohen (personal communication). The occupations are: A. B.
C. D.
E.
F. G. H. I. J.
A faculty mcmber in an acadcmic institution Mechanical Enginccr Opcrations Researchcr Technician Manager in a staff position in an industrial entcrprisc ( c g , dcaling with safcty, human rcsourccs, timc and motion study, etc.) Owner of a plant with morc than 100 workcrs Supcrvisor Industrial Enginecr Managcr of a production dcpartmcnt with more than 100 workcrs An applicd scicntist
“FAC” “MECH” “O.R.”
“TECH” “STAFF”
“OWN” “SUP” “IND”
“MAN” ‘‘APPL ’’
6.2 Results of the Unfolding Procedure In unfolding the Goldberg data consistently for each k best quantitative J scales are found which are subsets of the largest scale DBHCJAFIEG for all ten items, see Figure 5. This result is obtained after removing three outliers: subjects with 35, 38 and 39 inversions from the median ranking,
Unfolding and Consensus Ranking
25 1
whereas remaining X values are in the range 0 to 18 (see Feigin & Cohen, 1978, p. 211). The Goldberg data excluding these three outliers are called the Goldberg* data, to distinguish them from the full set of data. For both sets of data the results for the best quantitative J scales (4 Ik 2 9 ) are given in Table 1. In some cases only the best qualitative J scale is given. Table I Total numbers of inversions (Cy) and frequencies of perfect fit (y = 0) for 4 2 k I 10 stimuli from the Goldberg data (n = 143, total set) and the Goldberg* data (n = 140, without outliers). For each value of k the three best quantitative J scales are given. If they arc the same, only one is presented. Orderings with an asterisk (*) indicate that only the bcst qualitative J scale has been determined. Goldberg* data k
Ordering
4
DHFG DBAG DBFG DBHFG DBAFG DBHAG DBHAFG DBHAFEG DBHAFIEG DBHJAFIEG DBHCJAFIEG*
5
6 7 8
9 10
Goldberg data
y=o
y=o 2 3 4 29 30 30 71 143 273 428 486
138 137 136 115 111 112 88 54 27 12 16
9 9 9 38 39 40 85 169
138 137 137 115 111 112 88 54
GDBHJAFE DBHJAFIEG* GBHJAFCIED* ;DBCHJAFIEG*
From Table 1 it is clear that a stable continuum underlies the Goldberg* data. On removing three outliers from the data some disturbance is eliminated which causes slightly different results from the Goldberg data for higher values of k. Hence, further results and the underlying continuum will be discussed for the Goldberg* data only. Best J scales for 4 I k I10 are given below in Figure 5 , starting with k = 4. For each larger k one item is added to the already existing quantitative (k - 1)scale. For 4 I k I 7 the Goldberg data show results analogous to the Goldberg* data, only the number of inversions is higher due to the three
252
van Blokland-Vogelesang
Tech k (D) 4 Tech 5 Tech 6 Tech 7 Tech 8 Tech 9 Tech 10 Tech
Mech (B)
OR
Ind
Appl Fac Own Man
(C) (H)
Mech Mech Mech Mech Mech Mech O.R.
t Technical
(J)
(A)
Ind Ind
Ind Ind
Ind Ind Ind
Fac Fac Fac Appl Fac Appl Fac
Occupations
(F)
(1)
Own Own Own Own Own Man Own Man Own Man
Staff
Sup
(E)
(GI SUP
SUP Staff Staff Staff
SUP Sup
Sup Sup
Staff Sup
Managerial Occupations +
Figure 5. Quantitative J scales for the Goldberg” data for 4 I k 5 9. For k = 10 only the best qualitative J scale has been determined.
outliers. For k = 8 the best quantitative J scale is GDBHJAFE, in which the least prestigous item G has moved to the other end of the scale. This is because of the error introduced by the outlying rankings, against which the unfolding procedure is clearly not robust (but, in this case, only for as large a k as k = 8). In the unfolding of data, restrictions on possible locations of items on the scale follow from subjects’ rankings. If one object is placed in the lowest ranks by most of the people, it should get an extreme position on the J scale. If all rankings fit the J scale perfectly, the position of this least preferred item is firmly determined. However, in the situation of increasing error much information (restrictions on the positions of the items) is lost. Remaining restrictions are such that the position of the extreme item should be far from other items, hence it could be on either end of the scale. Thus, restrictions are not enough to firmly fix the item on the one or other end of the scale. Hence, the item can “flip over” to the other end of the scale. The scale found for the Goldberg* data is clearly a bipolar continuum, ranging from purely technical professions to purely managerial professions. At both ends of the scale the professions with the least prestige can be found: “Technician” as the technical profession with the least prestige and “Supervisor” as the managerial profession with the least prestige. In the center of the scale are the high status professions: “Faculty member”, “Owner of a big plant” and “Applied scientist”
Unfolding and Consensus Ranking
253
being the top status professions, in this order. If Goldberg’s (1976) conjecture is true, the cosmo-local oriented professionals are to be found here. The pure “cosmopolitans” (technical professions) or the pure “locals” (managerial professions) would be located near the ends of the scale.
6.3 Goodness of Fit of the Unfolding Model to the Data To test the goodness of fit of Feigin and Cohen’s model to the unfolded data, Pearson’s X2 statistic (eqn. (4)) will be used. Only the case k = 7 for the data with and without outliers will be considered here. The cases k = 4,5,6 gave comparable results. For k > 7 no tables for P e ( Y = y ) are as yet available. They will be available soon. Table 2 Goodness of fit for the Goldberg data with and without outliers,
case k = 7. Goldberg* data ,. 7 = 1.02, 8 = .21, T = .86
Y
obs
0 1 2 3 4 5 6 7 8 9 10
54 49 23 9 3 2 0 0 0 0 0 140
X2 55.01 46.3 1 24.46 9.73 3.25 .93 .24 .05 .o1
.02 .16 .09 .06 .06
.oo .oo x2=.37
Goldberg data 7 = 1.18, = .23, z = .84 A
e
Y
obs
0 1 2 3 4 5 6 7 8 9 10
54 49 23 9 3 2 0 0 2 0 1 143
X2 49.17 46.67 27.82 12.50 4.72 1.53 .44 .ll .02
.47 .12 .84 .98 .20
.oo .oo X 2=2.61
Case k = 7. For the Goldberg* data X 2 = .37, df = 3, .90 c p c .95. For the Goldberg data X2 = 2.61, df = 3, .25 < p c SO. If we had treated the values Y = 4 and Y 2 5 as separate categories, we would have found
254
van Blokland-Vogelesang
X 2 = 6.95, df = 4, .10 < p < .25. Hence, the Feigin and Cohen model fits in very well with the errors in the data for k = 7. See Table 2. In sum, the Goldberg data without the three outliers unfold into a quantitative J scale for nine items. The best qualitative J scale for ten items is consistent with this scale. The median ranking is a folded J scale and represents the prestige hierarchy of these technical occupations. The J scale is bipolar and can be interpreted as going from “working with techniques’’ to “working with people” (see Figure 6 ) . The Goldberg data including the outliers do not give the same results. An extreme item flips over to the other end of the scale, because of the increased level of error due to these outliers.
social prestige
‘,
\
\
i,
\ \
\
\
\\
\
‘\--”appl
\ ,
‘.---oO.R
‘. ‘\ ---\
\
I
\
‘\, ‘\\ .-41ind
\
\
\
‘
,\
‘1
rnan(r--( -41
‘...------ -
I
I
/I
.’’
’
I
I
,I
/
/
// I
I
I
/ /I /
/
/I
//’
staff+---/ 41
I
I
/
/
mech
I
I
tech
..,/
I
___C-/’
‘rconsensus ranking. fac, own,
- - - - - -, sup.
Figure 6. The unfolding scalc for the Goldbcrg* data. The median ranking is a foldcd J scalc and rcprcscnls the prcstigc ladder for thcse tcchnicd occupations.
For all k that have been investigated, deviations from the perfect unfolding model can be explained by Feigin and Cohen’s model.
Unfolding and Consensus Ranking
255
7. Discussion
In unfolding a set of data we are searching for an underlying J scale. This J scale can be seen as a reference frame for the evaluation of stimuli. If people’s rankings all unfold into the same quantitative J scale, the consensus ranking is the ranking of the median individual on the J scale. Two points come up for discussion now. The first point concerns the stability of an unfolding solution for increasing numbers of items. If the J scale is a reference frame in some domain of research, it should not be varying with the number of stimuli in the analysis. That is to say: we are looking for a J scale which grows with increasing numbers of stimuli. Also, for each k the same items must be on the J scale in a constant order. If this is the case we can conclude that an underlying reference frame has been established. However, as we have seen from the two Goldberg data sets, extreme rankings or - in general - the level of error in the data may cause stimuli not to have firm positions on the J scale. If this is the case, stimuli may start flipping over from the one end of the scale to the other end. A related problem is that of “irrelevant stimuli”. The majority decision and the consensus ranking in general are not independent of irrelevant alternatives. More specifically: by introducing irrelevant stimuli in the data, the median ranking (and the J scale) may be different for increasing k. For some k an item may crop up into the J scale and disappear again with the next larger k. If a stable continuum seems to arise from the analysis, it seems wise to ignore a J scale which contains that particular stimulus and to take a next best J scale for that value of k which is consistent with the whole set of J scales. In the previous section it was shown that the Goldberg (1976) data (excluding three outlying rankings) unfold into a nested set of J scales. Also errors from the perfect unfolding model could be explained by Feigin and Cohen’s (1978) error model (at least for k 5 7). Since the occupations were ranked according to the degree of social prestige associated with each of them, the consensus ranking is interpreted as the prestige ladder for these technical occupations.
van Blokland- Vogelesang
Appendix I E o ( X ) (first row) and Ee(T) (second row) for selected values of 6 and k.
0 .05 .10 -15 .20 .25 .30 .35 A0 .45
k 3
.loo .933 .199 367 .297 302 .392 .738 ,486 .676 .576 .616 .663 .558 .747 .502 ,828
.448 .50 .55 .60 .65 .70 .75
.80 .85 .90 .95
1.OO
.905 .397 ,978 .348 1.048 .301 1.115 .256 1.179 .214 1.239 .174 1.297 .135 1.352 ,099 1.404 .064 1.453 .03 1 1.500 0
4
5
.152 .949 .310 .897 .471 343 ,636 .788 303 .732 .972 .676 1.141 .620 1.309 .564 1.475 .508 1.638 .454 1.798 .401 1.953 ,349 2.103 .299 2.248 .251 2.388 .204 2.522 .159 2.650 ,117 2.772 ,076 2.889 .037 3.000 0
,205 .959 .421 .916 .647 ,871 .884
.823 1.132 .774 1.388 .722 1.653 .669 1.924 .615 2.199 .560 2.477 .505 2.755 .449 3.031 .394 3.304 .339 3.572 .286 3.832 ,234 4.085 ,183 4.329 .134 4.563 ,087 4.786 ,043 5.000 0
6
258 .966 .532 .929 324 .890 1.134 .849 1.464 305 1.813 .758 2.180 .709 2.566 .658 2.967 .604 3.382 .549 3.806 .492 4.238 .435 4.672 ,377 5.105 .319 5.533 ,262 5.953 ,206 6.362 .152 6.757 ,099 7.137 .048 7.500 0
7 .310 .970
.643 .939 1.Ooo ,905 1.384 368 1.797 329 2.240 .787 2.714 .741 3.221 .693 3.759 .642 4.326 .588 4.920 .53 1 5.536 ,473 6.168 .413 6.810 .351 7.455 .290 8.096 ,229 8.726 .169 9.340 ,110 9.932 ,054 10.500
0
8 .363 .974 .754 .946 1.177 .916 1.634 383 2.130 348 2.668
309 3.251 ,768 3.882 .723 4.564 .674 5.295 .622 6.075 .566 6.899 ,507 7.762 .446 8.654 .382 9.565 ,317 10.483 ,251 11.396 .186 12.293 ,122 13.164 ,060 14.000
0
9
10
.415 .977 365 .952 1.353 .925 1.884 ,895 2.463 .863 3.096 328 3.789 ,790 4.547 ,747 5.375 .701 6.277 .651 7.256 .597 8.308 ,538 9.429 .476 10.609 .411 I 1334 ,343 13.088 .273 14.350 ,203 15.601 ,133 16.823 ,065 18.000
,468 .979 .976 .957 1.530 .932 2.134 .905 2.796 376 3.525 ,843 4.327
0
,808 5.212 .768 6.190 .725 7.268 .677 8.452 .624 9.747 ,567 11.149 .504 12.651 .438 14.237 .367 15.885 ,294 17.565 .219 19.248 ,145 20.902 .07 1 22.500
0
Unfolding and Consensus Ranking
257
References Arrow, J. K. (1951). Social choice and individual values. New York: Wiley. Black, D. (1948a). On the rationale of group decision making Journal of Political Economics, 56, 23-34. Black, D. (1948b). The decisions of a committee using a special majority. Econometrica, 16, 245-261. Cohen, A., & Mallows, C. L. (1980). Analysis of ranking data. Bell Laboratories Memorandum, Murray Hill, NJ. Coombs, C. H. (1954). Social choice and strength of preference. In R. M. Thrall, C. H. Coombs, & R. L. Davis (Eds.), Decision processes. New York: Wiley. Coombs, C. H. (1964). A theory of data. New York: Wiley. Davison, M. L. (1979). Testing a unidimensional, qualitative unfolding model for attitudinal or developmental data. Psychomenika, 44, 179-194 Feigin, P. D., & Cohen, A. (1978). On a model for concordance between judges. Journal of the Royal Statistical Society, B, 40, 203-213. Goodman, L. A. (1954). On methods of amalgation. In R. M. Thrall, C. H. Coombs, & R. L. Davis (Eds.), Decision processes. New York: Wiley. Goldberg, A. I. (1976). The relevance of cosmopolitan/local orientations to professional values and behaviour. Sociology of Work and Occupation, 3 , 331-356. Kemeny, J. G. (1959). Mathematics without numbers. Daedalus, 88, 577-591. Kemeny, J. G., & Snell, J. L. (1972). Preference rankings, an axiomatic approach. Cambridge, MA: MIT Press. Kendall, M. G . (1975). Rank correlation methods. London: Griffin Lehmann, E. L. (1975). Nonparametrics. New York: McGraw-Hill. Mallows, C. L. (1957). Non null ranking models I. Biometrika, 44, 114130. van Blokland-Vogelesang, R. A. W., Verbeek, A., & Eilers, P. (1987a). Iterative estimation of pattern and error parameters in a probabilistic unfolding model. In E. E. Roskam & R. Suck (Eds.), Progress in mathematical psychology I . Amsterdam: Elsevier (North-Holland).
258
van Blokland-Vogelesang
van Blokland-Vogelesang, R. A. W. (1988). UNFOLD: A computer program for the unfolding of complete rankings of preference in one dimension. Free University, Amsterdam van Blokland-Vogelesang, R. A. W. (in press). Midpoint sequences, intransitive J scales and scale values in unidimensional unfolding. In E. E. Roskam & E. Degreef (Eds.), Progress in mathematical psychology 11 Amsterdam: Elsevier (North-Holland). Van der Ven, A. H. G. S . (1977). Inleiding in de schaaltheorie. Deventer: Van Loghum Slaterus. Van Schuur, W. H. (1984). Structure in political beliefs. Doctoral thesis, University of Groningen, The Netherlands.
New Dcvclopments in Psychological Choicc Modeling G . Dc Soctc, H. Fcgcr and K. C. Klaucr (cds.) 0Elscvicr Scicncc Publisher B.V. (North-Holland), 1989
259
UNFOLDING THE GERMAN POLITICAL PARTIES: A DESCRIPTION AND APPLICATION OF MULTIPLE UNIDIMENSIONAL UNFOLDING Wijbrandf H. van Schuur University of Groningcn, Thc Nethcrlands This paper discusses a numbcr of problems with existing unfolding modcls and proposes a strategy of analysis to overcome these probIcms. This stratcgy assumcs dichotomous or dichotomized data, and derives unfoldability critcria from information about ordcrcd triples of stimuli. A unidimcnsional unfolding scale conforming to thcse criteria can bc found for a maximal subset of stimuli. This proccdure can bc applicd to full or partial rank ordcrs of preference, which arc dichotomized to “pick kln” data, and to Likcrt-type rating scales, which arc dichotomizcd to “pick anyln” data. This procedurc is applicablc to largc data scts, such as survey data. As an cxamplc, the proccdurc is applicd to prcfcrcnccs for five German political parties in electoral survcys in 1969, 1972, and 1980. A dominant left-right unfolding dimcnsion is found, and violations of this represcntation are discusscd. 1. Introduction
Coombs’ unfolding model, first presented in 1950, is regarded by many methodologists in the social sciences as an appealing model for the analysis of preferences. Introductions to unfolding appear in many textbooks on scaling, and computer programs for unfolding analysis continue to be developed. Despite favorable attention, however, reports of successful applications of unfolding are rare, and unfolding programs have only very recently found their way into some general-purpose statistical packages. This paper is a rcviscd version of an articlc publishcd in Zcirschrifi fur Sozialpsychologie, 1987, 18, 258-273.
260
van Schuur
There are two major reasons for the relative neglect of the unfolding model by applied social researchers. One is that most techniques for unfolding operate on full rank orders of preference, which ties unfolding to a relatively unpopular form of data collection. The other is that until now we have not been able to satisfactorily unfold imperfect data (i.e., data that do not conform perfectly to the unfolding model). In this paper I propose a new strategy for unidimensional unfolding that solves both these problems. This strategy is implemented in a computer program called MUDFOLD, for Multiple UniDimensional unFOLDing. To exemplify of the technique, I present the unfolding analysis of preferences for five German political parties by German voters in 1969, 1972, and 1980. 2. Background: Problems With Existing Unfolding Models 2.1 Unfolding Analysis of Different Data Types
The tradition of using full rank orders of preference in unfolding analyses has obscured the fact that other types of data may be represented along an unfolding dimension as well. In particular, data obtained from five-point Likert-type rating scales may conform to the unfolding model. Researchers who do not realize that rating data may fit the unfolding model often subject their data to factor analysis. The use of factor analysis with data that can be unfolded gives rise to a problem, however: an extra, artificial factor will be introduced, in addition to the number of factors (i.e., dimensions) necessary for an unfolding representation (Coombs & Kao, 1960; Ross & Cliff, 1964). Such factor analysis results may then lead to interpretations of the data different from those that an unfolding analysis would suggest. However, researchers who try to unfold rating data with the currently available unfolding models often get degenerate results because their data contain many ties. The unfolding model presented below is capable of analyzing rating data without the problem of degeneracy. An essential aspect of this model is that data are dichotomized, e.g., into “preferred” and “not preferred” response alternatives. Dichotomization allows rating data and a large number of other data types, including full and partial rank orders, to be used in unfolding analysis. It is a desirable technique also for additional reasons, as will be discussed shortly.
Unfolding the German Political Parties
26 1
2.2 Unfolding Analysis of Imperfect Data When a data set is not perfectly unfoldable, its imperfections can be attributed either to random noise or to systematic deviations from the unidimensional unfolding model. Random noise can be handled by using a stochastic rather than a deterministic model. Stochastic models have been discussed by Sixtl (1973), Zinnes and Griggs (1974), Bechtel (1976), Dijkstra et al. (1980), and Jansen (1983), among others. All of these models are unsatisfactory in certain ways. Some are designed to be used only in a confirmatory way, i.e., to test whether a known order of all stimuli can be interpreted as a Jscale. Others require repeated questioning of subjects to obtain estimates of the probability with which they prefer one stimulus over the other. Still others depend on assumptions that are probably incorrect; for example, especially problematic is the assumption that if subjects are given a choice between two stimuli that lie close together on the J-scale but far away from their ideal point, they will almost deterministically prefer the one closer to their ideal point. However, they will probably prefer both to approximately the same (low) degree. Despite these difficulties with stochastic unfolding models, the strategy of relaxing the criterion for perfect representation to allow stochastic representation seems advantageous. I return to this when I propose a new strategy for unidimensional unfolding. Systematic deviations from the unidimensional unfolding model can be explained in at least four ways. According to one interpretation, respondents begin the task of picking preferred items by using the most salient common criterion, but in the course of evaluating stimuli that are less preferred, they bring other, more idiosyncratic criteria into play. According to a second explanation, the preference judgment process is multidimensional rather than unidimensional, i.e., two or more criteria for preference play an independent but simultaneous role in all preference judgments for all stimuli. Thirdly, the set of stimuli may not be homogeneous with respect to the latent unfolding dimension; that is, one or more of the stimuli are indicators of a different latent trait. Finally, the set of subjects may not be homogeneous with respect to the latent unfolding dimension: they may either use different dimensions, or they may perceive the stimuli differently on the same dimension.
262
van Schuur
Let us look at these problems in more detail and consider some possible strategies for dealing with them.
2.2.1 Analyzing Dichotomous Data: “pick anyln” and “pick kln” Analysis The unfolding model assumes that successively chosen stimuli are decreasingly good substitutes for the subject’s ideal stimulus according to the criterion used for selection. However, in the course of giving a full rank order of preference for n stimuli, a subject may begin to use other criteria for choosing that are different from the criterion with which he or she started out. Coombs (1964) talked about the “portfolio” model in this connection, and Tversky (1972) and Tversky and Sattath (1979) suggested an “Elimination by Aspects” (EBA) model, in which different criteria for preference are hierarchically ordered. To deal with this problem in such a way that we can still find the dominant criterion that is used by all subjects, we should restrict ourselves to distinguishing only the first few most preferred stimuli from the remaining ones: otherwise we risk introducing idiosyncratic noise. Distinguishing the first few most preferred stimuli from the remaining ones can be done by dichotomizing the preference responses of each subject (see Leik & Matthews, 1968; Coombs & Smith, 1973; and Davison, 1980, among others). This is accomplished by assigning the code “1” to each subject’s most preferred stimuli; and “0” to the remaining stimuli. The cutoff point between preferred and non-preferred stimuli depends on the type of data. In the case of Likert-scale items one or more response categories (e.g., “strongly in favor”) can be considered as the “preferred response” and the remaining categories as “nonpreferred responses”. Since a subject can give the “preferred response” to any number of Likert-scale items, he can pick any of the n stimuli as “most preferred”. In the case of full or partial rank orders of preference, however, the researcher generally has to decide which k ( k 2 2) most preferred stimuli will be distinguished from the remaining ones. The unfolding analysis of dichotomous data has been called “parallelogram analysis” by Coombs (1964): a data matrix of subjects and stimuli, ordered according to their position on a perfect unidimensional unfolding scale shows a parallelogram pattern of “ l ” s from top left to bottom right.
Unfolding the German Political Parties
263
Using dichotomous data has both advantages and disadvantages. An advantage is that a large number of different data types (including full and partial rank orders of preference, Likert-type rating scales, and roll call data) can all be subjected to such an analysis; all that is needed is that the most preferred stimuli can be distinguished from the others. A disadvantage is that, in contrast to the unfolding analysis of full rank orders of preference, the unfolding analysis of dichotomous data only leads to a qualitative J-scale. This means that no metric information about the relative distances between the stimuli is available and therefore that subjects cannot be discriminated as well as in a quantitative J-scale. However, Davison (1979), has argued convincingly that it is in any event unlikely that a single quantitative J-scale can be found for a large group of subjects in practical applications of unfolding analysis, because subjects often use different subjective metrics.
2.2.2 Multidimensional Unfolding Multidimensional unfolding models assume that subjects do not use a single criterion in making their preference choices, but rather use two, three, or even more independent criteria simultaneously. Multidimensional models have been proposed by Bennett and Hays (1960), Roskam (1968), Schonemann (1970), Carroll (1972), Gold (1973), Kruskal, Young, and Seery (1973) and Heiser (1981), among others. Multidimensional unfolding models are appealing in part because there are various ways for combining the different dimensions, for example, the vector model and the weighted distance model (e.g., Carroll, 1972). However, disadvantages are that technical problems of degeneracy and the representation of I-scales as points in essentially open isotonic regions are more likely to arise in doing multidimensional than unidimensional unfolding. Also problematic is the assumption that different criteria for preference (i.e., more than one dimension) are used simultaneously and independently, and that they are relevant for each stimulus and each subject. Proponents of multidimensional unfolding insist that reality is multidimensional: e.g., a chair has a color, a weight, and a size; a person has an age, a sex, and a preference for certain goods; and a political party may be large, religious, and right wing. Still, subjects often do not evaluate items on the basis of all possible attributes at once. Often they compare them with respect to one attribute only, e.g., sizes of chairs, ages of
264
van Schuur
subjects, and ideological positions of political parties. There may be instances in which a multidimensional model is indeed the best one. But the relative merit of multidimensional versus unidimensional unfolding models in particular situations should be determined empirically.
2.2.3 Selecting a Maximal Subset: of Stimuli or of Subjects? It is an established practice in (multidimensional) unfolding analysis to assign stress values to subjects. This practice reflects the assumption that difficulty in finding a representation can be explained by reference to subjects who apparently used criteria other than the overall dominant one(s), or who perhaps even behaved randomly. A possible procedure for reducing imperfection in one’s data is thus to delete subjects whose stress values are too high. However, high stress values may arise because one or more stimuli do not belong in the same universe of content along with the other stimuli, and therefore cannot be adequately incorporated into the same representation. For unfolding to apply, subjects should differ in their preferences for the stimuli, but they should agree about the cognitive aspects of the stimuli: whether gentlemen prefer blondes or brunettes is a different matter from establishing whether Marilyn is blond or brown. If there is disagreement among the subjects about the characteristics of a stimulus, differences in preference will be difficult to represent; such a stimulus can better be deleted from an unfolding scale. There are reasons for preferring the deletion of stimuli to the deletion of subjects from an unfolding scale. Subjects are often selected as representatives of a larger population. Deleting subjects therefore lowers the likelihood that the result will generalize successfully from a sample to a population. Stimuli, in contrast, are rarely a random sample from a population of stimuli, but rather are intended to serve as the best and most prototypical indicators of a latent trait. In other words, we are often less interested in the actual stimuli than in their potential for allowing us to measure subjects along a latent trait. This means that the deletion of stimuli can generally be defended more easily than the deletion of subjects. Regardless of whether stimuli or subjects are deleted, an explanation for their nonscalability is called for. For stimuli this is especially true in the case that they constitute an entire population, such as all political
Unfolding the German Political Parties
265
parties of a country. The nonscalability of certain stimuli in one dimension may mean that a less parsimonious spatial (multidimensional), or a discrete (cluster, or tree) representation is needed instead of a unidimensional unfolding representation. Alternatively, different well-specified groups of subjects may consistently use different criteria in judging a set of stimuli. Explaining why certain subjects are difficult to represent on an unfolding dimension or in an unfolding space is generally even more difficult than explaining why certain stimuli do not fit. Such explanations are virtually nonexistent in the applied unfolding literature.
3. A Proposed Strategy for Unidimensional Unfolding The unidimensional unfolding model proposed in this paper is based on a combination of three of the strategies discussed above for dealing with data that are not perfectly unfoldable. It allows for a stochastic representation of a maximal subset of stimuli and all subjects in one dimension, using only the highest preference judgments of each subject. Subjects’ most preferred stimuli are distinguished from the remaining ones in a dichotomous way. The approach used here to find an unfolding scale is a form of hierarchical cluster analysis. The optimal smallest unfolding scale is first found and this is then extended with additional stimuli, for as long as the stimuli jointly continue to satisfy the criteria for an unfolding scale. If no more stimuli can be added to the p-stimulus unfolding scale the procedure begins again by selecting the optimal smallest unfolding scale among the remaining n - p stimuli. The process by which more than one maximal subset of unidimensionally unfoldable stimuli can be found in a given pool of stimuli is called “multiple scaling”.
3.1 The Concept of “Error” We generally do not know in advance which stimuli are representable in an unfolding scale, much less the order in which they are representable. The smallest unfolding scale consists of three stimuli, since it takes at least three stimuli to falsify a proposed proximity relation. For the unfolding scale of the ordered triple ABC, the response pattern in which A and C are preferred but B is not is defined as the “error pattern” for that uiple of stimuli. Since part of our analysis is to establish the order in which the stimuli form an unfolding scale, we must consider all three
266
van Schuur
permutations in which each of the three stimuli is the middle one: BAC, ABC, and ACB (a reflection of the scale is an admissible transformation). For the triple consisting of the stimuli A, B, and C in this order, the response pattern 101 would be the error pattern for an unfolding scale ABC, 110 for the scale ACB, and 01 1 for the scale BAC. The amount of error in an individual response pattern to more than three stimuli in a proposed scale order is defined as the number of proximity relations in that pattern that violate the unfolding model, i.e., the number of triples that contain the error pattern. For example, the pattern (ABCD, 0101) contains one error, namely in the triple BCD; the pattern (ABCD, 1011) contains two errors: in the triples ABC and ABD, the pattern (ABCDEFG, 1110111) contains nine errors: in the triples ADE, ADF, ADG, BDE, BDF, BDG, CDE, CDF, and CDG, and the pattern (ABCDEFG, 1011111) contains five errors: in the triples ABC, ABD, ABE, ABF, and ABG. The amount of error in a data set being evaluated for its fit with a candidate unfolding scale is defined as the sum of errors over the response patterns of all subjects. This figure can be calculated by summing the number of errors in each triple of stimuli fist over all subjects, and then over all triples of stimuli. The number of errors in a data set can be calculated for each candidate unfolding scale, i.e., for each set of three or more stimuli in each of their possible permutations.
3.2 Calculating the Expected Number of Errors The number of errors found in a candidate unfolding scale must be compared with the number of errors that would be expected under statistical independence, i.e., under the assumption that a subject’s preferences for the stimuli are completely unrelated. In this “null model” it is assumed that subjects do not differ systematically from each other in their probability of giving a positive preference response to the stimuli. When subjects are free to select as many stimuli as they wish as their most preferred (the “pick anyln” situation), the expected frequency with which a given set of stimuli is preferred is the product of the relative frequencies with which each of the stimuli is preferred times the number of subjects:
Unfolding the German Political Parties
267
Exp.Frey.(ijk, 101) =p;(l
-pJ)*pk.N
where p 1 is the relative frequency with which stimulus i is “picked” and N is the number of subjects. In the case of “pick kln” data, calculating the expected number of errors under statistical independence is a two-step procedure. This is first explained for “pick 3/n” data, and then generalized. a. The expected frequency of the “111”-response patterns is first determined by applying the n-way quasi-independence model (e.g., Bishop, Fienberg, & Holland, 1975). b. From the expected frequency of the “111”-response to each triple the expected frequency of other response patterns (e.g., 01 1, 101, or 110) is deduced. Ad a. In a data matrix in which each of the N subjects picks three of the n stimuli as most preferred, we can find the relative frequency p i with which each stimulus is picked. Under the statistical independence model these pi's arise from the addition of the expected frequency of triples ( i , j , k ) for all combinations of j and k with a fixed i . The expected frequency of triple i, j , k (is., alJk) is the product of the item parameters f,), f J , and fk times a general scaling factor f,without interaction effects: aijk
=
f f, ‘4 *fk. ‘
The values for f and each f, are found iteratively. This procedure, first described by Davison (1980), was developed by Van Schuur and Molenaar (1982). The details of this procedure are given in Van Schuur (1984). Ad b. Once the expected frequency of the “1 1 1”-pattern of all triples is known, the expected frequency of the other response patterns can be found, assuming that each subject picked exactly three stimuli as most preferred. For example: consider the situation in which subjects pick three out of five stimuli A, B, C, D, and E. For the unfolding scale ABC, the error pattern is the one in which stimuli A and C are picked, but B is not. Since exactly three stimuli were picked, either D or E must have been picked in addition to A and C. We can therefore calculate the expected frequency across all subjects of the error response pattern for the triple (ABC,lOl) by summing the frequency of the expected “ 1 11”-patterns for
van Schuur
268
the mples ACD and ACE. In general: Exp.Freq (ijk, 101) =
f e f i
*fk
*
C
fs.
s+i,j,k
This procedure can be easily generalized to the “pick kln” case, where k = 2 or where k > 3. First we find the expected frequency of each k-tuple, ranking between 1 and Second, we calculate the expected
[i].
frequency of the error response pattern of an unfolding scale of three stimuli, as follows: Exp.Freq. (ijk, 101) = f * f i*fk .Q where Q is the sum over all
[;I;] k - 2 tuples of the product of their
fi’s, where s is not equal to i, j , or k. 3.3 A Coefficient of Scalability Once we know for a triple of stimuli in a particular permutation both the frequency of the error response observed, 0bs.Freq. (ijk, lOl), and the frequency expected under statistical independence, Exp.Freq.(ijk, 101), a coefficient of scalability can be defined analogous to Loevinger’s H (cf. Mokken, 1971, who uses Loevinger’s H for multiple unidimensional cumulative scale analysis): H(ijk) = 1 -
0bs.Freq.(ijk, 101) Exp.Freq. (ijk,101) *
For each triple of stimuli (i, j , and k), three coefficients of scalability can be found: H (jik),H (ijk), and H (ikj). Perfect scalability is defined as H = 1. This means that no error is observed. When H = 0 the amount of error observed is equal to the amount of error expected under statistical independence. The scalability of a (candidate) unfolding scale of more than three stimuli can also be evaluated. In this case we simply calculate the sum of the error responses to all relevant triples of the scale for both the observed and expected error frequency, and then compare them, using the coefficient of scalability H :
Unfolding the German Political Parties
269
el 0bs.Freq. (ijk, 101)
C H=l-
ijk=l
P
3
Exp.Freq. (ijk, 101) ijk=l
The scalability of individual stimuli in the scale can also be evaluated. This is done for each stimulus separately by adding up the frequencies of the error patterns, observed and expected, in only those mples that contain the stimulus, and then comparing these frequencies by using the coefficient of scalability H .
3.4 The Search Procedure for an Unfolding Scale After obtaining all the information needed for calculating the coefficients of scalability of each triple of stimuli in each of its three essentially different permutations (i.e., 0bs.Freq (ijk, 101), Exp.Freq (ijk, lOl), and H ( i j k ) , we can begin to construct an unfolding scale. This is done in two steps. First the best elementary scale (the “best mple”) is found, and second, new stimuli are added in one by one to the existing scale. The best elementary scale is defined as the triple of stimuli that conforms best to the following criteria: Its scalability value should be positive in only one of its three permutations. This guarantees that the best triple has a unique order of representation. Its scalability value is higher than some user-specified lower boundary. This helps to ensure that the scale will be interpretable in a substantively relevant way. In practical applications a lower boundary of H > 0.30 is suggested as a rule of thumb; this value is modeled on Mokken’s (1971) approach to cumulative scaling.
If more than one triple satisfies the first two criteria, we select that mple with the highest absolute frequency of the sum of the perfect patterns that contain at least two of the three stimuli. Each mple contains eight response patterns, one of which (101) is the error pattern, and four of which (000, 100, 010, and 001) are not very informative about preferences for sets of stimuli. The high frequency of Occurrence of the patterns 111, 011, and 110 guarantees the
van Schuur
270
representability of the largest group of respondents for the elementary scale. Once the best elementary scale is found, each of the remaining n - 3 stimuli is investigated to see whether it might make the best fourth stimulus. The best fourth stimulus (e.g., D) may be added to the best triple (e.g., ABC) in any of four positions: DABC, ADBC, ABDC, or ABCD. These places are denoted as place 1 to place 4. The best fourth or, more generally, p + l’th - stimulus has to meet the following criteria to be included in a p-stimulus unfolding scale: 1. All new
k]
triples that include the p
+ i’th
stimulus and two
stimuli from the existing p-stimulus scale have to have a positive
H (ijk)-value. This guarantees that all stimuli are homogeneous with respect to the latent dimension. 2. The p + l’th stimulus should be uniquely representable, i.e., it can be positioned in only one of the p possible places in the p-stimulus scale. This helps to ensure the later usefulness and interpretability of the order of the stimuli in the scale.
+ l’th stimulus, as well as the H-value of the scale as a whole, have to be higher than some user-specified lower boundary (see second criterion for the best triple). Actually, adding a stimulus to a scale may even increase the H-value of the scale as a whole, depending on the scalability quality of the triples that are added to the scale.
3. The Hi-value of the p
4.
If more than one stimulus conforms to the criteria mentioned above, that stimulus will be selected that leads to the highest scalability for the scale as a whole.
This procedure of extending a scale with an additional stimulus is repeated as long as the criteria mentioned above are satisfied. When no further stimulus conforms to the criteria, the p-stimulus scale is taken as a maximal subset of scalable stimuli. This maximal subset can then be further evaluated as an unfolding scale with additional goodness-of-fit criteria.
Unfolding the German Political Parties
27 1
3.5 Maximizing Perfection or Minimizing Error? The search procedure of finding an unfolding scale is based on identifying a maximal subset of stimuli that contains the smallest proportion of errors in its mples. An alternative procedure might be to find a maximal subset of stimuli that contains the largest proportion of perfect patterns among all of its patterns (e.g., Davison, 1980). If we had applied this procedure we would have been interested in the extent to which the number of perfect patterns found exceeds the frequency of perfect patterns to be expected under statistical independence. We should not accept a set of stimuli as a scale if the observed frequency of perfect patterns is no more than can be explained by assuming statistically independent responses. Observed and expected frequencies of perfect patterns can also be compared by applying Loevinger’s coefficient of homogeneity. For a “pick 3/n” analysis this becomes: H=l-
0hs.Frey. ( i j k , 111) Exp.Freq.(ijk, 11 I ) ’
where 0bs.Freq.(ijk, 11 1) and Exp.Frey.( i j k , 11 1) are counted and calculated, respectively, in the same way as the error response patterns. Perfect response patterns - especially the “ 111”-responses to adjacent stimuli - should occur more frequently than expected under statistical independence, and should have a negative H-value, whereas imperfect patterns, that is, “1 11 ”-responses to non-adjacent stimuli, should occur less often than expected under statistical independence and should have a positive H-value. There are at least two problems in using the frequency of perfect patterns and the H (ijk, 11 1)-coefficients to find an unfoldable subset of stimuli. One problem is the difficulty in finding a “best” or even unique ordering of stimuli. Whereas the unique ordering of a “best” triple of stimuli follows from the (non)occurrence of errors in the three permutations, no unique ordering of the stimuli is implied in the H (ijk, 11 1)coefficients. A more important problem is that with this procedure the evaluation of a set of stimuli as an unfolding scale cannot be based on the error patterns of all triples, but only on the set of (perfect) patterns of adjacent triples. In the “pick kln” case this involves only the evaluation of n - k
+ 1 patterns,
whereas i n the procedure I am advocating all
212
van Schuur
triples are considered. Evaluating the frequencies of the perfect patterns will therefore only be used heuristically at the end of the scaling procedure to help in considering other possible start sets for the search procedure described above, and in evaluating an hypothesized unfolding scale. 3.6 The Dominance Matrix and the Adjacency Matrix The use of the coefficient of scalability as a test for the goodness-of-fit of a candidate unfolding scale can be criticized on grounds that the coefficient is not specifically tuned to the unfolding model: a good fit can be obtained for data that conform either to the unfolding model or to Guttman’s cumulative scaling model. Although this criticism is justified for the “pick anyln” case, its force can be reduced by subjecting the dominance matrix and the adjacency mamx of the unfoldable stimuli, in their scale order, to visual inspection. The dominance matrix is a square asymmetric matrix whose cells ( i J ) display the percentage of subjects who prefer stimulus i but not stimulus j . When the stimuli are ordered in their scale order along the J-scale, then for each stimulus i the percentage pi, should decrease from the first column toward the diagonal and increase from the diagonal toward the last column. The adjacency matrix is a lower triangle whose cells ( i , j ) show the percentage of subjects who “picked” both i and j . When the stimuli are ordered in their scale order along the J-scale, then for each stimulus i the percentage pi, should increase from the first column toward the diagonal and decrease from the diagonal toward the last row. The procedure for detecting stimuli that disturb the expected pattern of characteristic monotonicity is analogous to the procedure Mokken (197 1) used in multiple unidimensional cumulative scaling. Table 1 shows the dominance matrix and the adjacency matrix for a perfect unidimensional unfolding data set. Note that in the dominance mamx no column-wise monotonicity is expected. If stimuli form a cumulative scale rather than an unfolding scale, the monotonicity patterns of the dominance matrix of stimuli will differ from those just described, in that they will not reverse around the diagonal. An important difference between the use of the coefficients of scalability and the use of the dominance and adjacency matrices must be
Unfolding the German Political Parties
213
Table 1. Dominance Matrix and Adjacency Matrix
for a Perfect Four Stimulus Unfolding Scale. ~~
Data matrix A
B
C
D
Frequency
1
0 1
0 0 1
0 0
P
0 0 0
0
r
0
1
S
0
0 0 1 0
t
1
X
C p + t q+t
D p+t+w q+t+u+w
1
0 0 1
0
0 0 1 1
0 1 1
1 1 I 1
4
U V W
Dominance matrix A A B C
D
B
P q+u+x
r+u+v+x s+v+x
r+v s+v
r+u+w S
Adjacency matrix A
B
B C
t+W W
u+w+x
D
0
X
C
v+x
mentioned here. The coefficients of scalability reflect the relative number of errors, whereas the matrices reflect the absolute number of errors. Dijkstra et al. (1980) have shown already that the characteristic monotonicity requirement is not a sufficient condition for a set of stimuli to be interpreted as an unfolding scale. They give a counter example in which a perfect characteristically monotone dominance matrix was derived from Iscales that did not belong to the same J-scale. Looking at the pattern of absolute frequencies only and disregarding the information from the H coefficients may therefore lead to unjustified acceptance of an unfolding
214
van Schuur
scale.
3.7 Scale Values Once an unfolding scale of a maximal subset of stimuli has been found, scale values for stimuli and subjects can be determined. The scale value of a stimulus is defined as its rank number in the unfolding scale. The scale value of a subject is defined as the mean of the scale values of the stimuli the subject “picked” as most preferred. Subjects who did not pick any stimulus from the scale cannot be given a scale value, and have to be treated as missing data.
4. An Application to Preference for German Political Parties Transitive rank orders of preference were derived from pairwise preference comparisons for five German parties by German voters in 1969 (N = 907) and in 1980 ( N = 1316). Full rank orders of preferences given in 1972 were obtained directly from a random sample of 1785 German voters (the data are published in Pappi (1983); Norpoth (1979a, 1979b) also discusses the 1969 and 1972 data). The MUDFOLD model will be applied to a “pick 2/5” and a “pick 315” analysis of these three data sets. The parties are denoted by the capital letters A-E as follows: A: CDU/CSU (Christian democrats); B: SPB (social democrats); C: FDP (liberals); D: NPD (neo-national socialists); E: DKP (communists). The scalability values of each triple of all stimuli in each permutation, as well as the dominance and adjacency matrices, are given in the Appendix in Table 2 through Table 7. For each permutation of a triple i,j,k (e.g., jik, ijk, and ikj) the observed and expected frequency of the error patterns are given (i.e., the patterns ijk,Oll, ijk,lOl, and ijk,llO, which are the error patterns of the scales jik, ijk, and ikj, respectively), as wel! as their appropriate H-value. Expected frequencies are rounded to the nearest integer. On the basis of this information an unfolding scale of a maximum subset of stimuli is constructed. In the “pick 3/5” analyses, the observed and expected frequencies of the “ijk,lll”-patterns are also given, together with the matching H-value. The dominance matrix contains the percentage of subjects who “pick” the row party but not the column party among the most preferred. The adjacency matrix contains the percentage of subjects who “pick” both the row party and the column
Unfolding the German Political Parties
215
party among the most preferred.
4.1 A “Pick 2/5” Analysis of the 1969 Data Five of the ten triples have a positive and high enough coefficient of scalability (i.e., > 0.30) in only one of the three possible permutations, that is, they are “unique” triples:
BAD : SPD - CDU - NPD EBA : DKP - SPD - CDU CAD : FDP - CDU - NPD ADE : CDU - NPD - DKP BED : SPD - DKP - NPD
(H = 0.71) ( H = 0.80) (H = 0.66) ( H = 0.80) (H = 0.71).
However, it is impossible to construct a scale of more than three stimuli. The three major parties (ABC, or: CDU, SPD and FDP) have a negative H-value in all three permutations, which means that they cannot be represented together i n one unidimensional unfolding scale. Moreover, the scale value of a larger scale containing all three major parties would be very low. Finally, the position of the DKP (stimulus E) - either to the left of the SPD (stimulus B), or close to the NPD (stimulus D) - cannot be uniquely determined. The five three-item scales can be interpreted either in terms of a left-right dimension (the first three), or in terms of a government-opposition dimension (the last two). On the basis of the dominance and adjacency matrices an unfolding order of the stimuli SPD CDU - FDP - NPD - DKP might have been expected. However, the “unique” triple CAD (FDP - CDU - NPD), which we have already seen has an acceptably high coefficient of scalability, violates this order. The fact that the triple with the three major parties cannot be unfolded suggests that the German voters did not all use the same criterion for preference for the five political parties.
4.2 A “Pick 3/5” Analysis of the 1969 Data.
The best candidate starting triple is the only “unique” ordered triple, BCE, that has a H-value larger than 0.30, namely 0.49. Since the triple BCD has negative H-values in all three permutations, the scale BCE cannot be extended with stimulus D. If we follow the strict procedure, the best triple cannot be extended with stimulus A either, since A can be represented in two places in the scale: position 1 (giving scale ABCE), or
van Schuur
276
position 2 (giving scale BACE). Moreover, in both positions the scalability value of stimulus A, H(A), falls slightly below the user-specified lower boundary of 0.30. And even if we are willing to accept stimulus A in the unfolding scale, it is difficult to choose between these two positions on the basis of the monotonicity patterns in the dominance and adjacency matrices. However, if we relax the criterion of unique representability to allow stimulus A to be represented in the position that gives the highest overall H-value, then it will be represented in scale BACE (SPD - CDU FDP - DKP). This scale can probably best be interpreted in terms of a “government-opposition” dimension: the “Great Coalition” between SPD and CDU governed the Federal Republic until 1969. The scale is rather weak, however. Scale ABCE A: CDU B: SPD C: FDP E: DKP
Scale BACE
Pi
Hi
0.98 0.97 0.94 0.02
0.28 0.32 0.32 0.34 H = 0.32
B: SPD A: CDU C: FDP E: DKP
Pi
Hi
0.97 0.98 0.94 0.02
0.35 0.29 0.32 0.37 H = 0.33
where pi is the proportion of subjects who “pick” stimulus i as most preferred, and Hi coefficient of scalability for item i.
4.3 A “Pick 2/5” Analysis of the 1972 Data The best triple among the “unique” triples ADE, CBE, and BED is CBE (or reflected as EBC: DKP - SPD - FDP): its frequency of admissible patterns (011 and 110) is highest, and its H-value is 1.00. Unfortunately, as in the analysis of the 1969 data, this triple cannot be extended to form a scale that comprises all three major parties (CDU, SPD, and FDP). This is because each of the three pairs that can be made of these three parties (CDU + SPD; CDU + FDP; and SPD + FDP) are mentioned approximately as often as would be expected under statistical independence. There are 1730 respondents, or 97%, who “picked” two of the three major parties as most preferred. The representation of the two small parties DKP (stimulus E) and NPD (stimulus D) is also problematic. The unique triples ADE and BED suggest that D and E are relatively close together, whereas information
Unfolding the German Political Parties
277
from BAD and CAD suggests that D is relatively close to A (CDU), and information from ABE suggests that E is relatively close to B (SPD), which is in accordance with the suggestion from other analyses that D and E are the end points of the scale. The two other criteria for evaluating data as an unfolding scale (the Occurrence of perfect patterns and the characteristic monotonicity patterns of the dominance and adjacency mamces) do not suggest the same solution: the four pairs of parties that are mentioned together more often than expected under statistical independence, (AD, DE, BE, and BC) might lead us to expect a scale ADEBC (i.e., CDU - NPD - DKP - SPD - FDP). However, the dominance and adjacency matrices suggest an unfolding scale ECBAD (DKP - FDP - SPD - CDU - NPD), in which the only deviations of the characteristic monotonicity patterns involve the item pairs BE (SPD-DKP) and CE (FDP-DKP). In fact, for the three major parties this last scale conforms to the order that Norpoth (1979a, 1979b) suggested on the basis of his own analyses: FDP - SPD - CDU, which he interpreted in terms of a “religious-secular” dimension. Still, this scale has an H-value of only 0.08, which makes the null hypothesis of statistical independence very plausible. The position of the two smaller parties, DKP and NPD, is based on the responses of a small number of subjects and therefore highly unstable: of the 1785 subjects only 22 mentioned the DKP, 35 mentioned the NPD, and 2 mention both the DKP and NPD among their two most preferred parties.
4.4 A “Pick 3/5” Analysis of the 1972 Data The best elementary unfolding scale is the triple ACE, since the sum of the patterns 011, 110, and 111 is higher than for the other “unique” triples (the ordered triples ABE, ADE, BED, and CED). Its H-value is 0.38. (In this example the triple ABE could also have been considered: the sum of its patterns 011, 110 and 111 is only marginally less, and its H ( i j k ) value is larger (0.65). The search procedure would lead to the same final conclusion, however.) This best triple cannot be extended with stimulus D, since there is at least one negative scalability value in the triples ACD, ADE, and CDE for each of the four possible places. Stimulus B can be represented in more than one position: position 2 (scale ABCE) and position 3 (scale ACBE). The position that gives the highest overall H-value is position 3, giving as
278
van Schuur
a final scale: CDU - FDP - SPD - DKP. The two perfect response patterns of this scale (ABC and BCE) are the two most preferred patterns, and they occur more often than expected under statistical independence, so these results do not violate the unfolding interpretation. This order of the stimuli conforms to the reflected order of the parties on the ideological left-right continuum. Since it is customary to represent political parties from left to right, I have reversed the order to EBCA in the final scale, as well as in the dominance matrix and adjacency mamx. Scale EBCA
E: DKP B: SPD C: FDP A: CDU
Pi 0.05 0.98 0.98 0.93
Scale ECBA
Hi 0.63 0.43 0.37 0.32 H = 0.42
DKP FDP SPD CDU
Pi
Hi
0.05 0.98 0.98 0.93
0.54 0.29 0.36 0.30 H = 0.36
Let us return for a moment to the nonrepresented party, the NPD (stimulus D). The triples incorporating stimulus D that have the highest scalability values are BAD, CAD, and ADE. The triples BAD and CAD are mentioned relatively frequently (24 and 30 times, respectively), more frequently than would be expected under statistical independence. This suggests that the NPD should be represented to the right of the CDU. The scale in this case would be EBCAD. However, triple ADE also has a high scalability value ( H = 0.76), and these three stimuli are also mentioned together more often than expected under statistical independence. In fact, all triples including both stimuli D and E (NPD and DKP) occur more often than expected under statistical independence. This relatively frequent co-occurrence of NPD and DKP - which are at opposite end on the ideological left-right continuum - suggests that at least some subjects used another dimension in establishing their preference order (e.g., a “protest”, “anti-system”, or “government-opposition” dimension). 4.5 A “Pick 215” Analysis of the 1980 Data
Among the unique mples (ACB, ADE, BED, CAD and CBE) mple CBE is the best one, according to the criteria given in 3.4. It can be extended with stimulus D in the fourth place, giving scale CBED. The best triple
Unfolding the German Political Parties
279
cannot be extended with stimulus A: although all H(ijk)-values for the scale ACBE are positive, the scalability value of the scale as a whole drops below 0.30. The representation of stimulus D (NPD) next to E (DKP) rather than at the other end of the scale, next to the FDP, depends on a single person who mentions stimuli D and E together. The expected number of subjects who would mention D and E together under statistical independence is 0.089. Because of this one subject, the values for H(EBD) and H(ECD) become negative, which precludes the scale EBCD, according to our criteria. Moreover, the scalability of scale DEBC is higher then that of the scale EBCD. Scale EBCD
E: DKP B: SPD C: FDP D: NPD
Scale DEBC
Pi
Hi
0.01 0.69 0.77 0.01
0.46 0.67 0.67 0.52 H = 0.61
NPD DKP SPD FPD
Pi
Hi
0.01 0.01 0.69 0.77
0.96 0.83 0.83 0.89 H = 0.88
4.6 A “Pick 3/5” Analysis of the 1980 Data
The best triple has to be sought among the three unique triples with a H(ijk)-value of over 0.30: ABE, ACE, and BED. Triple ACE has the highest sum of the frequencies of the 011, 110 and 111 patterns, and is therefore chosen as the best triple (H = 6.40).Stimulus D cannot be added to the scale: according to the H-values of triple CDE, E should be represented between C and D, but due to the negative H-value of the ordered triple AED this representation is not possible. Stimulus B can be represented in two places: position 2 (scale ABCE) and position 3 (scale ACBE). Representation of stimulus B in position 3 gives the highest overall H-value, with no violations of the characteristic monotonicity patterns of the dominance matrix and the adjacency matrix. This scale (in reflected order: DKP - SPD - FDP - CDU) can be interpreted in terms of a left-right dimension.
van Schuur
280
Scale ECBA
Scale EBCA
E: DKP B: SPD C: FDP A: CDU
Pi
Hi
0.07 0.97 0.99 0.92
0.81 0.74 0.64 0.58 H = .70
E: DKP C: FDP B: SPD A: CDU
Pi
Hi
0.07 0.99 0.97 0.92
0.75 0.21 0.39 0.35 H = .39
5. Discussion
Applications of MUDFOLD, a computer program for the unidimensional unfolding analysis of dichotomous data, to preferences for five political parties by West German voters in 1969, 1972, and 1980, lead to unfolding scales for four of the five parties. It is not possible to represent all five German parties in a unidimensional unfolding scale. The difficulty in unfolding the preference rankings of the five German parties has already been pointed out by Norpoth (1979a, 1979b) and Pappi (1983). The detailed information, obtained through MUDFOLD analyses, suggests two major reasons for this difficulty: first, that there is very little structure in preferences for the three major parties, and second, that the two smallest parties can be represented in two conflicting ways. In the three years for which data have been analyzed, he number of subjects who mentioned one of the three pairs of the three major parties together as most preferred in the “pick 2/5” analysis, or who mentioned all three parties together as most preferred in the “pick 3/5” analysis, hardly deviates from the number that would be expected under statistical independence. Two possible explanations may be given for this finding, both of which are compatible with an unfolding representation. According to the first, the three parties are very close together on the unfolding scale, and are therefore difficult for subjects to distinguish. Second, subjects may differ in their interpretation of the position of the three major parties along the underlying dimension. For instance, for some people the FDP may be representable to the right of the CDU, whereas for others the FDP should be placed between SPD and CDU, or even to the left of the SPD. Such cognitive difference would make the unidimensional representation of differences in preferences impossible. Klingemann (1972) and Pappi (1980), among others, present some evidence supporting this phenomenon.
Unfolding the German Political Parties
28 1
The conflicting possible representation of the DKP and NPD as either close together or each at opposite ends of the unfolding scale is found in all three data sets. This also suggests that different subjects may base their preference judgments on different criteria. However, only a small number of subjects mentioned these parties together among their two or three most preferred ones, and it is difficult to make valid inferences on the basis of a comparison between small numbers of observed and expected errors. An alternative explanation for the results for these two parties is that some respondents or some coders may have inadvertently reversed the appropriate pairwise preference judgments or the preference I-scales. The least-preferred parties would then have been interpreted as the most preferred, and vice versa. This reversed order is more in agreement with the dominant unfolding interpretation. However, there is no way to validate this suggestion on the basis of the published data. Despite the difficulty of representing all parties along one unidimensional unfolding scale, we still find some easily interpreted structure in subsets of the data. In 1969 the preference effects of the “Great Coalition’’ are clearly visible in a “government-opposition” dimension. Most of the additional structure found among unfoldable triples or four-tuples of parties can be interpreted in terms of the “left-right” dimension, which Klingemann (1972) also identified as important on the basis of other evidence. These results do not conform to the interpretation given by Norpoth (1979a, 1979b) for the same data. Norpoth analyzed these data by constructing an unfolding scale for a maximal subset of subjects rather than a maximal subset of stimuli, and concluded that the three major parties would form the best unfolding scale in the order FDP - SPD - CDU, which he interpreted as a “religious-nonreligious” dimension. However, he did not find this interpretation very plausible: “... the overwhelming share [of subjects] claimed by this dimension strains credulity. Religious issues have rarely if ever topped the priority list of the public in recent years” (1979b, p. 729). By insisting on keeping all stimuli in the scale, which forced him to throw out at least 20% of his subjects without any substantive explanation, he found it impossible to obtain the left-right results that he also had expected on the basis of Klingemann’s previous studies.
282
van Schuur
The major reason for the difference between Norpoth’s findings and those presented here, lies in Norpoth’s emphasis on absolute numbers of errors, compared to my emphasis on the number of errors relative to the number of errors that would be expected under the null hypothesis, i.e., the hypothesis that subjects’ responses are statistically independent of each other. It is true that the permutation FDP-SPD-CDU, that Norpoth accepts as an unfolding scale among the three major parties, is the order that leads to the smallest absolute number of errors. However, this number of errors does not differ significantly from the number of errors that would be expected under the null hypothesis of statistical independence.
Appendix. Detailed Information on MUDFOLD Analyses Table 2. 1969 Data, Pick 215, N = 907 Error paitcrns
ABC ABD ABE ACD ACE ADE BCD BCE BDE CDE
Obs
Exp
Huik)
Obs
Exp
143 3 11 1 2 6 1 2 6 6
141 10 9 3 2 .2 3 2
-.01 .71 -.29 .66 .20 -32.54
163 15 2 15 2 2 3 11 11 2
162 12 10 12 10 10 10 9 9 2
.2 .2
.66 -20 -32.54 -32.54
H(iJk) Obs -.01 -.29 .80 -.29 .80 30
.71 -.29 -.29 .20
Dominance matrix
B: SPD A:CDU C:FDP D:NPD EDKP
561 561 561 163 163 15 143 143 3 1
Exp
H(ikJ)
558 558 558 162 162 12 142 142 10 3
-.01 -.01
-.01 -.01 -.01 -.29 -.01 -.01
.71
.66
Adjaccncy matrix
SPD
CDU
FDP
NDP
DKP
-
17 -
63 64
18
16
-
2
1
1
2
3 2
79 80 34 2
78
20
SI’D
SI’D 81 CDU 62 FDP 16 34 2 NPD 0 - D K P 1
CDU
FDP
NDP
2 0
1
18 0 0
DKP
Table 3. 1969 Data, Pick 315, N = 907
C
E m r patterns
2s
111 patterns
a
00
Obs ABC ABD ABE ACD ACE ADE BCD BCE BDE CDE
Exp
14 12 2
20 17 3
18 8 6 23 7 8 2
17 3 .2 22 4 .3 .4
H(jik)
Obs
Exp
H(ijk)
Obs
Exp
H(ikj)
Obs
Exp
.29 .29 .37
18 19 3
26 22 4
.30 .14 .27
49 822 863
51 818 853
.03 -.01 -.01
818
-.07 -1.60 -24.08 -.05 -.76 -26.87 -4.48
47 6 5 45 4 6 3
43 8 12 43 8 11 7
-.09 .24 .57
819 835 62 820 830 57 29
814 832 65 813 827 60 39
-.01
-.04
-49 .44 .56
Dominance matrix SPD B: SPD A:CDU C:FDP D:NPD E:DKP
2 3 3 1
FDP
NDP
DKP
2
5 6
91 91 91
96 97 93 8
I
-
5 I
.04
-.01 -.OO .05 .25
B
810
-.01
Q
45 4
43 8
17 1 2 12 2 0 6
22 4
-.05 .48 .22 .74 -8.69 .28 .33 1.oo -73.47
.2 17 3 .2 .I
2
s 3
%
% =. r,
Q
e%
3.
2
Adjacency matrix
CDU
2 1 1
-.OO
H(111)
SPD CDU FDP NPD DKP
SPD
CDU
FDP
NDP
96 92 6 1
92 7 1
4 1
1
DKP
h)
01
w
van Schuur
284
Table 4. 1972 Data, Pick 215, N = 1785 Error patterns
ABC ABD ABE ACD ACE ADE BCD BCE BDE CDE
Obs
Exp
Huik)
Obs
Exp
690 7 19 3 0 2 3 0 2 2
677 20 12 7 5 .1 7 5 .1 .1
-.02 .64 -.55 .58 1.oo -14.38
279 23 1 23 1 1 7 19 19 0
283 8 5 8 5 5 20 12 12 5
.58
1.00 -14.38 -14.38
H(iJk) Obs .01 -1.82 .80 -1.82 .80 .80 .64 -55 -.55
1.00
Dominance matrix DKP
E: DKP C:FDP B: SPD A:CDU D:NPD
-
SPD
CUD
NPD
1
0 16
1 39 40 -
1 54
-
82 60 2
44 44 2
17 2
H(ikj)
.o 1 .o1 .o 1
768 768 768 283 283 8 677 677 20 7
.01 .01 -1.82 -.02 -.02
.64 .58
Adjacency matrix
FDP
54
761 761 761 279 279 23 690 690 7 3
Exp
1
82 58
-
DKP DKP FDP SPD CDU NPD
FDP
SPD
39 16 0
43 0
CDU
0 1
0 0
1
NPD
Table 5. 1972 Data, Pick 315, N = 1785 Error patterns
ABC ABD ABE ACD ACE ADE BCD BCE BDE CDE
Obs
Exp
122 45 79 45 79 2 31 3
123 66 58 66 58 1 20 18 .9 .8
5
5
111 patterns
Hulk)
Obs
Exp
H(ijk)
Obs
Exp
H(ikj)
Obs
Exp
.oo
32 34 6 28 10 8 25 7 84 80
36 19 17 18 16 33 19 17 74 74
.ll -.76 .65 -.53 .38 .76 -.33 .58 -.14 -.08
30 1601 1619 1597 1625 54 1673 1639 68 74
34 1607 1609 1607 1610 37 1648 1656 83 84
.12
1595 24 6 30 2 4
1590 18 16 19 17 .2 65 58 .7 .7
.32 -.36 .32 -.36 -.48 -.56 .83
-4.64 -4.89
Dominance matrix DKP E: DKP B: SPD C: FDP A: CDU D: NPD
93 94 92 5
FDP
CDU
NPD
0
1 2
4 7 7
5 94 94 90
3
-
2 2
-.01 .01 -.01 -.47 -.02 .01 .18 .12
44 78 1 1
-.oo -.32 .63 -.56 .88 -19.72 .33 -.36 -.66 -.44
Adjacency matrix
SPD
2 2 2
.oo
H(111)
DKP DKP SPD FDP CDU NPD
SPD
FDF
CDU
96 91 4
91 4
3
5 5 1 0
NPD
van Schuur
286
Table 6. 1980 Data, Pick 215, N = 1316 Error pattcrns
ABC ABD ABE ACD ACE ADE BCD BCE BDE CDE
Obs
Exp
636 1 15 0 3 1 0 3 1 1
626 6 6 9 9 .1 9 9 .1 .I
Huik)
Obs
Exp
-.02 .84 - 1.36
382 17 0 17 0 0 1 15 15 3
378
-.01
4 4 4
-3.44 1.00 -3.44 1.00 1.00 .84 -1.36 -1.36 .66
1.00
0.66 -10.29 2.00
.66 -10.29 -10.29
4 4
6 6 6 9
H(ijk)
D.NPD E: DKP B: SPD C: FDP A: CDU
-
DKP
FDP
1
1
1
68 77 50
0 29 30
1 21 21
1
69 78 49
SPD
Exp
26 1 26 1 26 1 382 382 17 636 636
275 275 275 378 378 4 626 626 6 9
1
0
H(ikj) .05 .05
.05 -.01 -.01
-3.44 -.02 -.02 .84 1.00
Adjnccncy matrix
Dominance matrix NDP
Obs
NDI’
CDU 0 1 50 49
-
NPD DKP SI’D
FDI’ CDU
DKI’
SPD
FDP
1 0 0
48 20
29
CDU
0 0 0 1
-
Table 7. 1980 Data, Pick 3/5, N = 1316
C
Error patterns
ABC ABD ABE ACD ACE
ADE BCD BCE BDE CDE
Obs
Exp
100 20 80 21 81 1 37 2 4 3
100 45 56 45 57 1 18 23 1 .3
Huik)
Obs
-.MI
37 39
.55 -.43 .53 -.43 .09 -1.00 .91 -3.22 -7.68
4
11 4 2 8 1 81 81
Exp
H(iJk)
Obs
40 18 23 5 7 29 5 7 62 78
.08 -1.20 .82 -1.10 .40 .93 -.49 .85 -.30 -.04
9 1167 1174 1167 1202 44 1246 1186 28 56
3 @a
Exp
H(ikj)
Obs
Exp
H(111)
5
12 1169 1168 1186 1180 23 1219 1207 49 62
.23
1166 8 1 36 1 3 20 80 0 1
1163 5 7 18 22 .1
-.00 -.56 .85 -1.04 .96 -29.33 .55 -.43 1.oo -.18
9 3Q
Dominance matrix
.OO -.01
.02 -.02 -.93 -.02 .02
.43 .09
44 56 .2 .8
m
3
% 2
s 0
E
e3 2.
8
Adjacency matrix
~
~~
E:DKP B: SPD C: FDP A: CDU D: NPD
3t?:
111 patterns
DKP
SPD
FDP
CDU
NPD
-
0
0 1
6 8 8
6 95 95 89
2
-
91 93 92 5
3 3 3
1 1
DKP DKF' SPD FDP CDU NF'D
SPD
FDP
CDU
96 89 2
91 4
4
NDP
-
6
6 0 0
N
2
288
van Schuur
References Bechtel, G . G . (1976). Multidimensional preference scaling. The Hague: Mouton. Bennett, J. F., 8z Hays, W. L. (1960). Multidimensional unfolding: Determining the dimensionality of ranked preference data. Pyschometrika, 25, 27-43. Bishop, Y. M. M., Fienberg, S. E., & Holland, P. W. (1975). Discrete multivariate analysis: Theory and practice. Cambridge, MA: MIT Press. Carroll, J. D. (1972). Individual differences and multidimensional scaling. In R. N. Shepard, A. K. Romney, & S. Nerlove (Eds.), Multidimensional scaling: Theory and applications in the behavioral sciences (Vol. 1, pp. 105-155). New York: Seminar Press. Coombs, C. H. (1950). Psychological scaling without a unit of measurement. Psychological Review, 57, 148-158. Coombs, C. H. (1964). A theory of data. New York: Wiley. Coombs, C. H., & Kao, R. C. (1960). On a connection between factor analysis and multidimensional scaling. Psychometrika, 25, 219-231. Coombs, C. H., & Smith, J. E. K. (1973). On the detection of structure in attitudes and developmental processes. Psychological Review, 80, 337-351. Davison, M. L. (1979). Testing a unidimensional, qualitative unfolding model for attitudinal or developmental data. Psychometrika, 44, 179194. Davison, M. L. (1980). A psychological scaling model for testing order hypotheses. British Journal of Mathematical and Statistical Psychology, 33, 123-141. Dijkstra, L., van der Eijk, C., Molenaar, I. W., van Schuur, W. H., Stokman, F. N., & Verhelst, N. (1980). A discussion on stochastic unfolding. Methoden en Data Nieuwsbrief, 5 , 158-175. Gold, E. M. (1973). Metric unfolding: Data requirements for unique solutions and clarification of Schonemann’s algorithm. Psychometrika, 38, 44 1-448. Heiser, W. J. (1981). Unfolding analysis of proximity data. Leiden: University of Leiden. Jansen, P. G. W. (1983). Rasch analysis of uttitindinul data. Nijmegen:
Unfolding the German Political Parties
289
Catholic UniversityRhe Hague: Rijks Psychologische Dienst. Klingemann, H. D. (1972). Testing the left-right continuum in a sample of German voters. Comparative Political Studies, 5, 93-106. Kruskal, J. B., Young, F. W., & Seery, J. B. (1973). How to use KYST. Bell Laboratories, Murray Hill, NJ. Leik, R. K., & Matthews, M. (1968). A scale for developmental processes. American Sociological Review, 54, 62-75. Mokken, R. J. (1971). A theory and procedure of scale analysis. The Hague: Mouton. Dimensionen des Parteikonflikts und Norpoth, H. (1979a). Praferenzordnungen der deutschen Wahlerschaft: Eine Unfoldinganalyse. Zeitschrijt fur Sozialpsychologie, 10, 350-362. Norpoth, H. (1979b). The parties come to order! Dimensions of preferential choice in the West German electorate, 1961-1976. American Political Science Review, 73, 724-736. Pappi, F. U. (1983). Die Links-Rechts Dimension des deutschen Parteiensystems und die Parteipraferenz-Profile der Wahlerschaft. In M. Kaase & H. D. Klingemann (Eds.), Wahlen und politisches System, Analysen aus Anlass der Bundestagswahl 1980 (pp. 422-441). Opladen: Westdeutscher Verlag. Roskam, E. E. (1968). Metric analysis of ordinal data. Voorschoten: VAM. Ross, J., & Cliff, N. (1964). A generalization of the interpoint distance model. Psychometrika, 29, 167-176. Schonemann, P. H. (1970). On metric multidimensional unfolding. Psychometrika, 35,349-366. Sixtl, F. (1973). Probabilistic unfolding. Psychometrika, 38, 235-248. Tversky, A. (1972). Elimination by aspects: A theory of choice. Psychological Review, 86, 542-573. Tversky, A., & Sattath, S. (1979). Preference trees. Psychological Review, 86, 542-573. van Schuur, W. H., & Molenaar, I. W. (1982). MUDFOLD: multiple stochastic unidimensional unfolding. In H. Caussinus, P. Ettinger, & R. Tomassone (Us.), COMPSTAT 1982 (Part I, pp. 419-426). Vienna: Ph y sica-Verlag. van Schuur, H. (1984). Structure in political beliefs, a new model for stochastic unfolding with application to European party activists.
290
van Schuur
Amsterdam: CT Press. Zinnes, J. L., & Griggs, R. A. (1974). Probabilistic multidimensional unfolding analysis. Psychometrika, 39, 327-350.
New Developrncnts in Psychological Choice Modeling G. De Soete, H. Fcger and K. C. Klauer (eds.) 0 Elsevicr Scicncc Publisher R.V. (North-Holland), 1989
29 1
PROBABILISTIC MULTIDIMENSIONAL SCALING MODELS FOR ANALYZING CONSUMER CHOICE BEHAVIOR Wayne S. DeSarbo University of Michigan
Geert De Soete University of Ghcnt, Belgium
Kame1 Jedidi University of Pennsylvania We review the development of two new stochastic multidimensional scaling (MDS) methodologies that operate on paired comparisons choice data and render a spatial representation of subjects and stimuli. In the probabilistic vector MDS model, subjects are represented as vectors and stimuli as points in a T-dimensional space, where the scalar products or projections of the stimulus points onto the subject vectors provide information about the utility of the stimuli to the subjects. In the probabilistic unfolding MDS model, subjects are represented as ideal points and stimuli as points in a T-dimensional space, where the Euclidcan distance between the stimulus points and the subject ideal points provides information as to the respective utility of the stimuli to the subjects. To illustrate the versatility of the two models, a marketing application measuring consumer choice for fourteen actual brands of over-the-counter analgesics, utilizing optional reparameterizations, is dcscribcd. Finally, other applications are identified.
The second author is supported as “Bcvoegdverklaard Navorser” of the Belgian “Nationad Fonds voor Weknschappelijk Onderzock”. This paper is a revised version of an article published in Communication & Cognition, 1987, 20, 17-43.
DeSarbo, De Soete, & Jedidi
1. Introduction
The method of paired comparisons involves presenting a subject two stimuli at a time. The subject is then required to choose one of the two presented stimuli (cf. e.g., David, 1963; Thurstone, 1927). Since this paper is concerned with understanding consumer behavior, we will be using the terminology of consumers (for subjects) and productshrands (for stimuli). The method of paired comparisons can be gainfully applied in consumer behavior research whenever it is not possible or feasible to make continuous measurements of the utilities of a set of products or brands. With J products, each of the I consumers typically makes
(”21
judgments. However, if this number is too large, incomplete designs may be utilized (cf. Bock & Jones, 1968; Box, Hunter, & Hunter, 1978) in order to reduce the number of judgments a consumer must make. Since consumers are often inconsistent when making judgments, probabilistic models are needed for analyzing such paired comparisons data. To display the structure in paired comparisons data, several models have been presented in the psychometric literature which represent the consumers and the products in a joint uni- or multidimensional space. There have been an number of unidimensional scaling procedures proposed to obtain scale values for products from such (aggregated) paired comparisons data (for a survey, see Bock & Jones, 1968; Torgerson, 1958). More recently, multidimensional scaling models have been devised to account for the multidimensional nature of the products. Here, two general classes of models have been typically utilized to represent such preferencehhoice data: vector and unfolding models. A vector or scalar products multidimensional scaling model (Slater, 1960; Tucker, 1960) represents the consumers as vectors and the products are points in a 7’-dimensional space. Figure 1 represents a hypothetical two-dimensional portrayal of such a representation where there are two consumers (represented by two vectors I and 11) and five products (represented by the letters A-E). Here, utility or preference order for a given consumer is assumed to be given by the orthogonal projection of the products onto the vector representing that consumer. For example, for consumer I, product B has the highest utility, then E, then A, then D, and finally C. For consumer 11, the order of utility (from highest to lowest) is A, B, C, D, and E. The goal of the analysis here is to estimate the “optimal” vector directions and product
Analyzing Consumer Choice Behavior
293
coordinates in a prescribed dimensionality. An intuitively unattractive property of the vector model is that it assumes preference or utility to change monotonically with all dimensions. That is, it assumes that if a certain amount of a thing is good, more must be even better. (The isoutility contours therefore are parallel straight lines perpendicular to a consumer’s vector.) According to Carroll (1980). this is not an accurate representation for most quantities or attributes in the real world (perhaps with the exception of money, happiness, and health).
Figure I . Two-dimensional illustration of the vector model (taken from Carroll & DeSarbo, 1985).
There has been some work done concerning analyzing paired comparisons via such vector or scalar products models. Bechtel, Tucker, and Chang (1971) have developed a scalar products model for examining
294
DeSarbo, De Soete, & Jedidi
graded paired comparisons responses (i.e., where consumers indicate which of two products are preferred and to what extent). Cooper and Nakanishi (1983) have devised two logit models (vector and ideal point) for the external analysis of paired comparisons data. Carroll (1980) has proposed the wandering vector model for the analysis of such paired comparisons data. According to this vector model, it is assumed that each consumer can be represented by a vector and that individual consumers will prefer that brand from a pair having the largest projection on that vector. The direction cosines of this vector specify the relative weights the consumer attaches to the underlying dimensions. The wandering vector model assumes that a consumer’s vector wanders or fluctuates from a central vector in such a way that the distribution of the vector termini is multivariate normal. De Soete and Carroll (1983, 1986) have developed a maximum likelihood method for fitting this model and have proposed various extensions of the original model to accommodate additional sources of error as well as graded paired comparisons. Unfortunately, the De Soete and Carroll (1983, 1986) model requires replicated paired comparisons per subject (or group of subjects) to estimate more than one vector. This turns out to be a rather difficult data collection task in consumer behavior research. Without such replications, a group of subjects must be considered as replications of each other. Assuming considerable heterogeneity within the group of subjects, the centroid vector for the group may be estimated with considerably high variances on the terminus. In addition, no provision is available to explore individual differences (with replications) as a function of specified subject differences (such as demographic characteristics). DeSarbo, Oliver, and De Soete (1986) propose an alternative probabilistic vector MDS model which operates on paired comparisons. This model can estimate separate subject vectors without requiring withinsubject replications. A variety of possible model specifications are provided where vectors andor stimuli can be reparameterized as a function of specified background variables. We will describe its model structure as well as its program options, and provide a marketing application. The other major type psychometric model to represent such preference/choice data is the unfolding model (Coombs, 1964). We will discuss only the simple unfolding model of Coombs (1964). In the simple unfolding model, both consumers and products are represented as
Analyzing Consumer Choice Behavior
295
x2
t
@ III (DABEC)
@ I (BACDE)
Figure 2. Two-dimensional illustration of the simple ideal point model (taken from Carroll & DeSarbo, 1985).
points in a T-dimensional space. The points for the consumers represent ideal points, or optimal sets of dimension values. The farther a given product point is from a consumer’s ideal point, the less utility that product has for the consumer. This notion of relative distance implies a Euclidean metric on the space which implies that, in T = 2 dimensions, iso-utility contours are families of concentric circles centered at a consumer’s ideal point. Carroll (1980) demonstrates that the vector model is a special case of this unfolding model where the ideal point goes off to infinity. Figure 2 illustrates a hypothetical two-dimensional space from an unfolding perspective. Here there are three consumers represented by ideal points labeled I, 11, and 111, and five products labeled A-E. The figure specifies the preferencehtility order for each consumer as a function of distance
296
DeSarbo, De Soete, & Jedidi
away from the respective ideal point. The objective in unfolding analysis is to estimate the “optimal” set of ideal points and product coordinates in a prescribed dimensionality. Although several unidimensional stochastic unfolding models have been proposed in the literature (Bechtel, 1968, 1976; Coombs, Greenberg, & Zinnes, 1961; Sixtl, 1973; Zinnes & Griggs, 1974), only three multidimensional unfolding models have been developed to accommodate paired comparisons data. The first one by Schonemann and Wang (1972) and Wang, Schonemann, and Rusk (1975) is based on the well-known Bradley-Terry-Luce model and consequently assumes strong stochastic transitivity. In the multidimensional unfolding model proposed by Zinnes and Griggs (1974), it is assumed that the coordinates of both the consumer and the product points are independently normally distributed with a common variance. Zinnes and Griggs (1974) assume that for each element of the product pair, a consumer independently samples a point from his or her ideal point distribution. In the Zinnes-Griggs model, the probability that consumer i prefers product j to k is defined by
where F ”(v1,v2,h1,h2) denotes the doubly non-central F distribution with degrees of freedom v1 and v2 and noncentrality parameters hl and hz, and di; (respectively &) the Euclidean distance between the mean point of consumer i and the mean point of product j (respectively k). More recently, De Soete, Carroll, and DeSarbo (1986) and De Soete and Carroll (1986) have proposed the wandering ideal point model for the analysis of such paired comparisons data as an unfolding analogue of the wandering vector model. According to this model, it is assumed that each consumer can be represented by an ideal point and that he or she will prefer that product from a pair which has the smallest Euclidean distance from that ideal point. This model assumes that a consumer’s ideal point wanders or fluctuates from a central ideal point in such a way that the distribution of the ideal point coordinates is multivariate normal. De Soete, Carroll, and DeSarbo (1986) have developed a maximum likelihood method for fitting this model and show that it is the only existing probabilistic multidimensional unfolding model requiring only moderate stochastic transitivity.
Analyzing Consumer Choice Behavior
297
Unfortunately, as in the case of the wandering vector model, the De Soete, Carroll, and DeSarbo (1986) model also requires replications of paired comparison matrices per consumer to estimate more than one ideal point. Again, this turns out to be a rather difficult task in terms of data collection. Without such replications, only one centroid ideal point can be estimated for a sample of I consumers. Assuming considerable heterogeneity in the sample, the single centroid ideal point me be estimated with considerably high variances. In addition, no provision is currently available to explore individual differences (with replications) as a function of specified consumer differences (such as demographic characteristics), or have similar reparametrizations on products (vis a' vis attributes or features). DeSarbo, De Soete, and Eliashberg (1987) propose an alternative probabilistic MDS unfolding model which also operates on paired comparisons. This model can estimate separate consumer ideal points without requiring within-consumer replications. A variety of possible model specifications are provided where ideal points and/or product coordinates can be reparameterized as a function of specified background variables which aids in the understanding of consumer choice behavior. We will describe its model and structure as well as its program options, and provide a marketing example.
2. Methodologies
2.1 Research objectives As stated, the objective of this paper is to review the two probabilistic MDS models proposed by DeSarbo, Oliver, and De Soete (1986) and DeSarbo, De Soete, and Eliashberg (1987) for representing paired comparison judgments so that consumers and products can be displayed in a joint space, thus permitting inferences concerning the nature of the consumer choice under investigation. In doing so, two sub-objectives will be addressed. The first concerns the ability to investigate the nature of individual (consumer) differences on preference/choice and its measurement, while the second involves modeling the effect of specific product features on the measurement of preferencehhoice. The discussion section will suggest further potential applications to the investigation of still other latent constructs.
DeSarbo, De Soete, & Jedidi
298
2.2 Notation
Let
i = 1, . . . , I consumers, j,k = 1, . . . ,J brands/products, f = 1, . . . , T dimensions, 1 = 1, . . . , L brand features, n = 1, . . . ,N consumer variables,
b
1 if consumer i finds product j more satisfying than k, = 0 else,
{
= the I-feature/attribute value for the j-th brand, Yi, = the n-th background variable value for the i-th consumer, = the f-th coordinate for consumer i, bit = the f-th coordinate for brand j , aa = the impact coefficient of the n-th consumer variable on the r-th
Hjl
dimension, ylt = the impact coefficient of the I-th brand variable on the f-th dimension. 2.3 The Vector Model DeSarbo, Oliver, and De Soete (1986) define a latent consumer preference or utility construct: Vi, = Ui,
+ ei,,
(1)
where V , = the (latent) utility of brand j to consumer i, T
aitbjt,
uij = t=1
ei, = error. Here, U, refers to a “true” utility or latent preference score for consumer i concerning brand j . It is modeled as equal to the scalar product of the brand coordinates (bjt)and the consumer vector (ad). The order of utility or preference for a given consumer is thus assumed to be given by the projection of the brand into the vector representing that consumer. As is characteristic for a vector MDS model, it also assumes that utility or
299
Analyzing Consumer Choice Behavior
preference changes monotonically with all dimensions. Assume now that:
eij
-N(o,~?)
(2a)
(where 0: is the variance parameter for the i-th consumer),
Cov(eij,eik)= 0, V i , j z k ,
(2b)
Cov(eij,eisk) = 0, V i
(2c)
#
i ‘,j,k.
Suppose that consumer i is presented two brands j and k and is asked to select the one that is “more preferred”. Then
P(6ijk = 1) = P(Vij > Vik)
where
DeSarbo, De Soete, & Jedidi
300
The general form of the likelihood function, assuming independence over subscripts i , j , and k , is
where:
a(’)= P(6ijk = 1). Taking logs, one obtains the log likelihood function: I
In L =
C
J
[6i;k
i=l jd
In
(@(a))
+ (1 - ti,,) In (1 -
@(.)I
(6)
DeSarbo, Oliver, and De Soete (1986) developed a procedure for estimating A = ((ai,)) and B = ((b;[)),given A = ((6,)) and T by maximizing (6). Unlike Carroll’s (1980) original formulation of the wandering vector model, the model here posits no explicit distribution on the consumer vectors. Rather, it is a type of random utility model (Thurstone, 1927) where a normally distributed error term is added to a latent construct which is derived from the vector model. 2.4 The Unfolding Model
DeSarbo, De Soete, and Eliashberg (1987) define a similar latent consumer ‘‘dispreference” (inversely related to preference) or disutility construct:
where V ; = the (latent) disutility of stimulus j to consumer i, m
ei; = error. It is assumed that the three assumptions expressed in ( 2 ) concerning ei, also hold here. Suppose that consumer i is presented two brands j and k and is asked to select the one that is “more satisfying”. Then:
Analyzing Consumer Choice Behavior
301
Similarly,
P(6ijk = 0) = 1 - Q,
I
-
-
The general form of the likelihood function, assuming independence over
Taking logs, one obtains the same log likelihood functions as ..I expression (6). DeSarbo, De Soete, and Eliashberg (1987) similarly use maximum likelihood procedures to estimate A = ((ail)), B = ((bjt)),and ( C i ) given A = ( ( 6 i j k ) ) and T . Contrary to De Soete et al.’s (1986) original formulation of the wandering ideal point model, the model here posits no explicit distribution on the consumer’s ideal point. Rather, it too is a type of random utility model (Thurstone, 1927) where an error term is added to a latent construct derived from the unfolding model. Both the wandering vector and the wandering ideal point models, again, require replications of paired comparisons data for each consumer in order to estimate more than a single vectodideal point, since the consumer vector/ideal point is modeled as being explicitly normally distributed. However, both the wandering vector and ideal point models have the advantage of implying only moderate stochastic transitivity, whereas the present two models, like Thurstone’s
DeSarbo, De Soete, & Jedidi
302
(1927) Law of Comparative Judgment Case V, imply strong stochastic transitivity. 2.5 Program Options
These two probabilistic choice models can accommodate a number of different model specifications and options. One can perform either an internal analysis (where the user estimates both brand points and vectorshdeal points) or an external analysis (where the user can f i x one or more sets of coordinates throughout the analysis). The user can also select among a number of methods for generating starting estimates, including a userdefined option. Also, since the scalar products model is invariant under nonsingular linear transformations, and Euclidean distances are invariant under orthogonal rotations, options are provided to rotate either A or B to principal axes for possible enhancement of interpretation. Perhaps the most valuable program options concern the possibility of reparameterizing consumer vectorhdeal point coordinates and/or stimulus coordinates as functions of prespecified background features and attributes. That is, one may reparameterize consumer vectorshdeal points via: N ail
=
Yinaatlr
(1 1)
n =1
and/or brand coordinates via:
As in CANDELINC (Carroll, Pruzansky, & Kruskal, 1980), Three-Way Multivariate Conjoint Analysis (DeSarbo, Carroll, Lehmann, & O’Shaughnessy, 1982), and GENFOLD2 (DeSarbo & Rao, 1984, 1986), one can use these reparameterization options to examine what impact such featuredatmbutes have on the derived solution. This can often aid in interpreting the resulting solution. Note that, because of potential problems associated with placing such restrictions on consumer vectors as discussed in Carroll et al. (1980) and DeSarbo et al. (1982), an option exists to estimate a multiplicative stretching/shrinking parameter, ~ i on , the righthand side of expression (11) for the vector model. These zi parameters are multiplied by the ai, coordinates (the vectors) and act as stretching or shrinking factors for the vectors.
303
Analyzing Consumer Choice Behavior
As mentioned, these reparametrizations can aid in the interpretation of the derived dimensions (cf., Bentler & Weeks, 1978; Bloxom, 1978; de Leeuw & Heiser, 1980), and can replace the post-analysis property-fitting method often used in an attempt to interpret the results. In addition, as shall be discussed, the imposition of these sets of reparameterizations can provide an effective tool for understanding the nature of preference or choice. It should be noted that when a linear function replaces a product or subject coordinate, the number of background variables in the linear function cannot exceed the number of entities that exist for those variables. For example, if J brands have L attributes, J 2 L since one can only identify at most JT coordinates (excluding rotational indeterminacy). Similarly, if I consumers have N background variables, I 2 N since one can only identify at most IT coordinates (excluding rotational indeterminacy). Thus, in most applications, such reparameterizations actually improve the degrees of freedom of the model by reducing the number of parameters to be estimated. In all cases, the degrees of freedom of any of the models are equal to or greater than those in competing joint space models, as shown below.
2.6 Degrees of Freedom One typically collects I
[ J(J; I)]
independent paired comparison
responses in an application. Defining the degrees of freedom of the model as the effective number of parameters, one can calculate the degrees of freedom for the various models accommodated by the vector model methodology. These are shown in Table 1, where it is assumed one is interested in estimating ~i where appropriate. Note that in all these vector models and adjustment of T 2 is required due to the well-known invariance of such bilinear models to nonsingular transformations (Kruskal, 1978). Also, in vector models where bjl is not reparameterized, one can add a constant c to all stimulus vectors b, and not affect the choice probabilities in equation (3). This necessitates a subtraction of T degrees of freedom. Finally, in estimating zi, one can set the overall scale by fixing one zi = 1. The degrees of freedom in the unfolding models are shown in Table 2, where it is assumed on is interested in estimating cfj for all consumers i.
DeSarbo, De Soete, & Jedidi
304
Tubfe I . Degrees of freedom for the various vector models Model
Effective number of model parameters T(I +J ) - T 2- T
T(I + L ) - T2
bjt
T ( N + J ) + (I - 1) - T 2 -T
Tuble 2. Degrees of freedom for the various unfolding models
Model
Effective number of model parameters
T(Z+J)+(Z- 1 ) - T ( T - 1 ) 2
-*
T(I+L)+(Z-1 ) -
T(T- 1) 2
T(N+J)+(I- 1)-
T(T-1) 2
T(N+L)+(I- 1)-
T(T-1) 2
Note that in all these unfolding models, an adjustment of T(T - 1)/2 is
Analyzing Consumer Choice Behavior
305
required due to the well-known invariance of such unfolding models to orthogonal rotations. Also, in unfolding models where b,, is not reparameterized, one can add a constant c to all brand points b, and not affect the choice probabilities in equation (8). This necessitates a subtraction of T from the degrees of freedom. Finally, in estimating q,one can set the overall scale by fixing one q = 1. 2.7 The Algorithms
Maximum likelihood estimates of the set of specified parameters can be obtained by maximizing In L (or minimizing -In L ) in expression (6) for each of the models. The method of conjugate gradients (Fletcher & Reeves, 1964) is utilized to solve these nonlinear, unconstrained optimization problems. Automatic restarts have been implemented to enhance its convergence properties. This conjugate gradient method is particularly useful for optimizing functions of several parameters since it does not require the storage of any matrices (as is necessary in Quasi-Newton and second derivative methods.) A number of goodness-of-fit measures are computed for these models: 1. The log likelihood function: In L.
2. A deviance measure (McCullagh & Nelder, 1983; Nelder & Wedderburn, 1972): I
D = -2
J
C C 6,k
In (&jk)
+ (1 - 6,)
In (1 - i i j k )
(13)
i=l jck
where ;i,k is the estimated probability that consumer i finds brand j more preferable than brand k as expressed in equations (3) and (8) for the two models. Note that one can test nested models within each model type (vector or unfolding) as the difference between respective deviance measures. This difference is (asymptotically) x2 dismbuted with the difference in model degrees of freedom providing the appropriate x2 test degrees of freedom. This test is appropriate in testing dimensionality as well as the various models described in Tables 1 and 2 because of the nested terms. Recall, this is an asymptotic test, however. One obvious problem with this approach concerns the presence of incidental parameters in the likelihood function ( k . , parameters whose order vary according to the
DeSarho, De Soete, & Jedidi
306
order of A, such as the ail’s) as there are no within-subject replications. In such cases, maximum likelihood estimators may not be consistent. However, preliminary Monte Carlo analysis by DeSarbo, Oliver, and De Soete (1986) and DeSarbo, De Soete, and Eliashberg (1986) suggest that the asymptotic likelihood ratio test might still be useful in this situation.
3. The proportion of correct predictions in A. Here, the proportion of times the solution correctly predicts 6,k is calculated for the total sample as well as for each subject. 3. An Illustration
3.1 Study Design A sample of I = 30 undergraduate business students at the University of Pennsylvania were asked to take part in a small study designed to measure preferences for various brands of existing over-the-counter (OTC) analgesic pain relievers. These respondents were initially questioned as to the brand(s) they currently use (as well as frequency of use) and their personal motivations for why they chose such brand(s) (e.g., ingredients, price, availability, etc.). They were then presented fourteen existing OTC analgesic brands: Advil (A), Anacin (B), Anacin-3 (C), Ascriptin (D), Bayer (E), Bufferin (F), Cope (G), CVS Buffered Aspirin (a generic) (H), Datril (I), Excedrin (J), Nuprin (K), Panadol (L), Tylenol (M), and Vanquish (N). The letters in parentheses are plotting codes that will be used furtheron. Initially, they were presented colored photographs of each brand and its packaging, together with price per 100 tables, ingredients, package claims, and manufacturer (cf. DeSarbo & Carroll, 1985). Table 3 presents summaries of selected portion of the descriptions for each of the brands. Each subjecthonsumer was requested to read this information and return to it at anytime during the experiment if he/she so wished. After a period of time, they were asked to make paired comparison preference judgments for all possible 91 pairs of brands. They were told that they had to choose one from each pair (i.e., no ties were allowed). The presented pairs were randomized for each subject.
Analyzing Consumer Choice Behavior
307
Table 3. OTC Analgesic Brand Descriptions
Brand
Advil Anacin Anacin-3
Ascriptin Bayer Bufferin COP
cvs
Dauil Excedrin Nuprin Panadol Tylenol Vanquish
Mg.of Aspirin 0 400 0 325 325 324 421 500
Mg. of Acetaminaphon 0
0
Mg. of Ibuprofen
Mg. of Caffeine
200 0 0 0
0 32 0 0
0 0 0
0 0
0
500 0 0 0 0 0 500
250
250
0 0 0
0
0
200
0
500 3 25 194
0 0 0
0 221
32 0 0
65 0 0 0
33
Mg. of Buffercd Compounds 0
0 0 150 0 100 75 100 0 0 0 0 0 15
Price in US. dollars
Max. Dosage
6.99 3.91 5.29 3.29 2.69 3.89 5.31 1.99
6 10 8 12 12 10 8 12
5.15
8
4.99 1.59 4.99 3.69 4.99
8 6 8 8 12
3.2 Analysis We conducted an analysis of A in T = 1, 2, and 3 dimensions for both models with oj = 1, V i, and the reparameterization option via B = Hy, where H is defined via the feature variables (standardized to zero mean and unit variance) presented in Table 3. This specification was preferred since H contains features that consumers stated (in a pretest) were important in their choice of a specific OTC analgesic brand. All consumers were encouraged to read this information contained on the color photographs of the brand and packaging prior to the paired comparison task. Vector model results. To determine how well, if at all, preference could be defined in terms of the hypothesized manifest variables, an analysis of the 30 x 14 x 14 A array was conducted in T = 1, 2, and 3 dimensions with the probabilistic vector model using the reparameterization option for the b;"s as a function of the feature matrix presented in Table 3. Reparameterization options for ail were not utilized since the students were fairly homogeneous with respect to demographics (except for sex). Table 4 presents the results for the three analysis. If one assumes that the asymptotic x2 test is valid, it is apparent that the twodimensional solution is the most appropriate for the present data. Even if one were to ignore such a test, the differences in the proportion of correct predictions between the three solutions clearly indicate that the two-
DeSarbo, De Soete, & Jedidi
308
Table 4. Vector model results
Dimensionality T=l T=2 T=3
1nL
D
-1421.82 -1262.73 -1255.89
2843.64 2525.46 2511.78
Difference of deviance measures
Proportion of correct predictions
318.18' 12.68
.532 .781 303
* p I.01
dimensional solution is the most appropriate one for this application. In order to assist in interpreting the two dimensions, Table 5 presents the correlations between each of the feature variables (H) and the brand coordinates (B). Here, the first dimension is dominated by buffered ingredients and the aspirin vs. acetaminophen contrast. The second dimension is most highly associated with price, maximum dosage, and the aspirin vs. ibuprofen contrast. (Note that these correlations will of course vary according to the type of rotation used for interpreting A and/or B). Table 5. Correlations between design variables (H) and derived stimulus coordinates (B) for the vector model
Dimension Feature variable
I
I1 ~~
1 2 3 4 5 6 7
.436 -.588
.249 .3 17 .637 .248 .118
~
~~
~
-.910 .397 .7 19 -.284 -.614 369 -.878
Analyzing Consumer Choice Behavior
309
Figure 3 presents the two-dimensional joint space where the 'yb and the ai, coordinates are both represented as vectors (normalized to equal length for convenience) and the brands as points (A-N). The numbers 1-7 represent the 'yft coordinates for the columdvariables of the features mamx H, while subject vector are represented by an arrow. Figure 3 aptly shows how brands in the second and third quadrants are most preferred given the preponderance of subject vectors in these quadrants. Brands C, I, L, and M, all acetaminophen based brands located in quadrant 2, are highly preferred. The two ibuprofen brands, A and K, are located together in quadrant 1. There seems to be some preference for the simpler types of aspirin like E, B, and H. However, those brands with complex aspirin compounds (e.g., with buffered ingredients), located in quadrant 4, are away from the major concentration of subject vectors. It is interesting to note that the first dimension separates the simple vs. complex ingredient brands, while the second dimension separates the aspirin brands vs. the acetaminophen and ibuprofen brands. Unfolding model results. Table 6 presents the statistical summary results for the probabilistic unfolding model for T = 1, 2, and 3 dimensions and the same reparameterization option for the brand coordinates. Here, too, T = 2 dimensions appear to render a parsimonious solution to the structure in the paired comparisons data. In comparison with the vector model results in Table 4, the unfolding model performs slightly better as should be expected due to the fact that the vector model can be formulated as a special case of the unfolding model. Table 7 presents the correlations between the features mamx (H) and the brand coordinates (B). The first dimension is positively related to aspirin ingredients with caffeine and high maximum dosages, as contrasted with the more expensive ibuprofen brands. The second dimension correlates highly with buffered ingredients and high maximum dosages. Figure 4 presents the resulting joint space of ideal points (*), brands (A-N), and feature vectors (1-7). As shown, dimension one separates the ibuprofen brands (A, K) from the heavy aspirin brands with caffeine (B, J, and N). Note how the acetaminophen brands cluster just to the left of the origin. Dimension 2 separates brands with buffered ingredients @, F, H, N) from those with none. Note the distribution of ideal points throughout the space indicating somewhat diverse preferences, although most ideal points are near the acetaminophen brands as presented in the vector model
DeSarbo, De Soete, & Jedidi
310
-2
-1
0
1
2
Figure 3. Two-dimensional solution for the vector model.
Table 6. Unfolding model results
Dimensionality T=l T=2 T=3 * p I .01
In L
D
-1395.11 -1203.65 -1188.43
2790.22 2407.30 2376.86
Difference of deviance measures
Proportion of correct predictions
382.92* 30.44
.682 .831 .846
Analyzing Consumer Choice Behavior
31 1
Table 7. Correlations between design variables (H) and derived stimulus coordinates (B) for the unfolding model Dimension Feature variable
I
i1
.583 -.023 -.761 .519 .098 -.716 .711
.298 -.151 -.340 -.380 -.882 -.477 -.608
results.
3.1
1.i
b
. . . . . . ... . . . . . . . ... .
1
w
..
"--;--I
"" i . . . . . . . . . . . . . . . . .
c."*
e
..... ..
-2.5
. . . . . . . . . .. . .
-1.5
-0.5
0.5
. .' . . . .w . . . dB
1.5
.: . . . . . . .
2.5
Figure 4. Two-dimensional solution for Lhc unfolding model.
3.5
312
DeSarbo, De Soete, & Jedidi
To compare the two MDS procedures, a canonical correlation analysis was performed on the two sets of brand coordinates as an approximate configuration matching technique given the specific types of rotational indeterminacies in each of the two solutions. Here the canonical correlations were: hl = 0.919 and h2 = 0.359, indicating that a least one of the dimensions extracted from each analysis appears to be similar. These different findings are not uncommon in MDS given the diffeerenr utility models underlying each procedure. 4. Discussion
Two new MDS methodologies for the spatial analysis of paired comparisons data have been presented and contrasted to existing methodologies in terms of model structure, stochastic assumptions, input requirements, and model specification options. The models, their assumptions, and the variety of different reparameterization options available for various analyses have been described. A marketing application of the methodologies to a measurement problem in consumer preference was described in some detail for OTC analgesics where seven hypothesized attribute determinants were measured via a features mamx. Two analysis were performed where brand coordinates were directly reparameterized in terms of the features matrix. The procedures each produced a two-dimensional joint space with at least one common brand dimension. These methodologies should prove equally viable for various other applications where paired comparisons are collected. They can aid in similar measurement problems concerning latent, unobservable constructs such as utility, similarity, risk, intention/attitude, etc. With the various reparameterization options for ai, and bjt, additional flexibility is provided for investigating determinants of both individual differences ( e g , demographic information) and stimulus differences. Such reparameterization options would also be valuable in utilizing these methodologies for an external type of preference MDS analysis generally referred to as conjoint analysis. Here, a design matrix is presented defining object stimulus profiles, and a dominance judgment such as preference or intention to buy is asked in paired comparison form. The methodology then derives the contribution of each (orthogonal) object design variable to the resulting derived dimensions. This has proven to be
Analyzing Consumer Choice Behavior
313
of substantial interest to the marketing profession for product design applications (see DeSarbo et al., 1982).
References Bechtel, G. G. (1968). Folded and unfolded scaling from preferential paired comparisons. Journal of Mathematical Psychology, 5 , 333-357. Bechtel, G. G. (1976). Multidimensional preference scaling. The Hague: Mouton. Bechtel, G. G., Tucker, L. R., & Chang, W. (1971). A scalar product model for the multidimensional scaling of choice. Psychometrika, 36, 369-388. Bentler, P. M., & Weeks, D. G. (1978). Resmcted multidimensional scaling models. Journal of Mathematical Psychology, 17, 138-151. Bloxom, B. (1978). Constrained multidimensional scaling in N spaces. Psychometrika, 43, 397-408. Bock, R. D., & Jones, L. V. (1968). The measurement and prediction of judgment and choice. San Francisco: Holden-Day. Box, B. E. P., Hunter, W. G., & Hunter, J. S. (1978). Statistics for experimenters. New York: Wiley. Carroll, J. D. (1980). Models and methods for multidimensional analysis of preferential choice (or other dominance data). In E. D. Lantermann & H. Feger (Eds.), Similarity and choice. Bern: Hans Huber. Carroll, J. D., Pruzansky, S . , & Kruskal, J. B. (1980). CANDELINC: A general approach to multidimensional analysis of many-way arrays with linear constraints on parameters. Psychometrika, 45, 3-24. Carroll, J. D., & DeSarbo, W. S. (1985). Two-way spatial models for modeling individual differences in preference. In E. C. Hirschman & M. B. Holbrook (Eds.), Advances in consumer research (Vol. 12). Association for Consumer Research. Coombs, C. H. (1964). A theory of data. New York: Wiley. Coombs, C. H., Greenberg, M., & Zinnes, J. L. (1961). A double law of comparative judgment for the analysis of preferential choice and similarities data. Psychometrika, 26, 165-171. Cooper, L. G., & Nakanishi, M. (1983). Two logit models for external analysis of preference. Psychometrika, 48, 607-620. David, H. A. (1963). The method of paired comparisons. New York:
314
DeSarbo, De Soete, & Jedidi
Hafner. De Leeuw, J., & Heiser, W. (1980). Multidimensional scaling with restrictions on the configuration. In P. R. Krishnaiah (Ed.), Multivariate Analysis V. Amsterdam: North-Holland. DeSarbo, W. S., Carroll, J. D., Lehmann, D. R., & O’Shaughnessy, J. ( 1982). Three-way multivariate conjoint analysis. Marketing Science, I, 323-350. DeSarbo, W. S., De Soete, G., & Eliashberg, J. (1987). A new stochastic multidimensional unfolding model for the investigation of paired comparisons in consumer preference/choice data. Journal of Economic Psychology, 8, 357-384. DeSarbo, W. S., Oliver, R. L., & De Soete, G. (1986). A probabilistic multidimensional scaling vector model. Applied Psychological Measurement, 10, 79-98. DeSarbo, W. S., & Rao, V. R. (1984). GENFOLD2: A set of models and algorithms for the GENeral unFOLDing analysis of preferenceldominance data. Journal of ClassGcation, I , 147-186. DeSarbo, W. S., & Rao, V. R. (1986). A constrained unfolding model for product positioning. Marketing Science, 5 1-19. De Soete, G., & Carroll, J. D. (1983). A maximum likelihood method for fitting the wandering vector model. Psychometrika, 48, 553-566. De Soete, G., & Carroll, J. D. (1986). Probabilistic multidimensional choice models for representing paired comparisons data. In E. Diday, Y. Escouffier, L. Lebart, 3. Pages, Y. Schektman, & R. Tomassone (Eds.), Data analysis and informatics IV. Amsterdam: North-Holland. De Soete, G . , Carroll, J. D., & DeSarbo, W. S. (1986). The wandering ideal point model: A probabilistic multidimensional unfolding model for paired comparisons data. Journal of Mathematical Psychology, 30, 28-41. Fletcher, R., & Reeves, C. M. (1964). Function minimization by conjugate gradients. Computer Journal, 7, 149-154. Kruskal, J. B. (1978). Factor analysis and principal components. I. Bilinear models. In W. H. Kruskal & J. M. Tanur (Eds.), International encyclopedia of statistics (Vol. 1). New York: The Free Press. McCullagh, P., & Nelder, J. A. (1983). Generalized linear models. New York: Chapman & Hall. Nelder, J. A., & Wedderburn, R. W. M. (1972). Generalized linear
Analyzing Consumer Choice Behavior
315
models. Journal of the Royal Statistical Society, A, 135, 370-384. Schonemann, P. H., & Wang, M.-M. (1972). An individual difference model for the multidimensional analysis of preference data. Psychometrika, 37, 275-305. Sixtl, F. (1973). Probabilistic unfolding. Psychometrika, 38, 235-248. Slater, P. (1960). The analysis of personal preference. British Journal of Statistical Psychology, 13, 119-135. Thurstone, L. L. (1927). A law of comparative judgment. Psychological Review, 34, 273-286. Torgerson, W. S. (1958). Theory and methods of scaling. New York: Wiley. Tucker, L. R. (1960). Intra-individual and inter-individual multidimensionality. In H. Gulliksen & S . Messick (Eds.), Psychological scaling: Theory and applications. New York: Wiley. Wang, M.-M., Schonemann, P. H., & Rusk, J. G. (1975). A conjugate gradient algorithm for the multidimensional analysis of preference data. Multivariate Behavioral Research, 10, 45-99. Zinnes, J. L., & Griggs, R. A. (1974). Probabilistic multidimensional unfolding analysis. Psychometrika, 39, 327-350.
This Page Intentionally Left Blank
New Developments in Psychological Choice Modeling G.De Soete, H. Feger and K. C. Klauer (eds.) 0 Elsevier Science Publisher B.V. (North-Holland), 1989
317
PROBABILISTIC CHOICE BEHAVIOR MODELS AND THEIR COMBINATION WITH ADDITIONAL TOOLS NEEDED FOR APPLICATIONS TO MARKETING Worfgang Gaul University of Karlsruhe, FR Germany The paper emphasizes the fact that for applications of choice behavior models based on the Thurstonian scaling approach additional tools for the interpretation of the results obtained can improve the readiness of adoption of such techniques. Among others multidimensional generalizations, a combination of external and internal analyses, and the use of stochastic ideal points and stochastic preference directions are discussed. Examples within the framework of the recently developed factorial model, probabilistic ideal point model and probabilistic vector model approaches are given. For marketing applications the need of interpretation-supporting tools within these probabilistic choice behavior models is demonstrated.
1. Introduction In this paper the discussion of probabilistic choice behavior models is resmcted to the recently developed so-called generalized Thurstonian scaling approaches originating from generalizations of the LCJ (Law of Comparative Judgment, Thurstone, 1927). Examples are the factorial model (Takane, 1980), the wandering vector model (De Soete & Carroll, 1983), and the wandering ideal point model (Blickenholt & Gaul, 1984b, 1986; De Soete, Carroll, & De Sarbo, 1986). The underlying procedure for gathering choice behavior data is the method of paired comparisons. A bibliography concerning references on the paired comparisons literature up
This research has been supported by the Deutsche Forschungsgerneinschafi. This paper is a revised version of an article publishcd in Communication & Cognition, 1987, 20, 77-92.
318
Gaul
to the middle of the seventies is given in Davidson and Farquhar (1976), for further details about the historical development of the ideal point model and vector model approaches, see e.g., Bijckenholt and Gaul (1986). Thus, no attempt of a chronological review of the here described approaches will be made. Instead, it will be shown how the interpretation of the results obtained by different models is affected by the consideration of multidimensional generalizations, a combination of external and internal analyses, the use of stochastic ideal points and stochastic preference directions or segmentation aspects which some of these models allow for. 2. Some Paired Comparisons Based Choice Behavior Models
The Law of Comparative Judgment is the basis for the following generalizations. Thurstone (1927) associated with every choice object 01 from a set of L interesting objects a discriminal process of the form
u, = U[ + El,
1 = 1, . . * , L
(1)
where ul is the (unknown) utility scale value of object 01 and the stochastic vector E’ = (el, . . . , EI, . . , , EL) describes random error which obeys a multivariate normal distribution according to E
- N(O,x) with c = (Ojk), O1[ = Of.
(2)
Z denotes the covariance matrix of
E and 0: the variance of EL. If it is assumed that within a paired comparisons experiment object 0, is preferred to object ok whenever the difference of the corresponding discriminal processes
ujk = uj
- uk
(3)
is positive then - using standard statistical knowledge - the probability of prefemng object 0, to object ok is given by
where 0 denotes the standard normal distribution function and d7k = Oj2 -tOk 2 - 2Ojk = Gj’ + 02 - 2rjkOjOk
(5)
Probabilistic Choice Models for Marketing
319
describes the variance of u,k. Following Gulliksen (1958) the di; are called comparatal dispersions whereas the 01 are called discriminal dispersions; the r,k are correlation coefficients for the corresponding discriminal processes. If S is the total number of subjects judging the pairs of objects the observed choice proportions j j k =
njk 7 can be used to estimate p;k
where
njk is the number of subjects who have preferred object oj to object ok. One gets L ( L - 1)/2 equations of the form Uj
- Uk
= @)-lG;k)d;k
(6)
where @-' is the inverse transformation of the normal distribution function. Equation (6) should be solved with respect to the unknown parameters of the Thurstone model. However, as the number ( L + 4) ( L - 1)/2 of unknown ul-, (31- and r;k-parameters exceeds the number of equations further restrictions such as
ol I CJ > 0, i.e., all comparatal disper-
LCJ case V (r;k sions are equal),
I 0,
LCJ case I11 ;(k' correlations),
= 0, i t . , all discriminal processes have zero
have to be imposed to ensure the solution of (6) unless sufficient estimates for the d;k are available. Of course, many discussions have been published whether or not the LCJ case 111, case V, etc., assumptions reflect reality well enough. These arguments will not be repeated here, instead, the following recently developed generalizations of the Thurstonian approach by Takane (1980), De Soete and Carroll (1983), De Soete et al. (1986), Bkkenholt and Gaul (1984b, 1986) will be described and it will be shown how e.g. the use of stochastic ideal points and stochastic preference directions can help with the interpretation of the results obtained. The classical LCJ allows a one-dimensional scaling of the objects via the corresponding ul-values. A link between multidimensional scaling techniques - which work up measures of (dis-)similarity - and the classical one-dimensional LCJ can be given by the comparatal dispersions (5). This fact has been emphasized e.g., by Sjoberg (1975, 1980) who carried
Gaul
320
out various investigations to find empirical evidence for the conjecture that r,k is in agreement with the rated similarity between the objects 0, and Ok. Indeed, if similarity and, hence, correlation between discriminal processes increase, comparatal dispersion decreases according to equation (5). Additionally, comparatal dispersions satisfy distance properties as was pointed out by Halff (1976). Takane (1980) proposed what is called the factorial model of comparative judgment. Assumed that with an L x M matrix X the covariance mamx C allows the following decomposition
Z=Xx’ then, instead of the o j k (or rjk) the coefficients of the X matrix are now the unknown parameters. Given this reparametrization the comparatal dispersions can be rewritten in the form
m=l
m=l
m =1
m=l
and may be interpreted as distances between the objects 0, and ok in an M-dimensional space in which object 01 is represented by the I-th row XI’ = ( x l l , . . . , x h , . . . , x I ~of) the matrix X . If X is an L x L diagonal matrix one gets the known LCJ case I11 assumptions. The factorial model is an interesting multidimensional generalization of the LCJ, see also Figure 1 and the arguments given there, but it does not allow a simultaneous representation of the subjects and objects in a joint space. To incorporate subjects explicitly in further model developments assume that a subject (or a homogeneous group of subjects) s (s = 1, . . . , S) who is asked to select that object 01 from a pair of objects that is of the greatest utility to him decides according to his utility imaginations Us[where Usl is described by a random variable, then, with the utility-difference
the probability of prefemng
0,
to o k is given by
Up to now the description follows the discussion which we already know from Thurstone’s LCJ. The only thing which remains to be done is to
32 1
Probabilistic Choice Models for Marketing
specify
the
utility terms in an appropriate manner. Let X I ’ = ( ~ 1 1 ., . . , X ~ M )denote the point in the joint M-dimensional space describing object 01 and R, = (R,1,
. . . , R,M) with R,
- N(e,,l)
(10)
denote the random point or vector taking into account the inter- and intra-individual irregularities within the paired comparisons choice behavior of subject s where I describes the identity matrix (of course, the approach allows for RZ - N ( e J , c , l ) , c, > 0, via the transformation R , = l/&R: with e, = l / K e J in analogy with De Soete and Carroll, 1983). In the PVM (Probabilistic Vector Model) (De Soete & Carroll, 1983) R, describes a stochastic preference vector for which it is assumed that subject s samples a vector realization r, from the multivariate normal distribution each time when a pair of objects oj, ok is presented and prefers 0, to ok whenever the relation
holds which corresponds to the fact that x, has a greater projection on the sampled preference vector direction than xk. In a similar manner, in the PIPM (Probabilistic Ideal Point Model) (Bockenholt & Gaul, 1984b, 1986; De Soete et al., 1986) R, describes a stochastic ideal point where, again, it is assumed that subject s selects an ideal point realization r, from the multivariate normal distribution each time when a pair of objects o,, ok is presented and chooses that object which is closest to the actually sampled ideal point realization, i.e. prefers 0, to ok whenever the relation ( X j - rs)’ws(xj
- rs) < (xk - rs>’ws(xk - rs>
(12)
holds where W , is an M x M diagonal matrix of weights. With
Us[= Xl‘R,
(134
or
us1 = - ( X I - R,)‘W,(Xl
- R,)
as corresponding utility values the probability of prefemng object object ok is given by
(13b) 0,
to
Gaul
322 usj
with us/ = xl‘e, and (7)) for PVM or
d!k j
- usk
= <xi- xk)‘(xj - xk) (see also the factorial model
us[ = -(XI - es)’ws(xi - e s ) and d:jk = 4 ( ~-j xk)’W:(~j - x k ) for PIPM where d& are, again, the corresponding variance terms of Of course, the dimensionality M will have to be chosen in such a way that a “best” fit to the data under consideration is possible but variation within the data not accounted for by the chosen dimensionality can be taken into consideration via
uy
= USf+ Es[
where
describes additional random disturbances. With ‘El = u(c)- U ( E ) USJ SJ sk
one, then, gets
Now, notice that in the classical Thurstonian LCJ one was interested in the uf-, of- and rjk-parameters to solve a one-dimensional scaling problem. Here, the coordinates of the es- and xl-vectors - together with the main diagonal of W, (in case of PIPM) and the o-parameter if one uses (14b) - are needed to solve a multidimensional scaling problem in which the different dimensions together with the expected ideal points or expected preference directions may provide the researcher with additional
Probabilistic Choice Models for Marketing
323
support for the interpretation of choice behavior phenomena. The question is whether these attempts to improve interpretation possibilities are sufficient or whether further interpretation-supporting tools are needed within the just described probabilistic choice behavior approaches as will be demonstrated within the following examples.
3. Examples Three examples known from the literature will be used to describe how evaluations of solutions achieved by the different probabilistic choice behavior models can be interpreted and how interpretation possibilities can be improved.
3.1 Rumelhart and Greeno (1971) Study The first example is based on the Rumelhart and Greeno (1971) data. This example is not typical for the marketing area but it has already been used by several other researchers (Bkkenholt & Gaul, 1984; De Soete, 1983; De Soete et al., 1986; Takane, 1980) so that an easy illustration and comparison of known results is possible. In the Rumelhart and Greeno (1971) study 234 college students had to judge the names of well-known personalities with respect to the question “With whom would you prefer to spend an hour of conversation?” on a paired comparisons basis. Three groups of personalities were selected for this experiment. The first group consisted of three politicians (Harold Wilson [HW], Charles De Gaulle [CD], Lyndon B. Johnson Lq), the second group of three athletes (Johny Unitas [JU],Carl Yastrzemski [CY], A. J. Foyt [An), and the third group of three movie stars (Brigitte Bardot [BB], Elisabeth Talyor [Em, Sophia Loren [SL]). Table 1 contains a summary of selected analyses of the Rumelhart and Greeno (1971) data obtained by our computer routines. Similar although not as comprehensive results can be found in De Soete (1983), De Soete et al. (1986) and Takane (1980). Already a cursory glance at Table 1 shows the following: -
The LCJ case V model is not appropriate.
-
The LCJ case In results have a bad fit.
324
Gaul
Tablc 1. Summary of selected analyses on the Rumelhart and Greeno (1971) data (see Bockenholt & Gaul, 1984b). Test against null model
Model specification dimensionality
hL
effective no. of parameters
Null model
-5310.65
36
LCJ Case V LCJ Case I I I
-5351.76 -5327.24 1
Wandering ideal point model (w, = 1. V m )
2 2 3 3 Weighted wandering ideal point model (wltlw12 = 1.23)
a2
add."
add. add.
2
xz
d.f.
p-value
AIC (-1oaOO)
8 16
82.22 33.18
28 20
4.001 0.032
719.52 686.48
-5339.14
9
56.98
27
4001
696.28
-5315.69 -5315.38 -5312.32 -5312.02
16 17 23 24
10.08 9.46 3.34 2.74
20 19 13 12
0.967 0.965 0.996 0.997
663.38 664.76 670.64 672.04
-5315.55
19
9.8
19
0.957
669.10
693.30
-5315.12
20
8.94
18
0.961
670.24
Wandering vector model
2
-5317.68
16
14.06
20
0.827
667.36
Factorial model
2
-5312.99
22
4.68
14
0.989
669.98
2
a
add.
Additional oz parameter.
- Multidimensional versions of the generalized Thurstonian scaling approaches increase the fit significantly. A comparison of the factorial and the wandering (probabilistic, stochastic) vector model was already done by De Soete (1983) who showed that in the underlying example the wandering vector model outperforms the factonal model in terms of the AIC what can also be checked by Table 1. In Bockenholt and Gaul (1984b) (see Tab. 1) and in De Soete et al. (1986) the wandering ideal point model was shown to have the best AIC-value of all considered model versions. Notice, that incorporation of the additional random disturbances parameter o2 could not add much to the interpretation of the data. These arguments, however, establish only part of the mathematical justification for this class of probabilistic choice behavior models. For interpretation purposes the advantages of the simultaneous representation of expected ideal points or expected preference directions together with the choice objects become clear from a look at Figures 1, 2,
Probabilistic Choice Models for Marketing
LJ
CY
JU
**
+9
*?
*S
.a ”
325
AF
BB
SL
Figure I . Two-dimensionalsolution of the Rumelhart and Greeno data according to the factorial model.
3. In Figure 1 one recognizes that the reparamemzation of the factorial model allows a representation of the data in a two-dimensional space which according to Table 1 and in comparison to the poor results of the LCJ case I11 and LCJ case V solutions has a very good fit. The object points cluster in the way they should do and allow to recognize separable groups of politicians and athletes and movie stars where the latter two groups seem to be more similar to each other than to the remaining group of politicians as could be expected. However, an obvious question within a choice behavior experiment “What persons have been mostly preferred” can not be answered from Figure 1. The projections of the object points on the expected preference vector in Figure 2 as well as the distances of the object points from the expected ideal point in Figure 3 allow an answer to the above question.
Gaul
326
!
.s
CD
CY *>
* J
HW
LJ .a
ET
.S
SL
Figure 2. Two-dimensional solution of the Rumelhart and Grecno data according to the wandering vector model. The students in the sample put a conversation with the politician Lyndon B. Johnson at the head of the rank whereas the athlete Carl Yastrzemski takes the last position in this ranking. Of course, once such a ranking is known another obvious question would be “Why?”. Up to now nothing is known about the dimensions in which the solutions are presented. Are there connections between the dimensions in the used joint space and salient aspects of the choice behavior approaches or is the dimensionality just something needed for fitting the mathematical model? One of the many attempts to use the dimensions of the chosen space for interpretation purposes of the solutions obtained is contained in Heiser and de Leeuw (1981) who reanalyzed data originally collected by Sjoberg (1967) from studies by Ekman (1962). In the underlying paired comparisons experiment offenses had to be judged
Probabilistic Choice Models for Marketing
321
with respect to “immorality” and in the two-dimensional space directions concerning reckless vs. intentional causes and a graduation of damage caused by the offenses could be distinguished. While these interpretations are derived without collecting additional information about objects (and subjects) additional tools which could be combined with the up to now described evaluation possibilities should be of interest.
CY
-* JU *a
AF
.z
HW CD
.I
.expected ideal point
Figure 3. Two-dimensional solution of the Rumelhart and Grecno data according to the wandering ideal point model.
3.2 Kaas (1977) Study The second data set is taken from Kaas (1977) who collected paired comparisons data for 10 stimuli consisting of seven hair spray brands and three amounts of money (see Table 2). Hundred customers of a supermarket were asked to judge each brand-brand combination and each brand-money combination with respect to the question “Which stimulus
Gaul
328
possesses a higher worth?” Concerning the money-money combinations it was assumed that a higher amount of money will be preferred, see Table 3 for the aggregated paired comparisons data. Table 2 . Ten choice objects from Kaas (1977).
Amounts of moncy
Brands 1 2 3 4 5 6 7
Elidor Gard Poly Pretty hair Riar Shamtu Taft
8 9 10
2.00DM 2.50 DM 3.00 DM
Table 3. Aggregated paired comparisons matrix for the choice objects in Table 2. 0 72 46 29 37 56
64 31 37 52
28 0 18 21 23 34 28 14
19 41
54 52 0 34 41 64 66 44 43 67
71 79 66 0 53 72 68 44 59 68
63 77 59 47 0 63 71 43 57 69
44 66 36 28 37 0 48 31 35 50
36 72 34 32 29 52 0 29 35 47
69 86 56 56 57 69 71 0 100
100
63 81 57 41
43 65 65 0 0 100
48 59 33 32 31 50 53 0 0 0
Already a cursory glance at Table 4 shows the following:
- The LCJ case V model is not appropriate. -
The LCJ case 111 results have already a non-rejectable fit.
-
All other one-dimensional model versions and also the twodimensional wandering vector model (for which a comparison with the two-dimensional factorial model is interesting) have a bad fit.
Probabilistic Choice Models for Marketing
329
Thus, the attempt to incorporate price as dominant dimension to support the interpretation of the paired comparisons choice behavior data within one-dimensional Thurstonian scaling models - as was done in the original study - was not fully successful. Again, the additional random disturbances parameters 02 could not increase the fit of the models significantly, and was omitted (except for the one-dimensional wandering ideal point and wandering vector model approaches) in Table 4. Tab& 4. Summary of selected analyses on the Kaas (1977) data.
Model specification dimensionality
OZ
C8.W m
Weighted wandering ideal pint model Wandering vector model
Factorial model
1
add.'
2 3 2 1 2 3 2
I~L.
effedive no. of
xz
d.f.
p-value
AIC (-5000)
parameters
Null model LCJ Case V Wandering ideal pin1 model
Test against null model
add.
-2653.68 -2773.30 -2669.02
45 9 18
239.24 30.68
36 27
<0.001 0.284
397.36 564.60 374.04
-2755.44
10
203.52
35
~0.001
530.88
-2664.87 -2659.26 -2662.08
18 26 21
22.38 11.16 15.80
27 19 24
0.717 0.918 0.895
365.74 370.52 366.16
-2772.57
10
237.78
35
-2674.22 -2659.19 -2659.73
18
26 25
41.08 11.02 12.10
27 19 20
~0.001 565.14
0.040 0.923 0.912
384.44 370.38 369.46
Additional uz parameter.
However, Kaas (1977) study demonstrates one of the attempts to use interpretation supporting tools within probabilistic choice behavior models. Figure 4 shows the version of the wandering ideal point model for which the value of the AIC measure indicated the best fit. The graphical representation reveals a perceived similarity of Poly [3], Pretty hair [4] and Riar [5] and of Elidor [l], Shamtu [6] and Taft [7]. Additionally, the amounts of money [8, 9, 101 form a separate cluster as it should be. Concerning the given data one might wish to incorporate three isopreference price lines (circles, ellipsoids,...) which could help with the interpretation of different price levels and/or ideal prices for the group of products under study. Here, the position of Card [2] suggests a lower bound of 3.00 DM for a possible price of this brand, whereas a lower price bound of 2.00
330
Gaul
DM and an upper price bound of 2.50 DM is indicated for e.g., Pretty hair [4] and Riar [ 5 ] .
Figure 4. Two-dimcnsional solution of the Kaas data according to the wandering ideal point model.
This price argumentation also seems to reveal that a low price segment (consisting of Poly [3], Pretty hair [4] and Riar [5]) and a high price segment (consisting of Elidor [l], Gard [2], Shamtu [6], and Taft 171 where Gard [2] has a somewhat more exceptional position) can be separated via the wandering ideal point approach. Of course, if price is the only dominant dimension of the perceptual map a one-dimensional approach would be sufficient but the results of the one-dimensional models in Table 4 do not support this hypothesis. Thus, in a further stage it is meaningful to interpret the dimensions or interesting directions in a yielded multidimensional configuration. For the Kaas (1977) data, unfortunately, no
Probabilistic Choice Models for Marketing
331
additional external stimulus data were at our disposal. Thus, no further interpretations will be given here.
3.3 B’kkenholt and Gaul (1984a) Study The third example is taken from Bijckenholt and Gaul (1984a). Paired comparisons data from 36 subjects (15 participants of continued education at the Chamber of Industry and Commerce of Karlsruhe and 21 students of an introductory course of marketing at Karlsruhe University) who had to judge 10 cognac print ads were collected with respect to the question “Which advertisement do you prefer?”. A short break was made before starting the final part of the experiment in which the subjects were asked to respond to the print ads - this time shown one by one - for a second time. In this second part of the experiment rating scales concerning dimensions - which had been pretested in Gaul (1984) - like credibility, sympathy, extravagance, preciousness, insignificance, etc., had to be filled in. Table 5 contains a summary of selected analyses of the Bijckenholt and Gaul (1984a) data obtained by our computer routines. Already a cursory glance shows the following: the LCJ case V and case I11 results have already a non-rejectable fit. Thus, is it really necessary to look for more sophisticated models? Indeed, the one-dimensional utility scale values (properly normalized) of different models do not show dramatical changes as can be seen from Figure 5. Now, to answer the question “Why?” it is time to remember that a distinction is useful between internal analysis of paired comparisons data in which only the pairwise choice data are taken for further evaluation, and external analysis where a priori given object dimensions are used to explain the pairwise preference judgments in terms of these object dimensions and that object dimensions such as credibility, sympathy, extravagance, preciousness, insignificance, etc., are available from the second part of the experiment. Thus, besides external analyses based on Cooper and Nakanishi (1983) a factor analysis was conducted revealing that two factors which were easy
332
Gaul
Table 5. Summary of selected analyses on the Bkkenholt and Gaul (1984a) data.
Test against null model
Model specification dimensiondity Null model LCJ Case V LCJ Case III Factorial model Wandering vector model
a2
2
InL
effective no. of parameters
-1024.89 -1043.72 -1036.56 -1028.53
x’
d.f.
p-value
AIC (-2000)
45 9 18 25
37.66 23.34 7.28
36 27 20
0.393 0.667 0.995
139.78 105.44 109.12 107.06
1
add.”
-1043.54
10
37.30
35
0.364
107.08
2 2 3 3
add. 0 add. 0
-1031.41 -1033.51 -1027.42 -1027.83
19 18 27 26
13.02 11.24 5.06 5.88
26 27 18 19
0.984 0.925 0.999 0.998
100.80 103.02 108.84 107.66
’ Additional oz parameter to interpret and accounted for 91% of the variance could be extracted, see Table 6. The first factor accounts for 58% and indicates a more emotional-erotic dimension while the second factor presents an image dimension which is characterized by attributes such as extravagance and preciousness. Of course, it should be of interest to perform a multidimensional generalized Thurstonian scaling approach and check whether such a model will reveal similar results. In Figure 6 the solution of a twodimensional probabilistic vector model is shown together with the directions of the object dimensions derived from the external data of the second part of the experiment via simple regression. The expected preference vector direction is situated in the angle between the first and second factor directions and in close agreement with the first factor direction. Thus, it seems that within the used dimensions the dominant dimension has been found. The projection of the object points on the expected preference vector reproduces the one-dimensional solution of Figure 5 but the combination of external and internal analyses together with appropriate multidimensional versions of generalized Thurstonian scaling approaches gives a better understanding of the choice behavior phenomena of the underlying experiment. Within the used object dimensions attributes like “sympathetic”, “credible” and “stimulating” seem to make up such a
Probabilistic Choice Models for Marketing
333
dominant dimension that even one-dimensional approaches give an appropriate fit. For further explanation the reader is referred to Biickenholt and Gaul (1984a). least preferred
7
most preferred
8
4
1
3
10 5
Figure 5. Utility scale values.
Table 6. Factor analysis results.
Dimension
Factor 1
Factor 2
sympathetic credible extravagant stimulating precious Variance explained by each factor
0.986 0.969 -0.163 0.967 0.092 2.88
0.050 0.003 0.911 0.012 0.922 1.68
Up to now no segmentation aspects (with respects to subjects) have been discussed although the ideal point and vector model approaches allow such a proceeding. Of course, the obvious segmentation into the two groups of students and participants of courses at the Chamber of Industry and Commerce was analyzed but did not show results worthwhile to be reported here. In general, it should be of interest to analyze
334
Gaul
probabilistic ideal point and probabilistic vector model approaches with several expected ideal points and expected preference vector directions. Here, different expected ideal points and expected preference vectors could correspond to clusters of subjects and it would be of interest how the fixed positions of the object points in the joint space have to be interpreted with respect to the possibly different viewpoints of the simple clusters. A first description of such segmentation aspects within the mentioned generalized Thurstonian scaling approaches is given in Gaul and Bkkenholt (1986).
Bisquit
+9
Courvoisier
+a
+7 Hennessv
R emv + 6
native
+
'+
mean preference vector
+5
Martell 10 Martell
Courvoirier
+3 I
R emy ++ II
Figure 6. Two-dimensional solution of the Bockenholt and Gaul (1984a) data according to the wandering vector modcl with addi-
tional explanatory object dimensions.
Probabilistic Choice Models for Marketing
335
4. Conclusions
Some advantages of recently developed probabilistic choice behavior models are illustrated. Of course, ideal point and vector models are wellknown in the area of choice behavior. One advantage of stochastic versions of these models which are here described in the framework of generalized Thurstonian scaling is their ability to allow statistical estimation and testing and to provide goodness-of-fit criteria. Sometimes, however, additional external information is needed for interpretation purposes of the solutions obtained. Whereas an internal analysis uses the paired comparisons data, only, an external analysis is based on further object atmbutes by which the choice behavior phenomena can be explained. Marketing examples indicate how internal and external analyses can be successfully combined. Efforts to incorporate segmentation aspects in these models are under consideration.
References Biickenholt, I., & Gaul, W. (1984a). A multidimensional analysis of consumer preference judgments related to print ads. In Methodological advances in marketing research in theory and practise. Copenhagen: EMACESOMAR. Biickenholt, I., & Gaul, W. (1984b). A wandering ideal point approach to the analysis of choice behavior. Discussion paper 68, Institute of Decision Theory and Operations Research, University of Karlsruhe
(TH). Biickenholt, I., & Gaul, W. (1986). Analysis of choice behavior via probabilistic ideal point and vector models. Applied Stochastic Models and Data Analysis, 2 , 209-226. Cooper, L. G., & Nakanishi, M. (1984). Two logit models for external analysis of preferences. Psychometrika, 48, 607-620. Davidson, R. R., & Farquhar, P. H. (1976). A bibliography on the method of paired comparisons. Biomerrics, 32, 241-252. De Soete, G. (1983). On the relation between two generalized cases of Thurstone’s Law of Comparative Judgment. Marhematiques et
Gaul
336
Sciences humaines, 81, 41-51. De Soete, G., & Carroll, J. D. (1983). A maximum likelihood method for fitting the wandering vector model. Psychometrika, 48, 553-566. De Soete, G., Carroll, J. D., & DeSarbo, W. S. (1986). The wandering ideal point model: A probabilistic multidimensional unfolding model for paired comparisons data. Journal of Mathematical Psychology, 30, 28-41.
Ekman, G . (1962). Measurement of moral judgment: A comparison of scaling methods. Perceptual and Motor Skills, 15, 3-9. Gaul, W, (1978). Zur Methode der paarweisen Vergleiche und ihrer Anwendung im Marketingbereich. Methods of Operations Research, 35, 123-139. Gaul, W. (1984). Datenanalyse auf der Basis von Ordinalurteilen. Studien zur Klassifrkation, 15, 142-152. Gaul, W., & Btickenholt, I. (1986). Generalized Thurstonian scaling of advertising messages. Discussion paper, Institute of Decision Theory and Operations Research, University of Karlsruhe (TH). Gulliksen, H. (1958). Comparatal dispersion, a measure of accuracy of judgment. Psychometrika, 23, 137-150. Halff, H. M. (1976). Choice theories for differentially comparable alternatives. Journal of Mathematical Psychology, 14, 244-246. Heiser, W. J., & De Leeuw, J. (1981). Multidimensional mapping of preference data. Mathematiques et Sciences humaines, 73, 39-96. Kaas, K. P. (1977). Empirische Preisabsatzfunktionen bei Konsumgutern. Berlin: Springer. Rumelhart, D. L., & Greeno, J. G. (1971). Similarity between stimuli: An experimental test of the Luce and Restle choice models. Journal of Mathematical Psychology, 8, 370-381. Sjoberg, L. (1967). Successive intervals scaling of paired comparisons. Psychometrika, 32, 297-308. Sjoberg, L. (1975). Uncertainty of comparative judgments and multidimensional structure. Multivariate Behavioral Research, I I , 207-218. Sjoberg, L. (1980). Similarity and correlation. In E. D. Lantermann & H. Feger (Eds.), Similarity and choice. Bern: Hans Huber. Takane, Y. (1980). Maximum likelihood estimation in the generalized case of Thurstone’s model of comparative judgment. Japanese Psychological Research, 22, 188-196.
Probabilistic Choice Models for Marketing
337
Thurstone, L. (1927). A law of comparative judgment. Psychological Review, 34, 273-286.
This Page Intentionally Left Blank
339
AUTHOR INDEX
A Akaike, H., 133, 135 Anderson, N. H., 178,203 Arabie, P., 58, 73 Arbuckle, L., 145, 157 Aronson, E., 30 Arrow, J. K., 241, 257 Ashby, F. G., 180, 203
B Bagman, R. E., 141, 157 Barlow, R. E., 9, 28 Barnes, S. H., 111, 119 Bartholomew, D. J., 9, 28 Beaver, R. J., 101, 119 Bechtel, G. G., 78, 80, 95-96, 124, 135, 147, 151, 157, 179, 189, 203, 261, 288, 293, 296, 313 Becker, G. M., 125, 135 Bennett, J. F., 59, 72, 263, 288 Bentler, P. M., 63, 72, 110, 119, 148, 150, 157, 303, 313 Beyle, H. C., 4, 28 Birnbaum, M. H., 179, 203 Bishop, Y . M. M., 267, 288 Black, D., 240, 257 Block, H. D., 100, 119, 212, 219 Bloxom, B., 63, 73, 141, 145, 151, 157, 303, 313 Bock, R. D., 141, 145, 148, 151,
157-158, 292, 313 Biickenholt, I., 180, 204, 317-319, 321, 323-324, 331, 333-336 Bonett, D. G., 110, 119 Borg, I., 11, 27-28, 30, 73 Bossuyt, P. M., 77, 80, 89, 95-97, 207-208,218-219 Box, B. E. P., 292, 313 Bradley, R. A., 101-105, 120, 124, 135, 139, 145, 158 Brernner, J. M., 9, 28 Browne, M. W., 59, 74, 148, 153, 158 Brunk, H. D., 9, 28 Bulgren, W. G., 183-184, 204 Bush, R. R., 120, 205, 219-220
C Capozza, D., 125, 137 Carroll, J. D., 4, 10-11, 26, 28, 30, 58-60, 62, 73-74, 123-125, 131, 136, 141, 144-145, 158, 161-163, 166-167, 171, 174176, 180, 189, 204, 263, 288, 293-297, 300, 302, 306, 313314, 317, 319, 321, 336 Caussinus, H., 289 Chandler, J. P., 188, 204 Chang, J. J., 163, 175 Chang, W., 147, 157, 293, 313
340
Chave, E. J., 3, 5, 31 Chernikova, N. V., 93, 97 Christoffersson, A., 149, 152, 158-159 Clark, L. A., 73, 166, 175 Cliff, N., 260, 289 Cohen, A., 237-238, 242, 246-251, 255, 257 Coombs, C. H., 1-2, 6, 8, 14, 28, 33, 38, 42, 45, 47, 51-52, 5455, 58-59, 73, 77-78, 80, 84, 95, 97, 116, 119-120, 123-125, 136, 162, 175, 180-181, 204, 214, 219, 222, 227, 234-235, 237-238, 240-241, 245, 251, 260, 262, 288, 294, 296, 313 Cooper, L. G., 294, 313, 331, 335 Coxon, A., 116 Cronbach, L. J., 27-28 Croon, M., 78, 80, 95, 97, 99, 180, 204, 208, 219 Cunningham, J. P., 164, 175
David, H. A., 292, 313 Davidon, W. C., 69, 73 Davidson, J. A., 59, 73 Davidson, R. R., 318, 335 Davis, R. L., 257 Davison, M. L., 60, 73, 243, 257, 262-263, 267, 271, 288 Debreu, G., 143, 158 Degreef, E., 176, 258 DeGroot, M. H., 125, 135 De Leeuw, J., 8, 19, 27-29, 59, 63, 73, 75, 141, 144, 158, 303, 314, 326, 336
Author index
Dempster, A. P., 108, 120 DeSarbo, W. S., 57-58, 60, 62-63, 68-69, 73-74, 78, 97, 123-124, 136, 145, 158, 161-162, 166167, 175-176, 180, 204, 291, 294, 296-298, 300-302, 306, 313-314, 317, 336 De Soete, G., 123-125, 128, 131132, 134-136, 141, 144-145, 149, 158, 161-162, 165-167, 171, 175-176, 180, 204, 291, 294, 296-298, 300-301, 306, 314, 317, 319, 321, 323-324, 335-336 Diday, E., 136, 176, 314 Dijkstra, L., 83, 97, 261, 273, 288 Ducamp, A., 222, 235 Dykstra, R. L., 89, 97
E Edwards, A. L., 4, 28 Efron, B., 27-28 Eilers, P., 257 Eisler, H., 178, 204 Ekman, G., 326, 336 Eliashberg, J., 161, 175, 297, 300-301, 306, 314 Ellis, M., 180, 205 Enslein, K., 30 Escouffier, Y., 176, 314 Ettinger, P., 289
F Falmagne, J. C., 222, 235 Farquhar, P. H., 318, 335 Feger, H., 33, 73, 137, 158, 160, 175, 235, 313, 336
Author Index
Feigin, P. D., 237-238, 242, 246251, 255, 257 Ferguson, L. W., 4, 28 Festinger, L., 235 Fienberg, S. E., 101, 105, 120, 267, 288 Fishbein, M., 98 Fletcher, R., 69, 74, 305, 314 Forlano, G., 4, 30 Furnas, G. W., 162, 166-167, 176
G Galanter, E., 120, 205, 219-220 Gaul, W., 180, 204, 317-319, 321, 323-324, 331, 333-336 Ghurye, S., 119-120, 219 Gill, P. E., 7 1, 74 Gleser, G. C., 27-28 Gold, E. M., 263, 288 Gold, M., 21, 23, 28 Goldberg, A. I., 237-238, 249-250, 253, 257 Golledge, R., 205 Goodman, L. A., 241, 257 Green, P. E., 58, 74 Greenacre, M. J., 59, 74 Greenberg, M., 78, 95, 97, 124, 136, 296, 313 Greenberg, M. G., 78, 92, 94, 97 Greeno, J. G., 125, 134, 136, 143, 151, 159, 323 Griggs, R. A., 78, 95, 98, 124, 137, 161, 176, 180, 183-184, 205, 261, 290, 296, 315 Gulliksen, H., 75, 315, 319, 336 Guttman. L., 5-6, 27, 29, 148, 158
34 1
H Halff, H. M., 125, 136, 142-143, 158, 170, 176, 320, 336 Hays, W. L., 59, 72, 263, 288 Hefner, R. A., 180-181, 183, 204 Heiser, W. J., 3, 8-9, 12, 14, 1719, 21, 26-29, 59-60, 63, 7374, 141, 144, 158, 263, 288, 303, 314, 326, 336 Henry, N. W., 101, 120 Himmelblau, D. M., 71, 74 Hinckley, E. D., 4, 29 Hoeffding, W., 119-120 Hoffding, W., 219 Hoffman, D. L., 78, 97 Hohle, R. H., 145, 158 Holbrook, M. B., 313 Holland, P. W., 267, 288 Horst, P., 29 Hovland, C. I., 4, 29-30 Hunter, J. S., 292, 313 Hunter, W. G., 292, 313
I Indow, T., 146, 159 Inglehart, R., 111, 113, 120
J Jansen, P. G. W., 80, 95, 97, 261, 288 Jech, T., 100 Jedidi, K., 161, 175, 291 Johnson, J., 63, 75 Johnson, S. C., 163, 176 Jones, L. V., 141, 145, 148, 151, 158, 292, 313
342
Author Index
Joreskog, K. G., 110, 120, 141, 146, 148, 150, 152, 154, 159-160
Kaas, K. P., 327, 329-330, 336 Kaase, M., 235, 289 Kao, R. C., 260, 288 Katz, D., 235 Kemeny, J. G., 241, 257 Kendall, M. G., 184, 190, 204, 248, 257 Klingemann, H. D., 235, 280-281, 289 Krantz, D. H., 125, 136, 143, 159, 209, 212, 219, 222 Krishnaiah, P. R., 30, 73-74, 135, 3 14 Kruskal, J. B., 9-11, 29-30, 59, 62, 73-74, 198, 204, 263, 289, 302-303, 313-314 Kruskal, W. H., 314 Kurtz, R., 180,205
L Laird, N. M., 108, 120 Lantermann, E. D., 73, 137, 158, 160, 175,235,313, 336 Larntz, K., 101, 105, 120 Lazarsfeld, P. F., 101, 120 Lebart, L., 176, 314 Lehmann, D., 62, 74, 302, 314 Lehmann, E. L., 243, 257 Leik, R. K., 262, 289 Levine, M. V., 8, 30 Liken, R., 5 , 30 Lindzey, G., 30
Lingoes, J. C., 28, 30, 59, 63, 73, 75, 235 Looman. C. W. N., 27, 31 Luce, R. D., 100, 104, 120, 124, 136, 145-146, 159, 162, 176, 205, 209, 212-213, 217, 219220,222
M MacKay, D. B., 177, 180, 182183, 190, 197, 204-205 Madow, W., 119-120, 219 Mallows, C. L., 238, 246, 250, 257 Mann, H., 119-120, 219 Marschak, J., 100, 119, 125, 135, 212, 219 Mattenklott, A., 106, 120 Matthews, M., 262, 289 McArdle, J. J., 154, 159 Mckullagh, P., 305, 314 McDonald, R. P., 148, 154, 159 McFadden, D., 162, 176 McGuire, W. J., 7, 30 Messick, S., 75, 315 Meulman, J., 17, 26-29 Mieschke, K, J., 106, 120 Mokken, R. J., 268-269, 272, 289 Molenaar, I. W., 83, 97, 267, 288-289 Muller, M. W., 17, 30 Murray, W.,71, 74 Muthen, B., 149, 152, 159
N Nagel, E., 219 Nakanishi, M., 294, 313, 331, 335
343
Author Index
Nanda, H., 27-28 Nebergall, R. E., 31 Nelder, J. A., 305, 314 Nerlove, S., 28, 73, 75, 174, 204, 288 Noma, E., 63, 75 Norpoth, H., 221, 228, 235, 274, 277, 280-281, 289 Nugent, J. H., 145, 157
0 Oliver, R. L., 161, 175, 294, 297298, 300, 306, 314 Olkin, I., 119-120, 219 Orth, B., 221-222, 225, 232, 234-235 O’Shaughnessy, J., 62, 74, 302, 3 14
P Pages, J., 176, 314 Pappi, F. U., 221, 228, 232, 234235, 274, 280, 289 Patnaik, P. B., 184, 205 Pendergrass, R. N., 101-102, 104105, 120 Pintner, R., 4, 30 Plackett, R. L., 101, 120 Powell, M. J. D., 69, 74 Pruzansky, S., 62, 73, 166, 175, 302, 313
R Rajaratnam, N., 27-28 Ralston, A., 30 Ramaswamy, V., 167, 175 Ramsay, J. O., 78, 95, 97, 133,
136 Rao, V. R., 57-58, 60, 63, 68-69, 74, 302, 314 Redner, R. A., 119-120 Reeves, C. M., 305, 314 Restle, F., 143, 145, 159 Robertson, T., 89, 97 Romney, A. K., 28, 73, 75, 174, 204, 288 Roskam, E. E., 10, 30, 45, 59, 75, 77, 80, 93, 95-97, 116, 207208, 218-219, 228, 235, 257258, 263, 289 Ross, J., 260, 289 Rubin, D. M., 108, 120 Rumelhart, D. L., 125, 134, 136, 143, 151, 159, 323 Rusk, J. G . , 124, 161, 176, 296, 315 Russo, J. E., 125, 137, 143, 160
S Saaty, T. L., 179, 205 Sattath, S., 137, 151, 160, 262, 289 Schefft!, H., 179, 205 Schektman, Y., 176, 314 Schonemann, P. H., 59, 75, 78, 95, 97, 124, 136, 161, 176, 189, 205, 263, 289, 296, 315 Seery, J. B., 10, 30, 59, 74, 198, 204, 263, 289 Sehr, J., 106, 120 Shepard, R. N., 28, 73, 75, 174, 204, 288 Sherif, C. W., 4-5, 19-20, 31, 96, 98
344
Author Index
Sherif, M., 4, 29-31, 96, 98 Shocker, A. D., 60, 75 Sixtl, F., 78, 80, 95-96, 98, 124, 136, 261,289, 296, 315
Sjoberg, L., 125, 137, 143, 151, 159-160, 179, 205, 319, 336 Slater, P., 292, 315 Smith, J. E. K., 262, 288 Snell, J. L., 241 Sorbom, D., 148, 150, 159-160 Spence, I., 59-60, 75 Srinivasan, V., 60, 75 Stephenson, W., 27, 31 Stokman, F. N., 83, 97 Stouffer, S. A., 29 Strauss, D., 145, 160 Stuart, A., 184, 190, 204 Suck, R., 257 Suppes, P., 100, 120, 180, 205, 209, 212-215, 217, 222
326,
154,
302, 315, 317-318, 337
Timmerman, H., 205 Tomassone, R., 289, 314 Torgerson, W. S., 5, 15, 31, 59, 75, 292, 315
Townsend, J. T., 180, 203 Tucker, L. R., 58, 75, 147, 157, 292-293, 313, 315 Tversky, A,, 125, 137, 143, 145, 151, 160, 209, 219, 222, 262, 289
V Van Blokland-Vogelesang, R. A. W., 237, 243, 245, 248, 257-258
Van Buggenhaut, J., 176 Van der Eijk, C., 83, 97, 288 Van der Ven, A. H. G. S., 245, 258 183, 219,
T Takane, Y., 59, 75, 139, 141, 144-145, 151, 160, 317, 319320, 323, 336 Tanur, J. M., 314 Tarski, A., 219 Ter Braak, C. J. F., 27, 31 Terry, M. E., 120, 124, 135, 145, 158 Thrall, R. M., 257 Thurstone, L. L., 1-3, 5, 29, 31, 140-141, 144, 160, 167-168, 176, 180-181, 205, 292, 300-
Van Schuur, W. H., 83, 97, 243, 258-259, 267, 288-289 Verbeek, A., 257 Verhelst, N., 83, 97
Walker, H. F., 119-120 Wang, M.-M., 78, 95, 97, 124, 136, 161, 176, 296, 315
Wedderburn, R. W. M., 305, 314 Weeks, D. G., 63, 72, 303, 313 Wegener, E., 204 Wilf, H. S., 30 Wold, H., 160 Wolff, R. P., 180, 205 Wright, F. T., 89, 97 Wright, M. H., 71, 74
Author Index
Y Yanai, H., 152, 160 Yellott, J. I., 102-103, 120-121 Young, F. W., 10, 17, 30-31, 59, 74-75, 198, 204, 263, 289
345
Z Zermelo, E., 100, 121 Zinnes, J. L., 78, 95, 97-98, 124, 134-137, 161, 176-177, 180, 182-184, 190, 197, 204-205, 212, 219, 222, 261, 290, 296, 313, 315
This Page Intentionally Left Blank
347
SUBJECT INDEX
A abundant data, 100 ACOVS, 139, 141 additive tree, 164 adjacency matrix, 272 admissible pseudo-distances, 16 AIC statistic, 134 algebraical relational system, 209 algorithm, alternating leastsquares, 57, 63 algorithm, branch-and-bound, 90 algorithm, EM, 108 algorithm, generalized Fisher scoring, 132 alternating least-squares algorithm, 57, 63 AMOMS, 150 analgesic pain reliever, over-thecounter, 306 analysis, covariance structure, 110 analysis, external, 4, 59, 294, 302, 317, 331 analysis, factor, 131, 150, 260 analysis, hierarchical cluster, 265 analysis, internal, 10, 59, 302, 317, 33 1 analysis, latent class, 101 analysis, latent structure, 99, 101 analysis, loglinear, 139 analysis, multivariate conjoint, 62,
302 analysis of covariance structures, 139, 141 analysis of moment structures, 150 analysis, parallelogram, 262 analysis, path, 150 approach, nonmetric, 9 approach to ties, primary, 25 approach, unconditional, 12 approximation, Patnaik, 184 assimilation effect, 4 asymptotically distribution free method, 153 attitude scaling, 3 attitude scaling, Thurstonian, 3 axiom, Luce’s choice, 104 axiomatic foundations of unfolding, 221 axiomatic measurement theory, 207
B backtracking, 244 Bernoulli trial, 81, 209 betweenness relation, 223 binary choice, 209 bivariate normal distribution, 197 bootstrap, 27 boundary, 34 boundary, orientation of a, 47
348
Subject Index
boundary triangle, 47 branch-and-bound algorithm, 90 branch-and-bound method, 244
C CANDELINC, 62, 302 category points, 6 central chi-square, 184 central F distribution, 184 characteristic monotonicity, 83, 272 chi-square, central, 184 chi-square distribution, noncentral, 183 choice axiom, Luce’s, 104 choice, binary, 209 choice model, individual, 100 choice model, stochastic multidimensional spatial, 162 choice theory, probabilistic, 207 choices, replicated, 178 class analysis, latent, 101 class, latent, 247 class model, latent, 99 cluster analysis, hierarchical, 265 coefficient of scalability, 268 comparative judgment, factorial model of, 320 comparison judgment, pair, 139 comparisons data, paired, 77-78, 123, 161, 291 comparisons, graded paired, 294 comparisons, method of paired, 317 comparisons, paired, 209 configuration, recovered, 200 conjoint analysis, multivariate, 62,
302 conjugate gradient method, 71, 305 consensus ranking, 237 consumer preference, latent, 298 contingency table, 34 contour, iso-utility, 293, 295 contrast effect, 4 COSAN, 148 covariance structure analysis, 110 covariance structures, analysis of, 139, 141 cross validation, 3, 27 cumulative scaling, multiple unidimensional, 272 curve, single-peaked, 6 curve, unimodal, 6
D data, abundant, 100 data, dichotomized, 259 data, dichotomous, 259 data, models of, 214 data, paired comparisons, 77-78, 123, 161, 291 data, pick anyln, 259 data, pick kln, 259 data, possible realization of empirical, 213 data, possible realization of the, 209 decision, majority, 237 decomposition rule, 52 degeneracy, 54, 260 degenerate solution, 10, 63 degeneration, 3, 6, 11 descent method, steepest, 71
Subject Index
design, incomplete, 292 deviance measure, 305 dichotomized data, 259 dichotomous data, 259 direction, stochastic preference, 317 discriminal dispersion, 3 19 discriminal process, 144, 318 dispersion, discriminal, 319 distance distribution, 15 distance model, 99 distance model, weighted, 263 distribution, bivariate normal, 197 distribution, central F , 184 distribution, distance, 15 dismbution, double-exponential, 103 dismbution, doubly noncentral F , 124, 183, 296 distribution free method, asymptotically, 153 distribution, multivariate normal, 125, 177, 180 distribution, noncentral chi-square, 183 distribution, normal, 184 distribution, smooth, 18 dominance matrix, 272 dominant J-scale, 227 double-exponential distribution, 103 doubly noncentral F distribution, 124, 183, 296
E effect, assimilation, 4 effect, contrast, 4
349
effect, similarity, 143 election, presidential, 19 EM algorithm, 108 empirical data, possible realization of, 213 EQS, 148 equations, linear structural, 150 error-in-variable regression, 154 estimate, initial, 180 estimate, maximum likelihood, 86, 101, 106, 132, 177, 180, 247, 305 exponential, negative, 8 external analysis, 4, 59, 294, 302, 317, 331 F distribution, central, 184 F distribution, doubly noncentral, 124, 183, 296
F facet theory, 27 factor analysis, 131, 150, 260 factorial model, 144 factorial model of comparative judgment, 320 Fechnerian theory, 212 Fisher scoring algorithm, generalized, 132 Fisherian information matrix, 133 foundations of unfolding, axiomatic, 22 1 function, penalty, 166 function, real-valued utility, 208 function, single-peaked, 8 function, single-peaked preference, 237
350
Subject Index
G general unfolding model, 59 generalizability theory, 27 generalized Fisher scoring algorithm, 132 GENFOLD, 58 GENFOLD2,57, 302 graded paired comparisons, 294 gradient method, conjugate, 71, 305 GSTLJN, 166
H Hefner model, 180 hierarchical cluster analysis, 265 hierarchical tree, 163 higher order metric scale, 38 homomorphic mapping, 209
I ideal point model, probabilistic, 321 ideal point model, stochastic, 317 ideal point model, wandering, 123, 125, 149, 161-162, 296, 327 incomplete design, 292 index, uncomparability, 142 individual choice model, 100 inequality, ultrametric, 163 information matrix, Fisherian, 133 initial estimate, 180 internal analysis, 10, 59, 302, 317, 33 1 invariant, order, 3, 10, 17 inverse, Moore-Penrose, 133, 173 I-scale, 223, 243, 263
isochrest, 8 isotonic region, 33 isotonic regression, 9 iso-utility contour, 293, 295
J jackknife, 27 joint probabilistic midpoint unfolding theory, 84 J-scale, 237, 261 J-scale, dominant, 227 J-scale, qualitative, 45, 222, 237, 239, 263 J-scale, quantitative, 45, 222, 237, 239, 263 judgment, factorial model of comparative, 320 judgment, numerical, 178 judgment, pair comparison, 139 judgment, preference ratio, 177-178 judgment, same-different, 180 judgment school, social, 4 judgment, similarity, 180
K KYST, 10, 198
L ladder for technical occupations, prestige, 237 latent class, 247 latent class analysis, 101 latent class model, 99 latent consumer preference, 298 latent structure analysis, 99, 101 least-squares algorithm,
Subject Index
alternating, 57, 63 length tree, path, 164 likelihood estimate, maximum, 86, 101, 106, 132, 177, 180, 247, 305 likelihood seriation, maximum, 78 Liken-type rating scale, 260 linear programming, 238, 245 linear structural equations, 150 LISCOMP, 149 LISREL, 148 log linear model, 105 logistic regression, 27 loglinear analysis, 139 Luce’s choice axiom, 104
M majority decision, 237 mapping, homomorphic, 209 mapping, reverse, 58, 68 matrix, adjacency, 272 matrix, dominance, 272 matrix, Fisherian information, 133 maximum likelihood estimate, 86, 101, 106, 132, 177, 180, 247, 305 maximum likelihood seriation, 78 MDS model, probabilistic unfolding, 291 MDS model, probabilistic vector, 29 1 mean normalized stress, 9 measure, deviance, 305 measure of recovery, 198 measurement theory, axiomatic, 207 method, asymptotically distribution
35 1
free, 153 method, branch-and-bound, 244 method, conjugate gradient, 71, 305 method of paired comparisons, 317 method, quasi-Newton, 7 1 method, steepest descent, 71 method, unfolding, 222 metric scale, higher order, 38 midpoint, 34 midpoint monotonicity, 82 midpoint unfolding theory, joint probabilistic, 84 MINIRSA, 10, 45, 115, 228 moderate stochastic transitivity, 125, 143, 168, 296 moderate utility model, 123, 125, 170 moment structures, analysis of, 150 monotonic regression, 9 monotonicity, characteristic, 83, 272 monotonicity, midpoint, 82 monotonicity restriction, 9 Moore-Penrose inverse, 133, 173 MUDFOLD, 260 multidimensional spatial choice model, stochastic, 162 multidimensional unfolding model, 57 multidimensional unfolding model, probabilistic, 123-124 multiple scaling, 265 multiple unidimensional cumulntive scaling, 272 multiple unidimensional unfolding,
352
Subject Index
259 multiple-judgment sampling, 139-140 multivariate conjoint analysis, 62, 302 multivariate normal distribution, 125, 177, 180
N negative exponential, 8 Newton-Raphson procedure, 101 noncentral chi-square distribution, 183 noncentral F distribution, doubly, 124, 183, 296 noncentrality parameter, 183 nonmetric approach, 9 nonscalability, 264 normal distribution, 184 normal distribution, bivariate, 197 normal distribution, multivariate, 125, 177, 180 normalization, 9 normalized stress, mean, 9 numerical judgment, 178 numerical relational system, 209
0 occupations, prestige ladder for technical, 237 of covariance structures, analysis, 139, 141 order invariant, 3, 10, 17 order metric scale, higher, 38 order, weak, 225 orientation of a boundary, 47 orientations, value, 111
over-the-counter analgesic pain reliever, 306
P pain reliever, 65 pain reliever, over-the-counter analgesic, 306 pair comparison judgment, 139 paired comparisons, 209 paired comparisons data, 77-78, 123, 161, 291 paired comparisons, graded, 294 paired comparisons, method of, 317 parallelogram analysis, 262 parameter, noncentrality, 183 party preferences, political, 221 path analysis, 150 path length tree, 164 Patnaik approximation, 184 penalty function, 166 Pendergrass-Bradley model, 102 pick anyln data, 259 pick kln data, 259 point model, probabilistic ideal, 321 point model, stochastic ideal, 317 point model, wandering ideal, 123, 125, 149, 161-162, 296, 327 point, statement, 6 points, category, 6 political party preferences, 221 possible realization of a theory, 208 possible realization of empirical data, 213 possible realization of the data,
Subject Index
209 preference direction, stochastic, 317 preference function, single-peaked, 237 preference, latent consumer, 298 preference ratio judgment, 177-178 preference strength, 7 preferences, political party, 22 1 PREFMAP, 59 PREFMAP2, 59 presidential election, 19 prestige ladder for technical occupations, 237 prestige, social, 237 primary approach to ties, 25 probabilistic choice theory, 207 probabilistic ideal point model, 32 1 probabilistic midpoint unfolding theory, joint, 84 probabilistic multidimensional unfolding model, 123-124 probabilistic unfolding, 80, 207 probabilistic unfolding MDS model, 291 probabilistic unidimensional unfolding, 77 probabilistic vector MDS model, 29 1 probabilistic vector model, 321 procedure, Newton-Raphson, 101 procedure, unfolding, 33 process, discriminal, 144, 318 programming, linear, 238, 245 property-fitting, 303 pseudo-distances, 9
353
pseudo-distances, admissible, 16
Q Q-methodology, 27 qualitative J-scale, 45, 222, 237, 239, 263 qualitative solution, 34 quantitative J-scale, 45, 222, 237, 239, 263 quantitative solution, 34 quasi-independent model, 267 quasi-Newton method, 7 1
random sample, 180 random utility model, 102, 162 ranking, 99 ranking, consensus, 237 ranking model, strict utility, 102 rating scale, Likert-type, 260 ratio judgment, preference, 177-178 raw stress, 9 realization of a theory, possible, 208 realization of empirical data, possible, 213 realization of the data, possible, 209 real-valued utility function, 208 recognition response, 180 recovered configuration, 200 recovery, measure of, 198 region, isotonic, 33 regression, error-in-variable, 154 regression, isotonic, 9 regression, logistic, 7-7
Subject Index
354
regression, monotonic, 9 relation, betweenness, 223 relational system, algebraical, 209 relational system, numerical, 209 reliever, over-the-counter analgesic pain, 306 reliever, pain, 65 replicated choices, 178 reproducibility, 5 resampling, 3, 27 response model, unimodal, 3 response, recognition, 180 response strength, 7 restriction, monotonicity, 9 restriction, smoothness, 3 reverse mapping, 58, 68 rule, decomposition, 52
S same-different judgment, 180 sample, random, 180 sampling, multiple-judgment, 139-140 sampling scheme, 83 scalability, 2i scalability, coefficient of, 268 scalability, simple, 143 scalar products model, 292 scale, higher order metric, 38 scale, Likert-type rating, 260 scaling, attitude, 3 scaling, multiple, 265 scaling, multiple unidimensional cumulative, 272 scaling, Thurstonian, 95, 317 scaling, Thurstonian attitude, 3 scheme, sampling, 83
school, social judgment, 4 scoring algorithm, generalized Fisher, 132 seriation, 77 seriation, maximum likelihood, 78 shifted single-peakedness, 8 similarity effect, 143 similarity judgment, 180 SIMOPT, 245 simple scalability, 143 simple unfolding model, 59 single-dipped, 20 single-peaked curve, 6 single-peaked function, 8 single-peaked preference function, 237 single-peakedness, 223 single-peakedness, shifted, 8 smooth distribution, 18 smooth succession, 6, 18 smooth transformation, 18 smoothness restriction, 3 social judgment school, 4 social prestige, 237 solution, degenerate, 10, 63 solution, qualitative, 34 solution, quantitative, 34 spatial choice model, stochastic multidimensional, 162 SSTUN, 166 stability, 27 starting value, 188 statement point, 6 statistic, AIC, 134 steepest descent method, 71 stochastic ideal point model, 317 stochastic multidimensional spatial
Subject Index
choice model, 162 stochastic preference direction, 3 17 stochastic transitivity, moderate, 125, 143, 168, 296 stochastic transitivity, strong, 124, 143, 168, 296 stochastic transitivity, weak, 210 stochastic tree unfolding, 161 stochastic tree unfolding model, 162 stochastic unfolding model, 261 strength, preference, 7 strength, response, 7 stress, mean normalized, 9 stress, raw, 9 strict utility ranking model, 102 strict utility theory, 212 strong stochastic transitivity, 124, 143, 168, 296 structural equations, linear, 150 structure analysis, covariance, 110 structure analysis, latent, 99, 101 structure, unfolding, 221, 224 structures, analysis of covariance, 139, 141 structures, analysis of moment, 150 substitutability, 2 10 succession, smooth, 6, 18 system, algebraical relational, 209 system, numerical relational, 209
T table, contingency, 34 technical occupations, prestige ladder for, 237 theory, axiomatic measurement,
355
207 theory, facet, 27 theory, Fechnerian, 212 theory, generalizability, 27 theory, joint probabilistic midpoint unfolding, 84 theory, possible realization of a, 208 theory, probabilistic choice, 207 theory, strict utility, 212 theory, unfolding, 77 theory, utility, 207 theory, weak utility, 212 Thurstonian attitude scaling, 3 Thurstonian scaling, 95, 317 ties, primary approach to, 25 TORSCA, 198 transformation, smooth, 18 transitivity, moderate stochastic, 125, 143, 168, 296 transitivity, strong stochastic, 124, 143, 168, 296 transitivity, weak stochastic, 210 tree, additive, 164 tree, hierarchical, 163 tree, path length, 164 tree, ultrametric, 163 tree unfolding model, stochastic, 162 tree unfolding, stochastic, 161 trial, Bernoulli, 81, 209 triangle, boundary, 47
U ultrametric inequality, 163 ultrametric tree, 163 uncomparability index, 142
Subject Index
356
unconditional approach, 12 UNFOLD,237,243 unfolding, axiomatic foundations of, 221 unfolding MDS model, probabilistic, 291 unfolding method, 222 unfolding model, 58, 123, 182, 222, 237, 259, 294 unfolding model, general, 59 unfolding model, multidimensional, 57 unfolding model, probabilistic multidimensional, 123-124 unfolding model, simple, 59 unfolding model, stochastic, 261 unfolding model, stochastic tree, 162 unfolding model, weighted, 59 unfolding, multiple unidimensional, 259 unfolding, probabilistic, 80, 207 unfolding, probabilistic unidimensional, 77 unfolding procedure, 33 unfolding, stochastic tree, 161 unfolding structure, 22 1, 224 unfolding theory, 77 unfolding theory, joint probabilistic midpoint, 84 unfolding, unidimensional, 238 unidimensional cumulative scaling, multiple, 272 unidimensional unfolding, 238 unidimensional unfolding, multiple, 259 unidimensional unfolding,
probabilistic, 77 unimodal curve, 6 unimodal response model, 3 utility function, real-valued, 208 utility model, moderate, 123, 125, 170 utility model, random, 102, 162 utility ranking model, strict, 102 utility theory, 207 utility theory, strict, 212 utility theory, weak, 212
v validation, cross, 3, 27 value orientations, 111 value, starting, 188 variance-component models, 150 vector MDS model, probabilistic, 29 1 vector model, 58, 99, 263, 292 vector model, probabilistic, 321 vector model, wandering, 125, 144, 161-162, 294, 324
W wandering ideal point model, 123, 125, 149, 161-162, 296, 327 wandering vector model, 125, 144, 161-162, 294, 324 weak order, 225 weak stochastic transitivity, 210 weak utility theory, 212 weighted distance model, 263 weighted unfolding model, 59