Sociedad de Estadistica e I~vestigacidn Opevatit~a Test, (2002) Vol. 11, No. 1, pp. 143-165
A biplot method for multiva...
10 downloads
416 Views
962KB Size
Report
This content was uploaded by our users and we assume good faith they have the permission to share this book. If you own the copyright to this book and it is wrongfully on our website, we offer a simple DMCA procedure to remove your content from our site. Start by pressing the button below!
Report copyright / DMCA form
Sociedad de Estadistica e I~vestigacidn Opevatit~a Test, (2002) Vol. 11, No. 1, pp. 143-165
A biplot method for multivariate normal populations with unequal covariance matrices Miquel Calvo*, Angel Villarroya and Josep M. Oiler Departarr~en t of S~a.ti,~tic,s, U.r~iversit9 of' F3arcelo.r~a, Spair~.
Abstract S o m e p r e v i o u s idea.s a.bout non-]inea.r b i p ] o t s t o a c h i e v e a j o i n [ representation of m u l t i v a r i a t e n o r m a l p o p u l a t i o n s a.nd a.ny p a r a m e t r i c f u n e L i o n w i t h o u t a s s u m p t i o n s al>out t h e eova.riance m a . t r i e e s a r e e x t e n d e d . Usna.1 r e s t r i c t ; i o n s on t h e c o v a r i a n c e m a t r i c e s ( s u c h a.s h o m o g e n e i t c y ) are a v o i d e d . Va.rial,les a r e r e p r e s e n t e d as c u r v e s c o r r e s p o n d i n g to t h e d i r e c t i o n s of m a x i m u m m e a n s v a r i a t i o n . To d e m o n s t r a t e t h e versa.tility of t h e m e t h o d , t h e repre,sentaJ~ion ot varia.nces a.nd cova.ria.nces as a n exa.mp]e of f m - t h e r [>ossil:)]e i n t e r e s t i n g pa.ra.metric f l m c t i o n s h a v e b e e n d e v e ] o p e d . T h i s m e t h o d is i l l u s t r a . t e d w i t h t w o d i f f e r e n t d a t a s e t s , a n d t h e s e r e s u l t s a r e corn p a r e d w i t h t h o s e o b t a i n e d u s i n g t w o o t h e r d i s t a n c e s for t h e n o r m a l m u l t i v a r i a t e case: t h e M a h a . l a . n o b i s dis[a.nce ( a s s u m i n g ; a c o m m o n eovaria.nce m a t . f i x for all p o p u l a t i o n s ) a.nd R.ao's dista.nce, a s s u m i n g a. c o m m o n e i g e n v e c t o r s t r u c t . u r e for all t h e c o v a r i a n e e in a t r i c e s .
Key Words:
M u l t i v a r i a t e n o r m a l d i s t r i b u t i o n , n o n l i n e a r b i p l o t s , Siegel d i s t a n c e ,
Fl,a.o d ista.nce.
AMS subject classification:
1
62H99,52-07,62-09.
Introduction
The biplot method is a widely used plotting tecimique in applied multivariate data analysis. Biplot enables k muRivariate samples to be plotted together with the set of coordinate axes corresponding to the original variables, projecting the two classes of objects into a low-dimensional Euclidean space. This double representation, usually done in ]t{2, has led to improved data interpretation in applied studies and, sometimes, may be complemented with other analysis based on hierarchical methocks, as in Cap devih and Arcas (1995). T h i s w o r k is s u p p o r t e d 1.999SGR00059.
by
DGICYT
gra~,t
(Spain),
BFM2000-0801
and
also
* C o r r e s p o n d e n c e to: M i q n e l C alvo L l o r c a , I ) e p a . r t a m e n t d ' E s t a . d i s t i c a , U n i v e r s i t a t d e 13arcelona, A v g d a . . D i a g o n a l 645, 0 8 0 2 8 Ba.rcelona., Spa.in. E.ma.il: c a . l v o @ b i o . u b . e s R e c e i v e d : F e b r u a r y 2000;
A c c e p t e d : D e c e m b e r 2001
144
M. Calvo, A. Villarroga ar~,d J.M. Oller
Gower and Harding (1988) generalized the classic biplot m e t h o d by including embeddable metrics in a Euclidean space, and proposed the extension of this idea to any kind of metric. This last technique is known as the ~on-li~eer biplot. In the same paper, Gower and Harding also proposed to extend the biplot to cover more s t r u c t m e d s~mple d~ta; see also Gower (1993) and Cuadras et al. (1997) %r further details and related topics. The ca.~,orzical disc'rirlzi,zant a.rzal#,sis (CDA) is a classic representation method introduced by Rao (1948). It enables the samples from p different populations, each of them associated to a multivariate normal model, to be plotted in a low-dimensional space. The underlying metric is the one induced by the dh[ahalar~.obis disfa~.ce. This implies an irnportant additional assmnption: a colflnlon covariance matrix %r the p populations is required. In most applied situations tiffs hypothesis of homoge~fity of covariance matrices is not satisfied e.g., Fisher's Iris data. The need %r a more general distance %r the multivariate normal model has been raised in several papers. More recently Krzanowski (1996) proposed, the Rao distance for m.ultivariate normal densities, but his technique reqlfires a common structure of the eigenvectors in the covariauce matrices; see also the comprehensive paper of B u r b e a (1986). Unfortunately, it is not possible to extend this result to the full family, i.e., without any condition on their covariance structure, because the explicit form of the Rao distance has not already been obtMned for all cases, see, for instance, Calvo and Oller (1991). In this paper we extend some results previously obtained by Calvo et al. (1998). First, we looked for a graphical representation of multivariate norreal populations, without the assumption, of covariance matrix homogeneity. The rt-variate normal populations 5%,,(tr E) are identified as a symmetric (r~, + 1) • (r~,+ 1) positive defi~fite matrix, and the Mahalanobis distance is then replaced by the Siegel disrepute, see Calvo and Oiler (1990). We have not used the R,ao distance between multivariate normal distributions since, until now, it has not been obtained explicitly, as we pointed out above. Furthermore, three important properties of the Siegel distance are the reasons that we have preferred it over other more usual general distances, such as the Hellinger or B h a t t a c h a r y y a distances. These three properties are: a) the Siegel distance is not upper bounded, as the other two are; 5) it is bzvaric~.t under affine trm~sformations over the random variables and c) it is a quite sharp lower b o u n d of the Rao distance. See Calvo and Oller (1990) and Subsection 2.2 for more details. Once the interdistance population matrix is computed, the samples are represented in a low-dimensional
Biplot for normal nopulations with ~tnequal cova~ance
145
space, following standard Principal Coordinates Analysis (PCA). The newest aspect of our proposed m e t h o d for the non-linear biplot, is how the representation of the variables is obtained. The Siegel distance does not permit to plot the set of axes in a simple way, as is done in standard biplot method (e.g. Cower a n d Harding (1988)). In Section 2 we suggest the use of the gradient of the random variables mean in the Siegel space, where the populations are embedded. By the gradient's integration, a bundle of curves is obtained. Each curve, associated to one of the original variables, provides information on the direction of the maximtma variation of the corresponding mean value. OIme the bundle is computed, the w~riable representation is obtained by using the same projection of the populations, based on P C A . Therefore, we can obtain a set of coordinate axes, analogous to the non-linear biplot axes, choosing a nominated point as the origin of the n variable curves. Tiffs representation, based on the first moment #, can be extended to any smooth function of t~ and ~, and in particular to higher order moments. We illustrate here its potential usefulness by representing the variances and the covariances in this way. The techmcal details are discussed in the following sections.
2 2.1
Representation
of populations
and variables
T h e e m b e d d i n g in t h e Siegel g r o u p
Let us assume that the populations f ~ l , . . . , f~p have associated the nmRivariate normal model N , ( # , G). For each population, its density function is univocally determined by the proper parametric representation. From now on, we represent f~i by (#i, ~i). Let us consider the set of the symmetric positive-definite matrices, P~+I, and the differential metric defined as: d's2
21 t r
r 6
(2.1)
The structure of P,,+~ becomes a R,iemannian manifold usually known as the Siegel group, see Siegel (1964). The nmltivariate normal embedding in tile Siegel group is proved in Calvo and Oller (1990), b u t we prefer to
M. Calvo,A. Villarrogaand J.M. Oller
146
summarise here some results. Any ~P E P~,+I can lye expressed as: .3#t
fl
,
,3 ~ I[{§
t~ 5 R r', E 5 P.~,
/
and the differential metric in (2.1) can also be expressed as:
d ' ~ 2 +fld#t E ' ds2 =21 (\..~4/
dg+~trl {(E
]dE)2}.
(2.2)
The basic ide~ in Calvo and Oller (1990) is to associate each multivariate normal density to a symmetric definite-positive matrix by means of the following map:
(P'~'E~)~f(i~~'E~)~( E~+~t~t~/t~ ) ~~ t 1
"
(2.3)
If O is the parametric space of the multivariate normal model~ the image set f(6)) has an induced metric in P,~+] eqtfivalent to the metric induced in (9 by the Fisher inGrmation matrix, i.e., the d.~2 element has the %rm:
As proved in Calvo and Oiler (1990), .f(O) is a ~on-geodesicsubmanifold of P,+], isometric to @, with the itfformation metric. An imporl~ant derived consequence is t h a t the Siegel distance supplies a lower boundof the Rao distance. A later property of inl~erest is related to affine transformations. If the r a n d o m vector of variables X is transformed by the rule: X --* Q X + q,
with Q E
GL~,
and q E R " ,
the density corresponding to f~} will now be represented by
Because of the im;ariance of the Siegel gToup under changes from it follows immediately t h a t
G.(% %)
(GL~,.), (2.s)
Biplot for 7~,orrn,al r~,opula.tior~,swith ~t7~,eq~talcova~ar~,ce
147
In other words, the Siegel distance remains u~,cha~,ged .u~.der ajfi~te t~ct~.s-
.formations of the variables. The Siegel dista,~ce between the populations f~ and f~j is defined as the Riemannian distance between the two matrices in the Siegel group where the populations are embedded. This distance is given by:
n,+l =
=
) 1/2 log: Z~.
, (2.6)
\ ~:=1 where ItAII - {tr(AA~)} j/2 stands for the matrix norm, a n d Ak ~re ~he eigenvaNes of ~ i j/2 ~.i ~ i 1/z (or also of ~ i I ~.i)' Let us remark again that, from (2.5), the distance (2.6) is invariant under affine changes, in particular, under scale and/or translation changes of the random variables. In applied situations, the parameters (#i, Ei) are unknown, so they are replaced by their m a x i m u m likelihood estimators (~s, S ] ) to represent f~i, giving:
~~
2.2
( '~ + t"~"[ "~ 1
Some relationships tance
between
,
i
1,...,p.
(2.7)
S i e g e l d i s t a n c e a n d R a o dis-
Although a closed form of the Rao distance between two arbitrary multivarial, e normal distributions has not been obtained yet, extending some previous results, see Calvo and Oller (1991), it is possible to obtain explicit expressions for this distance in. certain cases. This fact allows us to compare Rao distance to Siegel distance in these cases. First of all, let us obtain the Rao distance for two points of the form. (#j., E) and (#2, a,E), where c~ ~ IR+. Observe that this case is ~.ot included in Krzanowski (1996), since now we are computing the Rao distance in the whole mamfold of all multivariate normal distributions, instead of the submanifold obtained considering only covariance matrices with the same eigenvectors. Starting from form.ula (14) in Cairo and Oiler (1991), applied to the presenl, case and with the same notation, if we let X be a r~, • r~, matrix and a r~, • 1 vector defsJed by
X
(cosh(Gp/2)- B G - sinh(Gp/2)) cosh(Gp/2) ,
d
E-~/2(#2-f,,j.),
148
M. Calvo, A. Villarroya and J.M. Oller
and (7 a n • r~, symmetric matrix given by
c
(,
,)
(~+)i+2~d~
.
It is possible to express
i T'(o + ~)T where T is an r~, • ~u orthogonal matrix and H is a r~, • r~, skew symmetric matrix. Taking into account Theorem 3.1 of tile above referred paper and that, in the present case
BG-sinh(G/2))
T (cosh(G/2) is an orthogonal matrix, it results that
eosh2(ap/2)
~ r ~ (C + H ) ( C + H)" T 4
and~ since tr(G 2) = 2~ we can express the Rao distance, p~ as
Moreover~ since for any square matrix It<
(
~
5
+ w')),
where A}(.) stands for the i-th eigenvalue of the corresponding matrix, we h~zve
and thus, taking into account that argcosh(.) is a monotone increasing function, it results that
Bu% in the present ease, from section (4.1) in Cairo and Oller (1991) and by obser~dng that X is symmetric, we have geodesics which joint the points ( ] q , E ) and (#;, < ~ ) w i t h H 0, and therefore the equality in (9.9)
Biplot for normal nopulations with mzequal c:ovaNance
149
is attained, obtaining, HnMly, that the R,ao distance between (if> c~E) is given by: p2
2argcosh 2
where 5 = E-J/2(/te
+ 2 v ~ + 4v/~ 6*d +
(ifl, E)
and
2
10).
On the other hand, the Siegel distance applied to the same cas% after some straightforward computations, is given by
Observe that a series expansion, in powers of %' v/cT-- 1 and Ilalt v/did, of the difference between (2.10) and (2.11) is given by
/
+o(HalP) + o(4b,
(9.12)
which shows the similar behaviour of b o t h distances when H~It is sman. Moreover, it is straightforward to check that O = ds when 6 = O. Observe t h a t if we use a ]lIahalanobis type dissirnilarity~ such as
d~., = (#2
t*])*(~- (S;-' +
Pq'))(~,2
#:,.),
(2.13)
and we compare with the R,ao distance, in the same case as before, we obtain the following series expansion
2 ~2
+
o(ttalP)
+
4 o/.~
o('~.,~),
(2.14)
which shows a less adequate behaviour of this MahManobis t y p e dissimilarity, compared to the Siegel distance, in order to approximate R,ao distance.
2.3
Representation
of t h e p o p u l a t i o n s
As it was introduced in CMvo et M. (1998), the m e t h o d we propose for the representation of the population's set starts by computing the interdistance matrix 29 between die p populations based on (2.6):
D
(d~j) = (J,~(%,%)),
i,j ~ { < . . . , p } .
(9.15)
M. Calvo, A. Villarroya ar~,d J.M. Oller
150
The next step follows the classic Principal Coordinate Analysis or Metric Scaling to reduce the dimension and represent the p populations in the 2 or 3 first principal axes. To do so, compute the p x t~ matrix T (t/j) deth~ed by:
-d?.,, +
_
Ph=l and then, diagonalize T, T = P A P t, with p p t ~his, compute ~he p • p matrix W = P A 1/2.
1.
,
(2.16)
p2 l~,l=l
= I and A diagonal. After
The principal coordinates are the columns of W, and with choice of coluzims ~ arid / the populations can be represented on a plane taking the principal coordinates k and I. In the most common cases, the selection will be k = 1 and I = 2, i.e., the first and second principal coordinates. One drmvback of this procedure is that the Siegel distance is ~zot Eaclidea~,, since the Riemannian sectional curvatures are not zero. Therefore, in general, we cannot expect to include the p population points in a Euclidean space, with the Siegel distances between t h e m conserved. In spite of this, we can argue that, in many applied situations, the negative eigenvaNes of ~he A matrix will be (in absolute value) quantil, a~ively less important than the positive ones. Hence the relative distortion produced, if the negative eigenvalues are just ignored, should be tolerated. For the applied situations in which negative eigenvalues are relatively important, Principal Coordinate Analysis fails to represent the populations, and Nonmetr~ic Multidimensional Scaliig (MDS) ~ectaiiques ~{11 be reqtfired. This discussion is p o s t p o n e d till the end of the next section.
2.4
Representation
of the variables
Once the points (individuals or populations) are projected, classic biplot m e t h o d associates each variable with a coordinate axis. See Cower and Harding (1988) for the construction details. These coordinate axes allow the user to interpret, according to the random variables, the relative proximity of the statistical objects represented. If we examine these objects in the fl~II original space (that is, before the P C A projection) and make a parallel trajectory to axis i, this straight line corresponds to the raeximzm~ variatio~t of variable i by distance uIffty. Along tiffs axis the expectation
Biplot for normal nopulations with ~tnequal cova~ance
151
of tile random variable iias greater variation t h a n is obtained along any other direction. In a linear biplot, once the axes are projected in a lowdimensional space, a parallel move to axis { is also equivalent to moving in the direction of m a x i m u m variation %r the { variable. To obtain the same resuk in the non-linear biplol~, where the axes are not straight lines, the move must now be parallel to the ca,w e axis. However, in the classic biplot there is little doubt a b o u t where to place the origin of the coordinate axes, because it is quite natural to centre the variables and assign the origin to the centroid 0 = (O),,. Note t h a t the barycentre (the center of mass) of all the represented points is also the point O. According to Calvo et al. (1998), in the Siegel space, our selected fi'ame for the representation of normal densities, there is not a clear candidate for being the coordinate origin. Moreover, it is not possible to trace the axes, as it is in the biplot method, in which it makes sense to define pseudo-samples with a r er, where G. is the r - t h vector of the canonical basis and c~. > 0, because the vector components are the individual values of the variables. But, in the Siegel case, the components are the parameters corresponding to a densitE and the components of the covariance matrix cannot be assigned arbitrarily to zero. In Calvo et al. (1998) we proposed t h a t the biplot underlying idea, i.e., to trace each coordinate axis following the m a x i m m n variation direction of the variable mean, should be preserved. The mean is the most natural parameter to be considered because we are plotting normal populations. Note t h a t such m a x i m u m variation depends on the origin point selected. In more technical lang~lage, the mean gxadient values of E(Xd) must be computed. By integration of these gradients, the directions of m a x i m m n variation are solved and appear as a bundle of c~tvves. W e have selected, as the default origin, the populatioIfs barycenter. This point may be the best for examining the pertinent space area. The local curvature of the Siegel space at this point seems to be an adequate compromise between all the local curvatures of the sampled populations. Alternatively, it may be possible to plot other bundles of curves, e.g. plot origins at the sampled populations, if one wishes to reflect the particular curvature of each observed area. Some additional details can be found in the appendix, however t h e y are not strictly reqttired in order to mtderstand the rest of the discussion. Let us consider ~0 E P~+-j an arbitrary point selected as the origin. The m a x i n m m
152
M. Calvo, A. Villarroga and J.M. Oiler
variation curve for the e x p e c t e d m e a n value of l~tle r a n d o m variable j is:
~I'o
~
e.i ~of
'
0
(2.17)
~,V]~Ie r e
#~
1
"
As n o t e d before, ej denol, es the j - t h vector of t h e canonical basis~ and t is the curve parameter. The square of the Siegel distance b e t w e e n t*J 9 ~) is: d2~ (~0, ff#~j(t)) = a~rgeosh 2 (o-j.j t2 @ 1) ~// \ 2 '
P~o = (~rij).
(2.18)
) },/2
It is possible to in.vert tiffs expression, obtaining: 2cosh
t
.o,
-
2
(2.19)
crj.j
If the aim is to establish a set of coordinate m a r k points in the plot, an interesting applied question is h o w the curve p a r a m e t e r t m u s t change in order to increase m, units of E ( X j ) starting at ~0. F r o m expression (2.17) we obtain: t - rn.#j (2.20) oj j W i t h these basic: results, we can describe the algorithm to represent the variables in a low-dimensional space, with origin at ~0- The steps are: 1) C o m p u t e a b a r y c e n t r e of the populations, i.e., a point t h a t satisfies the condition: P
P
i=l
i=l
ve
P.+j. (2,21)
Note t h a t mtmeric'al rlzeth.ods are required to obtain tiffs point. \~% have here e m p l o y e d the s t a n d a r d N e a r m e t h o d applied to the equations o b t a i n e d m a k i n g equal to zero the corresponding partial derivatives.
153
Biplot for norrn, al nopulations with unequal cora~ance
2) For each r a n d o m variable, Xj, compute several points over the curve ~ 0 (t), taking t, in equation (2.17), successive values in the set Tj = space should be obtained in order to guarantee the final quality plot. Let us denote the collection of points v;~J'~,0= t[ )~Of~~ ),"0 (
}t~r;"
3) For every variable Y, the collection v gr0 t~j must be projected in the same space as the populations, as described in Section 2.3. Following Gower and Harding (1988), compute for every point in V t~j the vector d = "~0 (ds)~,• where(],/ d 2i ~ - d i 2, , + ~ and d.;,,+l is the distance between 0 ~'j (t) and the population, i, and 9
d?.
1 ;
Z 4,(% %) j=l
1
t,
EE
p
9
4(%,%).
(2.22)
j = l t,~=l
Note that d?.,, is called the proximity function in Cuadras et al. (1997). The point r
(t) is associated to:
yj = !A-:' P t d 2
(2.23)
where as before A a n d P are the matrices obtained when T is diagonalized. 4) Taking the k and l components of ;qd, the point ~0;:~ (t) could be rep0 resented in the plan.e associated with the k and I principal coordinates. 5) The last step is to join with straight-line segments the consecutive points plotted in the representation plane. If the shape of the curve so obtained is not smooth enough, repeat steps 2) to 5) with a closet" set of t's. Let us resume now the discussion at the end of Section 2.3, in which the negative eigenvalues of T are relevant and the populations must be represented using Non-,metric MDS techniques instead of PCA. The above algorithm to represent the variables llKlSt be slightly modified if a N o n m e t r i c MDS is used. A detailed study of this situation is out of the scope of the present paper, but we propose, as a provisional idea, to replace step 3) in
154
M. Calvo, A. Villarroya and J.M. Oller
the following way. If L is the subspace where the populations are projected ~dth a _Non-metric MDS, compute the point yy in L which minimizes the distance to every desired ~~o (t). This deffifition of 9j replaces the defi~fition in step 3. Step 4 should be replaced by step 5.
2.5
Representation
of variances
mad covariances
We argued in Section 2.4 that the study of the gradient of the expectation is the most natural way to represent together populations and variables. But, in fact, there is no reason to restrict tile view of the populations to the first moment variation. A researcher could represent m a x i m u m variation curves of any other real valued smooth function of # and E, of any moments order, also mixed moments, as well as linear functions of the random variables. The basic algorithm shown in Section 2.4 can easily be adapted to these other potentially reles~nt situations. We illustrate this idea with the representation of the variances and covariances. The expression for the nmximum variation curve for the covariance of the random variables i and j is (see the appendix for more details):
( (271 As in Section 2.4, ~0 E P,+1 is the arbitrary point selected to be the origin. The matrix E~ i stands for ei e t:. In equation (2.24) the parameter t cannot be assigiled arbitrarily, because r ~<~(t) must b e positive-definite. If E0 (Crij) ..... it can be proved that t must verify: =
1 ~{j_(o~o.~.~)~,'2 ,
< t <
1 crij+(cyii
(Tjj) 1/2
if
i 7/ j
if
i
(2.2.5)
1 t < 2 ~.~.
j.
After some straightforward computations, the following expression for the upper left block ill equation (2.24) is obtained:
(s01
(E,j + Ej,)t) i
= ~ 0 @ tE~176 1 c&5t ~0 @
I
t(1 crier) @ (]-oLi ~)~-t2 aii crjj
c,
I
~.it
"
Biplot .for normal nopulations with unequal coramart, ce
(Eij+ Eji.)t)-'
In component form, defining (E0 i expression for kt,q is:
]i:pq
155
(kpq) ....... the
~gO-pi O'jq ~-t ( ( 1 - ~.j t)~,,.j + ~ , ~ ~ z ) ((1 - ~.~j t)~,~,, + t ~ , ~ , , ) Gpq-~ 1 Gij
((1
(i
(z26) The square of the Siegel distance between ~!'~,~il(t) and @0 is given by:
(z2r) For the special case of the variances, this expression could be inverted: or.) .)
t =
(z28)
2 cr:ij
To represent a set of coordinate axes associated with the covariances in a low-dimerlsional space, with origin at ~0, change the collection of points in the algorithm in 2.4 to V ~ i
3
t~
t
t~T~
Examples
E x a m p l e 3.1 ( V o l e s p o p u l a t i o n s ) . In order to illustrate our method, we shall use d a t a from four populations of voles described in Airoldi and Hoffmann (1984) and studied later by Airoldi a~d Flury (1988). Krzanowski (1996) also used these d a t a in his paper a b o u t the Rao distance for uornlal populations with common principal components. Since the distance used here is also closely related to Rao distance for the normal case, wqth no assuniptions on the matrix of covariances, it may be useful to compare b o t h results. Four populations are considered: males and females from Micrvtus calijbrnic,~s and males and females from M. ochrogaster. The sample sizes are 173, 141, 88 and 76, respectively. The three variables studied are Length, W i d t h and Height of skulls (in logarithms • 100). The observed means and covariance matrices are reported in Tables 1 and 2. In Table a we show the interdistan.ces obtained with three distances applied to the multivariate rlormal model and closely related to the Rao distance (under different assumptions). Mahalanobis distance is the distance
M. Calvo, A. Villarroga and J.M. Oller
156
2if. ochrogaster M. califvrnicus
Length
Width
Height
Females
323.2
265.7
231.7
Males
324.7
266,6
231.3
Females
326.3
269.7
235.1
Males
328.8
271.9
237.1
Ta,b[e 1: Sample mea,n Ya.lues of the three Yariables.
Females
M. och'l'ogeste'r
3I. c~lifornicus
88.66 79.11 41.32 86,08 81.66 40.24
80.57 38.81
85.54 42.08
Males
23.97
65.40 60.23 24.69
62.27 23.47
16.33
26.66
112,01 106.64 52.97
108.13 54.75
33.86
Table 2: Sample covaria.~icem.a.trice~. used in classic canonical discrinfinmlt analysis (CDA) and coincides with the Rao distance between normal populations with a common covariance matrix. Krzan.owski distance is the Rao distance when common principal components in the covariance matrices are assmned. Finally, Siegel dislance, the distance chosen in the present method, has the advantage t h a t no special assumption on the matrix of covariances is needed. In this exan> ple, the equality of covariance matrices is not reasonable, but homogeneity of principal components can be assumed (see Krzanowski (1996)), which could explain the difference between the Krzanowski and MahManobis distances in Table 3. However the Krzanowski and Siegel distances are quite similar, as could be expected. An interestizg aspect is t h a t the Mahalanobis distance between females and males of 21J. cal~fvrn.icus is a bit higher t h a n between b o t h sexes of 21f. ochrogaster, in close agreement with mean ~ l ues (Table 1). However, this relationship is inverted with the other two distances due to the values of the matrices of eovariances (see Table 2). ResNl, s of the proposed m e t h o d are shown in Figure 1, where popNations and variables are jointly represented. As can be seen, each population
157
Biplot f o r norm, al nopulations with tm, eq~ual cora~ance
Mahalanobis dist.
Krzanowski dist.
Siegel distance
Ca(m) ca(o Oc(m) Ca(m) Ca(f) Oe(m) Ca(m) ca(f) Oc(m) ca(r) 0.4491
0.5390
0.5447
Oc(m) 1.4782 1.1251
1.5454 1.1590
1.5796 1.2553
Oc(f)
1.1794 0.7901 0.4221 1.4048 0.9810 0.6246 1.3711 0.9754 0.7637
Table 3: t n t e r d f s t a n c e m a t r f c e s for t h e four popula,tioms:
Oc(m) O~iales o f M.
a:nd Ca(f) (fbma:le~ o f M. ca.lifvrmcus),
~Oc(females) " '~---~-~;~.-'~
,,Ca(fe males) ~_2"~'~--~-i-'-'~---.~-.-..-_.....
%
len wla
Iqel
Ca~male-s-~-.--~
Oc(males)
Fi~ure Z: Reprea~enta.tion o f tile four pvpula.tion~ and t h e three varia.bte.a ~according to t h e m e t h o d based on Slogel dfstance.
Unfty f n c r e m e n t s o f / t h e varfa, ble mea.ns
a,re ma,rked ~ffth cro~sem
is allocated to one of the %ur quadrants of the plane with clear population and sex differences, So, M. calfforn~c~s (Ca in the fig-ure) populations are placed on the right of the h a l f plane, and females of both species in the upper half-plane. The similarity between original distances and those observed in Figure 1 can be measured through the cophenetic correlation
158
M. Calvo, A. Villarroga and J.M. Oller
,,Oc(females)
VAR(lenght) / / ' ~ VAR(width) /~///.~ VAR(height) ~,Ca(females) .
/'__
Ca(males)
~,Oc(males) Coy(length-height)
F~oure 2: Joint representation of popuia.tfons, ~a:ria:nceb~a.nd ]ength-heig,ht corarimice. T h e inter.~'ection point i~ a baItycentre, a~ in Figure 1.
%r the interdistances, which is 0.998, a n d by the percentage of explained variability, which is 96.87 %. Moreover, no negative eigenvalues were %und. The arrows represent the variables with unity increments marked as crosses. A barycenter of the populations was chosen as the origin (intersection point) of their representation and is slightly displaced from the origin of the gTaph (the centroid of the population coordinates), as happens with other nonlinear biplots (Cower and Harding (1988)). Note the nonlinearity of the trajectories, which is a more marked effect at the extremes of the representation than near the origin, where minor distortions should be expected. As can be seen~ the three variables increase to the right~ i.% approximately towards the males of M . Califo~nicus, in close agreement ~ i t h Ngher v a h e s of this population for all of t h e m (see Table 1). However, it should not be forgotten that cow,fiance matrices are also taken into account ~ i t h our method. This is a more realistic approach, although, in some cases, it could seem less intuitive. In this sense, if we apply P C A to Mahalanobis matrix the obtained gTaph is quite similar but the two sexes of JtJ. c a l i f o v n i c u s appear, approximately, as close as those of M. ochrogaster
Biplot .for normal nopulations with ~tnequal cova~ance
159
Figure 3: Johit repres~enta:tion of populatio~L~; mea.n lengr (solid lhie,~) and len~4th variance (dotted lines) pa,ssing throup~h populations and ba
160
M. Calvo, A. Villarroga and J.M. Oller
indicates Chat the considered variance increases to the top right corner and the mean leng~ch increases to the b o t t o m right corner of Figalre 3. [-I E x a m p l e a.2 ( E g y p t i a n skulls). A classical example in multivariate analysis is the one reported by Thomson and Randall-Maciver (1905) and is concerned with four measurements of 150 male Egyptian skrflls from five time periods: E (Early Predynastic), L (Late Predynastic), T (12-13 TM dynasty), P (Ptolemaic) and R (Roman). The foui variables measured were: MB (maximum breath of skull), BH (basibregmatic height of skull), BL (basialveolar length of skull) and NH (nasal height of skull). A gTaphical display of this d a t a obtained by CDA is s h o ~ l in Figure 4 (lower-case letters). The display obtained by the m e t h o d described in this paper is shown in the same figure (upper case letters). In both cases the first axis (horizontal) can be interpreted as time, from most recent on the left to most distant on the right. However, discarding the magmtudes of the distan ces, there is a clear difference between the two displays: the relative situation of the three most recent populations T, P and R,. In the CDA display the three populations lie, approximately, in a straight line, with the Ptolomaic population between the other two. However with ore" m e t h o d the three populations form a triangle in which the side connecting P and R, is practically perpendicular to the first axis. ~p t~-.
E . . . . . . . .
-vp 9 T.b
....
re
~R
F~ure 4: Graphical display of ~tie five eg.?.'t)tiali populaticms obta.hied by t~vo methods: CDA (in lower cm~eietter~') a.nd Biplot (upper ca.se lette~b). The ordgin coi~icide~ udth botlz celitroid~. Of course, the discrepancy rnust be due to the different rnetrics assumed by both methods, clearly reflected in tile interdistance matrices showxl in
B iplot for normal nopulations wifl~, unequal coramance
Table 4:
161
E
L
T
P
R
E
0
0.30075
0.94759
1.36860
1.66246
L
0.94308
0
0.85124
1.25979
1.49661
T
1.36731
1.22836
0
0.66442
0.97895
P
1.71955
1.59887
1.15124
0
0.48781
R
1.78993
1.59732
1.15836
1.18607
0
hiterdistance ma:trice~" for the five popula.tJons of the exampJe: "ffa:ha-
la.nobis clfsta3ices in the upper part a.nd Siegel distances ffi the lower part. Table 4. Under the CDA metric tile population P is clearly closer to populations T, L ~nd E t h a n p o p u h t i o n R, while both populations (P and R) are quite close (only populations E and L are less distant). However, under the Siegel metric both populations, P and R, are, approximately, equidistant from the other three and qtfite far from each other, in good agreement with situation in Figure 4. The question is: which m e t h o d (display) should be chosen for this example? Although CDA seems to give display of easy interpretation, we must bear in mind that it is based on the Mahalanobis distance, and homogeneity of the covariance matrix is thus assumed. However, Bartlett's test gives a p-value - 8.94 x l 0 11 so the homogeneity hypothesis does not hold and CDA should be discarded. Moreover, the situation of populations P and R in the biplot does not seem so strange if we notice t h a t they are quite close in time, compared with the rest. So t h a t time effect should not be very important (they are very close on the first axis) a n d other factors (nflgTations, etc.) could explain their differences (the second axis could be related to such factors). After the representation of all means, variances and covariances (not shown) and, for the sake of simplicity, we have selected for the illustration of the parametric function representations those sho~;~ in Figure 5: the mean values of the variables BL and MB (quite parallel to the first axis and reflecting the time) and the variances of the I~[B and NH variables (more related to the second axis) which could explain, in part, the position of populations T, P a n d R. h d e e d , the interpretation of the final plot with our m e t h o d is not as straightforward as with CDA, since not only the
162
M. Calvo, A. Villarroya ar~,d J.M. Oller
means of the four variables, but also the 10 combinations of t he different variance-covariances are taken into account to com put e the interdistances (2.6). However it is more realistic t h a n to suppose a com m on covariance matrix.
~
kvariance(M
B)
variance(NH) F ~ u r e 5: Biplot representation of the five populations together with the mean values of the variaMes MB and NL and the variances of MB and NH, The origin of the plot is the centroid of the populations. El
4
Conclusions
In this paper a new m e t h o d for the simultaneous representation of multiv~riate normal populations and any p a r a m e t r i c ftmction is described. It can be seen as an extension of the CDA using some non-linear biplots ideas ~ i t h the advantage of non requiring arty assumption a b o u t the covariance m a t r i x structure. So t h a t the limitations of the Mahalanobis distance a b o u t c o m m o n covariance m a t r i x are avoided. In our m e t h o d t he variables are represented by their m a x i m u m variation curve and t h e n pr oj e c t e d in the same space as the populations. Tiffs is also the way in what classical a n d non-linear biplots represent variables, although usually t h e y are not described in those terms. Additionally our m e t h o d allows the simultaneous representation of any p a r a m e t r i c function.
Biplot .for ~zorm,al rzopulatiorzs a.'ith mzeq~,al covaNar~,ce
163
We have illustrated this point by the representation of the poptflations jointly x~ith some variances and covariances, but other possible interesting plots, i.e., the correlation coefficient, could be easily deduced following the same procedure. It must be pointed out that the interpretation of the resulting plots may not be trivial, depending on the chosen parametric flmctions plotted and the concrete example considered. In this sense, let us notice t h a t when the hom.ogeneity of the covariance matrices does not hold the use of CDA can produce artifact plots and misleading interpretatious. And although the m e t h o d described in this paper will usually result in higher dimensional parametric spaces (manifolds) than CDA, it is clear that simplicity is not necessarily synonymous of rea/it~]. Iu any case, the final step always consists of the projection, of the statistical objects (populations, variables or any parametric function) placed iu the original space into a two dimensional space. For this reason we believe that the current static 2D representations do not take adwantage of all the graphical possibilities. In our opinion, a more convenient way to explore and summarize the d a t a should probably only be at, rained developing an adequate software that allows user interaction in several ways. Our ideal software should include not only the current usual giaphical capabilities in programs of multivariate data representations (such as rotations and zooms of the projected data) but also the possibility of changing the selected principal coordinates on demaud, with several combinations of 2 or, even better, 3 axes. It shoukt also implement the ability to change the plot origin, allowing the user to select the po,.rtic,~lar vision of one population, or to travel in the space of the data. We expect that this dynamic point of view ~dll help the user to visualize the distortion produced by the curvature of the maniRdd. As an important third goal, our ideal software should include the representation of the usual parametric flmctions, mixing them at user demand. With. this three value added capabilities, the applied researcher would be able to profit the perspectives opened with the current work.
Appendix The geodesics of the Siegel gToup starting at q*~ E
r
t),,.+] are:
M. Calvo, A. Villarroga and J.M. Oller
164
~Ijl/2
where ~.j is the symmetric square root of ~1. The s y m m e t r y of 9 ('&~.j) enables the lower triangle of the matrix to be dispensed with, Taking into account this convention, the inverse of the metric tensor is derived as:
(d~:"p(l, dZ!)r.s}
=
"~/.'pr'~/.'q,s@ Z'.'.'psT~)qr, with p <_ q and r _< s.
(4.1)
With all these relationships, to obtain the curves ~I,0 9 ''~ (t), derive E(X~) ~dth respect to ~. \,Vith the aid of the inverse of the metric tensor in (4.1), and after some computations, the following expression of the gradient of E(Xi) is obtained:
dt
e.~ E
0
"
Assuming th4~t the desired origin is at ~0, integTate the last expression to obtain (2.17). Remember t h a t Eo is constant along the curve and, also, t h a t the (n + 1,.n + 1) component of 9 t~j ~,0(t) is ahvays 1. This restriction appears because the curve only contains matrices in P,,+~ corresponding to the embedding of some multivariate normal density. In the same way, to obtain the curves ~ (t), derive cov(Xi, X i ) = ~r.ii with respect to qJ. The following expression for the gTadient is obtained:
d~ dt
( 2 Ei~ G + G E;s 2 0~
0 ) 0 "
(4.2)
indent Assuming that the desired origin is at ~0, integration of this gives
(2.24). References
AIKOLDI, .J.P. a n d FLUKY, B. K. (1988). An application of common principal component analysis to cranial m o r p h o m e t r y of mierofus cal~fornicus and m. och,vogaster. Journal of Zoology London~ 216:21 36. AIROLDI, .]. P. and HOFFMANN, R. S. (1984). Age variation in voles (Microtus californicus M. ochrogaster) and its sig~fificance for systematic studies. Occidental Papirus Muset~m Natural History Kansas, 111:1-45.
Biplot for normal nopulations ~'ith unequal cova~ance
165
BURBEA, J. (1986). Informative geometry of probability spaces. Eg;~ositiones Math, er~.aticae, 4:347-378. CALVO, 5{. and OLLE,R~ J. ]~'~. (1990). A distance between multivariate normal distributions based in an embedding into tile Siegel Croup. Journal of Multivariate Analysis, 35(2):223 242. CALVO~ ]~'~. and OLLER~ J. ]~[. (1991). An explicit solution of information geodesic equations for the multivariate normal model. Statistics and Decisions, 9:119-138.
CALVO, M., VILLARROYA, A., and OLLER, J. M. (1998). Non-linear hiplots for multivariate normal grouped populations. In Advances in Data Science and Classification, A. Rizzi, M. Vichi and H.H. Bick, eda., pp. 343-348. Springer. CAPDEWLA, C. and Aacas, A. (1995). Pirs d~des piramidales. Questii6, 19:131-151.
indexadas y disimilari-
CUADRAS, C., FoarIaNa, J., and OLIVA, F. (1997). The proximity of an individual to a population with applications in discriminant analysis. Journal of Classification, 14:117-136. GowEa, J. C. (1993). Recent advances in biplot methodology. In Multivariate Analysis Ft~tur'e Directions 2, C. M. Cuadr'as and C. R. Rao Editor's. North-Hoffand, Amsterdam. GOWEa, J. C. and HAaDINC, S. A. (1988). Biometr'ika, 75(3):445-455.
Non-linear biplots.
KRZANOWSKI, W. J. (1996). Rao's distance between normal populations that have common principal components. Biometrics, 52:1467-1471. RAo, C. R. (1948). The utilisation of multiple measurements in problems of biological classification. Yuurnal of the Royal Statistical Associatioon B, 10:159-193. SIEOEL, C. L. (1964). S~Implectic Geometr:q. Acadenfic Press, New York. THOMSON, A. and RANDALL-MACIVER, R. (1905). Ancient Races of the Thebaid. Oxford University Press.