This content was uploaded by our users and we assume good faith they have the permission to share this book. If you own the copyright to this book and it is wrongfully on our website, we offer a simple DMCA procedure to remove your content from our site. Start by pressing the button below!
d approache w h IC Improve on the approximate meth d d 'b d s .. d I' 0 s esen e here. WI' consider IOglStiC rna e s for bmary data in Section 9 O.3'an d P Olsson . regressIOn m d lJ; for count data in Section 9.4, . 0 es o
0
•
,
1
o
0
9.2
-
-
Estimation for generalized linear mixed models
In the random effects GLM we assume: (1) the conditional distribution of Yij given U i follows a distribut.ion from the exponential family with density !(Yij IU;;(3); (2) given Vi, the repeated measurements, Yil,.' Yn ., are independent; (3) the Vi are independent and identically distributed with density function !(V i ; G), 0
,
Let V = (V I , , , , , V m). In the subsection below on conditional likelihood , we will treat the random effects as if they were fixed parameters to be removed from the problem, so that we need not rely on the third assumption above, In the subsection on maximum likelihood estimation, we will treat Vasa set of unobserved variables which we then integrate out of the likelihood, adopting the assumption that the random effects distribution is Gaussian with mean zero and variance matrix G. 9.2.1
Conditional likelihood
In this subsection, we review conditional maximum likelihood estimation for /3. McCullagh and NeIder (1989, Section 7,2) present a more general treatment for GLMs. The main idea is to treat the random effect.s,. Ui, 'mate (3 using the condItIOnal as a set of nuisance parameters an d t 0 es t 1 likelihood of the data given the sufficient statistics for the U~. Treating V as fixed, the likelihood function for (3 and U IS (9,2,1)
RANDOM EFFECTS MODELS ESTIMATION FOR. GENERA
172
. . rfy th discussion we restrict attention to where Oij := Oij(f3, U)- To ~:Ju~lagh :nd Neider: 1989, p. 32) for which canonical link functIons ( h I'k l'hood above can be written as - , f3 + d' .U, Then tel e I
Oij
-
xii
tJ
t
eXP{f3'LXiiYii+ , . t,J
LU~LdijYij
- '2;7/J(Bi j )},
J
i
(9.2.2)
',J
the sufficient statistics for 13 and U i are Li,j XijYii and Lj dijYij E d·y· is sufficient for Ui for fixed 13, respec lve Y", I 'f. 'l~h'Jod is proportional to the conditional distribution The cand ItlOua Iike I a . . of the data given the sufficient statistics for the U i , The contnbutlOn from
H
ence~, I and
LIZED LINEAR MIXED MODELS
implies that we can learn abo t '" " u one mdlvldual' ffi . mg the vanability in coeffic'le t s coe clents by understandWh , " n s across the po I f vanablhty, we should rely on the 0 ula . pu a lOn, .en there is little P those for an individual Whe th P, tIon average coeffiCIents to estimate . n ere 18 sub t f l ' . more heavily on the data from each' d' . s a~ la :anatlOn, we must rely ficients. This idea was illustrat d ,mSlvl~Ual m estImating their own coef. e III ectlOn 55 wh b" to estimate an individual subJ'ect' CD4 .., ere Our 0 Jectlve was The I1'keh'h ood function for thes k trajectory. to include both 13 and the element un nGo~n parameter 6, which is defined f , IS so
= II
f II
t=1
J=1
m
L(o; y)
subject i has the form
f ( Yi I LdijYij j
= bi jf3) =
f(Yi;f3,Ui) f (", d .. .. - b··f3 U) LJ j 1J Y'J " , , f (Lj XijYij = ai, Lj dijYij = bi ;(3, U i ) f(LjdijYij=bi;{3,Ui)
=
(9.2,3) For a discrete GLM this expression can be written as
nj
b" and
R;2
II m
i=1
'" wRit
L: R
i2
exp(,8, ai)
exp (,81 ",ni ) Wj=l XijYij
.
(9,2.4)
For simple cases such as the random ' hood is reasonably eas t " mtercept model, the conditionallikeli, y a maximIze (B I d D llltercept models for b' res ow an ay, 1980). Random ' mary and co t d detall below. un ata are considered in more 9.2.2
ni
L L Xij{Yij -llij(U
i )}
= 0,
(9,2.6)
i=l j=l
= ai and IS the set of values for y. such that L' d. 'y" = b,i' T~e conditional likelihood for f3 given the data for all . J ,'J'd'J SImplifies to m III d IVI uaIs 1JYl3
(9.2.5)
, To fin~ the maxim,um likelihood estimate, we can solve the score equatIons obtamed by settmg to zero the derivative with respect to 0 of the log likelihood, If we imagine that the 'complete' data for an individual comprise (Yi' U i ) and if we restrict attention for the moment to canonical link functions, then the complete data score function for f3 has a particularly simple form
S ,8(01 Y, U) =
whe~e R.il : the set of pO,ssible values for Yi such that Lj XijYij
2: 3
!(Yij IU i ;,8)!(U i ;G) dU j •
This is just the marginal distribution of Y bt . ed b . . . 0 am y mtegratmg the joint d ·Istn'b utlOn of Y and U with respect to U I . . . ' n some special cases such as t£h e GaussJan lInear model (Chapters 4- 6) ' the 'Inte , graI above has a cIosed or~, but for ~ost non-Gaussian models, numerical methods are required for Its evaluatIOn.
m
E Ri2 exp (f3' Lj XijYij + U~bi) ,
173
Maximum likelihood estimation
Here, we will treat the U. abIes fr 1 as a sample of ind d epen ent unobservable variam a random effects d' t 'b' IS n utIon , Qual't ' I atlvely, this assumption
where llij(U i ) = E(Yij lUi) = h-I(X~jf3 + d~jUi)' The observed data score equations are obtained by taking the expectation of the complete data equations with respect to the conditional distribution of the unobserved random effects given the data, That is, we define the observed data score functions, S{3(O J y), as the expectations of the complete data score functions, S{3(ol y, U), with respect to the conditional distribution of U given y. This gives, m
S{3(ol y) =
ni
L L Xij[Yij -
E{llij(Ui) IYi}]
= 0.
(9,2.7)
i=1 j=I
The score equations for G can similarly be obtained as
Sa(o/ y) =
~G-l 2
{f E(UiU~ I
l I Yi)} G- - ; C- = 0.
(9.2.8)
;",,1
, . d t' t f 0 a common strategy is To solve for the maximum hkehhoo es Ima e o , al 'h 't tes to use the EM algorithm (Dempster et al" 1977). This gont m I era
RANDOM EF
FECTS MODELS LOGISTIC REGRESSION FOR BINARY RESPONSES
1 74
. tlllg t h e expectations in the score
I which involves eva ua fthe parameters, and an M-step, between an E-step" the current values 0 d t d parameter estimates. b ve usmg t give up a e th core equations .0 d' the conditional expectaequations a 0 in which we sol:e efsthe integration mvol ve ~n ne or two, numerical d'mellSlOn 0 When q IS 0 C The. I the d'Ime nsion of U i · ably easily (e.g. rouch t d reason . es can be implemen e . I roblems Monte Carlo tions IS q, integra~ion technIi~90), For higher dimenSlOn~ ~he appli~ation of Gibbs and Spiegelman, bed. See, for examp e, , t' on methodB can e us mtegr~ I. Zeger and Karim (1991), . t the score equations in such amphng III , t approxlma e b d B A alternative strategy IS 0 'ded This approach has een use n ' n be avO!· . d i::r t that the integratlOnB ca" els with GaussIan ran om euec s, et al. (1984) for and Clayton (1993) for rand?m Karlm ' (1991) , Schall ,(1991), an (1990) for non-linear d nd Bates . regressIOn , acts GLMs and Lm strom a d rs The central Idea IS to use eue " d effects an erro , . £ models with GaussIan ran am d't' al means in the score equatIOn or th than con I IOn . . f U conditional modes ra er ' t ' the conditional distributIOn 0 i ' 'al t to approxlma mg d t t:I This IB eqUlv en " t h th same mode an curva ure, fJ· . d' tributlOn WI e . given Yi by a GaussIan IS lace the integration WIth an By using modes rather t~an mea:t~d~~t:efhe M-step. optimization that can b: mcorpor I tely let v" = Var(Yij IU ) and
~yW;~ratel1i
To
logl~t~r:~:w
~pecify t~e al~on~7 zm~: ;~~;u;rog~te res~onse
i
defined to have Qi = dJag{vijh (fl.ij) }. 1.. I .. ) • = 1,.", ni and define the elements Zij = h(f-Lij) + (Yij - f-L~J)h ~J.L1J the n. x q matrix whose n· x ni matrix V; = Qi + DiGDi , were i IS 1 . d b j~h row is dij , For a fixed G, updated values of {3 and U are obtame y iteratively solving
j}.
~
f3
m
I
-1
= ( t;Xi Vi Xi
)-1 t; m
~
= GDi Vi-I (Zi -
The quantity iii is1 an estimate ofE(Ui I lid· An estimate ofthe conditional variance is (D;Qi D i + G-l)-l. Note that the parameters appear on both sides of equatiollB (9.2,9) and (9.2.10) so that the algorithm proceeds by iteratively updating first the estimates of the regression coefficient.s and t.he random effects, and then the variance of the random effpcts, until the parameter estimates converge. A variety of slightly different algorithms for estimation of G have been proposed, See Breslow and Clayton (1993) for one specific implementation and an evaluation of its performance. This approximate method gives reasonable estimates of {3 in many problems. The estimates of U j and G are more sensitive to the Gaussian approximation to the conditional distribution. The approximat.ion breaks down when there are few observations per subject and the GLM is far from the Gaussian. Karim (1991) and Breslow and Clayton (1993) have evaluated this approximate method for some specific random effects GLMs. More recently, Breslow and Lin (1995) and Lin and Breslow (1996) proposed a bias correction method by expanding the approximate likelihood, which they termed penalized quasi-likelihood, at random e~ects parameters G. They found that bias reduction is satisfactory, especlally for large values ofG.
9.3 Logistic regression for binary responses 9.3.1
Conditional likelihood approach
In this section, we consider the random intercept logistic model for binary data given by
Xi Vi
-1
.
(9,2.9)
Zt
To simplify the diSCUSSIOn, does not include an intercept term. the 'Yi is proportional to m
IJ
To estimate G, note that the score equation (9.2,8) implies that
i=l
j=1
e Jom
]
(9,3.2)
. the sufficient statistics for the 1i has f3 gIVen
the form
)
n m
(9.2.10)
1=1
exp (l:j~1 YijX: j f3
l:R; exp( l::~1 X:t,8)
(9.3.3)
(no)
' all the Yi'. ways ' dex set ~ contams y" and t he m f ns where Yi. = L...tj=l 1J t of n. repeated observa 10 ' .. esponses oU 1 choosing Yi. pOSItIVe r ",ni
(9.2.11)
j=1
J-l
The conditional likelihood for
m i==1
j
(~ .'X~.) f3 _ flog {I + exp('Yi + X;j(3)} , II exp [n. 'Yi LYij + ~ Y'J
X i (3) ,
These equations are an application of Harville's (1977) method for linear random effects models to a linearized version of the possibly non-linear estimating equations in the GLM extension,
G== m- I LE(UiU: IYi)
11 Ui ) = Po + U + X;jf3. (9,3.1) ill write "Y' = f30 + Ui and assume that Xij we w Th' ~ t likelihood function for f3 and
logitPr(Y'ij =
,.
I
and Ui
175
0
f
RANDOM EFFECTS MODELS 176
LOGISTIC REGRESSION FOR BINARY RESPONSES
Table 9.1. Nota
10
triaL Group
177
t ' n for a 2 x 2 crossover (1,0)
(0,1)
(1,1)
(0,0)
AB BA 'equI'valent to the one derived in 'k I'h d a b ave IS The conditional lI e I ~o (B I and Day 1980). In that context, t 01 studIes res ow , stratified case-con ~ 'th tatum there are Yi. cases and ni - Yi. conr there are m strata; III the Z St t' 'that any statistical package suitable ection is impor an III d Th' troIs, IS conn 'fi d ase-control studies can be use to fit a i 's of the stratI e c , h l' h for t e ana ySI ,. d I t binary longitudinal data WIt Itt1e or random intercept logistIC mo e 0 no modification,
For the 2 x 2 crossover data on cerebrovascular deficiency, previOUSly discussed ~ E~mple 8.1, We have bl == 6, CI == 0, b2 == 4, and C2 == 2. For the model mdudlng the treatment variable only the treatment effect {3 . esti~~ted as log~ (6+4)/(0+2)} == 1.61 with ~timated standard err~r ~.'7~~ A SImilar result 18 obt~ined when the period effect is included in the model. T~e tre~tment effect IS now estimated as ~ log{(6 x 4)/(0.5 x 2)} == 1.59 With estimated standard error 0.85. Note that we have used the co v ntion of replacing the zero cell with 0.5 in this calculation. Nevertheles~ ~he data indicate that the odds of a normal electrocardiogram for the tre~ted patients are about five times (5 == exp(1.6)) greater than the odds for patients receiving the placebo. This finding from the conditional inference is to be contrasted with the results from fitting a marginal model in Example 8.1 of Section 8.3. For the marginal model, we estimated a rougWy two-fold increase in the odds for the treated group. The smaller value from the marginal analysis is consistent with the theoretical inequality stated in Section 7.4.
Example 9.1. The 2 x 2 crossover trial Let a, b, C, and d denote the numbers of response pairs for ,each I of hthe four . possible combinations of outcomes in a 2 x 2 crossover tna , as s own III Table 9,1. h 'd For example, b1 is the number of subjects in the first group, w 0 receIVe the active treatment (A) followed by placebo (B) with outcomes (1,0), that is with a normal response (Y == 1) at the first visit and an abnormal response (Y == 0) at the second. For the logistic model (9.3.1) which includes only the treatment variable XI, the conditional likelihood (9.2.4) reduces to ({3) { 1 :x:XP({31)
}b
1
+b
2 {
1 } 1 + exp(131)
C1 +C2
The estimate of (31 which maximizes this conditional likelihood is
~I == log{(b1 + b2)!(Cl + C2)}, Its variance c~n be estimated by (b i + b2 )-1 + h + C2)-I. . I~ the penod effect (X2) is now added to the model, the conditional lIkelIhood function becomes exp{ ({31
+ (32)bd
exp{(131 - 132)b
2
}
{I + exp({31 + .B2)}b + {I + exp(131 _ 132)}b2 +C2' The maximum conditi Irk l'h . . I e I ood estimate of 131 and the corresponding vanance estimate are, ona respectively 1
. (31
==
1 log
2
(b b) ~ ,
Example 9.2. The 3 x 3 crossover trial Table 9.2 gives the results of fitting a random intercept logistic regression model by conditional likelihood to the crossover data from Example 8.2. The table reports the estimated regression coefficients and their standard errors. Strong treatment effects are evident after adjusting for period and carryover effects. The chance of dysmenorrhoea relief for a patient is increased by a factor of exp(1.98) == 7.3 if her treatment is switched ~om placebo ~o a low dose of analgesic and by a factor of exp(1.71) == 5.5 If treatment 18 switched from placebo to a high dose. The principal advantage of the conditionallikelihoo~appro~ is that we remove the random effects from the likelihood by which we estIm~te {3, thus avoiding the assumption that they are a sample from a p~tlcu1ar probability distribution. The disadvantage is that we rely entIrely on within-subject comparisons. So persons with Yi. == ni or Yi. == 0 provide no information about the regression coefficients. In the 2 x 2 crossover
., all'k Table 9.2. Results for a condition I el'h I 00 d analysis of data from a 3 x 3 crossover trial.
C1
V;;(t11) ==
~ (bl
l
+ cll + b2"l + c2"l) .
Variable Coefficient Standard error
Carryover
Period
Treatment
B
C
2
3
1.98 (0.45)
1.71 (0.41)
0.69 (0.56)
0.85 (0.58)
B 0.14 (0.60)
C 1.24
(0.65)
RANDOM EFFECTS MODELS
LOGISTIC REG
178
. . th t a + d + a2 + d2 pairs are uninformative. In the trIals, thIs means I . 'd d at}lis 1accounts for 82% (55/67) af t he subjects under examp I,e · conSIC ere, . . . aeql1ently standard errors of regreSSIOn estImates tend to a 1)servat,IOn. on.~ , . or random effects analySIS. . For example, the ' a marginal e Iarger th an III . . . b standard error of the regression coefficient for the treatment :anahle IS here 0.91, BB opposed to a value of 0.23 obtained from ~he mar~mal model. At the extreme, the conditional analysis provides no mformatJOn about coefficients of explanatory variables which do not vary over time. This can be seen by examining (9.3.3). The product of any time-independent covariate and its coefficient will factor out of the sum in both the numerator and denominator and cancel from the conditional likelihood. This is sensible since we have conditioned away all information about each subject's intercept and thus cannot use estimates which are based entirely upon comparisons across subjects. We now turn to the situation in which the U i are treated as an independent.s~mplefron: a random effects distribution. We begin by reviewing the traditIOnal but Simpler random effects models for binary data and then consider the logistic-Gaussian model more specifically. 9.3.2
Random effects models for binary data
Historically, the for random effects models has bee n th e 0 b sert' th h motivation . va IOn at t e varIability among clustered binary responses exceeds what would be. expected due to binomial variation alone. Random effects mod-
~~;~:~a~~~:~:~dd::r~~~~~n:[~~e~~~~so;~:~)d extra-bi;omial variation.
REsSION FOR BINARY RESP
ONSES
The beta-binomial d' . . 179 of non-infectious diseas lS~nbutlOn has been used to m d malformed foetuses i ~~ m a household (Griffiths 197~) el the incidence somal aberrant cell n a Itter (Williams, 1975) and'th . the number of 1986). Prentice (19~6amo~g repeated samples'for an i:~~~ber of chromonot be positive in the l~tOalllbt.s out: that the correlat.ion c~:lffi,u~l (P~entice. .t ' I - Illomlal m d I Clent. () need I s ower bound is 0 e as previously tl"lOUg h" " t. but that. .
00 = max{ -fl./(n -
:1-
I
1), -(1 - fl)/(n
+ fl)},
The beta-bmomial frame wor k can be extend d b . may e Imposed on the cluster-spec"fi e so ,t.hat a parametric model h d , I e means J F e ~su.me to depend on duster-level f' ' . 1,. or .example. II, might a IOgIS.tl~ function, logit(fl;) = X;f3.xplanatory vanables. x" through Ongmally, it was assumed that th b I. . each response from the same I t e e a-bmomial distribution required e us er to have a t h e regression set-up this requ' d th . common probability. /1;. In . ' Ire e covanat . t b o bservatlOns within a cluster th t . e s o e the same for all Rosner (198 4) has extended th , ab tIS, b'Xi! = '" = X tn, -- Xi' However . . e e a- momial to . II h . ' vary wlthm clusters. His model for one I . a ow t e covanates to custer IS formally equivalent to the following ni logistic regressions: .. logit Pr(Y:' tJ = 1/ Ytl'···'Y'J-I,Y;j+l,···,Yin.,Xij)
=
log (
1-
Oil
Oil + Wij()i2 ) + (ni _ 1 _ Wij)(}i2 + X;jf3*,
j
= 1, ... , ni,
(9.3.4)
{Yib' .. ,Yin,} represent the n' bina'
was one 0 the earliest. Let cluster could be a litter in t ttl ry responses from cluster i. Here the a genetic study or an indi: der~ o~ ex~eriment, a family or household in distribution assumes that: I ua m a ongltudinal study. The beta-binomial
?
(1) conditional on fl.'" th e responses y;, . common probability J.li; ,1, ... , Yoin; are mdependent with
(2) the fl.; follow a beta distribution w'th Unconditionall h I mean J.l and variance 8p(1 - p). 1": y, t e total number of " ,I +." + Yin, has a beta-bino . I d~osl~lve responses for a cluster, Yi = mla Istnbution with .
and Var(Yi.) = nifl.i(1 ) The over-d' . - J.li {I + (ni - 1)8}. IsperSlOn p h arameter 6 is the c . responses fro m t e same cluster. orrelatlOn for each pair of binary
where w·· . t! ---:- y'•. - yij, (il) 'IS an mtercept parameter and ()i2 characterizes the as~oclatlOn between pairs of responses for the same cluster. This clever ~xtenslOn of the beta binomial does have some important limitations. First Its regression. coefficients, f3*, measure the effect of Xij on Yij which canna; first be explamed by the other responses in the cluster. Hence, the effects of cluster-level covariates may often be attributed to the other observations within the cluster, rather than to the covariate itself. This drawback is particularly severe when the cluster sizes vary so that different numbers of other responses are conditioned upon in the different clusters. This is a particular problem for longitudinal studies where the number of observations per person often varies. In addition, in longitudinal studies, it may be awkward to model the probability for the first response as a function of responses which come later in time. See, for example, Jones and Kenward (1987) who consider models for crossover trials. The logistic model introduced in Section 9.1 adds the random effects on the same scale as the fixed effects. To our knowledge, this approach was first considered in the biostatistical literature by Pierce and Sands (1975) in an unpublished Oregon State University report. They assumed
RANDOM EFFECTS MODELS
LOGISTIC REGRESSION FOR BINAR'" R
180
I
.' d m intercept. Since then, the . ., t' [, r a ullIvaflate ran 0 . I a Gaussian cllstrwu ,Ion 0 D: t has heen studied extensive y, .hG " n random euec s logistic model WIt ,al~ssJa 82 Stiratelli et al. (1984), Anclerson and including work by WIlliams (19 ») Ze er et al. (1988), Zeger and Karim Aitkin (198.5), Gilmol~r et ai. (~~:3~ 'anJWaclawiw and Liang (1993). One (1991), Breslow and Clayton) h' d a log-log link and a log gamma exception is Conoway (1990., w a u~e · t 'b 11,t'lon for the random mtercepts. d18,fI 9.3.3
Exo.mp ies
modei9 with Z . , 11OgZ8. t'c
0
Gaussian random effects
' the random effects model within the GLM frame. I' I' The approac h to fltt mg 9.2. No particular computatlOna SImp 1· Sec t'on work has been covere d III 1 . ' . anses ' when we 10CU, r , s on the logistic model wIth GaussIan random flcatlOn • effects. The likelihood function for (3 and
L((3,G; y) ;,;:
IT f fi i",1
{llij((3, Ui) }Yii {1 - llij((3, Ui)}I-Yi J f(U i ;G)dU i ,
j",1
where llij(l3, Ui) = E(Yij lUi; (3). With the logit link and Gaussian assumption on the Ui, this reduces to
.=1
exp
[13
1
~XijYij + U; L dijYij - L log{l + exp(x;j(3 + d;jUi)}] J
J
181
Table 9.3. Regression estimates and stand . ard errors (m parentheses) of random effects and m . I arglUa models fitt d ,e to the 2x2 crossover data for cerebrovQ~cul d fi . ...., and Kenward (1989) a dar, e Clency ' . ad ap t ed from Jones n presented In Table 8.1.
Intercept 1featment Period
G1 / 2
Random effects model
Marginal model
Ratio of random effects to marginal
2.2 (1.0) 1.8 (0.93) -1.0 (0.84) 5.0 (2.3)
0.67 (0.29) 0.57 (0.23) -0.30 (0.23)
3.4 3.3
3,3
a IS
(9,3.5)
IT/
ESPONSES
J
X (271"(1 I G l-q/2 exp( -U;a- 1U;j2)dU i ,
where G is the q x q variance matrix of each Ui' Crouch and Spiegelman (1990) present numerical integration methods tailored to the logisticGaussian integral above. Zeger and Karim (1991) have used a Gibbs sampling Monte Carlo algorithm to simulate from a posterior distribution similar to this likelihood function. In the examples below we use the ~pproximatio~ method by Breslow and Clayton (1993), whi~h has been Implemented III SAS (GLIMMIX) to obtain maximum likelihood estimates and their estimated standard r d ' W " . errors lor count ata regresslOn models, and e use .numencal IlltegratlOn methods implemented in SAS (NLMIXED) to obtam MLEs for binary response models. Example 9.1. (continued) For the 2 x 2 crossover trial b on cere rovascular deficiency we assume a I . t·· ogls ,Ic regreSSIOn model with dd" , random interce t ,_ a ltIve effects of treatment, period, and a tion with varia~ce "'Ie T,B~I + Ui , ~sumed. to follow a Gaussian distribuand GI/2 F ' . a e 9.3 gives maxImum likelihood estimates of (3 . or companson the t bl I obtained by fitting a .' I a e a so presents regression coefficients margma model.
Focusing firs~ on the maximum likelihood estimates, there is clear evidence of substantial heterogeneity among subjects. The standard deviation of the random intercept distribution is estimated to be 5.0 with a standard error of 2.3. By the Gaussian assumption for the intercepts on the logit scale, roughly 95% of subjects would fall within 9.8 logit units of the overall mean. But this range on the logit scale translates into probabilities which range from essentially 0 to 1. Hence, the data suggest that some people have little chance and others very high chance of a normal reading given either treatment. Assuming a constant treatment effect for all persons, the odds of a normal response for a subject are estimated to be 6.0 = exp(1.8) times higher on the active drug than on the placebo. The last column of Table 9.3 presents the ratios of regression estimates obtained from the random effects model and from the marginal model. The three ratios are all close to (0.3466 + 1)1/2;,;: 3.1, the theoretical value discussed in Section 7.4 and in Zeger et ai. (1988). Example 9.3. The pre-post trial on schizophrenia As discussed in Chapter 7, a distinctive feature of random effects models is that natural heterogeneity across subjects is modelled directly through subject-specific parameters. This example serves to illustrate that sometimes random intercepts alone may not sufficiently capture the variation exhibited in the data. Table 9.4 presents results from two random effects models: Model 1 with random intercepts to capture variations in baseline risk among subjects, and Model 2 with random int~rcepts ~d slo~es, the latter for subject variations in changes of risk over tIme. As 111 Sectl?n 8..3, We consider three independent variables: treatment status (xd, tIme 1ll weeks from baseline (X2) and their interaction (xs)· For Modell, the estimate for /33 is -0.877 (SE = 0.349) suggesting that o~ average, the rate of change for the risk of having PANSS ~ 80 is conSIderably lowher l' I 'd0 I group . For example, t e lOr the risperidone group than the h a open
LOGISTIC REGRESSION FOR BINAR Y RESPONSES
RANDOM EFFECTS MODELS 182
. estimates and standard errors Table 9.4. RegresslO~' ffects models for the pre~ (in parenthesp,S) of ran ~m e porLt tr 'lal on sehizophrema. Model Variable Intereept Treatment (xI) Time (X2) XI' X2
G11 G I2
G22
Table 9.5. Regression est.imates and t d n rd err~rs (in parentheses) of random effects models ~ st.: . . f . or e ndoneslan study on respIratory III ectlOn (Sommer et at., 1984). .
t
Model
1
2
Variable
2.274 (0.236) 0.236 (0.690) -1.037 (0.234) -0.877 (0.349) 11.066
2.400 (0.610) 0.319 (0.740) -1.034 (0.430) -1.247 (0.561) 12.167 0.143 3.532
Intercept
G and G22 denote the variances of random intercepts 11 h . and slopes respectively; 0 12 denotes t e covanance between random intercepts and slopes.
Sex Height for age Seasonal cosine Seasonal sine Xerophthalmia Age
risk for each patient in the haloperidol group at week 5 reduces by 17% (0.17 = 1- (6/5)-1.037) compared to that at week 4. However, for a patient receiving risperidone of 6 mg, their risk at week 5 reduces by as much as 29% (0.29 = 1 - (6/5)-1.037-0.877) relative to that at week 4. The estimate (G 11 = 11.07) for the variance of random intercepts suggests a strong degree of heterogeneity for baseline risks. When random slopes are added to the model, ~3 increases in magnitude, from -0.877 to -1.247, representing a 42% inflation. This addition of random slopes is justified by the substantial estimate of their variance, 0 22 = 3.532, as shown in Table 9.4. This extra variation across subjt;.cts is also more accurately reflected in the ~tandard error estimates of the (J's in Model 2. For example, the s.e. for fh III Model 2 (0.561) is almost twice the size of that for /33 (0.349) in Modell. Example 9.4. Respiratory infection in Indonesian preschool children Fitting a random effects. model to the data from the Indonesian study allows ~sf to .address the questIOn of how an individual child's risk for respiratory III eetlOn. would chang: if their vitamin A status were to change. This is accomphhsh:d by allo-:mg each child to have a distinct intercept which repT b resents t elr propensity for i f t' from mod 1 I n ec Ion. a Ie 9.5 gives regression estimates e sana ogous to Models 2-4 in Tabl 8 7 H for correlation by includin r e . '. ere, we have accounted and ' a Gaussian distributl' n ?th ~m llltercepts WhICh are assumed to follow o WI vanance G. Note that the estimates of the r d around 0.7, statistically si nific 1 an. am effects standard deviation are g ant y different from zero but smaller than
183
Age 2
2 -2.2 (0.24) -0.51 (0.25 ) -0.044 (0.022) -0.61 (0.17) -0.17 (0.17) 0.54 (0.48) -0.031 (0.0079) -0.0011 (0.00040)
Age at entry Age at entry 2 Follow-up time Follow-up time 2 GI(2
0.72 (0.23)
-1.9 (0..30) -0.56 (0.26) -0.052 (0.022)
0.57 (0.48)
-0.054 (0.011) -0.0014 (0.00049) -0.20 (0.074) 0.014 (0.0048) 0.71 (0.24)
3
-2.4 (0.37) -0.55 (0.26) -0.049 (0.022) -0.56 (0.22) -0.019 (0.22) 0.69 (0.49)
-0.055 (0.011) -0.0014 (0.00050) -0.085 (0.10) 0.0069 (0.0068) 0.72 (0.24)
in the crossover trial of Example 9.1. Among children with linear predictor equal to the intercept, -2.2, in Model 1 (average age,. ~eight,. fem~e, vitamin A sufficient), about 95% would have a probabIlIty of Illf~ctl~n between 0.03 and 0.31. This still represents considerable hetero.genelty. III · dd of infection assOCiated WIth the propensity for infection. T hereIa t Ive a ~ (054) = 1 7 xerophthalmia (vitamin A deficiency) are estimated to be ~~:'fi 'e in Modell and are not significantly different from 1. The lac 0 S~;I. c~~~s of the effect is due to the small number of xerophthalmia cases ( ) III I
RANDOM EFFECTS MODELS
COUNTED RESPONSES
184
185
.'
I d t Finally the longitudinal age effect illustrative subset of the ofl~m~ a a. I'n Model 2 can be explained, to a . f . tory inlectlOn seen on the flsk 0 resplra I d as shown by fitting Model 3. large extent, by the seasona tren' mong subjects, the regression estimt Because there is l~ss.7ete:o~::e~~r~inal model coefficients in Table 8.7. ates seen above are slhml ar o. I nd random effects coefficients are close Again the ratios of t e margma a to (0.3460 + 1)1/2 as discussed in Section 7.4.
Example 9.5. Epileptic seizures Returning to Example 8,5 we first cons'd '
Conditional likelihood method
'd condl'tl'onal maximum likelihood ,estimation of the ranWe now consl er dom intercept log-linear model for count data. Specifically, we assume that conditional on 'Yi = /30 + Ui ,
(1) Yij follows a Poisson distribution such that
= 1'i + X~j{j + log(tij),
logE(Yij hi)
j = 1, ... , ni;
and
(2) Yil,"., Yin. are independent. Under these assumptions, the likelihood function for f3 and 1'1, .. , ,'Ym is completely specified and is proportional to
IJ {n. +f
n,
m
exp Ii LYij
,_I
+ {j' LYijXij
,
Yiilog( tij) -
J=1
f:
tii exp( 'Yi +
X~ f3)}. .
J
m
i=1
(
.
Y,. Yll"",Yin
ni (
i) II J=1
exp(x~j{3)/
Xi]2
=
tije
x''Jf'J .r.I
Yij
ni t L e=1 x' {j) ,ee
(9.4.2)
'f
ni
L: tif exp(X~e{j) e=1
represents the probabiIitY that gory j', j = 1 n. W each of the Yi. events will fall into 'cate, . , " , . e now Use th d' . . continue the analysis of t h ' e con ItlOnal hkelihood approach to . e seizure data.
e model
+ f33 x 'JIX'J2 . +I ( ) og til ,
{~
if progabl'd e group, . the ith . subject is assigned to the' If the zth subject is assigned to the placebo group,
{~
if j = 1,2,3, or 4, if j = O.
Here, Ii is the exp~cted baseline seizure count for the ith subject, i = 1, .. , , 59. The coeffiCient (32 represents the log ratio of the seizure rates post versus pre-randomization for the placebo group. Note that this is assumed to take the same value for each subject. Similarly, f32 + /33 represents the log ratio of rates for the treated group so that /33 is the treatment effect coefficient. Because Xii = Xi2 = Xi3 = Xi4 and t;o 8 = til + ... + ti 4, the conditional likelihood for {j reduces to
(y ) ( 59
X
The contribution in (9 4 2) fo b' " which .. r su Jed Z IS a multinomial probability in
7':ij = tii
=
(9.4,1)
By conditioning on y',. -- ",n. d" . . L..i=1 Yii as was one III SectIOn 9.2,1, we obtam the followmg conditional likelihood which depends on {3 only:
n .
Xijl
g y:~
i=1
th
where
28
J=1
3=1
er
logE(Y;j IT;) = Ii + f3 IX ijl + f32x2 . . 'J J = 0, 1, ... ,4; Z = 1, ... , 59 ,
9.4 Counted responses
9.4.1
I
exp((3) ) 1 + eXPC(32)
(
Yi
II Yi~
)
(
1=29
Yi. -YiO (
1 ) 1 + exp(/32)
a) )Yi.-YiO exp ((32 + ,v3 1 + exp((32 + (33)
(
Y,o
1
1 + exp(/32
)YiO
+ (33)
Thus, the conditional likelihood reduces to a comparison of two sets of binomial observations with 'sample sizes' 28 and 31. In the placebo group, each subject contributes a statistic YiO/Yi. for estimating the parameter 7fl = 1/{I +exp((32)}, which is the common probabili~y t~at an.in~ividual's seizure occurred before rather than after the randomizatIOn. SImIlarly, the common probability in the progabide group is 7T2 = 1~{1 + exp(/32 + (33)}. Thus, a negative value for (33 indicates that a relatively larger fractIOn of the total seizures in the treatment group occurred before rather than after randomization as compared to the placebo group. In other words, a negative (33 indicates that the treatment is effec~ive: . Table 9 6 gives the conditional maximum likelIhood estlm~tes of f32 and (33 and'their standard errors. With the full data set, ther~ IS mod.est . ., ffi' than the placebo In reducmg eVIdence that progablde IS more e ectlve . ' l' 0 10 ± 0 065). WIth pOSSible out ler the Occurrence of seizures ((33 - - , ' ,
A
_
RANDOM EFFECTS MODELS
COUNTED RESPONSES
186
187
f conditional likelihood analysis Table 9.6. Re>su Its o . (coefficient ± standard error) of seIzure data. Complete data Patient# 207 deleted (32 (33
Pearson 's X2
0.11 -0.10
± 0.047 ± 0,065
289.9
0.11 -0.30
±
(2) the Jii are independent gamma d . variance ,pJi2. ran am vanables with mean Ji and
Then, the unconditional distribution of Y. _. '. I) IS negative bmomial with
0.047
± 0.070
227.1
patient number 207 deleted, a stronger treatment effect is suggested, (/33 = 2 -0.30 ± 0.070). However, the calculation of the Pearson X statistic
E(Y;))
= /l
and
Var(Y: -) I)
= I'~ + -1./12
,/-,,..,.
(9,4.3)
The use of the negative binomial model d at b k Greenwood and Yule (1920) who mOdelled es ~ at least to the work of . over-dispersed accid t The SImplest extension of the negati b" - en counts. ve momlal model is to that the /li depend on covariates x- thro h - assume . . ' ug some parametric function TlIe most common IS the log-lmear model for which . (9,4.4)
*2
where 1h = 1/{I + exp(P2)} and = 1/{I + exp(P2 + P3)}, also reveals that the fitted model is grossly inadequate. For example, with 57 degrees of freedom in the full data, the fitted model is rejected at the 0.01 level. The same conclusion is reached when the outlying individual is set aside. An important implication of this observation is that the estimated standard errors for the elements of /3 may be too small. This may be due to the inadequacy of the assumption that the change in seizure rate is common to everyone within a treatment group. One way to address this possibility is to introduce a random effect Ui2 for the pre-post explanatory variable X2. However, the conditional likelih~Od meth~d would no longer be appropriate since all relevant information a out WIll. be ~onditioned away. We must instead use the random effects approac whIch IS the topic of the following sub-section.
(3t
9.4.2
Random effects models for counts
The Poisson distribution has a 1 . . but in biomedical ap pl' . . ~ng tradItIon as a model for count data, is implied by the p . IcatlOns It I~ rarely the case that Var(Y) = E(Y) as OlSSon assumptIOn T . II ' mean (Breslow 1984) A d' " yplca y, the vanance exceeds the , . s Iscussed III St' 9 3 over-dispersion can be ex I' db ec IOn .. 2 for binomial data, this . p ame Y assumin th t th ' genelty among the expected res 0 g a :re IS natural heteroassumed to follow a gamm d' p ~bses. across observatIOns, If the means are " a Istn utlOn the . I d' . . margma IstnbutlOn of the counts IS the negative binomi I d' t 'b: arises from the assumptions t~at IS n utlon. Specifically, this distribution (1) conditional on /I- tl . """ Ie response variabl v . WIth mean Iti; e I ij has a POIsson distribution
One i~portant limitation of this model for application to longitudinal data I~ t~at th: explanatory variables in the regression above do not vary wlthm. sub.Jects. Morton, (1987) proposed a solution to this problem. In the longltudmal context, If we once again let v_ 1,1, ... , Y.-,,,. deno t e th e counted responses from the ith subject, Morton (1987) assum'ed that (1) conditional on an independent unobserved variable f E(Y,-)-I f t-) . " ( ,ij (3) fi,J=I, expx ... ,ni;
(2) Var(Yij lEi) = ¢>E(Y;j If;); (3) E(fi)
= 1 and Var(Ei)
= 17 2 .
Note that assumptions (1), (2), and (3) imply that E(Y;j) = exp(x:l-1) = /lij and Var(Y;j) = ¢>/lij + (j2/ltj . Morton (1987) extended this approach to include more complicated nesting structures as well. He used a quasilikelihood estimation approach, which is similar to GEE, for estimating (3,,p, and 17 2 . An attractive feature of this model is that it is not necessary to specify the complete distribution of fi, only its first two moments. The model that is the focus of the remainder of this chapter adds the random effects on the same scale as the fixed effects as follows:
(1') logE(Y;j lUi) = X:j{3 + d~jUi; (2') given U i , the responses Y;l,"" Yin. are independent Poisson variables with mean E(Y;j !Ui);
(3') the U i are independent realizations from a distribution with density function f (U i; G), This second approach allows the contribution of the random ~ffect~ to vary within a subject, that is, d ij need not be constant for a gIven z. If this flexibility is needed, the second approach is to be prefer~ed, In fact, . 1 e of (1') WIth d- - = 1 the expression (1) is readily seen as a specm cas . IJ and U i = log(Ei)' On the other hand, in order to make mferences about
RANDOM EFFECTS MODELS
FURTHER READING
188
189
ecify a distribution for the U i· The following d ta P ~ d G.' one sregressIOn . Wit . h G aUSSIan . random fJ an . nee Il th se of poisson sub-section Illustrates e u . effects by re-analysing the seizure data. 9.4.3
Table 9.7. . Estimates and stand ar d errors C effects POIsson regression models fi III parentheses) for random without patient number 207. tted to the progabide data with and
poisson-Gaussian random effects models
Example 9.5. (continued) . t' e fit two models to the progabide data which differ only In t hIS sec Ion w . . on how the random effects are incorporated. Modell IS a log-lInear model with a random intercept. In Model 2, we add a second random effect for the pre/post-treatment indicator (X2) so that logE(l'ij lUi)
= 130 + 131 Xijl + 132 x ij2 + 133 x ijl x ;j2 + Uil
+ Xij2Ui2 + log(tij) , where Ui == (Vii, Ui2) is assumed to follow a Gaussian distribution with mean (0,0) and variance matrix G with elements
(g~~ g~~). The inclusion of Un allows us to address the concern raised at the end of ~ection 9.4.1, that there might be heterogeneity among subjects in the ratIO of the expected seizure counts before and after the randomization. Th~ degree of heterogeneity can be measured by the magnitude of G 22 , the ~arI~nce of Ui2 .. Both mo?els were fitted using the approximate maximum hkelihood algOrithm outlined in Section 9.2.2. !able 9.7 presents results from fitting Models 1 and 2 with and without patient num?er 207. As expected, the results from M~del 1 are in close WIth those.from th e cond't' agreement h' H I lOnal approach given in Section 9.4.1. owever, t IS model IS refuted b th t . . in Model 2 . th yes atlstlcal significance of G 22 which ' error 0.062. usmg e complete dat a we es t'Imate to be 0.24 with standard Focusing on the results £ M or odel 2 fitted to the complete data, subjects in the placeb o group have expect d . e seIzure rates after treatment which are estimat d t b (exp(~2) = exp(0.002) =e1.00~) ~o:o~hlY the .same as before treatment are reduced after treatment b . b e progabIde group, the seizure rates !Ier;ce, the treatment seems t~ ~ out 27% (1 - exp(0.002 - 0.31) = 0.27). IS (33 = -0.31 with a stand d ave a modest effect: the estimated effect the moment patient numb ar207error of 0.15. Finally, if we set aside for e r , is whoif had. u ~usua IIy h'Igh seIzure . then the eVI'd ence that progabide ~ _ rates, 3 d-' -0.34 ± 0.15. The analysis 't~ ectlve IS somewhat stronger, with an IS carried out in order t wdl out patient 207 is only exploratory, overall result s. p atrent . 0 un erstand thO " s influence on the number 207 h IS patIent . seiZure Counts an d perhaps h a s ' as been ident·ft 1 ed b ecause of unuSU al speCial medical problems.
9.5 Further reading Throughout this chapter, we have used the Gaussian distribution as a convenient model for the random effects. When the regression coefficients are of primary interest, the specific form of the random effects distribution is less important. However, when the random effects are themselves the focus, as in the CD4 + example in Chapter 5, inferences are more dependent on the assumptions about their distribution. Lange and Ryan (1989) suggest a graphical way to test the Gaussian assumption when the response variables are continuous. When the response variables are discrete, the same task becomes more difficult. Davidian and Gallant (1992) have recently developed a non-parametric approach to estimating the random effects distribution with non-linear models. The statistical literature on random effects GLMs has grown enormously in the last decade. Key papers in the biostatistics literature include: Laird and Ware (1982); Stiratelli et al. (1984); Gilmour et af. (1985); Schall (1990); Zeger and Karim (1991); Waclawiw and Liang (1993); Solomon and Cox (1992); Breslow and Clayton (1993); Drum and McCullagh (1993); Breslow and Lin (1995) and Lin and Breslow (1996). These papers also provide useful additional references.
GENERAL 191
for suitable functions frO, and
vC: == v( /le)A. .)
"'ij
(10.1.2)
,+"
where h and v are known link and vanance " f ' unctIOns d t . A . e er~\Iled from the specific form of the density function ab details on GLMs. ave. ppendlX A.5 gives additional
10
Transition models
In wor~s, the transition model expresses the . " as a functIOn of both t h ' conditional mean /lc e CQvanates :c.. ' and )of'th.e past respon f)
Yij- q' P ast responses or functions th f .' ses as additional explanatory variables W erleo are SImply treated . e assume t lat the past affects the present t h rough the sum of s terms each f h' h . 1 Th £ . , 0 W IC may depend on the .1- f . .Illustrate . q prior va ues. e lollowmg examples with d'ffi ·. I erent I'IlUl. unctions t h e range 0 f transition models which are available. Linear link - a ?inear regression with autoregressive errors for Gaussian data (Tsay, 1984) IS a Markov model. It has the form Ytj -I, ... ,
' h t r considers extensions of generalized linear models (GLMs) for ThlScape h I " 'b' the conditional distribution of eac response YiJ as an exp lClt descrJ JDg . W '11 t: function of past responses Yij _ 1, ... , Yil and covanates x ij . e WI. LO~uS on the CaBe where the observation times tij are equally spaced. To slmphfy notation we denote the history for Bubject i at visit j by'Hij = {Yik, k = 1, ... Ii : I}. As above, we will continue to condition on the past and present values of the covariates without explicitly listing them. The most useful transition models are Markov chains for which the conditional distribution of Yij given ?tij depends only on the q prior observations Yij-lt.", Yij-q' The integer q is referred to as the model order. Sections 10.1 and 10.2 provide a general treatment of Markov GLMs and fitting procedures. Section 10.3 deals with categorical data. Section lOA briefly discusses models for counted responses which are in an earlier stage of development than the corresponding models for categorical data. 10.1 General As d~8:ussed .in ~ection 7.3, a transition model specifies a GLM for the condltlO~~ dlstnbution of Vij given the past responses 'H,.. The form of ' ~J the conditIOnal GLM is f(Yij I'H ij ) == exp{[YiAj -1j;(Oij)]j¢ + C(Yij, ¢)},
I
J
fJ
Vij = :Ci/f3
+~ ""' ar(Y,)· -r -
v5 == Var(Yij l'Hij ) = '1/1" (Oij )¢.
We will consider transition models satisfy the equations where the conditional mean and variance
+ Z·
f)'
where the Zij are independent, mean-zero, Gaussian innovations. This is a transition model with h(J.lc.) == Jl.cf)' v(lJ....)c ) == 1, and f,r -- a r (y I..J - r I lJ Xij-r (3). Note that the present observation, Vij, is a linear function of:c and of the earlier deviations Vij-r - :C~j_r{3, r == 1, ... , q. 'J Logit link - an example of a logistic regression model for binary responses that comprises a first-order Markov chain (Cox, 1970; Korn and Whittemore, 1979; Zeger et ai., 1985) is o
logit Pr(Y,j == 111tij) = :cd {3 + aYij -I, previously given as equation (7.3.1). Here C
.
C
h(J.lij) == lOglt(Jtij) = log
J.l5 ' (1 -J.lijC)
and
fr('Hij, a) and
:c'f ). - r {3)
r=l
(10.1.1)
for. known functions 1j;(() ij) and ( ) variance are c Yij, ¢. The conditional mean and
/If· == E(Y. . I'Hij) == 1j; (()ij)
q
= arYij-rt
s
=q =
1.
A simple extension to a model of order q has the form q
logit Pr(Vij = 111ti j) = Xi/f3 q
+
L
QrYij-r'
r=1
The notation f3 indicates that the value and interpretation of the q regression coefficients changes with the Markov order, q.
,
TRANSITION MODELS
FITTING TRANSITION MODELS
me a log-linear model where Yi j Log-link - with count dat,a w,e c~n assZu and Qaqish (1988) discussed P 'sson dIstrIbutIOn. eger h • 11 . I _ {I g(y~. ) - Xij-l '(3} , were Yij == given 'Hij fo ows a 01. a first-order Markov cham wIth I - a 0 .)-1 ( ., d) 0 < d < 1. This leads to
In the linear model we assume that Yo. en '1..J . 'b ' If ~tl, ... , vI iq are also multivariate . 'J ",.venG Ilij .follows a Gaussian dlstn utlon. . weakly stationary th ausslan. and the h v ij IS ance struc t ure £or teL ' . covari. f . ' e marginal dlstnbutlOn . ) b I(Yij,.·., Y,q can e ully determmed from the cond't' I d' 'b . . h aut add"ltionaI unknown parameters H IlOna mode1 Wit f II .Istn utlOn . lIke. " , ence, u maxImum lIhood estlmatlOn can be used to fit Gaussian aut . oregresslve models . See Tsay (1984) and references therein for details. In the logistic and log-linear cases f(Y'1 y. ) I'S n t d t . d fr • , " .. " ' q a e ermme am the GL~ assumptlOn abou~ th: conditional model, and the full likelihood is unavaIlabl:. ~n alternative IS to estimate f3 and Q by maximizing the conditional lIkebhood
192
m"" y.,>
'"g ~ E(Y;; [1t,,) ~ exp(x,/13) (exP~::;~'13))
0
- 0 from being an absorbing state whereby when a > 0, we have The constant d prevents Yij-I . = 0 forces all fut ure responses t 0 bONate e '. YiJ-;1 1/9 , when the prevIOUS outcome, Yij-l, exceeds an Increased expect a t'on I ''''') I I , (,I) Wh < 0 a higher value at tij-I causes a ower va ue exp(roi .1-11-' ' en a ,
mod~l
at tWithin the linear regression model, the transition can be formulated with Ir = ar(Yij-r - roij-l'(3) so that E(~j! = .Xij (3 whatever the value of q. In the logistic and log-linear cases, It IS dlffic~1t to formulate models in such a way that f3 has the same meaning for dIfferent assumptions about the time dependence, When f3 is the scientific focus, the careful data analyst should examine the sensitivity of the substantive findings to the choice of time-dependence model. This issue is discussed below by way of example, Section 7.5 briefly discussed the method of conditional maximum likelihood for fitting the simplest logistic transition model. We now consider estimation in more detail. 10.2 Fitting transition models
As indicated in (7.5.3), in a first-order Markov model the contribution to the likelihood for the ith subject can be written as
II I (Yij IH ij ).
In a Markov model of order q, the conditional distribution of ~. is .)
l7tij ) ==
f(Yij IYij-I, ... , Yij-q),
so that the likelihood contribution for the ,;th b' t b ' su Jec ecomes n,
f(Yil"",Yiq)
II j=q+I
II I(YiQ+l,""
m
= II
Yini IYiI, ... , Yiq)
i=I
n,
II
i=lj=q+l
f(Yij IH ij ).
m
n,
L L
8
C
:~j vf.1-
1
The GLM (10.1.1) specifies anI th d' . the likelihood of th fir t Y e c?n ItJonal distribution f(Yi.1 l7-£i.1); . e s q ObservatIOn f(y·,1, ... ,Yiq ).IS not spec iiied dIrectly.
(10.2.2)
(Yi.1 - /-tm = 0,
where 6 = ((3 Q). This equation is the conditional analogue of th~ GLM . A ppend'IX A.. 5 The derivative'Il8/-tij/85 IS anaScore equation, discussed III £ ulate the
logous to Xij but it can depend on 0: and (3. We can Stl o~~ ws Let estimation procedure as an iterative weighted least squares as 0 dO c't Itij row 1 S l'i be the (ni - q)-vector of responses £or J. -- q + 1,. 't': n·• an 'th kth expectation given H ij . Let X; be an (ni -q) x (p+s)) ma(nrIX_Wql) x (n' _q) • / C k - 1 n· -q an, 1 8/-tiq+k/86 and Wi = dmg(l vik+q' - , ... , • .. (Yi _ C). Then, . F'IIIally, let Z·• = X.'t5.+ Z• onIt.X· using d·lagonal weighting matrIX. an updated ~ can be obtained by iteratively regressmg A
f(Y··IY·· ) 'J ,)-1"",Yij_q'
(10.2.1)
When maximizing (10.2.1) there are two distinct cases to consider. In the first, fr(H ij jQ,f3) = arlr(Hi.1) so that h(J.l5) == roi/f3 + L:=l ar fr(7-£ij). Here, h(/-tB) is a linear function of both f3 and Q == (ab"" as) so that estimation proceeds as in GLMs for independent data. We simply regress Yij on the (p + s )-dimensional vector of extended explanatory variables (Xij, It (H i.1 ), ... ,Is (H i.1 )). The second case occurs when the functions of past responses include both Q and (3. Examples are the linear and log-linear models discussed above. To derive an estimation algorithm for this case, note that the derivative of the log conditional likelihood or conditional score function has the form
i=lj=Q+l
j=2
f(Yij
m
SC (6) =
n,
Li (Yn, ... , Yin.) == I (Yil)
193
A
£ th conditional mean and variweights W. When the correct model is assumed or t' ~IY as m goes to infinity, ance, the solution ~ of (10.2.2) asympto IC al to the true value, t5, and follows a Gaussian distribution with mean equ
TRANSITION MODELS FOR CATE
TRANSITION MODELS 194
V" = ('f:,Xt'Wi x ;*)-1
(10.2.3)
,:1 • IT d ds on a and o. A consistent estimate, VS ' is obtained The varlance v8 epen fJ . a d by their estimates f3 and o. Hence a 95% confidence by rep1acmg fJ an a " . . a1 C (3 • (3A ± 2 where V.. is the element m the first row mterv lor 1 IS I V V 811 ' °u A
In the regression setting we model th t .. . f' ' e ransltlOn prob a b'l' . 0 covanates Xi' = (1 x'' I Itles as functlOns J "Jl,Xij2, ... ,Xi') Avery I ~ separate logistic regression for Pr(Yij := 1rV;_ = genera _model uses IS, we assume that J I Y'J)' Yij - 0,1. That
A
q;;-
and column of V". . .. . Ii the conditional mean is correctly speCIfied and the condItIOnal vanance is not we can still obtain consistent inferences about t5 by using the robust vari~nce from equation (A.6.1) in Appendix A, which here takes
and logit Pr(Yij = 11 Yij-I
(f,Xt'WiXt) (f:X;'WiViWi X:) (f Xt'Wi x ;)-1 -1
,:1
,=1
(10.2.4) A consistent estimate VR is obtained by replacing Vi = Var(Yi l1td in the equation above by its estimate, (l'i - Mic)(Yi - Mi C )', Interestingly, use of the robust variance will often give consistent confidence intervals for 6 even when the Markov assumption is violated. How!lver, in that situation, the interpretation of 1; is questionable since JL5 (IS) is not the conditional mean of Y:,'J given 'IJ, , ' t'J' 10.3 Transition models for categorical data This section discusses Mar kov ch" . am regreSSIOn models for categorIcal responses" observed at equ II d . b' a y space mtervals. We begin with logistic deIs lOr mary responses rna . mult'n . al d and then b nefly consider extensions to 1 o~m an ordered categorical outcomes. As dIscussed in Section 7 3 fi' . . characterized by th t " ., a. rst-order bmary Markov cham IS e ransltlOn matnx
(
11"00 11"10
11"01) 1I"1l '
logit Pr(Yij =
11 Yij-l
=
YiJ'-I)
== x',a + y"'J- IX',,,, 'JfJ o 'J .... ,
(10.3.1)
so ~hat f31 .= .f3o + o. Equation (10.3.1) expresses the two regressions as a smgle lOgiStIC model which includes as predictors the previous response Yij-I as well as the interaction of Yij-I and the explanatory variables. An advantage of the form in (10.3.1) is that we can now easily test whether simpler models fit the data equally well. For example, we can test whether 0:: = (ao,O) so that in (10.3.1) Yij-lXi/O: = O:OYij-l' This assumption implies that the covariates have the same effect on the response probability whether Yij-l = a or Yij-l = 1. Alternately, we can test whether a more limited subset of a is zero indicating that the associated covariates can be dropped from the model. Each of these alternatives is nested within the saturated model so that standard statistical methods for nested models can be applied. In many problems, a higher order Markov chain may be needed. The second-order model has transition matrix Yij Yij-2
0 0 1
1 where 7I"ab == Pr(y;" == blY:' _ probability that ' / _ 1 'hJ - 1 - a), a, b = 0,1. For example ?f01 is the L 'J W en the . ' that each row of a transition rna . prevIOUS resp~nse is Yij-l = 0. Note a) + Pr(y;'J' == 11 Y ) tnx sums to one smce Pr(Y:' = 0 I Y:' 1 = 'J-I = a == 1. As its '. '3 'Jname Imphes, the transition matrix
= 1) == X~jf3l,
where f30 and f31 may differ. In words, this model assumes tht th IX '11' a e euects . bl a f exp1ana.t ory vana es WI dIffer depending on the prevIous . response. A more conCIse form for the same model is
the form
,=1
195
records the probabilities of making e h f . one visit to the next. ac 0 the pOSSible transitions from
(p + 8) X (p + 8) variance matrix
VR;:;:;
GORICAL DATA
Yij-l
0 1 0 1
0
1
1rooo
11"001
11"010
11"011
11"100
11"101
11"110
7l'1l1
"V b)' for example 11"011 is the H ere, 11"abc = Pr(Yij = C IYij-2 = a, L ij-l , _ B al with probability that Yi J' = 1 given Yij-2 = 0 and Yij-l - 1. fiy;n ogy ate · could now t lour separ t h e regression models for a first-or d er ch aIll, we
TRANSITION MODELS FOR CATEGORICAL DATA
TRANSITION MODELS 196
~ ossible histories (Yij-2, Yij-d, ach of the lOur p ffi' t a f3
~
logistic regressions, one or e d (1 1) with regression cae Clen ~ 1-"00'. 01' an it is'agalll . more convenient to wnte a smgle name Iy (0 , 0) , (0 , 1) '.(1,0), I But {3 10' a nd (3 11' respectIve y. equation as follows logit
Pr(Yij =
11 Yij-2
--
--
Y'J-2,
Yo-I 'J
= X ij f3 + Y-')-- IX-'J-0'1 + y"'J- 2X 'J- -02 I
I
I
= Yij-I)
10.3.1
+ Y'j-I -- y-'J--2XiJ-03'
(10.3.2) .
a,z' .
~~~~~rr;.~s~~;~~:ene~ndst~is~ituationp' l:e :u:tg~~l~t:~~s:i~~t:et~r~~:~~~ models of different or er. exam , rOT
model which can be written in the form
= 11 Yij-3 = Yij-3, Yij-2 = Yij-2, Yij-l =
Yij-d
= X~jf3 + O'tYij-1 + 0'2Yij-2 + 0'3Yij-3 + 0'4Yij-lYij-2
+ 0'5Yij-lYij-3 + 0'6Yij-2Vij-3 + 0'7Yij-IYij-2Yij-3'
(10.3.3)
A second-order model can be used if the data are consistent with aa = a5 = <16 = <17 = 0; a first-order model is implied if aj = a for j = 2, ... ,7. As with any regression coefficients, the interpretation and value of 13 in (10.3.3) depends on the other explanatory variables in the model, in particular on which previous responses are included. When inferences about 13 are the scientific focus, it is essential to check their sensitivity to the assumed order of the Markov regression model. As discussed in Section 10.2, when the Markov model is correctly specified, the transition events are uncorrelated so that ordinary logistic regression can be used to estimate regression coefficients and their standard errors. However, there may be circumstances when we choose to model Pr(Yij I fij-I, ... , fij-q) even though it does not equal Pr(Yij IHij)' For example, suppose there is heterogeneity across people in the transition matrix due to unobserved factors, so that a reasonable model is Pr(fij
intercept U makes the t:ansitions for a person correlated. Correct inferences about the populatIOn-averaged coefficients can be d . th . . Section 7.5. rawn uSlllg e GEE approach descnbed III
Indonesian children's study example
I
r and Y-')- 1 , we obtamUTf3 00 = 13; 'a t values lor Yij-2 ld By plugging in t he dIlleren . and = f3 + 0'1 + 0'2 + 0'3. vve wou {301 = {3 + al; f310 = 13 + f3l l odel fits the data equally well so h t e parsImonIous m again hope t a a mar f the a _would be zero. that many of the com~onents 0 f (10 3'2) occurs when there are no interAn important speCIal case 0 .. d __ and the explanatory t esponses Y- '-I an y,)-2, h actions betwee~ t e pas r t' ;~he a _are zero except the intercept variables, that IS, when all elemen so aae'ct the probability of a positive . th revious responses 11' term. In thIS c:e, ffi et p f the explanatory variables are the same regardless
logit Pr(Yij
197 i
= 11 Y;J-I = Yij-I, U;) = (/30 + Ui) + x~}f3 + aYij-l,
where Ui '" -:r~O,0"2). We may still wish to estimate the populationaveraged transItIon matrix, Pr(Y;j IY;j-l = Yij-I). But here the random
We illustrate the Markov models for binary longitudinal data with the respiratory disease data from our subset of 1200 records from the Indonesian Children's Health Study. Ignoring dependence on covariates for the moment, the observed first-order transitions are as presented in Table 10.1. The table gives the number and frequency of transitions from the infection state at one visit to the next. These rates estimate the transition probabilities Pr(Yij IYij-t). Note that there are 855 transitions among the 1200 observations, since for example we did not observe the child's status prior to visit one. For children who did not have respiratory disease at the prior visit (Yij-I = 0), the frequency of respiratory infection was 7.7%. Among children who did have infection at the prior visit, the proportion was 13.5%, or 1.76 times as high. An important question is whether vitamin A deficiency, as indicated by the presence of the ocular disease xerophthalmia, is associated with a higher prevalence of respiratory infection. Sommer et al. (l9~4) have demonstrated this relationship in an analysis of the entire IndonesIan data set which includes over 20 000 records on 3500 children. In our subse~, we can form the cross-tabulation shown in Table 10.2 of xeropht~alm~a and respiratory infection at a given visit, using the 855 observatIOns III Table 10.1.
Table 10.1. Number (frequency) of transitions from respiratory disease status Y; - at visit j - 1 to disease status Y;j a~J;i~it j for Indonesian Children's Health Study data.
Yij I
0
1
0
1
721 (0.923) 64 (0.865)
60 (0.077) 10 (0.135)
781 (1.0)
74 (1.0) 855
TRANSITION MODELS
TRANSITION MODELS FOR CATEGORICAL DATA 199
198
This exploratory analysis suggests a model
tabuIation of respiratory o 2 Cross-ero for Table 1 . ' . hthalmia status Xij disease l';:j ag~mst x, ~alth study data from Indonesian chIldren s visits 2 to 6.
0
1
748 (0.920) 37 (0.881)
65 (0.080) 5 (0.119)
Xij
0 1
logit Pr(Yij
= 11 Yij-l = Yij-r) = a:;j{3 + 0Yij_I'
Table IDA presents logistic regression results for that model and a number of others. For each set of predictor variables, the table reports the
Table 10.4. Logistic regression coefficients, standard errors (within round parentheses), and robust standard errors (Within square parentheses) for several models fitted to the 855 respiratory disease transitions in the Indonesian children's health study data.
813 (1.0) 42 (1.0) 855
Model Variable
Tabl 103 Cross-tabulation of current respirato? . e •. y;. a ainst xerophthalmia Xij and pr~vldisease status lJ. g t Y; . for the Indonesian ous respiratory disease sta us lJ-l children's study data from visits 2 to 7.
Xij
0
1
0
688 (0.925) 33 (0.892)
56 (0.075) 4 (0.108)
1
Yij-l
744 (1.0) 37 (1.0) 781
0
1
60 (0.870) 4 (0.800)
9 (0.130) 1 (0.200)
69 (1.0) 5 (1.0) 74
=0
The frequency of respiratory infection is 1.49 = 0.119/0.080 times as high among children who are vitamin A deficient. But there is clearly correlation among repeated respiratory disease outcomes for a given child. We can control for this dependence by examining the effect of vitamin A deficiency separately for transitions starting with Yij-l = 0 or Yij-l = 1 as in Table 10.3. Among children free of infection at the prior visit, the frequency of respiratory disease is 1.44 = 0.108/0.075 times as high if the child has xerophthalmia. Among those who suffered infection at the prior visit, the xerophthalmia relative risk is 1.54 = 0.200/0.130. Hence, the xerop~thalmia effect is similar for Yij-l = a or Yij-l = 1 even though Yij-I IS a strong predictor of Yij.
Intercept
Current xerophthalmia (1 = yes; 0 = no)
1
2
3
-2.51 (0.14) [0.14)
-2.51 (0.14) [0.14]
-2.85 (0.19) [0.18]
-2.81 (0.18) [0.17]
0044 (0.50) [0.54]
0.40 (0.55) [0.51]
0.42 (0.50) [0.53]
0.79 (0.58) [0.49]
0.78 (0.52) [0.53]
-0.024 (0.0077) [0.0070)
-0.023 (0.0073) [0.0065J
Season (1 = 2nd qtrj 0= other) Yij-l
0.61 (0.39) [0.41)
Yij-l
0.11 (1.3) [1.1]
Yij-l
by age
Yi j - 1 by season
5
-2.44 (0.13) [0.14)
Age - 36 (months)
by xerophthalmia
4
0.62 (0.37) [0.39)
1.23 (0.29) [0.29]
(0.28) [0.27J
0.82 (0.47) [0.44)
0.62 (0.38) [0040)
-0.11 (1.4)
[1.1] 0.00063 (0.029) (0.024) -1.24 (1.2) [1.1J
1.11
TRANSITION MODELS FOR CATEGO
TRANSITION MODELS
200
m (10.2.3) reported by ordinth tandard error fro d fi , regression coeffielent, e s . nd the robust standard error e ned , ary logistic: regression procedures, a by (10.2.4). . k f' c ction using only xerophthalmIa , d l I' ·t8 tlw fjS 0 mle .' f'''' The first mo c prec IC ' . Table 10.2. The frequency 0 mll~ctlOn status and should reproduce I" Table 10 2 is 8JJ% whieh equals a:Uong children without xeropht~a mla ~n44 is the i~tercept for Model 1 in exp( -2.44)/{I + exp( -2.44)!, : t~ ~ed from Table 10.2 is Table 10.4. The log odds ratiO ca cu a, log{(O.l19/0.881)/(0.080/0.920)}
:=
0.44,
• c hthalmia in Model 1 of Table 10.4. The I' h 's the coeffiCient lor xerop d '. . . formulation is that standar errors are w 11C I , advantage of the logistic regresSIOn readily calculated. I' d . t Model 2 allows the association between xeropht.ha mIa an res~lra~? ong children with respiratory disease at .the pnor. VISit . t d' a dIseflSe a Il1er am sition rates from Table 10.3. For children Without an d repro duces th e tran . h' infection at the prior visit (Yij-I = 0), the log odds ratIO for xeropht almla calculated from Table 10.3 is log{(0.108/0.892)/(0.075/0.925)} = 0.40, which equals the coefficient for current xerophthalmia in Model 2. The log odds ratio for children with infection at the prior visit is 0.40 + 0.11 = 0.51, the sum of the xerophthalmia effect and the xerophthalmia-by-previousinfection interaction. A comparison of Tables 10.2 and 10.3 indicates that the association of xerophthalmia and respiratory infection is similar for children who did and did not have respiratory infection at the previous visit. This is confirmed by the xerophthalmia-by-previous-infection interaction term in Model 2 which is 0.11 with approximate, robust 95% confidence interval (-2.1, 2.3). In Model 3, the interaction has been dropped. If the first-order Markov assumption is valid, that is, if Pr(Yij I'Hij) = Pr(Y;j IY;j-d, then the standard errors estimated by ordinary logistic regression are valid. The robust standard errors have valid coverage in l~rge samples even when the Markov assumption is incorrect. Hence a Simple check of the sensitivity of inferences about a particular coefficient to the Mar~o: assumption is whether the ordinary and robust standard errors are Similar. They are similar for both coefficients in Modell. Models ~' 2, and 3 illustrate the use of logistic regression to fit simple ~arkov ch~ms. The strength of this formulation is the ease of adding additlhonalpredlctors to the model. This is illustrated in Models 4 and 5 where , t e child's age and a b' . ' d' y m Icator of season (1 = 2nd quarter 0 = other) h ave been added In mal Mod 1 4 fi 11' ' . the same fi: e , we t a mteractions with 'Yij-l which IS as ttlng. separate logistic regressions for the cases Yi' -1 = 0 an d 1. None of the mtera t' .h ". J are dropped in Model 5, c Ions Wit pnor mfectlOn are important so they
RICAL DATA
201
Having controlled for age season d . .' . ' , a n respiratory' ~ . . VISit III Model 5, there is mild eviden . h In ectlOn at the pnor ce III t ese data ~ . . between xerophthalmia and respirato ' " . . or an asSOCiatIOn . ' . ry Illlec:tlOn' the x hth I . ficlCnt IS 0.78 With robust standard error 0 53 ' . erop . a mla coefis important to check the sensitivity of th' .: A~ .wIth any regression, it . . . e SCIentIfic findmgs t h' f model. With transItIon models , we must. C'1leC . k Whet her th . 0 ' COWl' . 0 enees about f3 change with the model for th t' d . e regresSIOn mfere Ime ependence 1i '11 we add Y;J-2 as a predictor to Model 5 Be' _ . ' 0 I ustrate. '.,.. . . cause we are uSlIlg tw as predIctors, only data from visits 3 throu h 7 I .~) ~)fIOI \ ISlts . .. . g are re evant glvlIlg a total . of 591 tranSitIOns. Interestmgly, the inclusion of Y,.. d, . ')-2 re uces the mfluence of season an d y.ij -1, and lllcreases the xerophth' I ' ffi . ' . a nlla COl' C1ellt to 1 73 Controllmg for respiratory infection at the two prl' . 't 1 " ' . '. or VISI s near y doubles the xerophthalmia coeffiCient. This example demon t t . . sra .es an E'BsentIal fea. . ture .of tranSitIOn models: explanatory variables' (e '. g xerop hth . ) and . a Imla prevIOus responses are treated symmetrically as predictors of the currE'nt response. Hence, as the time-dependence model changes, so might inferences about the explanatory variables. Even when the time dependence is not of primary scientific interest, the sensitivity of the regression inferences must be checked by fitting a variety of time-dependence models. 10.3,2
Ordered categorical data
The ideas underlying logistic regression models for binary responses carry over to outcomes with more than two categories. In this sub-section, we consider models for ordered categorical data, and then briefly discuss nominal response models. The books by McCullagh and NeIder (1989, Chapter 5) and Agresti (1990, Chapter 9) provide expanded discussions for the case of independent observations. Agresti (1999) provides an overview of methods for both independent and clustered ordinal data. To formulate a transition model for ordered categorical data, let Yij indicate a response variable which can take C ordered categorical values, labelled 0, 1, ... , C - 1. An example of ordered data is a rating of health status as: poor, fair, good, or excellent. While the outcomes are ordered, any numerical scale that might be assigned would be arbitrary. The first-order transition matrix for Yij is defined by 1ra b = Pr(ljj = b IYij-l = a) for a, b = 0,1, .. " C - 1. As with binar~ data (C ~ 2), a saturated model of the transition matrix can be obtamed by fit~lllg a . for each of t he C POSSI'ble values of 1':. I· That IS , •we separate regressIOn 1)model Pr(Y: . = b I Y: '-1 = a) separately for each a = 0,1, ... ,C - 1. With tJ tJ t' l odds model (Snell , ordered categorical outcomes, we can use a p~por w~a 1964 and McCullagh 1980) which we now brIefly reVI€W. b' · ' . Ii -d' al data are usually ar Itrary, Smce the response categones or 01 III . t fficients have the same III erwe would like a regression mode I whose coe . . db k' . categorIes. . This IS achieve y war Illg pretation when we combine or splIt
TRANSITION MODELS FOR CATEGO
TRANSITION MODELS 202
,. , Y < ) rather than the cell prohabilwith the cumulative probabilIties P~( - ~abilitiP-s, we can derive the cell the cumulative pro C' · ities, Pr(Y = a). GIven _ (Y < a-I) + Pr(Y = a), a = 1, ... , ' '1" . Pr(Y < a) Pr , h th C probabl Itles SInce ~ inde endent observations as e lorm P The proportional odds model or Pr(Y ~ a) logitPr(Y ~ a) = log Pr(Y > a)
Pr(Y;J:SblY;j-l=a)=1fao+1fal+"'+~ log Pr(Y;j :$ b I1';j-1 Pr(Y;j > b I1';j-1
°
Pr(Y;j:$ b I1';j-1 ogP(y' r ij > blY:ij-I
I
.
(.
,
a=
0,1, .. , , c
V;
- 2.
Here, each is allowed to have a different intercept but the proportional odds model requires that covariates have the same effect on each Ya•• Our first application of the proportional odds model to a Markov chain for ordered categorical responses is the saturated first-order model without covariates. The transition matrix is 1fab = Pr(Vij = b IYij-I = a), a = 0,1, ... , c - 1. We model the cumulative probabilities,
Table 10.5. Definition of Y· variables for proportional odds modelling of ordered categorical data. V }'/ o
y.1
° 1 1
1
2
°1
o o
(10.3.4)
= a) = a)
=(Jb+X"'~ a I ) fJa
(10.3.5)
for a = 0, ... , C - l', b = 0 , ... "C - 2 As wI'th b'mary responses we c~n rewrit: (1O.~,5) as ~ single (although somewhat complicated) re~res swn equatIOn usmg the mteractions between Xij and the vector of derived 'bles Yij-I • . • vana - (Yij-I.O,"" Yij-I,C-2)
(10,3.6)
°
= 10glt Pr Ya = 1) = Oa + X 13,
_ - 9ab
x·· have a different effect on y..
Following Clayton (1992), it is convenient to introduce the vector of variables Y" = (Yo",~", ... ,YC- 2 ) defined by Ya" = 1, if Y s:: a and otherwise. If C = 3, Y" = (Yo", Yt) takes values as shown in Table 10.5. The proportional odds model is simply a logistic regression for the Y a" , since Pr(Y $ a) (Y ) Pr > a
= a) = a)
203
.. .,~ , h ....,.,umlllg t at
for a = 0,1, . , . , C -1 and b = ' 1,.,., C' - 2. Now suppose th t . C h .' a covanatE'B 'J. I ) lor eae prevIous state Y. . Th d I can be wntten as I)-I· e mo e
= ()a + x'f3,
l e - 2 Here an d for the remainder of this section,, we h were a = O, ,.:., . () and do not include an intercept term m x, write the model mtercepts asp; < a) = e-Oa /(1 +e-Oa). Since Pr(Y s:: a) is () < () < ... < ()C-2' If ()a = ()a+l, Taking x = 0, we see t?at Pr a non decreasing functIOn of a, we have 0 - 1 _ P (Y < + 1) and categories a and a + 1 can therefore then Pr(Y $ a) - r _ a be collapsed. " t t' , . parameters f3 have log odds ratIO mterpre a lOllS, smce The regressIOn
log
"
RICAL DATA
Comparing (10.3.5) and (10.3.6), we see that 0C-lb = Ob and Clab = (Jab a = 0,1,.,., C - 2; b = 0,1, ... , C - 2. Similary, f3 c - I = f3 and 'Ya = f3 a - f3 a + l , a = 0,1, .. , ,C - 2. With this formulation, we can test whether or not the effect of 'Xij on Yij is the same for adjacent categories of Yij-I by testing whether 'Y a = O. A simple special case of (10.3.6) is to assume that 'Yo = 'YI = ... = 'YC-2 = so the covariates Xij have the same effect on 1';j regardless of ¥ij-I' As with binary data we can fit models with different subsets of the interactions between Yij-I and Xij' . ' The transition ordinal regression model can be estimated usmg conditional maximum likelihood, that is, by conditioning o~ the first o~er vation, ViI, for each person. Standard algorithms for fittmg pr~portlOn~ odds models (McCullagh, 1980) can be used by ~~ding the ~erlved varIables y., and their interactions with Xij as additIOnal covanates. As an 'J - I fi . It I alternative Clayton (1992) has proposed using GEE to t slm~ ~eous Y ·th· and Its mterac, . logistic regressions to Y;jo,"" YijC-2' agam WI Y~j-i b CI t the as covariates. For the examples studw y ay on, tions with x·· OJ GEE estimates were almost fully efficient. d . t h e a natural or enng, we . d When the categOrIcal response .. ~es no ~v r Y;. = b 1'; '-1 = a), must model the transition probabIlIties, 1fa b - P (IJ AI J t' (1990) a b = O l e _ 1 McCullagh and NeIder (1989) and gres 1 , , ,".,' . f , ' dependent responses. discuss a variety of logistic model formulatIOns or m ()a+Ib,
°
LOG-LINEAR TRANSITION M ODELS FOR COUNT DATA
TRANSITION MODELS
204
· nd ordered categorical responses, b a deIs by using . ., As we have d emons,trated , for .mary . mdlcator vanabIes t i d to transItIOn rna . . these ean be ex ,enc e , . ' t ctions with covariates as addItIOnal for the previous state and theIr III era explanatory variables.
205
Suppressing the index i for the . }"d I . moment, Ill( IVI ua s III a population at ge t' . let y.J rep resent t he number of . . nera Jon J. Let Z ( . offsprmg for person k III generation . _ 1 Th k y] - d be the number of of the jth generation is J. en for YJ-I > 0, the total size YJ-l
10.4 Log-linear transition models for count data '1 t sl'ons of the log-linear model in which the ." In th IS sectl'on , , we conSIC er ex en.. th ast 'H' is POIsson WIth condI'J conditional distribution of Yij gIven e P . t' th t depends both on past outcomes YiI, ... , Yij -1 and tIonal expecta IOn . bl a . . tree h "We begin by revIewmg POSSI'bl e mo d eIs on explanatory vafla es x,], . ' .' I C In each case we restnct our attentIOn to a for the can dItIOna mean P'ij' , first-order Markov chain. O
> O. In this model, s~~gested by Wong (1986), ,8 represents the influence of the explanatory variables when the previous response takes the value Yij-l = O. When Yij-I > 0, the conditional expectation is decreased from its maximum value, exp( x; ,,8) {I + exp(-ao)}, by an amount that depends on at. Hence, this mo~el only allows a negative association between the prior and current responses. For a given ao, the degree of negative correlation increases as at increases. Note that the conditional expectation must vary between exp(x;j,8) and twice this value, as a consequence of the constraints on ao and al.
1. IlC = exp(x;j,8){1 + exp{-ao - aIYij-t}}, ao, al
= exp(x;j,8+aYij-I)' This model appears sensible by analogy with the logistic model (7.3.1). But it has limited application for count data because when a > 0, the conditional expectation grows as an exponential function of time, In fact, when exp(x;j,8) = 11, corresponding to no dependence on covariates, this assumption leads to a stationary process only when a < 0. Hence, the model can describe negative association bu~ ~ot p~sitive association without growing exponentially over time. ThIS IS a time series analogue of the auto-Poisson process discussed for data on a two-dimensional lattice by Besag (1974).
2. 115
3.
~~d~e:p~~j,8+a{l~g('yij_l) -X~j_I,8}], 1988)
where Yi'j-l
= max(Yij-l, d)
< ,I. Th~s IS the model introduced by Zeger and Qaqish
( a~d bnefly dIscussed in Section 10.1. When a = 0 it reduces to an ordmary log linear d I Wh ' th 't .mo e. en a < 0, a prior response greater an I s expectatIOn decreases th t. and th" e expec atlOn for the current response ere IS negative correlation betwe ', d W there is positive correlation. en Y'J-l an Yij' hen a > ,
°
For the remainder of this se . sition model above It .ctlon, we focus on the third Poisson tran. can anse through . I h ' al l' caIIed a size-dependent branch' a SImp e p ySIC mec lalllsm mg process. Suppose that exp(x~j,8) = 11·
Yj
::=
L ZdYJ-d.
k==1
If YJ-l = 0, we assume that the population'. . uals. Now, if we asssume that the r d IS ~estarted WIth 2 0 individ. . an om vanables Z . d POIsson WIth expectation (JiJy* )1-0 th h k are III ependent J-I , en t e population size ,II C I • I 10low the transition model with I.F = the number of offs ri J p.(YJ-I/p.). The assumption about . . p ng per person represents a crowding effect. Whe the populatIOn IS large, each individual tends t d h . n a: . Th' , 0 ecrease t elr number of ousprmg. IS leads to a stationary process. Figure 10,1 displays five realizatiOllE of this transition model for different val~es of a. When a < 0, the sample paths oscillate back and forth about theIr long-term average level since a large outcome at on t'Ime decreases ,e . . the condItIonal expectation of the next response. When a > 0, the process meanders, staying below the long-term average for extended periods Notice t~a~ the sample paths have sharper peaks and broader valleys. Thi~ pattern IS III contrast to Gaussian autoregressive sample paths for which the peaks and valleys have the same shape, In the Poisson model the conditional variance equals the conditional mean. When by chance' we get a large observation, the conditional mean and variance of the next value are both large; that is, the process becomes unstable and quickly falls towards the long-term average. After a small outcome, the conditional mean and variance are small, so the process tends to be more stable. Hence, there are broader valleys and sharper peaks. To illustrate the application of this transition model for counts, we have fitted it to the seizure data. A priori, there is clear evidence that this model is not appropriate for this data set. The correlations among repeated seizure counts for each person do not decay as the time between observations increases. For example, the correlations of the square root transformed seizure number at the first post-treatment period with those at the second, third, and fourth periods are 0.76, 0.70, and 0.78, respectively. The Markov model implies that these should decrease as an approximately exponential function of lag. Nevertheless, we might fit the first-order model if our goal Was simply to predict the seizure rate in the next interval, given ?nly ~he rate in the previous interval. Because there is only a single. ob~ervatlOn pnor to randomization we have assumed that the pre-randomIzatIOn means are the same for the 'two treatment groups. Letting d = 0.3, we estimate the treatment effect (treatment-by-time interaction in Tables 8.11 and 9.6) to be -0.10, with a model-based standard error of 0.083. This standard error Q
'
\\
TRANSITION MODELS
FURTHER READING
206
207
(')~ol_~~~200
.
12 (b)
o
1
50
L
.11
100
hI. I. I
250
150
AIIlI .AIIL
.~L,~,"l . J1.dM .~LM
'1~~~50 o a 1 I"
I
50
I
I.
I
100
(.) ;11_~:-:~~~----::;:--':""-~~~C:L2Ci~----''--2!;O o
50
100
Fig. 10.1. Realizations of the Markov-Poisson time series model: (a) a = -0.8; (b) u == -0.4; (c) Q == 0.0; (d) Q == 0.4; (e) Q = 0.8.
is not valid because the model does not accurately capture the actual correlation structure in the data. We have also estimated a robust standard error which is 0.30, much larger than 0.083, reflecting the fact that t~e correlation does not decay as expected for the transition model. The estImate of treatment effect is not sensitive to the constant dj it varies between -0.096 and -0.107 as d ranges from 0.1 to 1.0. The estimate of a for d = 0.3 is 0.79 and also is not very sensitive to the value of c. 10.5 Further reading Markov models have been studied by probabilists and mathematical statisticians for several deCades. Feller (1968) and Billingsley (1961) are semi?al texts. But there has been little theoretical study of Markov regressIOn
models except for the linear model (e.g. Tsay, 1984). This is partly because it is difficult to derive marginal distributions when the conditional distribution is assumed to follow a GLM regression. Regression models for binary Markov chains are in common use. Examples of applications can be found in Korn and Whittemore (1979), Stern and Coe (1984), and Zeger et al. (1985). The reader is referred to papers by Wong (1986), and Zeger and Qaqish (1988) for methods applicable to count data. There has also been very interesting work on Bayesian Markov regression models with measurement error. See, for example, the discussion paper by West et at. (1985) and the volume by Spall (1988).
GENERALIZED LINEAR MIxED MODELS 209
11
Likelihood-based methods for categorical data
to the underlying model assumptions n e
11.1.1
11.1 Introduction l history of schizophrenia, first-episode patients In a study af th e nat ura 11 ' .. , t ecol'ded monthly for up to 10 years fo owmg mIhad disease symp oms r tiaI hOSpI'tal'Izat'IOn (Thara et al', 1994) ' In studies of the health effects of air pollution, asthmatic children recorded the presence or absence of wheezing each day for approximately 60 days (Yu et ai" 2000). To determine whether maternal employment correlates with paediatric care utilization and both maternal stress and childhood illness, daily measurements of maternal stress (yes, no) and childhood illness (yes, no) were recorded for 28 consecutive days (Alexander and Markowitz, 1986). Lunn, et ai, (2001) analyse ordinal allergic severity scores collected daily for 3 months on subjects assigned either to placebo or to one of three doses of active drug, In each of these examples the primary scientific question pertains to the association between specific covariates and a high-dimensional categorical response vector. 'Group-by-time' regression methods that characterize differences over time in the expected outcome among patient subgroups can be used for each of these examples. Although several analysis options exist for discrete longitudinal data, there are few likelihood-based methods that can accommodate such increasingly common long discrete series, Likelihood-based analysis of longitudinal data remains attractive since a pr?perly specified model leads to efficient estimation and to valid summanes d " , . un er mlssmg at random (MAR) drop-out mechanisms as discussed I~ ~hapter 13, A likelihood approach can be used to construct profile lIkelIhood curves for k t . 1'1 d ' ey parame ers, to compare nested models usmg I'k I e I 100 ratiO tests and t . . , d " ' 0 compare non-nested models by consldermg penaIIze cntena such as AIC BIC H or ,owever, to be used successfully we nee d t 0 correctly specify th d 1~ d e mo e lorm and this requires both tailore exp Ioratory methods to ch ' for mOdel chec ' ,aractenze means and covariances, and methods kIng, Fmally, the sensitivity of likelihood-based methods
Notation and definitions
In this section we distinguish between conditional and marginal regression coefficients. We define the marginal mean as the average response conditional on the c~variates for.subje~t i, Xi = (XiI"",Xin.): J1~J = E(Yij I X;). A margmal generalized linear regression model links the marginal mean to covariates using a marginal regression parameter, (3M: h(f.l~) = X~j{3M. In contrast, we define a conditional mean as the average response conditional on the covariates Xi and additional variables A ij : f.l5 = E(Yij /Xi,Aij). A conditional generalized linear regression model links the conditional mean to both covariates and A ij using a conditional regression parameter, (3c, and additional parameters, T h(J15) = x~ .(3c + 1" A ij . We introduce this additional notation to help highlight so:Oe of the differences between marginal regression models discussed in Chapters 7 and 8, and random effects models discussed in Chapter 9, where A·· represents the unobserved latent variable U i , and. transition D models discussed in Chapter 10, where A ij represents functIons of the response history, 1fij . In this chapter we show ~ow a correspo~dence between conditional models and their induced margms can be explOited to obtain likelihood-based inference for either conditional regression models or marginal regression models.
11.2 Generalized linear mixed models Recall from Section 9,1 that a GLMM structures multiple .sources of measured and unmeasured variation using a single model equatIOn:
h{E(Y;j I Xi, Ui)} = X:j{3C
+ d:jUi ,
. and U· represents unmeasured Where Xij represents measured covarlates, t'h central generalized random effects. For longitudinal data there are ree
THODS FOR CATEGORICAL DATA 210
LIKELIHOOD-BASED ME Random intercepts
linear mixed models: h{E(Yij I xi,Vd} = X;jj3C I j3c h{E(Yij I Xi' Vi)} = xij h{E(Yij I Xi, Vi)} COV
+ UiO, + Ui O + U-I'
= x~/3c + Uij,
-- U- ) = (J2 p1t;j -t;k I, (U'J' .k
(11,2.2) 'J'
(11.2.3) (11.2.4)
11 2 1) correlation among observations collected on de,I ( ." In the first rna , ., I tt 'b table to a subject-specI'£1c '1evel' , or rand om'mteran mdlvldua are a n u . . f . me that there is no addItIOnal aspect 0 tIme that cept, These dat a as su . t th dependence model In (11.2.2) we assume that each subenters moe ' " jeet has an individual 'trend', or follo~s a random.lmear trajectory (on the scale of the linear predictor), For bmary data thIS model assumes that c _ h-I(""'_j3C + U-o + U-I . t .. ) which approaches 0 or 1 as tij -+ 00. f.l' - ""') • • 'J Fi~allYI (11.2.3) assumes only 'serial' correlation implying that obse~vations close together in time are more highly correlated than observatIOns far removed in time. Figure 11.1 displays simulated data, }lij = 0 or 1, generated from each of these models for j = 1,2, ... ,25 time points. The first column of plots shows data for 10 subject generated using random intercepts, while the second column uses random lines, and the third column uses a latent autoregressive process to induce dependence, Even though these data are generated under fairly strong and different random effects parameters (var(UiO ) = 1.5 2 for random intercepts, var(UiO ) = 0.2 2 for random slopes, and cov(Uij , Uik) = 2.5 2 . O,glj-kl for random autoregressive process), there are only subtle aspects of the data that distinguish the models, First, the latent random lines model in the second column shows subjects in rows 2 and 6 (from the top) that appear to change over time from all D's to alII's, and the subject in row 5 switches from alII's early to all D's late. This pattern is expected from random lines since lie. -+ 0 or 1 as " The data generated using autoregressive random""'J J mcreases, effects exhibit more alternating runs of 0' s or l' s th an t he random mtercept ' data reflect, . mg 't'Ion. F'mally, these data suggest the challenge ' ' the dserlaI assocla that bmary ata present - det t bl d' a ' ec a e luerences m the underlying dependence st ructure may ' only be we aklY apparent even with long categorical series, In Sect Ion 9.2,2 we ' t d d ' methods refer d m r? uce approXimate maximum likelihood on a Lap'lace are to.as p,enahzed quasi-likelihood (PQL) that are based pprmomatlOn to the ' I I' . and binomial data PQL margma Ikehhood. For count data can work s .. I Clayton (1993) also d urpnsmg y well, However, Breslow and that PQL approXimatlOns '. . ld po t ent ·all I Y severely biased e emonstrate t' t yle Sima es of both . . vanance components and regressIOn parameters when used ~ b' ' lor mary respo d Fu dIrected at removing th b' nse ata. rther research has been . e las of PQL b I' . approxImations (Breslow a d L' Y ~mp oymg hIgher order Laplace n m, 1995; Lm and Breslow, 1996; Goldstein
R
VVLfU~~.
(11.2.1)
,t
Random In ' l + slopes
Ql CIl
C
o
0. CIl Ql
a:
VWV ~ VlNVV\NVV\MA V\lA ~ flNlJJ\ ~ JlJIJl JLNL ~ VVJVL ANWMJ A L AJJUNl VVWV JLN\J lJJNl/\J JWJVlJ lJ\JYIA WVlAN nJJJ\JL YVLJl flIWlN jIJ\J'{V NVlfl{1
Fig. 11.1. Binary data simulated under GLMMs: Random inte~cepts == logit E(Yij I Xi, U;o) = fie + UiO; Random intercepts and sl.opes == IOgit E(Y;j I Xi, U i ) = fie + U;o + Ui1 • j; Random autoregressive == IOgIt E(Y;j I Xi, U;) = c ,. 101 (J + Uij where COV(Uij, U'k) = Gp J- .
212
LIKELIHOOD-BASED METHODS
FOR CATEGORICAL DATA GENERALIZED LINEAR MIXED MODELS
, I i ntage of approximate methods is 1996) Thp smgu ar a( va h d , . ~'. It mative numerical met o· s which and Rash ash , t'· I ea.:'iC relative to a e t their compu a ,JOna ~ ,,' h I'k I'hood function using either quadrat. . . , ,tl m' ximlzmg tel e I focus on (IIree , y a th ds We review numcncal maXlITmm () me, 0 ,. M te C'lr1. 1 ure meth(J( s or on, . <. S t' 11 2 1 Modern Bayesian computing , 1 (MI) (aches m ec lOn, . . . likehhoQ( ~ apprJ . d d th model forllls that can I){~ considered p ~"uss Markov Chain Monte Carlo algorithms have greatly ex an e for analysifl, In Section 11.2. 2 we ISC" (MCMC) analYflis for GLMMs, 11.2.1
tl J[g ~ tl J ~
C
f(y" I X" U,; Ii
)] •
p
p
I X" U,
{f)Ogj(Yi j I Xi, Ui = G I/2Zk; {3C)}] J=I
.
Monahan and Stefanski (1992) discuss alternatives to Gauss-Hermite and show that, the maximal error for Gauss-Hermite quadrature is increasing as th~ vanance of the random effects increases, Specifically, let M ('fJ, a) = explt('r/ + I j . z)ljl(z)dz, where expit(x) = exp(x)/[l + exp(x)], and ¢(z) IS the standard normal density function. The function M ('fJ a) represents the ma:ginal m~an induced by a GLMM where logit(JP) ~ 'fJ + U, U rv N(O,1j ). Let M('r/ 0") 't( ) to M('r/ ) . L.,k~1 Wk 'expl 'fJ + Ij , Zk , the approximation M h ,0" usmg auss-Hermlte quadrature with K nodes For K = 20, ona an and Stephanski (1992) show that sup IM('fJ a = M(TJ a = 6 2) I, ~ 10- while suP'1 I M('r/, Ij = 4) _ M( ~ ~ -3 ' SpIegelman (1990) also discus . . 'fJ,0" - 4) I rv 10 ,Crouch and quadrature begins t o f aI, 'I s condItIons under which 20-point Gaussian Adaptive Gaussian quadratu . technique that centr th G ~e IS an alternative numerical integration es e ausslan ap , . d pro>omatlOn to the likelihood at the posterior mode of th e ran om effects, To derive the adaptive quadrature
!
G - "K
2) _
~ U;Il
C
)}]
~ U;Il j} C
C-'I'¢(u/C'I') do.
C-'!>
¢(U/GI/2) ]
~ fjJ [ex x
~ IT t Wk' [ex
f(y"
x ¢([u _ al/b)
flU,; C) dU,
[exp{t,IOgf(Y'j I X" U,; lic) }] . C-'I'¢(U,/C'I') dU,
,=) k=1
~ tJ J[ex {t, log ~ fjJ [exp {t,
!ogf(Y'j I X" U,
Maximum likelihood algorithms
, , I I'andom effects standard numerical integration For low-dlmenslOlla , , ' luate the lrkelrhood, solve score, equatIOns, eva to d metho dscan be use Gauss-HermIte quadratan d compu t e rna de l- based information matrices, , K ' ure uses a fixed set of K ordinates and weights (Zk' wkh=1 to approximate the likelihood function, Consider a single scalar random effect Ui rv N(O, G) and the likelihood given by equation (9.2.5):
L(6, Y)
213
we begin with the likelihood given ab d OVe an then 'ct ' linear transformation for the placeme t f th COllSl er an arbItrary e quadrature points: n 0 L(t5,y)
p
{t,
¢([a +
m K ~g(;Wk'
[
exp
log
b·
'¢([u - aJ/b) du
fry"~ I X" U. ~ ("
+b ,); lie) } C-'I'
1]
z]/GI/2) ¢(z) . b 'ljl(z) dz
{
n, } ~logj(YijIXi,Ui=(U+b'Zk);I3C) C- I/ 2
I 2
!].
x ¢([a + b . zkl/C / ) . ¢(Zk) b
This shows that we can use the Gauss-Hermite quadrature points (Wk , Zk)K_ after any linear transformation determined by (a,b) (as long k-l . as the integrand is also modified to contain one additional term a ratIO of normal densities). If the function that we are trying to in~egrate were of the form exp( - ~ (u - a)2 /b 2) then evaluation using the .Imear transformation a + b . Zk would yield an exact result. An ada?tlVe approa~h uses different values, ai and bi for each subject, that proVIde a ~u~ratlc . ' bsu"Ject s cont n'b u tl'on to the log likehhood, apprmomation to the zth ".10 f( .. / X, lj .. ,qc) - U 2 /(2G). In Section 9.2.2 we showed that ~J g YtJ " t , fJ t e d b ' th . . • _ -V-I ( . - X·{3 ), an i IS e Ui IS the postenor mode, ai = Ui - CD, if Zt b. D'. +C-I 1-1/2, where approximate posterior curvature, bi = [Lj D ij V(Ji'J) IJ Dij = af..L~)ab. Liu and Pierce (1994) dis~uss ~ow this adaptive quadrature is related to higher order Laplace ~pproXlmatlOn, f PQL fixed quadratPinheiro and Bates (1995) studied the acc~acy 0 . d' model Their ure, and adaptive quadrature for the non-hnear rruxe .
214
LfKELfHOOD-BASED M
ETHODS FOR CATEGORICAL DATA
, d t btain high accuracy with fixed quadrature results SUggfl~'lt that mbor er a °adrature points may be necessary (100 or er 0 f qu " 'd tl1re methods proved accurate usmg 20 pomts me,th 0 ds a large num ') hile adaptIVe qua r a ' , ' software more. , w th d re noW implemented III commerCial or fewer Quadrature me a s a , ' I) d ' ,, l' ST'AT'A (fixed quadrature for logistic-norma an SAS packages mcluc mg I d ' I· . ) A k limitation of quadrature met 10 . S IS t lat I'Ikeh-, (fixed and ac1aptlve . , d' , r.(q quadrature points where q IS the ImenSlOn of . . ey hood evaIuatlOn reqUIres .l' ' . Lr tV· D q larger than two, the computatIOnal burden can t he ran dam euec t· C'or . ' " 'c Il'ml'tation makes numerIcal mtegratlOn usmg quadratbecome severe, Thl" ure an excellent choice for random intercept model~, .for n~sted random effects, or for random lines, but prohibitive for multIdimenSIOnal random effect models such as time series or spatial data models, Monte Carlo ML algorithms have been developed by McCulloch (1997) and Booth and Hobert (2000). Three methods are described and compared in McCulloch (1997): Monte Carlo Expectation-Maximization (MCEM); Monte Carlo Newton-Raphson (MCNR); and Simulated Maximum Likelihood (SML). Hybrid approaches that first use MCEM or MCNR and then switch to SML are advocated, Booth and Hobert (2000) develop adaptations that aim to automate the Monte Carlo ML algorithms and guide the increase in the Monte Carlo sample sizes that is required for the stochastic algorithm to converge. Booth and Hobert (2000) show that efficient estimation can be realized with random intercept models and crossed random effect models, but suggest that the proposed methods may break down when integrals are of high dimension. The advantage of Monte Carlo ML ~ethods is. that the approximation error can be made arbitrarily small ~Imply by Increasing the size of the Monte Carlo samples. In contrast, to Improve:ccuracy with quadrature methods a new set of nodes and weights, (Zk,Wkh:l' must be calculated for a larger value of K, In summary" fixed adapt'Ive, and St och ' numencal . ,mtegratlOn . astlC met hods have been devel d d d op,e an, rna e commercially accessible for use with GLMM h ' s aVlllg low-dimenSional random effect distributions However, none of the numerical ML h d ' met 0 s have been made computationally practical for mod I 'th e aspect makes it ims WI' random effects distributions with q > 5. This pOSSI ble to use ML f GLMM ' dom effects, such as (11.2.3 re ' o~ . s, th~t have senal r.anlongitudinal data anal . Fu), g atly hmltmg applIcatIOn for categorIcal ySIS, rther det 'I d' al regar mg methods for integral approximation can be f d' E oun In vans and Swartz (1995), 11.2.2
Bayesian methods
Zeger and Karim (1991) d' I . f ' ISCUSS use of G'bb ' ,YSIS 0 discrete data usin a GLM 1. S samphng for Bayesian anaImplementation of a wid g I M. Gibbs sampling is one particular , d' . er c ass of meth d £ lor Istnbution known as MCMC 0 s or sampling from a poster, MCMc methods constitute a technical
GENERALIZED LINEAR MIXED M
ODELS
215
breakthrough for statistical est' , , " Imatlon, These m th
1l'(o,U I y) ex f(y I U,o), f(V I 0) . 1T'(0), We note that the marginal posterior distribution for the parameters 0 is obtained by integrating over the random effects V,
7['(0 I y) = ex
Iu Iu
1T'( 0, U I y) dU f(y I V,o), !(V '0) '1T'(6) dU
ex L(y I 0) '1T'(6), indicating that the only difference between the maximization of the posterior, 7['(0 , y), and the marginal likelihood, L(y I 6), is the additional impact of the prior 1l'(0), We review these basic aspects of Bayesian methods for two key reasons. First, it shows that if we can actually solve the harder problem of characterizing the joint posterior 1T'(0, V I y) then we can compute the marginal posterior for regression and variance component parameters o. Second, this shows the well-known result that the posterior distribution closely approximates the likelihood function when weakly informative priors are used, The MOMO methods can be used to generate samples from the joint posterior 7['(0, U I y). General algorithm details are thoroughly presented
216
LIKELIHOOD-BASED METHO
OS FOR CATEGORICAL DATA
' d Louis 1996). The GLMM has 1 1996 CarIm an, C elsewhere (Cilks et a., ' h that f(y I V,o) = f(y , U, f3 ), and a separation of parameters suc d't' al independence relationships allow These can I IOn f( V 10) = f (V I G) ' 'th' MCMC algorithms (Zeger and Karim , ' . d C'bb teps WI III , use of slmphfie ISS . 11l'res that the samples generated via CMC in practIce req , ' . , 1991). Use 0' f Mb assessee1 tor t: vergence to theIr statIonary dIstnbu_ can M t Carlo sample sizes have heen generated Markov challlS e tion, and that adequate t ~n e ummaries such as means, medians, and , I estimate pos erwr s to precIse y" P bl' lly available software (BUGS) allows applicatandard deVIatIOns, u lea , dId d (11 2 1)-(11 23) and for more complIcate carre ate ata s tion for models " ' , f B /MCM 'th II likelihood-based methods the use a ayes C problems, As WI a 'd d I f( I~) , f I d thoughtful specificatIOn of the ata rna e y (J, reqUIres care u an , . h I" IS t , at rea Istrcally ' contribution of MCMC machmery However, th e major " , , t d models can now be routinely used m practIce, allowmg a nch I comp lea e I 'bl' d' family of correlated data models not previous y acceSSI e usmg Irect likelihood methods, Gelfand and Smith (1990) describe sampling-based approaches for calculation of marginal posterior densities, and Gelfand et at, (1990) provide illustration of Bayesian inference based on Gibbs sampling, Zeger and Karim (1991) discuss regression analysis of categorical response data within the GLMM framework, and Albert and Chib (1993), Chib and Greenberg (1998) overview MCMC methods for binary and polytomous data using multivariate probit models, Sun et at, (2000) discuss GLMMs where random effects have a serial or spatial correlation structure, Finally, Chib and Carlin (1999) discuss specific computational issues related to the use of MCMC for longitudinal data analysis, 11.3 Marginalized models The ~~al of this section is to show that log-linear, random effects and tranSItIOn models can b d 'th 1" . " ,. e Use WI mu trvarrate categoncal data to eIther model a condItIonal regression coefficient as described in Chapters 9 and 10 or as a, basis for constructing a correlated data likelihood with a marginai regressIon s~ructure that is directly modelled. In margrnal regression m d I h parameters represent the choa e s, : e mean (or first moment) regression lence with binar t nge III expected response, such as preva" . , y au comes per un't h I C ange III a gIven predIctor WIthout conditioning on the t h ' , 0 er responses 0 l ' , X r any atent varIables, CorrelatIOns among elements of y . ' ,given . even'f d unobservable latent v 'bl " I reasonably attributed to share ana es or thr h comes, is aCcounted for b ' oug a dependence on the past outy a separate d d 1 ad vantages of a direct m . I epen ence model. There are severa argma approa h F' parameters, (3M is invarI'a t 'h c, Irst, the interpretation of mean ' , ' n Wit respect t mo d e.I Two data analy t 'h 0 speCIficatIon of the dependence s s WIt the same mean regression but different
MARGINALIZED MODELS 217
association models have exactly the same tar get of . , M . sense, the mean model is 'separable' fro th e;;tlmahon, {3 . In thIS the joint distribution. Second marg'ln m, . e rlsemalllder of the model for t' d' , a rna d e can b using semi-parametric methods such as ener r ~ es~mate eIther (GEE), or using li.keli?ood methods descr~bed ~~~::.eshmatlllg equations Often the motIvatIon for adopting a rand a . ' condi tlOns on I atent vanables U· or for use of a tr om'fellects model that . ' ~]' ansI Ion model that conditions I I' on t 1e past outcomes, IS SImply to account for . .' . corre atlOn among repeated measurements. In discussIllg the role of statl'st' 'I die . ., , lea mo e s, ox (1990) comments: It is important to distinguish the parts of the model th t I f i ' , . . ' . a (e me aspects of subject matter mterest, the pnmary aspects and the secondary ~'spoct th t . d'
. , . " . ~ c S . a. efficient methods of estimatIOn and assessment of precision. (page 171)
III
Icate
Especially in empirical models, it is desirable that parameters ( t t e.g. con ,ras 5, ' " regreSSIOn coeffiCients and the lIke) have an interpretation largely independent of the secondary features of the models used. (page 173) Therefore, if the primary objective of longitudinal analysis is to make inference regarding the mean response as a function of cova;iates and time, then a marginal mean model may be useful. In this section we describe how the correlation among observations can be charaderized by embedding the marginal mean structure within a complete multivariate probability model based on either log-linear, random effects, or transition model dependence assumptions. Marginalized models separate the model for systematic variation (mean) from the model for random variation (dependence) when using regression models for longitudinal data. In the first component of a marginalized model a generalized linear model is used for the marginal mean:
(11,3.1) However, the marginal mean only identifies one facet of the complete multivariate distribution for the response Y i , In order to complete model specification we need to characterize the dependence among re~eated . I'Ize d rna d eI we specify a second regresSIOn to observations. In a margma characterize correlation or random variation: dependence: h{E(Yij ,Xi, A ij )} = Liij(Xi)
+ 'i':jAij'
(11.3.2)
. , 'bl A·· that are used to structure Here we introduce addItIonal varra es 'J al th I'nk funed rements In geneI', e 1 dependence among the repeate meas~ alth h we usually choose tions in (11.3.1) and (11.3.2) can be dIfferent d ~~gendence parameters them to be equal so that mean parameters, an, A P _ {Y. ' k =1= j}, 'ble chOIce IS i] ,k' are on a common scale, O ne pOSSI t I all other response ' case the parameter lij m ' dI'cates how S rang Y I n thIS
218
I- IKELIHOOD-BASED METHOD
S FOR CATEGORICAL DATA
Y.. Although we assume the ent response, ') . . fi . f . bl es, Y.,k, predict t h e curr. al mean gIVen . by (11.3.1), speci catIOn a vana . lysis focus is on the margm . f I for describing within-subject corana. , (11 32) IS use U 1St" 11 32 the conditional mean ~n, 'the 'oint distribution of Y i . n ec IOn .. J d I ' that are based on (11.3.1) and t ·on and for identlfymg I re a I . I' d I -linear rna e s U we describe 'margma Ize ,og , Alternatively, we may let Ai) = i, a (11 32) using A ij = {l'ik. k =1= J}. , bles In this case we also need coll~~tion of random effects or I~tbentt. var~~ th~ random effects. Here 'Yij I tion distrl u IOn . d f b to describe the popu a h t haracterize the magllltu e 0 uno C I t ' I . mponents t a represents varIance co " d all within-person carre a Ion. n , h . tion whIch III uces Served or ran d am varIa . I' d latent variable moclels t at are ' d 'b 'margma Ize Section 11.3,3 we escn e ' A _ U. Finally we may consider ,., I h (11 3 1) d (11 32) usmg ij based on ." ,an=?i " ;here (11.3.2) now descr.ibes how strong y ~ e A ij = {l'ik' k < J} ') t t orne In SectIOn 11.3.4 we descnbe past responses pred~c.t the ~~:~:~t~~ care based on (11.3.1) and (11.3.2) 'V" characterizes how strongly the past 'marginalized translItlOn ~ using A ij = ?iij' n th IS case I I) t' edict the current outcome. marginalized models the parameter !::.ij (X i) represent,s , and dependence · f the marainal means , ItM parameters, 'Y~J' t f a unc IOn 0 b' ') • 'h h 'I such that the conditional model in (11.3,2) is consIstent WIt t ,e margma .. )] . Stated alternatIvely, when mean structure: It··M = E A., [E(Y.' 'J I X ., A ')
obs~:v:~~~A~ese
the conditional model in (1i~3.2) is averaged over the distribution of A ij the value D. .. (X -) is chosen such that the resulting marginal mean structure . properly ' ) . III ' duce d , Itij M -- E A [h- I {!::.', J (X,) LikelihoodIS ' + ""I,A·}], I 'J IJ ij based estimation methods for marginalized models need to recover !::.ij (X i) and evaluate the likelihood function. In each of the following subsections we first review the model assumptions and then overview ML estimation issues. The structural specification of a marginalized model parallels the components of a generalized linear model for univariate outcomes. McCullagh and NeIder (1989, p. 27) identify a random component that specifies the distribution of the response and therefore determines the likelihood, and a separate systematic component that characterizes the mean response as a function of covariates. Marginalized models for longitudinal data similarly separate the distributional assumptions in (11.3.2) from the regression assumptions in (11,3.1), 11.3.1
An example using the Gaussian linear model
Although in this chapter we focus on the use of marginalized models for categorical response data, the model formulation in terms of (11.3.1) and (11.3.2) could equally be adopted for continuous response data. We briefly r~ent the G~ussian version of three marginalized models to derive anayt1cal expressIOns for D.ij(X i ) and to show that each specific pair of
MARGINALIZED MODELS
219
models, given in terms of the marginal mean (1131) d d' . ItlOnal " an. a con d mean ( 11.3.2), characterizes both the mean and the ' h . covanance, an Wit the ~sumPtI~n of normal errors specifies the multivariate likelihood FIrst conSIder the conditionally Specified Gaussian model: .
Yij I Yik:
k
of. j = /lM + '"' (Jk(Y.k__ /lM) + Etj ') ~.) Ik k#j
=( /l,) M_ fJ ,. 10.1.) .)f..L. E(Yij
+ fJ'.y'i + Eij, 1)
I Xi,A ij ) = D.ij(X i ) +fJ~jAij,
where ()ij is the vector ((Jijl, (Jij2, , , . , (Ji)n ) with () .. :::: 0 and A. :::: Y* 'lJ, I) i' the response vector Y i with Yi) set to zero. We see that D.;)(X ) :::: i (It~J - ()~jJJ,tt) a function of the marginal mean, f..L~I, and the dependence parameters Oij. The covariance of Y i is (1 - 6 i )-IR.;, where 6, is an ni x ni matrix with rows Oij, and R.; is a diagonal matrix with diagonal elements equal to var( Eij). We see that in order to be a valid model we assume i ) is invertible and require the product (1 - 6 )-1 R.; to be symmet(1 i ric, positive definite. Models specified in this fashion are also discussed in the spatial data literature and compared to alternative model specification approaches (see for example Sections 6.3.2 and 6.3.3 of Cressie, 1993). Next consider an AR(p) formulation: t
a
p
Yij
I Yik: k < j
=
It~J +
L rij,k(Yij-k -
/l~-k) + Eij
k=1
M - lijf..Lij , *) = ( Itij
E(Yij
I Xi,A ij ) =
+ lij' H..IJ-I + E'l' ..
!::.ij(X i ) +'")'~jAij,
where 'Yij is the vector ('")'ij,l, )'ij,2,.", rij,p ) , f..Lij* -- (M Itij-I' /lM ij-2,"".ItM) ')-P , A ..J = H,J I where H ij - I = (Yij-1,Yij-2, .. "Yij-p)· Agam we 't · A .. ('x -.)' = (Jl11 _ ",,1.11.*,) a function of the marginal mean, o b a1n l . . J . ' J ' """J 11)r"J . Y 's f-LM and the association parameters lij' The covarIance of i .1 (] ~ r·) -1 D. (I - r ' )-1 where r i is the ni x ni lower triangular m~t~lX , .. "i " r i d to a d'posItIveI with rows defined by lij' Here we find that any i ea sb · Lor r Y i (R; is assumed lagona 'fi d' tothO e f.a h'on are semi-definite covariance rnat nx 1) Models specI e 111 IS as 1 h d' matrix with var( Eij) on t e lagona . Ch ter 3 of Diggle ap , discussed in the time series literature (see for example t d as 1990). Finally, the random effects model can be represen e
Yij , Ui = It~
+ d~jUi + Eij
IlM + d' .a -- ""'J 1)
E(Yij
I Xi, A ij ) =
l 2 /
ei + Eij,
~ij(Xi) + a~jAij,
DS FOR CATEGORICAL DATA 220
MARGINALIZED MODELS
LIKELIHOOD-BASED METHO
N(O,I). Here we simply have , d' a and A··J ;: lor . D'~· h . 1 • d covariance of Yi IS DiG i +'<'1 were where aij;: ij A (X.) I'M. and the mduce ~ij l "IJ' d· is the jth row of D i . h t multivariate normal model can be IJ I I onstrate t a, a " . These examp es (~em E(Y... I X) ;: J'l.~, combmed with condlIJ 1 J b h . . . rgmal means specified usmg rna A' .. ) here A·· can either e t e remammg t' s E(Y; I X IJ' W IJ . { y, k tional expec~a ,lOn, ~J..t. I.'} the past response vanables, ik:' < J.} , response vanabIes {Yik ' k r J , or a latent variable, Ui·
ei
l/ 2
L"
C. rv '01
11.3.2
Marginalized log-linear models ' h t al 1975) have been widely used for the Log linear models (B IS op e. ., 1 d b' t - . _ lassified discrete observations. Ba ance mary vec ors, '_ 1 2 m can be considered as a crossanalysIs of cr~ss)c L" (}'il,Yi2, .. ·,.lin Jor ~ - , , ... " d' d' S t' 822 . . f h ponent responses. As Iscusse III ec Ion .. , clasSificatIOn 0 ten com ., b b'l' . , a Iog-Imeal' mo del'IS constructed directly for the multlvanate pro a 1 Itles (O) log Pro (}'il,' .. , Yin);: (Ji
j
j
+
j
l: em1l'ijYikYiI + .. , + e~n)Yil"'"
Yin·
j
L
=
PrO i (Yil, Yi2, ... , Yij
(2)
= 1, ... , Yin I Xi)
Yik,kf-j
yielding mixtures of exponential functions of the canonical parameters Oi' In a log-~i~ear model, the natural (canonical) univariate regressions are for the condItIOnal expectations logit E(Yij I Yik : k ::=
tP) lJ
f=. j)
+ "e(2)y ~ k
ijk Ik
h{E(Yij
I Xi)} = X;j,8M, f=. j)}
= .:iij(X,)
"(3)
+ L., eijk1YikYil + ... + e~ k
n )
II
Yil.
l#j
Therefore, although 10 -line d . variate dependenc'l g f ar mo els are well suited for describing multIes or or mod 11' '. , e mg Jomt and conditional distributions,
(11.3.3)
+ (};jW'j'
(11.3.4)
where W ij represents Yik, pairwise products Yidil, and higher order products of {Yik: k =J- j}. The fact that (11.3.3) identifie.s the joint distribution of Y i with Oij unconstrained parameters subject only to symmetry con-
egk
e(n)) , Here the canonIcal parameter vector Oi = (Oi ,Oi , ... , i IS unconstrained and e~O) is a normalizing constant. Given covariates, Xi, it is possible to allow Oi to depend on Xi or to extend the log-linear model to describe log P(Y i , Xi) when Xi is also discrete. However, in either case the log-linear model results in complicated functions for the marginal expectations, E(Yij I Xi), because these are obtained as sums over the response variable joint distribution •
E(Yij I Xi)
We now consider the formulation of marginaliz d I I' ' . an d Lair . d (1993) , . models. F 1't zmaunce margmalized' .e I ogI meal' . . l' . . canOlllca og-lmear models to permit Ikehhood-based regressIOn estimat' f h . .Ion 0 t e margmal means . by transformmg the canonical parameter, (}, ::= (0(1) (}(2) (n) , h . d t , , , . • . • B, ) mto t I' mlxe parameter, (}* = (/.lM 0(2) (J(nJ) h M M ' 1 , l ""', , Were /.li = M M M ) . ( /'1.; I , J'l.i2' ... , J'l.in , With /lij ::= E(Yi j I Xi). In their approach the underlying log-linear model parameters ' ((}(2) B(n)) ara llised t d ·b. h . . " ... , i ' · 0 escn e t e covanance of the response vector while the. average . ble IS . . . . .resJ)on • se vana directly modelled via the marginal mean. We use the following pair of regression statements to characterize the marginalized log-linear model
10git{E(Yij I Xi, Yik: k
~ (J(l) y, +"" e(2) y, . y; + L..J ij ij L..J ijk IJ Ik
221
they do not directly facilitate multivariate g l' d r . modelling of the marginal means. enera lze mear regressIOn
ditions such as = e~~~, is known as the Hammersly-Clifford Theorem (Besag, 1974). The term ~ij(Xi) is not a free parameter since it is constrained to satisfy the marginal mean model (11.3.3), The parameters for this model are 13 M and {Oij }j=I' As with all marginal regression models the mean model (11.3.3) is separated from the dependence model (11.3.4) and the parameter 13M retains the same interpretation for any model assumptions used in (11.3.4). The log-linear association parameters in Oij indicate how strongly other outcomes, Yik k =J- j, predict Yij. This model does not exploit time asymmetry and conditions on both past and future outcomes through {Yik: k =J- j} in order to characterize depe~dence. With regard to estimation, Fitzmaurice and Laird (1993) showed how iterative proportional fitting (Deming and Stephan, 1940) can be used ~o transform from the mixed parameter to the canonical parameter .Oi l~ n order to evaluate the likelihood function. In this approach the 2 multI:arI. . vector p 0 (V V V ). ate probabIlIty .1 iI, .1 i2, ... , .1 in IS recovered for each subject, , A related use of log-linea:r models that also permits marginal regresSIOn models is presented by Lang and Agresti (1994), Glonek an~ ~c~ullagh (1995) and Lang et al. (1999). Each of these approaches IS hmlt~d to applic~tions with small or moderate cl~ter siz~s due t~ ~o:rl~;~)o~ demands. In addition, the methods of Fltzmaunce an d aIr.. . the canonical asSOCiatIOn paraeffectively limited to balanced d ata SIllce ch meters (OV>, . .. ,ofn)) must be separately modelled and estimated for ea c1uster size ni.
0:
ZZ2
LIKELIHOOD-BASED
METHODS FOR
CATEGORICAL DATA MARGINALIZED MODELS
. ed atent variable models Marginalzz l er (2000) have
discusse~
how the 9) and Heagerty and. ~eg observed heterogeneity can be for We refer to these models flexlb~hty 0 . marginal regreSSIOn m . uivalently as marginalized combmed with a . ble models, or eq as marginalized latent varIa Md' dom effects models. . the marginal mean J1ij an an asSOCIran . we assume interest In . . S systematic variation. We Once agaIn I th t characterIze . generalized linear mode a bservations is induced Via unobated ume that correlation among 0 t' s Y; . would be conditionally furt her ass d that observa IOn tJ ed IX t s. The model can then be expressed serv latent variables Uij, andam euec independent given these ra~ through the pair of regressIOns 11.3.3
He~g~r.ty (1f9~LMMS
character~zIng ~~el
h{E(Yij I Xi)} = XijI 13M , h{E(Yij I Xi,Ud} = ~ij(Xi)
(11.3.5)
+ Uij·
Il~
h-1(X~jf3M)
(11.3.7)
U i rvN{O,G(a)},
. where the parameter a represents vanance compone nts that .determine t (U·) This random effects specification includes random. mtercep s, cov • u... == Iu.·' o random hnes, Uij =: UiO + Ui l ' t ij, an d autoregressIVe random eti:cts. that if G 1/ 2 G 1 / 2 = G(a) and Ai is multivariate standard normal, then U i = G 1 / 2 Ai and (11.3.6) becomes
~~te
h{E(Yij I Xi,A i )} = ~ij(Xi)
difficult for multilevel models with cluster-level covariates since no direct matching of Uij is observed for these contrasts. See Graubard and Korn (1994) or Heagerty and Zeger (2000) for further discussion. Figure 11.2 represents the marginalized latent variable model. The inner dashed box indicates that the marginal regression model characterizes the average response conditional only on covariates. The outer dashed box indicates that the complete multivariate model assumes dependence is induced via the latent variables Uij. The marginalized model in (11.3.5)-(11.3.7) also permits conditional statements via the implicitly defined ~iJ (X;), recognizing their dependence on model assumptions. The parameter ~iJ(Xd is a function of both the marginal linear predictor T/( Xij) = X~j 13M and the random effects distribution Fa(Uij ), and is defined as the solution to the integral equation that links the marginal and conditional means
(11.3.6)
. t' we assume a distributional model for To complete the model speclfica Ion t such as the normal random .. u... '" Fa, indexed by a parame er a, u.IJ' tJ effects model
= E(1l5), =
Jh-I{~ij(X;) +
r - - - -- - - - - - - -- - - - - - - -
~
(11.3.8) Uij}dFa(Uij).
(11.3.9)
- - -- - - -
;
,
,
, ,
' '
,7=-Y=7=7-: :
i
'EJ ~ ~ G::J
+ G 1/ 2 A i ,
which has the general marginalized model form that we describe in (11.3:2). The formulation in (11.3.5)-(11.3.7) is an alternative to the clasSical GLMM (Breslow and Clayton, 1993) which directly parameterizes the conditional mean function, ~ij (X i) = X~jf3c. Recall from Chapter 7 that there is a critical distinction between the marginal parameter, 13M , ang the conditional parameter, f3c. The conditional regression coefficient 13. contrasts the expected response for different values of the measured covanates, 'JJij, for equivalent values of the latent variable U . The marginal ij coefficient does not attempt to control for the unobserved U . when chari acterizing averages. For example, a marginal gender contrast ~ompares the mean among men to the mean among women, while a conditional gender contrast compares the mean among men with Uij = U* to the mean among women who also have Uij = U*. Interpretation of f3c can be particularly
223
c±JG5cJjg Marginal regression
_ _ _ _ _ _ _ _ _ _ .J
GLMM
,,
_ _ _ .J
inalized latent variable mo~el. ~he Fig. 11.2. Diagram representmg a ma:f s ecifies a marginal generalIzed Iminner dashed box indicates that ,theMm~dd ~d from an underlying GLMM for ear model, h{E(Yij I Xi)} = Xij(3 , III uc L
-
-
-
E(Yij I Xi, Ui).
-
-
-
-
-
-
-
•
ODS FOR CATEGORICAL DATA
224
MARGINALIZED MODELS
LIKELIHOOD-BASED METH
2 X.) allowing explicit dependence on We assume that var(Vij) = (7 ( . ' 'xed model formulation, V ij :::::, . th' represents a ml . h ndom intercepts have a dIfferent oovariates Xi smcc 18 d th case were ra lIo + Uil 'Xij, an e d' the value of a cluster-level covarim'agnitude of variation deP2 en Idng on(U. t X = 1) = (7?, In the common X - 0) = (7 an var ,0 I ' ate, var(Uw I i -N{ 2 (OX.)) we can rewrite Vij = (7 (X ij) , €, where 0, (1 ". b 8 case where Uij '" €'" N(O, 1) and the integral equatIOn ecome
h-I{TJ(Xij)}
=
J
h-I{Llij(Xi)
+ (7(Xi) . OcP(O d~,
A.. , th t dard normal density function, Given 1](Xij) and (7(X i ), where 'I' IS e s an A (X) S H t , al tl'on can be numerically solved for L..lij i · ee eager y (X) h h - 1 't the mtegr equa (1999)fordetailsofthelinkagebetween1](xij)an?~i! ,w ~n .- Ogl, , l'mk fun ctl'on and mixing distnbutlOn combmatlOns ' the For certam , and margmal mean can be obtamed ' between conditional transformat Ion . ' analytically, For example, using a probit li~k functIOn a,nd ~aussIan random effects, U = I1(X) for N(O, 1), yIelds the relatIOnshIp
,e e'"
{1](X)}
= E [{Ll(X) + (1(X) '0] = {
Jl
~(X)
+ (12(X)
}
showing that the marginal linear predictor, 1](X), is a rescaling of the conditional linear predictor, ~(X), If the variance of the latent variable is independent of X, then the marginal and conditional model structures will be the same (i.e, linear, or additive in multiple covariates),however, if I1(X) depends on covariates, then the marginal and conditional models will have different functional forms. A key example where heterogeneity or over-dispersion is assumed to depend on covariates is in teratologic applications where the intra-litter correlation is a function of the dose, X (see Prentice, 1986 and Aerts and Claeskens, 1997 for examples using the betabino~ial ~odel), Similarly for count data, Gr6mping (1996) discusses the relatIOnshIp between the marginal and conditional mean for a log link with normal mixing distribution, where exp{1J(X)}
= E[exp{~(X) + u(X) ,OJ = exp{~(X) + ~u2(X)}.
Again, if u(X) = 170 then th 'I , " e margma and conditional models are nearly eqUIvalent, but If heterog 't . II . . d £r enel y IS a owed to depend on covariates as m mIXe euects models the f t' al ' unc Ion form of the marginal and conditional models will d'ffi p' function of Xl er, dexample, a mixed model that has ~(X) a linear function. may ea to a marginal model where 1](X) is a quadratic
t
225
. Figure 11.2 characterizes the marginalized random effects model. The mner dashed box .shows that the marcnnal regresslon . mod d' . " o' e i escnbes systematIc. varIatIOn m the response as a function of . t Th . oovanaes. e margmal . . regressIOn structure IS Just one facet of the multl'va . t d' t ' b ' f . , na e Is.n utlOn 0 y . whIch IS assumed to be of GLMM form. f By introducing the marginally specified model, we allow a choice as to whether the marginal mean structure or the , . condl'tl'on aI mean s t.ructure IS t~e focus of modelling when using a latent variable formulation, There. e~st~ a general correspondence between 7J(X) and ~(X) so that the dIstmctlOn becomes purely one of where simple regression structure is usefully assumed, and what summaries will be presented through the estimated regression coefficients, The choice between marginal or conditional regression models can now be determined by the scientific objectives of the analysis rather than by the availability of only conditional random effects regression models. Estimation for marginalized latent variable models is just as computationally demanding as estimation for GLMMs described in Section 11.2. The likelihood function has the same form as the GLMM likelihood detailed in Section 11.2.1. However, f(~j I Xi, Ud depends on ~ij(Xi) which is a non-linear function of both 13M and the variance components a, The parameter ~ij(Xi) can be recovered using numerical integration to solve the one-dimensional convolution equation (11.3.9). Algorithm details are described in Heagerty (1999) and Heagerty and Zeger (2000). Similar to the GLMM, ML estimation for the marginalized random effects model is currently limited to low-dimensional random effects distributions due to the intractability of the likelihood function,
11,3.4
Marginalized transition models
In Chapter 10 we describe how transition models focus on the d~tribution of ~j conditional on past outcomes Yij-I, Yij-2,":' and covanates ~~. These models are particularly attractive for categoncal data, th~t exhibIt , 2 ." serial dependence smce the coeffi' clent s a f Y;.IJ-1> Y;'J-' , mdicate how strongly the past outcomes predict the current response, However, the;e Id t t to condition on past outcomes 0 are situations where we wou n~ wan 1 t clinical trials ' late X· For examp e, mos make inference regard mg a covar " t fixed final . . f t tent on the response a a are interested m the Impact 0, rea m file over time. In this case v.. measured follow-up time, or on the entIre response pro v d't' on outcomes .I'j-I,.I'J-2,··" we would not want to c,on ~ Ion din the effect of treatment on after baseline when making mference regar 1 gconsidered as intermediate Vij since earlier outcomes should be proper Y variables and not controlled ,for.. f erial dependence that a transition The attractive charactefl~atlOn~ h s marginal regression structure by model provides can be combmed WIt a
S FOR CATEGORlCAL DATA 226
MARGINALIZED MODELS
LIKELIHOOD-BASED METHOD
,.
d I (Azzalini, 1994; Heagerty, 2002), adopting a marginalized tran,sl~o~h:ofir:t_order Markov chain models of In this section we first reVle alt ative but equivalent, model speciern, 'd I d d cribe an Azzalini (1994 ) an es .' f a marginal regreSSIOn mo e used . f the combInatIOn 0 'd t fication III terms 0 f th response on covanates, an a ranto characterize the dependence; e ture the serial dependence in the sition model (Chapter 10).' usel'ktol'hcaPd function. We then generalize the d .d ntlfy a I e I 00 response process an I e . . del to allow pth-order dependence. I , alized tranSItIOn mo first-order margIn d b' y Markov chain mode to accomuce Azzalini (1994) introd athIntaris common in longitudinal data. a 'al depen d ence h dimes that the current response variable modate t e sen mo e assu , der Markov A fi rst -or . I through the immediate prevIOUS response, . d ndent on the hIstory on Y '1' . 18 epe, ') = E(li' I Yij-d, The transition probaI)J ItIes Pij,O = E(Yij I Yik, k < J) d .. J = E(Yi' I Yi-I = 1) define the Markov proE(Y.' I Y. I = 0 an P'J,I lJ J A I" (1994) d' tl parameterize the marginal mean. zza cess'3but d'J0 not Irec y . Inl F' arameterizes the transition probabilities through two ass~mptlOns. I~S~, a ~arginal mean regression model is adopted which constraInS the tranSitIOn probabilities to satisfy
J.l~ = Pij,l 'J.l~-I + Pij,O' (1 - J.l~-I) ,
(11.3.10)
Second, the transition probabilities are structured through assumptions on the pairwise odds ratio ,T,., _ ~IJ -
Pij,J/(I- Pij,d Pij,O! (1 - pij,O
)'
(11.3.11)
227
regression model for how strong) Yo. . d' (2000) describe the dependence :odellu~;:gl~ts Y;dj'l~eagehrty and. ~eger t f C _ E(V . mo e lOr t e condItIOnal I ij I Xi, Yik, k < J) with logit link expec a IOn f1ij 10git{E(Yij
I Xi, ?tij)}
=:
t.ij(X;) + I'ij,1 . Yij-I,
(11.3,12)
where H· = {Yo . k < J'} d The log oddS rat',10 I'i' I . . I 'J I" ,k . . an I'iJ ' I =: log W. 'J IS sImp y a OglStlC regressIOn coefficient in the model th t d't' J, hX Y: a, con I Ions on bot i and ij-l· The parameter t.iJ(X;) equals logit(p·, ) and' d t _ . d' r' I b M 'J,O IS e er mme Imp. IClt y y (3 and I'ij.1 through the marginal regression equation and .equatlOn (11.3.12). Furthermore, a general regression model can be speCIfied for I'ij,l, I'iJ'1 I
= Z'. 1 .1°1 J,)
(11.3.13)
where the parameter QI determines how the dependence of Yo. on v.. IJ I 'J-I varies as a function of covariates, Z;j,I, For example, lij.1 = O'j, allows serial dependence to change over time, and "Iij, I =: 0'0 +0'1 Zi allows subjects for whom Zi = 1 to have a different serial correlation as compared to subjects for whom Zi = O. In general, Zij is a subset of X; since we assume that equation (11.3.12) denotes the conditional expectation of Yij given both Xi and Zij. In summary, the marginalized transition model separates the specification of the dependence of Yij on Xi (regression) and the dependence of Yij on the history Yij-I, Yij-2, .. , ,Yil (auto-correlation) to obtain a fully specified parametric model for longitudinal binary data. A first-order Yil given model assumes that Yij is conditionally independent of Yij-2, Yij -I' The transition model intercept, t. ij (X i), is determined such that both the marginal mean structure and the Markov dependence structure are simultaneously satisfied. Equations (11.3,12) and (11.3,13) indicate how the ~rst-,order dep.e~d ence model can naturally be extended to provide a margmalIzed tranSItIOn model of general order, p. We assume that lij depends on the history only through the previous P responses, Yij -I, , .. , lij -p' ~ pth-order ~ependence model, or MTM(p) can be specified through the paIr of regressIOns: 0
which quantifies the strength of serial correlation. The simplest dependence model 83sumes a time-homogeneous association, I}J ij = I}J 0, however, models that allow Wij to depend on covariates or to depend on time are also possible. The transition probabilities, and therefore the likelihood, can be recovered ~,a function ~f the m~rginal means, J.l~, and the odds ratios I}J ij' ~zzahll1 (1994) prOVides detaIls on the calculations required for ML estimatIon and. establishes the orthogonality of the marginal mean and the odds ratio parameter in the restricted case of a time-constant (scalar) dependence model.
h{E(Yij
H~a~erty
and Zeger (2000) view the approach of Azzalini (1994) as combmmg a marginal mea d I h . ' n mo e t at captures systematic variatIOn III t he response 83 a fun t' f . I that d 'b ' Cion 0 covanates, with a conditional mean mode Th fiestcn des senal dependence and identifies the joint distribution of Yi' e rs -or er marginalized t 't' first assu . ransl IOn model, or MTM(I), is specified by mmg a regressIon structure for the marginal mean E(¥;' I Xi), . ~smg a generalized linear model h( M) _ , M .' lJ IS specified by assu' M k' J.lij - 'X ij (3 . Next, senal dependence mmg a ar ov structure, or equivalently by assuming a
I Xi)} = m~j(3M
0"
(11.3.14) p
10git{E(lij I Xi, Hij)} = D.ij(Xi ) +
L lij,k 'Yij-k
(11.3.15)
k=1
and we can further assume that the dependence parameters follow regres-
0
sion structure Ok
"II. nJ,
~k , k= = Z ''J' k OA I,
1, 0 " ,po
(11.3,16)
LIKELIHOOD-BASED
228
CATEGORICAL DATA METHODS F OR . .
,) rginalized transitIOn model additive ma For example, a second-order ( I ' Yij-I + "It},Z . Yi}-Z, and "Ii},1 ~ C I a depend on the interaction ' assumes.. logit{/LtJ } == Aij(Xi) +h "ItJ /I..e can a s , , == Z'· 2az, Althoug r'J d I J: r simplicity of presentatIOn, Z lal, "ItJ,z 'J, add'tive mO e 10 , 2 we assume an I meter 13M descri bes changes III the avery,..t}, I ' y,. tJ-' 1}-1 the MTM(2) the mean para . ithout controlling for previous n , f covanates, w d d e as a functIOn 0 age relspons ' 1 1 3' a d'lagra m that represents ,a secon -or er onse variables. Figure ,IS h ' er dashed box indicates that the res P " model T e Inn h' b t h inalized transitIOn 'h ar
-_._---------_._-----~-----------------------------~
, ,
MARGINALIZED MODELS
229
constrained and must yield the proper maruinal t' M C . d ' , 0' expec atlon J.L .. when J.L .. IS average over the dlstnbution of the history Ro fi' '}l 'J 1\ (X) , r any mte va ues of Ok as /.Jotj i ranges from -00 to +00 the induced mar' I ' . ' gina mean monotonI dd d Ically Illcreases from 0 to 1. Therefore uiven any fin't ' , " ' O' I ·e va ue epen ence model and any probablhty dlstnbution for the history a ' 1\ __ (X.) 'd 'fi d h ' , Ullique U'J , can b e I entl e t at satisfies both the transition model d th 'I ' an e margma mean assumptIOns, One disadvantage to directly using a transition model with I d ' bes l 'IS t hat the information in (Y. pagged response varIa y'.tp ) I'S cond't' I I ,.,., I lOne upon, and therefore does not contribute to the assessment of covariate effects. With an MTM(P) approach the information in the initial responses regarding J.Lit, and thus 13 M , is included through lower order marginalized transition models involving E(Y;k I X iko Y;k-I, .. "Y;I) for k < p, For example, when using an MTM(2) the likelihood for the initial responses, (Yil, Y;z), is obtained by factorization into Pr(Y;1 I XiI) (determined entirely by JL;1), and Pr(Y;2 I X i2 , Y;I) which involves and the conditional model: logit E(Y;z I X iZ , Y;d = Ai2(X i ) + ;hz,1 'Yil' Note that 1'iz,l is distinct from the first-order coefficient rij, 1, j > 2, in the MTM(2) dependence model, logit(J.t5) = ~ij(Xd+"Ijj,l' Yij-l + rij,Z ' Yij-2, since this model involves Y;j-2 in addition to l'ti-l' Therefore, in applications we estimate separate lower order dependence parameters, To carry out estimation within this general class of models, note that the MTM(p) likelihood factors into the distribution of the first p response variables times the subsequent Bernoulli likelihood contributions with parameters J.lg for j = (p + 1)"", nj:
ttM
Pr(Yil, Yiz, ' , , ,Yin; I X i) = Pr(Yil, Yiz"", Yip I Xi) . Pr(Y;p+l, Y;P+2,""
n,
= Pr(Yil, Yiz, ... , Yip),
II
l'tn; J'liiP+l,X.)
Pr(Y;j I 'liij, X d
j=p+l
Marginal regression : --- - - -- - --. - - - ---. --- -----
L----
_ a
...J
Transition model
-------------------------
g FTih · .11,3. Diagram representing a second-order marginalized transition model. e mner dashed box indo t h I' ed Ica es t at the model specifies a marginal genera IZ I·mear model, h{E{Y,.. I X-)} - 'M, . 'tioD model E(Y. I X '1 '. - Xij{3 , mduced from an underlymg trans l ,
'J
.,Yikk<J).
,h t t with a model for Pr(Yil,"" Ytp ) The basic maximization algorlt m s ar s t' . The key . aJ' d t 'fon model assump IOns. using lower order margm lze , ran~1 I hat transition probabilities, to subsequent likelihood evaluatIOn IS t (.1M d fu tion of the parameters fJ an can be sequentiaJly recovered as a nc t 'to 1J9. directly but the 01, ... , a p • The dependence. ~arame~ers e:;r~~ al"~,. ,a ' We obtain p intercept ~ij(X,;} is an impliCIt functIOn 0
ttS,
230
LIKELIHOOD-BASED MET
HODS FOR CATEGORICAL DATA
EXAMPLES
test the assumption
Aij (X i) by solving the marginal constraint equation:
0'
pH
=. .
"Y1J,p+1
m
Pr(Yij
M 1:tij -
= 1 I Yij-I = Yj-I,""
}j-p
= '!Ij-p)
Y'j-J ,,,,,Y'J-p X
Pr(Yij-1
= Yj-J,""
Up + 1
Yij-P = y,j-p)'
In order to obtain a solution we require the initial state probability Pr(Y;1 = , v, = ,) from which all subsequent p-dimensional probabilities Y,I,"" L,p Y,p y: can be obtained by multiplying Pr(Yik = Yib"" ik+{p-I) = Yik+(P-I)) times jlik+p and then summing over Yik· Details for MTM(2) estimation are provided in Heagerty (2002). . , ' The computational complexity of MTM(p) hkehhood evaluatIOn for subject i is O(ni 2P ). Calculations required to compute and update the p-dimensional history are O(2P ) and each observation requires such calculations, Therefore, with a fixed dependence order the computational burden only increases linearly with the length of the observation series, ni· The order of alternative likelihood methods is generally much greater with the iterative proportional fitting algorithm of Fitzmaurice and Laird (1993) requiring exponentially increasing computational time, O(2 n ;). Azzalini (1994) suggested that the MTM(1) has general robustness properties but only established a consistency result for a restricted scenario. By viewing the MTM(I) as adopting a logistic regression for allows us t~ show t~at the MTM(I) is a special case ofthe mixed-paramet~rmodel of ~Itzma~l'lce and Laird (1993) with "f = VeCbij,l) the canonical log-linear mteract~on parameter. Appendix 1 of Heagerty (2002) provides details that show {3 and 0:1 a:~ orthogonal. The implication of orthogonality is that the ML. estimate . £ . " {3 rem' alllS consistent lor (3 M even if the dependence
J.if
model IS mcorrectly specified. Use of the MTM(I) ML t' t {3.M d a sandw' h . es Ima e an • IC .var~ance estimator (Huber, 1967; White 1982' Royall 1986) proVides ' "in, . . a lIkelthood mativat ed versIOn of GEE appropriate serial data SituatIOns since the point estimat {3.M . . errors can be obt' d . h e WIll be consistent and valid standard allle WIt out requ' .' modelling of "0/' , F mng correct Markov order or correct "J,I· or general pth-order models 13M ( not be orthogonal and .t ' and 01,,," op) may , conslS ent esti t' f requires appropriate de d rna Ion 0 mean regression parameters pen ence modellin One practical advantage of rna' .g. . . several simple proced rgmahzed tranSItIon modelling is that ures can be used t assumptions. First to est bl' h h 0 assess the dependence model ·· a l s t e a propna P ' t e order we can use direct t rausltIon models 'and . regress Y;,J on X d' approxImate score tests for dd ij an pnor responses. Second, f rom p to p + 1) take simpl anf a itiona I Iagged response (increased order e orms. For exampI ' e, usmg a MTM(p) we can
231
. = 0 WIth the statistic
n'i
= '" '" ~.~
Y:ij-(p+l) (Y:ij - {If),
''''I ]=p+2
J
·C· wh:re Ilij IS t h e fitted conditional mean obt . ThIS statistic only approximates th l'k r~med from the MTM(p) model. ignores BAil / oO'p+ I. The approxilnet I e. I ood s~:o~e ~tat.istic because it . I a e score statistIc I . t " . simp y evaluates the correlation b t hS In tutlve SlUce it . . e ween t e (p + 1) I d the condItIOnal residual obtained b fitt" ' . agge response and with the first p lagged responses. y mg a margmahzed transition model L'
11.3.5
"
-
Summary
In this section we have introduced marginalized models for cat . II gitudinal data. These models separate specification of th egoflca onM f h . . , e average response Ilij' .rom t e dependence among the repeated measurements. Depend~ ence IS characterize~. by a second regression model for MC = E(Y.' I Xi, A ij where ~ddl~lOnal variables in A ij are introduced'\o struc;~re ~orr~latJon. MargmalIzed models expand analysis options to allow flexible hkehhood-based estimation of marginal regression parameters.
!
11.4 Examples 11.4.1
Crossover data
We now revisit the two period crossover data (Example 1.5) using GLMMs, and marginalized models.
GLMMs. In Section 9.3.3 we analysed the 2 x 2 crossover data using a GLMM with a random intercept. In Table 11.1 we present estimates obtained using PQL, fixed quadrature, adaptive quadrature, and posterior summaries obtained using MCMC. For Bayesian analysis we use independent normal priors with standard deviation = 10 2 for the regression coefficients, and a gamma prior for the precision, I/G '" f(I, 1). Not surprisingly, we find that PQL greatly underestimates the random effects standard deviation, G I / 2 , relative to the numerical ML estimators. We find only minor differences between estimates obtained using fixed and adaptive quadrature. The posterior mean for G 1/ 2 is larger than the MLE by (5.82 - 4.84)/4.84 = 20%. This is typical when posterior distributions are skewed - the median of G I / 2 is 5.09 with a 95% posterior credible region (2.48, 13.48).
Marginalized models. In Section 8.2.3 we analysed the crossover dat.a with GEE, using the pairwise odds ratio to model within-subject cor~e~atl~n. In Table 11.2 we display estimates obtained using marginal quasI-lIkelihood
232
THODS FOR CATEGORICAL DATA LIKELIHOOD-B ASED ME
EXAMPLES 233
'k I'h d-based estimates for a GLMM analysis of Table 11.1. LI e I 00 . G H . ethods are based on 20 pomt auss- ermlte crossover data. Quad rat ure m quadrature. Fixed, quadrature
PQL
SE
Est.
SE
Est.
Bayes MCMC
Adaptive quadrature Est.
Mean
SE
SD
Conditional mean {3C 0.768 (0.396) Intercept 0.692 (0.420) Treatment -0.356 (0.419) Period
2.153 (1.031) 1.841 (0.928) -1.020 (0.836)
2.170 (1.098) 1.836 (0.900) -1.016 (0.795)
2.644 (1.684) 2.249 (1.322) -1.330 (1.160)
Variance components GI/ 2 1.448 (0.602)
4.969 (2.263)
4.843 (1.747)
5.824 (2.925)
Table 11.2. Likelihood-based estimates for marginalized model analysis of crossover data. Marginalized GLMM Marginalized transition
MQL Est.
BE
Est.
BE
Est.
BE
M
Marginal mean 13 Intercept 0.668 (0.356) Treatment 0.569 (0.379) Period -0.295 (0.378)
0.651 0.577 - 0.326
(0.275) (0.227) (0.222)
Variance components 1.239 (0.549)
5.439
(3.718)
QI/2 0<1
log-likelihood
We can also use either a marginalized 10g-1' transition model for analysis In the h Iilear model, or marginalized . . case were n· - 2 th t are Identical. For analysis we use ' ese wo approaches logit E(Y;2 I X
-68.15
-68.17
log-likelihood
for a standard normal rand a . om euect Ai In th' . the varIance component G1/2 h '. IS specIfication we see that · . " c aractenzes th . to-su bJect vanation in the 10 dds f e magllItude of subjectnot explained by the covaria~e~ X. 0 Trhesponse .tha~ is unmeasured, or . . -M " e margmahzed GLMM swn estimates, {3 in Table 11 2 . regres' · . are nearly Id t' I ~b tamed usmg GEE (Table 82) d h . en Ica to the estimates G I / 2 is similar to that obtai~ed :~h t e Ivar~ance component estimate, GLMM. a c asslcal conditionally specified
-68.11
0.674 0.569 - 0.295
(0.278) (0.231) (0.229)
3.562
(0.907)
-68.32
(MQL), and two different marginalized models R I f ' that penalized . l'k I'h . eca I rom SectIOn 9.2.2 qUasl- I e I ood (PQL) . h . . imate ML inference £ GLMM .IS a ~et od for obtammg approxs. In discussmg the d I t f PQL B reslow and Clayton or (1993) I eve opmen 0 , used for a marginal m a so develop estimating equations that can be ean assumed to b . d d' Clayton (1993) refer t th ' . em uce Via a GLMM. Breslow and . 0 ese estImatlllg e t' M relatIOnship to method d' . qua Ions as QL, and note the s Iscussed m Z t I ( latent variable model fo th eger ea. 1988). The marginalized r e crossover data assumes t h ' . structure and e margmal regreSSIOn
1':1} ,, '
= U ,(X-.) 2 , + 0'1' Y.iI, A.
where Ctl is the log odds ratio measuring the association between y. d Y;2 that !s not expl~ined by covariates. Again, w~ obtain ML estt~:~ for margmal regreSSIOn parameters that are similar to GEE t' t Th f t . . es Ima es. e es Ima e Ct1 = 3.562 mdlCates striking dependence, or concordance between Yi1 and Yi2. ' In smaller sample sizes it is often desirable to consider inference based on the likelihood function directly rather than the Wald statistic. In this data set we have a total of m = 67 subjects. Figure 11.4 shows a profile likelihood curve for the treatment parameter in the marginal regression model. The MLE, t3{'1 = 0.569, is indicated by a vertical dotted line. A dashed line shows the quadratic approximation to the log-likelihood that is the basis for Wald inference. The solid line is log Lp(.er) = log L{y;8(,B{'1)}, the log-likelihood maximized over (.ert , .e~ Ct1) for fixed values of ,B{'1. This profile curve can be used for interval estimation based on likelihood ratio test inversion since the difference between the maximum, logL{y;J(t3{'1)}, and logL p (,B{'1 = b) represents the likelihood ratio test of Ho : ,B{'1 = b. The horizontal dotted line is at (log L{y; 6(,Br)} - 3.84/2), and characterizes a confidence interval based on the critical value ofaX2(dj = 1). We find excellent agreement between the Wald and likelihood ratio-based inference for ,Bf'1 near zero but find differences for greater than 1.0 that lead to a slightly wider confidence interval.
,
.er
Summary. We have demonstrated that it is feasible to obt~in likelih~od based inference using either a conditionally specified-regre~slO~ coeffiCIent with a GLMM, or using marginalized models. The margmalized models adopt the dependence formulation of random effects ~odels, or of .Ioglinear models, but directly structure the induced margmal means VIa a regression model.
S FOR CATEGORICAL DATA LIKELIHOOD-BASED METHOD
234
EXAMPLES
-68
'\
'\ \
-69
\ \
\ -70
.
.........................
......................................... \
\ --
,'' ' '
\
1
\
:
\
i
1
-71
\ \
i i i i
i
\
\ \ \
\
i i
\ \
!
\
i 0.5
1.0
Tx coefficient
Fig. 11.4. Profile likelihood for the marginal treatment parameter using a marginalized log-linear madellikelihood (--) , and the quadratic approximation based on the MLE and the model-based information (------).
11.4.2
Madras schizophrenia data
Th~ :vradras Longitudinal Schizophrenia Study investigated the course of poslt.lVe .an~ negative psychiatric symptoms over the first year after initial hospItalIzatIOn for disease (Thara, 1994). Scientific interest is in factors that c?rre~ate with the course of illness. Our analysis addresses whether the longltudmal symptom prevalence differs across patient subgroups defined by . 11y address the primary question we age-at-onset and gende r. 't0 st a t'IstIca GL MMs and mar . al' d use d . gm Ize mo I'Is to estImate the interaction between . t Ime and age-at-onset and th . t . . . ' e In eractlOn between time and gender III . . a IagIstlc regressIOn mod I W . d e. e COmpare parameter estimates obtame using ML und d'!l' er luerent d e p e n d ' . d using GEE. ence assumptIOns, and estimates obtaIne The Madras Lon it d' I S tudy collected data on six common schizophrenia sym t g ~ Ina p oms. ymptoms were classified into positive symptoms
235
(hallucinations, delusions, thought disorder . affect, apathy Withdrawal) E h s) and negatIve symptoms (flat , . ac symptom was d d 0, ... , 11 during the first year followin ho . r.eco~ e every month j == The prevalence of each positi g sP.ltahzatlOn for schizophrenia. . . . ve symptom declInes from . at InItIal hospitalization to <2007' b h approxImately 70% 10 y t e end of the first _ h'l prevaIence of each negative symptom d ec es r ' . year, w I e the t llld from approxImately 40% initially to 10% after one year H . . eager y an Zeger (1998) I data usmg graphical methods to dis la h .'. . ana ysed these p y t e wlthm-subJect correlation as a function of the time la b t . g e ween measurements The . I ~~~d~penden~e structure suggests strong seriai corr:I:~i~:a:~;~:;l~l~s~~:: ~ oms an strong long-term correlation for the ne ative s m Y ptoms. SpeCIfically, to characterize the dependence between b' g th '. Illary outcomes we u~e I' paIrwIse log .odds ratio I'i.(j,k) defined by (8.2.2) in S~ction 8.2.2. In FIg. ~ 1.5 we plot estImates of the pairwise log odds ratio log{ . . } ver the t~me separation It!j. - tik I ~or each of the six symPtorns.'Yi~Jt~~is fi~~ we dIsplay both empIrIcal estImates obtained using crude 2 x 2 t bl and ~ sm~oth function of It ij - tik I based on regression spline me:ho~~ descnbed I~ H~agerty and Zeger (1998). For the symptom 'thoughts' we find th: paIrWIse log odds ratio between observations 1 month apart is approxl~ately 3.0, but this association decays to approximately 1.0 for observatIOns 5 months apart, and further decays to 0 for observations 10 months apart. Serial dependence models such as the MTM(P) appear appropriate for this symptom and the other two positive symptoms. For regression analysis we focus on the outcome 'thought disorders'. Not surprisingly~ for this specific symptom a large fraction of the N == 86 subjects are symptomatic at the time of hospitalization (56/86 == 65% at month = 0). The crude prevalence of thought disorders decreases during the first year with only 6/69 = 9% presenting symptoms at month == 11. To evaluate if the course of recovery differs for subjects with early ageat-onset or for women we fit logistic regression models with main effects for month (j = 0, ... ,11), age (1 = age-at-onset less than 20 years old, o = age-at-onset ~ 20 years old), gender (0 = male, 1 = female), and the interaction between time and the two subject-level covariates. Evaluation of the interaction terms determines whether the rate of recovery appears to differ by subgroup. The majority of subjects have complete data but 17/86 subjects have only partial follow-up that ranges from 1 to 11 months. Regression analysis of subject discontinuation (drop-out) suggests that subjects who currently have symptoms, Yij = 1, are at increased risk to drop-out at time j + 1 (odds ratio 1.716, p-value = 0.345). Women are also at increased risk to discontinue (odds ratio = 2.375, p-value = 0.11). The potential association between drop-out and the observed outcome data warrant co~ideration.of an analysis that is valid if the drop-out mechanism is MAR. WIthout specIal modification GEE is valid only if missing data are missing completely at
EXAMPLES (d)
Hallucinations
(a)
,
..
6
random (MCAR). See Chapter 13 for further discussion of missing data issues.
o'
6 o.
237
Flat affect
G~MMs.
We first consider analysis of the schizophrenia symptom data GLMMs. Our primary interest is in a group-by-time model that mcludes month, age, gender, and both month: age and month: gender product terms:
~smg
a: o
~ 2
o
'0
........ ~.... ..... .... -2
J.l5 = E(Yij I month;j, age;, gender;, V
L.....----.--.,----,c---.--~--J
4
2
6 delta
8
10
logit(J.l5) =
t
6
4
2
6 delta
(c)
. age;
+ (3f . month;j . gender; + Uij.
Table 11.3 presents estimates using a random intercept model, Vij == U;o. Similar to the crossover data we find that PQL underestimates the variance component, with (;1/2 = 1.827, versus (;1/2 = 2.222 using adaptive quadrature. Our Bayesian analysis adopts independent priors for the regression parameters, f3'j (normal with a standard deviation of 10 2 ), and specifies a gamma prior for the precision (G r-v f(2,2». The estimates obtained using MCMC are quite similar to the ML estimates. Since the conditionally specified GLMM uses a single regression equation that contains both covariates and random effects, the regression estimates /Jc need to be interpreted recognizing that U;j is controlled for. However, in the random intercept model each subject is assumed to have the same change in their log odds
10
8
),
f3~ + f3f . month;j + f3~ . age; + f3f . gender; + f3f . month;j
Apathy
(b)
ij
t
Withdrawal
6 ~
a: o
9
4
;
···Z- : :
I
...:
., ..;...j~....~:....~.._..~._} ... ~~.. _-~~._ ....;/ -
~ 2
;,
10., ~
•
0
0
2
_~._._•••.• _.
:
- :
4
6
8
10
delta t
Fig.
1l.~.
Serial dependence for the Madras schizophrenia data. Pairwise log ratIos are plotted against the time lag for each of six symptoms. Solid hnes are based on a natural spline model using knots at t = 3,5,7 months. The dashed lines ar p . t· 95~ h de . . e om WIse 10 confidence bands. Also shown are t ecru palTWlse log odds ratios and corresponding asymptotic 95% confidence limits. The '0' represents the crude zero-cell corrected log odds ratio computed from a 2 x 2 table of Y., versus Y. Th h e . . 'J ike - represent the confidence limits for t es pomt estimates.
~dds
I
PQL
0
~o •.•~ ~ :._._.~~.. _ :
Table 11.3. Likelihood-based estimates for a GLMM analysis of Madras schizophrenia data using random intercepts, Var(V;o) = G. Bayes MCMC
Adaptive quadrature*
Mean
SD
SE
Est.
SE
(0.388) (0.048) (0.570) (0.533) (0.087) (0.081)
1.087 -0.437 1.438 -0.931 -0.251 -0.086
(0.457) (0.055) (0.678) (0.632) (0.100) (0.090)
1.085 (0.477) -0.439 (0.057) 1.511 (0.683) -0.965 (0.649) -0.262 (0.100) -0.089 (0.092)
Variance components 1.827 (0.643)
2.222
(0.285)
2.259 (0.288)
Est.
Conditional mean f3c Intercept 1.005 Month -0.387 Age 1.180 -0.748 Gender Month.Age -0.204 Month· Gender -0.079 Gl/2
,
*The maximized log-likelihood using adaptive quadrature is
369.88.
238
LIKELIHOOD-BASED MET
HODS FOR CATEGORICAL DATA EXAMPLES
of symptoms over time: , . c) _ f3c + fJc . age; logit(t15+1) -IOglt(l1i) - I 4
239
subject-specific rate of recovery:
+ (35C . gender;
logit(115+d -logit(115) =
I3f + 13; . age. + I3f· gender, + U'I ,
celled when computing the w.ithin. d . t rcept U '0 IS can since the ran am III e 'm '. t f month.age measures the difference · t d'ff: . 8U bJec I er.ence.. The cae clen 0 g early age-at-onset subjects rel. h ate of recovery amon III t e common r f mong the late age-at-onset subjects · to common rate 0 recovery a at IVe. . . II . 'ficant using the random intercept model and appears statlstlCa Y slgm d on MLE). -0251/0.100:::: -2.51, p:::: 0.012, base . . -. d ' t cept assumption to allow eIther random mterWe relax the ran om III er a rnes) or to allow autocorrelated random euects. cepts and sIopes (ran dam I , . d I 11 . 4. In the random hnes rna e we assume · Table Results are presen t ed III
(z -
logit{E(Y;j I Xi, Ui)} :::: X;jj3C
+ UiO + Un' tij,
where tt ' :::: (j -1) represents the month of observation. This model assumes that ea~h patient is following their own linear recovery course, and has a
Table 11.4. Likelihood-based estimates for a GLMM analysis of Madras schizophrenia data using random intercepts and slopes, var(UiO ) = G u , var(Uid :::: GZ2 , corr(Uw•Uii) = R, and with random autocorrelated random effects cov(Uij,Uik):::: G· plJ-kl. Adaptive quadrature* Est. Conditional mean (3c 1.620 -0.620 1.616 -0.953 -0.212 -0.188
Intercept Month Age Gender Month·Age Month·Gender
Bayes MCMC
SE
Mean
SD
Mean
SD
(0.684) (0.129) (0.978) (0.922) (0.180) (0.175)
1.805 -0.709 1.801 -1.108 -0.223 -0.227
(0.780) (0.152) (1.107) (1.064) (0.224) (0.211)
2.162 -1.002 2.641 -0.976 -0.429 -0.369
(1.304) (0.266) (1.748) (1.731) (0.303) (0.297)
(0.555)
3.895 (0.640) -0.604 (0.140) 0.662 (0.114)
Variance components
d/11 2
R QI/2 22 QI/2
3.490 -0.691 0.534
(0.101)
6.322 (1.284) 0.856 (0.031)
p
'The maximized log lik I'h d - e I 00
.
USIng
adaptive quadrature is -345.94.
I'
!he coefficients of month·age and month. gende . In the average rate of recovery rath th d'~ now re~resent differences er ,an luerences In a co of recovery. The maximized log lik l'h d' mmon rate . e I 00 mcreases from - 369 88 £ th ran dom Intercept model to -345 94 £ th . ,or , e increase in model fit. Allowing h~tero;:nei; rr:~:~~ hne.s ~lO~el- a la~ge s~opes increases ~he standard error of the ti~e mai; :~~~te~~dll~::~~slOn tlO~ t~rm coefficlent estimates. While a random intercept model i~~i;:: a slgmficant month·age interaction the estimate u . . r d r d I . " . . smg a more appropnate a~ am mes mo e IS no longer significant (Z = -0.212/180 = -1.18, PMC-MO C·238 , based on MLE). Once again, the estimates obtained using are comparable to the MLEs. Finally, we consider the autocorrelated random effects model logit{E(li)· I Xi, U;)} =
x',(.IC 'JfJ
+ U')'
where cov(Uij , Uik ) = G· p1t,;-tikl. The random intercept model is actually a special case of the autocorrelated random effects model where p ::;; 1, or all Uij are perfectly correlated. The autocorrelated random effects model is prohibitive to fit using ML since the dimension of the random effects in this example is ni = 12. Bayesian estimation using MCMC is felUlible. Table 11.4 presents posterior means and standard deviations using vague independent priors for I3f, a uniform prior for p, and a gamma prior (a r(2, 2) prior) for G I / 2 that is truncated to lie on (0.01,20). The correlation parameter p has a posterior mean of 0.856 with a posterior standard deviation of 0.031. The posterior 95% credible region (0.792, 0.913) is relatively far from the value that defines the random intercepts model, p = 1. The standard deviation of the random serial process is estimated IUl 6.322 (posterior mean). This estimate is substantially larger than the estimated standard deviation in the random intercepts model, or the estimated standard deviation for UiO+Ui1 · tij in the random lines model which ranges from 3.490 when t ij = 0, to 5.702 when tij = 11 (based on MLEs). In general, the posterior point estimates are larger using the autocorrelated random effects model relative to the random intercept or random line models. For example, comparing posterior means for the coefficient of month· age we find the random lines estimate is 1-0.223+0.4291/1- 0.4291 = 48% smaller than the autocorrelated process model estimate. Regression parameter interpretation in this model is somewhat complicated for cluster-level covariates since the regression coefficients are defined in a conditional model that controls for Uij , a random effect which varies over both subjects and time (see Heagerty and Zeger (2000) for a detailed
240
LIKELIHOOD-BASED ME
THODS FOR CATEGORICAL DATA
. t ts this regression model specifies diseussion). However, for tIme con ras C . ( c) _ f3c + f3c, agei + genderi + (ViJ+l - Vi})' logit (P'iH I) - loglt P-ij - I 4 C , . ' r sub'ect i yields f3f + age; + 135 . genderi, Averagmg over tIme, }, 0 .J fthese parameters as the average rate of . 'd' g the interpretatIOn 0 agam provI m . t bgroups In summary, GLMMs can be specific covana e su . recovery among t' nd allow a variety of dependence models d t ompare groups over Ime a H use 0 c h d'~ t random effect assumptions. owever, as g to be considered throu hI eren in the dependence structure need to be see in this example, C anges· , . b' we . t' the estimated regressIOn coeffiCIents 0 tamed considered when mterpre mg using different heterogeneity models.
fJf .
EXAMPLES
Table 11.5. GEE estimates for a 10 . t' . nia symptoms. Model based d ~ .IC regressIOn analysis of schizophrean empmcal standard errors are presented.
GEE ~ independence
fJr .
, I'zzed made1s. We now use marginalized models for analysis of the Mamma symptom data. We specify:
= E(Yij I monthij , agej, genderi), logit(J.t~) = fJff + f3r . monthij + 13:;: . agei + 13~ . genderi + f3!t ,month;j . agei + 13~ . month;j 'gender;. l.l~
In the marginal model we essentially use the subscript i to identify a specHic covariate subgroup rather than an individual, and characterize the difference in the prevalence of symptoms over time for subgroups:
logit(J.t~+I) - 10git(J.tm
= I3r + {3!t ' agei + 13~ . genderi ·
In contrast to the GLMMs the interpretation of the parameters in the marginal mean do not depend on the specific correlation model that is chosen for analysis. Table 11.5 presents GEE estimates for the marginal mean model, '( J.tij M) Ioglt
I M. = Xij{3
!here is a non-significant suggestion that the rate of recovery (time slope) IS faster am~ng women" but recovery does not appear to depend on ageat-onset. Usmg GEE WIth a working AR(I) model yields a coefficient for the age. ~y month interaction of &~ = -0.101 with a Z statistic based on = 119 ,p- vaIue -- 0235 Th empmcal standard ' errors of -0.101/0089 ' . . . e gender by month mteraction is weakly suggestive with 13M = _0.150, Z = -0.150/0.089 = -169 5 . . ,p-vaIue = 0.091. However if a working mdependence m d I ' d ' , 0 e IS use we obtain f3~ = -0.113 with Z = -0.113/0.096 ::::; 118 . ,p-value = 0 238 Un£ rt I model ' ". 0 unate y, the choice of working dependence can Impact pOlnt est' t d' objective criter' Ima es an slgnificance levels, and without Ion we cannot formall h . . II valid estimators. y c oose among varIOUS asymptotlca Y
241
Variable Marginal mean Intercept Month Age Gender Month·Age Month· Gender
Coef. Mod. SE Emp. SE
GEE - AR(I)* Coef.
Mod. SE Emp. SE
13 M 0.643 -0.254 0.811 -0.388 -0.137 -0.113
(0.202) (0.038) (0.305) (0.286) (0.064) (0.063)
(0.305) (0.059) (0.493) (0.449) (0.094) (0.096)
0.553 -0.235 0.638 -0.161 -0.101 -0.150
(0.296) (0.053) (0.440) (0.412) (0.089) (0.090)
(0.291 ) (0.055) (0.461) (0.420) (0.085) (0.089)
*The estimated lag-l correlation is p = 0.590.
By using a likelihood-based method rather than a semi-parametric method we are able to compare alternative dependence models using likelihood ratios. Table 11.6 presents ML estimates adopting both first-order and second-order marginalized transition models. The simplest marginalized transition model is the first-order time homogeneous dependence model, 'Yij.l = G'1,D· Table 11.6 presents the mean regression and dependence parameter estimates for this model. Point estimates and standard errors for (3M are quite close to those obtained using GEE. The estimated first-order coefficient ch,D = 3.166 indicates that the odds of symptoms at month = t are exp(3.166) times greater among subjects who previously had symptoms, }j-l = 1, compared to subjects who previously did not have symptoms, }j-l = O. Recall that since the MTM(I) has {3M orthogonal to 0, the resulting ML estimates, (3, are consistent even if the dependence model is incorrectly specified. To evaluate the specification of the dependence order we can compute score tests using only the MTM(l) fit, or use likelihood ratio tests comparing second-order to first-order models. The score test of 'Yij,2 = Q2,0 = 0 obtained from model 1 is 1.428 with p-value = 0.232. Model 2 is a secondorder marginalized transition model with scalar first- and second-order coefficients. The first-order coefficient model includes the variable 'initial' an indicator variable for month = 1, allowing 01 to be used for both the second-order model, Yij I Yij-l, Yij-2, and the initial state, Yo'I I"'~ which is a purely first-order distribution. The second-order LiD, .. coefficient, &2,0 = 0.650, is significant based .an the W~d stat.1Stlc, Z = 0.650/0.295 = 2.20, p-value = 0.028. ComparIson of devlances ylelds, 6.D = 2 x (337.19-334.44) = 5.50, p-value = 0.064 on 2 degrees of freedom.
242
LIKELIHOOD-BASED METHODS
FOR CATEGORICAL DATA
SUMMARY AND FURTHER READING
Marginalized transition models fitted . ML . ' · usmg permit a thorough mo d e I- base d anaIYSlS of schizophrenia sympt U1 . . 'k l'h d ' . oms. vve use the maximIzed d log- lI e I 00 to establish an appropriate d . '. epen ence model, and then evaluate the eVIdence regardmg differences i th d' n e Isease Course across su bgroups d efi ned by age-at-onset and gender W fi d h . . h . e n t at a second-order tIme m. omogeneous dependence model is appr opna . t e. C,orrespondmg . . regressIOn estImates of the marginal mean structu . d' t h h . re III lca.e t at t e rate of recovery does not vary significantly among tile c0 vanae 't su bgroups.
.' tions and ML estimates using Table 11.6. Generalized estlmatmg. equa ' . ms marginalized transition models for sc!llzophrema sympto . MTM(2) MTM(l) SE Coef. SE Coef. SE Coef. Variable Marginal mean 13 Intercept Month Age Gender Month· Age Month· Gender
M
0.534 -0.236 0.650 -0.142 -0.112 -0.144
First-order coefficient 01 3.166 Intercept Initial Month Second-order coefficient 02 Intercept
log-likelihood
-337.19
(0.300) (0.054) (0.442) (0.413) (0.086) (0.083)
0.576 -0.238 0.588 -0.150 -0.101 -0.140
(0.301 ) (0.052) (0.439) (0.412) (0.086) (0.084)
0.568 -0.234 0.619 -0.161 -0.100 -0.149
(0.295) (0.054) (0.434) (0.407) (0.091) (0.089)
(0.228)
2.911 -0.260
(0.291 ) (0.633)
2.099 0.403 0.156
(0.559) (0.740) (0.096)
0.650
(0.295)
0.597
(0.293)
-334.44
Summary,
W~ have used both GLMMs and marginalized models to
comp~re covanate subgroups over time. In the GUv[M we interpret the
-332.93
We can allow the dependence to change over time using the second-order model, 'Yij.l = 0:1,0 + 0:1,1 . initial + O:l,amonth, and 'Yij,2 = 02.0, whieh allows the coefficient of l'ij-l to depend on time (we could also allow the second-order coefficient to depend on time). Model 3 yields a maximized log-likelihood of -332.93 and 01.3 = 0.156, Z = 0.156/0.096 = 1.625, and reduction in deviance of 6.D = 2 x (334.44 - 332.93) = 3.02 on one degree of freedom (p-value = 0.082) relative to Model 2. Model 3 also achieves the smallest AIC value among the first-order and second-order m?dels considered. A score test for third-order effects leads to 0.072, indieatl~g the adequacy of a second-order model. The observed time trend in se~lal dep~ndence is expected in situations where patients stabilize (either WIth or WIthout symptoms). Fi~ally, ha:ing settled on a dependence model we can assess whether the difference m recove t . . ry ra es comparmg early age-at-onset to late age. at-onset subjects ((3 ) and 4 ~ompanng women to men (f35) are significant. U. Slllg Model 3 ~e obtain (34 = -0.100, with Z = -0.100/091 = -1.10, p = 0.271 and (35 - -0 149 'th Z ' - ' , WI = -0.149/089 = -1.67 p = 0.094. Although both early age t t b' , -a -onse su Jects and women appear to have faster recovery rates subgr d'fF . nominal0.05level. oup I erences m recovery are not significant at the
243
I
I
I'
,
coeffiClent of month as an average within-subject change in the log odds of ~Yn:Ptoms for ~ubj,ects in the reference group (age = 0, gender = 0), whIle m the margmalized models we interpret the coefficient of month as the change in the log odds of symptoms over time for the reference group. Primary scientific interest is in the interaction terms which have parallel interpretations in terms of differences in average rates of recovery for conditional regression coefficients, f3c, or as differences in the changes in the prevalence log odds across patient subgroups. Using likelihood-based methods we can compare the maximized log-likelihood for the various models. The GLMMs yield -369.88 and -345.94 for the random intercept and random slope models, respectively. The marginalized transition models yield -337.19 for the MTM(l) and -332.93 for the time inhomogeneous MTM(2). The maximized log-likelihoods suggest that the serial models provide a better fit for this symptom. Based on Fig. 11.5 we anticipate that serial models will fit the positive symptoms (hallucinations, delusion, thoughts) while models for negative symptoms (flat affect, apathy, withdrawal) will require a random intercept to characterize long-term within-subject dependence.
11.5 Summary and further reading In this chapter we overview ML methods for the analysis of longitudinal categorical data, We focus on the common case of a binary response. . . ,m Chapter 9 0 f cand't' We extend the dlscusslOn I 10 nally specified GLMMs to include details regarding ML and Bayesian estimation me~hods. Idn ' h 'fy the margmal rna Section 11 3 we discuss marginalized mo d eIs wh IC um . d . ffi t models dlscusse m . . d els discussed in Chapter 8 WIth the ran om e ec ~ 10 Chapter 9 and with the transition models discussed m Ch~Pterl' d . tructure to be dIrect y assume Marginalized models allow a regresSlOn s . d ea'ects . d f 'ther a log-lmear, ran am 11' , for the marginal mean mduce rom el t 'tI'on mod. I . dom effects or ranSI or transition model. Alternative y, ran 'fi t' n as discussed in d't' I mean specr ca 10 . els can be fit using theIr con 1 IOna . d d ginal means obtained Chapters 9 and 10 and then estimates of III uce mar
244
LIKELIHOOD-BASED METHODS FO
R CATEGORICAL DATA
dT I) model (see Lindsey (2000) for by marginalizing the fitted (con II JO~ta atl'ons where the marginal struc'f del) n SI u details using a transl IOn mo . back f an indirect approach which fits . . t est the draw 0 tum is of prImary m. er d i d then computes marginal summaries, a conditionally specified mo e a~ umed for a conditional mean, J.l0, . h ession structure IS ass IS that w en regr . M t have the same simple regres. d . I means J.l .. may no then the mduce margma .'. lJ, a roach may not facilitate simple sion structure. Therefore, an mdlrect pp . I ft t . . d t'mates or tests for margma e ec s. covariate adJuste es I ' d . GEE and the efficiency of GEE Mar inal models can be fitte usmg .., g . . d b d It tives is discussed III Fltzmauflce et al. relative to hkehhoo - ase a ema I 'fi d . . (1995) d Heagerty (2002). A proper y specl e , an . 'fi (1993) ' Fltzmauflce ad ffi' t parameter estimates, but when mlsspeCl ed likelihood Ie s to e Clen .' . · d can lead to bIll.'le es t'Imat es. The bias of mlsspeclfied ML estImates for clustered data is discussed in Neuhaus et al. (1992), TenRave et al, (1999), and Heagerty and Kurland (2001). In this chapter we have focused on binary response data. Approaches £ I ngitudinal ordinal data are overviewed in Molenberghs and Lesaffre (~~9~). Heagerty and Zeger (1996) discuss methods based. on log-linear models, while Agresti and Lang (1993) and Redeker and GIbbons (1994) present random effects models. We have focused on GLMMs and marginalized models. Alternative estimation approaches for hierarchical random effects models are discussed in Lee and NeIder (1996, 2001). Alternative ML approaches that directly model marginal means are found in Molenberghs and Lesaffre (1994, 1999), and Glonek and McCullagh (1995).
12
Time-dependent covariates 12.1 Introduction One of the main scientific advantages of conducting a lona-itudinal t d b o' su Y . h b'l' IS tea I Ity to 0 serve the temporal order of key exposure and outcome events. Specifically, we can determine whether changes in a covariate precede changes in the outcome of interest. Such data provide crucial evidence for a causal role of the exposure (see Chapter 2 in Rothman and Greenland 1998). ' There are important analytical issues that arise with time-varying covariates in observational studies. First, it is necessary to correctly characterize the lag relationship between exposure and the disease outcome. For example, in a recent study of the health effects of air pollution the analysis investigated association between mortality on day t and the value of exposure measured on days t, t-l, and t-2 (Samet et al., 2000). Subject matter considerations are crucial since the lag time from exposure to health effect reflects the underlying biological latency. Also, the relevance of cumulative exposure or acute (recent) exposure depends on whether the etiologic mechanisms of exposure are transient or irreversible. Second, there is the issue of covariate endogeneity where the response at time t predicts the covariate value at times s > t. In this case we must decide upon meaningful targets of inference and must choose appropriate estimation methods. In this chapter we adopt the following notation and definitions. ~e assume a common set of discrete follow-up times, t = 1, 2, ... , T, With a well-defined final study measurement time T. Let Yit be the response on subject i at time t. Let Xit be a time varying covariate and Zi a . of baselme, . . . ' t covan'ates . F10r simplicity we assume or tlme-Illvarlan , collectIOn that only a single time varying covariate is considered for analYSIs. We also assume that Y. t and Xit are simultaneously measured, and that for I I t v, d' ctly with Xt. However, cross-sectional analyses we.would corre a e ~ it Ire . • . that only prevIOus covarIate for etiologic or causal analyses we assume . are potential causes of Yit. Thus, III terms · measurement s. X it-I, X tt-2,'" the ex osure directly after of causal ordermg we assume that Xit represents P lit rather than before.
TIME-DEPENDENT COVARIATES
246
AN EXAMPLE: THE MSCM STUDY
. X·,t need to be determined ' fl the covanate . . I in order h that relate a longItudma response Factors that In uence " to select appropriate analySIS approac .es I nalysis Kalbfleisch and Prentice . , te In surVIVa a . , to a time-dependent covafla . , t as 'the output of a stochastIC t mal covana e . ( 1980 p. 124) de fi ne an zn e . d' 'dual under study' in contrast to ' . t d by the III IVI Process that IS genera e 'fi ' th t 's not III uenc ed by the individual under study. an external covanate a J,. I't t e the term endogenous is typically . . h nometncs I era ur Similarly, t'In t e ecO• bles t hat are s t chastically determined by measured o , used to reler to vafla d b ation while exogenous van abies are factors within the system ~dn etrho sertv under study (Amemiya, 1985). . d b f ctors OutSI e e sys em determme y ametncs . I'Iterat ure precise definitions of covariate exogeneI th n e econo t f onditional independence, and statements ity involve both statemen soc x - X . X. X } as the his'H i (t) - { ,1,, '1,2, I, .. ,d fi,t H (t) _ ' Define of parameter separat IOn. Y tor of the covariate through time t, and SImI ar y. e ne i _ y,y y, , .. " Yit}. A covariate process is wIth ,respect to the { ,1, ,2 process 1'f the covariate at time t is condItIonally mdependent of outcome , I all preceding response measurements. We define endogenous SImp y as the opposite of exogenous. Formally the definitions are
exogen~~s
I 'Hi (t), 'Hf (t f(X it I 'Hi (t), 'Hf (t -
exogenous: f(X it endogenous:
1), Zi) = f(X it 1), Zi)
i f(X it
IHf (t IHf (t -
1), Zi), 1), Zi),
where f(x) represents a density function for continuous covariates and probability otherwise. This definition is not the same as that given by Engle et al. (1983) since we have not further discussed specific implications of this assumption, nor commented on the relationship of the process Xit to parameters of interest (our statement is referred to as 'Granger noncausality' by Engle et al. (1983)). Our definition is essentially equivalent to that given for statistical exogeneity by Hernan et at. (2001). One implication of the assumption of exogeneity is the factorization of the likelihood for (XiI! Yit}:
!IX"
Y, I Z,; 9)
~ [n fIl-;, 111;It - 1),111' It - 1), Z,; 9)] x =
[n
!(X"
1111' (t -
1), Z,; 9)]
.c y (9) x .c x (9).
If we further assume th t 9 - (9 9) , . . a - I , 2, where (h and 9 2 are vanatIOn Independent (i.e. (9 1 E ( 1 ) X (9 E ( ), and that 9 is the 2 1X parameter of mterest, then Engle et al. (1983) d fi2 th as strongly ex £ e ne e process ,t ogenous or the parameter 9 1 , One motivation for introducing
~arameters
.
247
the concept of strong exogeneity is that wh th '". I"k I'h d b d' leI 00 - ase mference regarding 8 can en d't'e assumption, IS satisfied ' • t' • 1 con 1 IOn loss,of and therefore analysis can proceed 'thon tXit h WIthout " IlllormatJOn, an explicit model for X·It" WI ou aVlOg to speCIfy implication of exogeneity is that the e t t" f A second important I h ' xpec a. Ion o vL it con d't' 1 JOna on t e entire covariate process (X X X) d d I th ' 11, t2,,"", iT WIll epen on y on e covanates prior to time t Rorm II 1 . exogenous " a y, w len a process IS E(Vit!XiI,Xi2"",Xit"",XtT,Z,)=E(Y.tIXI X t
t,
X· 12,· .. ,
tt-l
Z.) 1
l'
Exogeneity is actually a stronger statement since it implies that Y.. . d ·· II' d It IS con ItIona y III ependent of all future covariates
12.2 An e~ample: the MSCM study Alexander and Markowitz (1986) studied the relationship ?etwe~n n:aternaI 1 ' t' health care utilization, The mvestlgatlOn was d emp oyment an paed Ia nc . that have occurred motivated by the major social and demographIC changes 'd 'th ' 1950 on Iy 12% of marne women in the US since 1950. For example, m . WI t I e while ey preschool children worked outSI'de th e ho m , in 1980 tapprOXlma f the labour 'ld d the age of SIX were par 0 . 45% of mothers WIth ChI ren un er . d the effect of mothers' force. A significant body of research has examme
248
TIME-DEPENDENT COVARIATES
AN EXAMPLE'. THE MSCM STUDY
work on cogm't'lve ane1 ROC!.'al ''''pects of child development while only limited , h ' t' tIthe impact on paedJatflc care utIlIzatIOn, The pnor rf'.Reare mvcs ,1ga ,cc ' , . Mothers' Stre.Rs and Children's Morbidity Study (MSCM) enrolled N = 167 I b et,we,.Dn th" ages of 18 months and.'} years that attended prf'.Rr:h00 I ehI'l (ren , 't ae(ll'atrl'c clinic To be eligible for the study chIldren needed an mner-ci y P '" " " . 'th tlleIr . nlother conditIOns, At , , WI, to be IIvmg ' and free . of chromc , . . entry, . mothers provided demographic and background mformatlOn regardmg their family and their work outside the home, During 4 weeks of foll()w~up daily measures of maternal stress and child illness were recorded, A final data source included a medical record review to document health care utilization, We use these data to illustrate statistical issues that pertain to regression analysis with time-dependent covariates, The specific scientific questions that we consider include:
,<0
, .
.'
"
'
•
249
Illness
,
30 25
. . '
c:
~ Q;
D..
20 15 10 5 0 0
5
10
15
20
25
Day
1. Is there an association between maternal employment and stress? Stress
2. Is there an association between maternal employment and child illness?
30
3. Do the data provide evidence that maternal stress causes child illness? A total of 55 mothers were employed outside the home (55/167 = 33%). We will refer to mothers that work outside the home as 'employed', and will refer to mothers that do not work outside the home as 'non-employed'. The analysis data contains additional baseline covariates including self-reported maternal and child health, child race, maternal educational attainment marital status, and household size. Time-dependent measures for household i included daily ratings of child illness, lit, and maternal stress, Xit, during a 28-day follow-up period t = 1,2" , . ,28. In the first week of follow-up the prev~ence of maternal stress was 17% but declined to 12%, 12%, and 1?% m weeks 2 through 4. The prevalence of child illness similarly declined shghtly ~om 16% in the first week to 14%, 11 %, and 11% in the subsequent weeks. 12 ' 1 shows th ecrud e weekly stress and Illness ' f TFIgure ' prevalence for amfil les With employed mothers and for non-employed mothers. For illness we nd large day-to-day vana 't'Ion b ut observe a trend of slightly decreased prevaIence among children wh th for stress is mar I' ose rna ers are employed. The time course e camp lcated with I d h .., , emp oye mot ers Illitially havmg a higher rate of stres b t f stress than non_empsl uda terhweek 14 the employed mothers report less aye mot ers. To meaningfully address the " '11 asSOCiatiOn between maternal employment and either st~ess or I ness we need t 1£ . confounders. For exam I T bl a contra or several potentIal ing mothers were high Psceh' la e 12.1 shows that the majority of work00 grad uates wh'l 1 OJ' mothers were high school ad 1 e on y 4170 of non-employed gr uates, Also c d th ' ompare to the non-employed mot hers the employed . rna ers were m rk I sus 43%) and to be white (62 W are 1 e y to be married (58% ver/0 versus 37o/c) S' . causaI pathway that lead f o. lnce stress may be m the s rom employment t 01'II ness we do not adjust for
o
25 20 E Q)
.-..
Q;
15
•
.
D··
0
·•..J;1 D
0
•
•
•
--_Ii!.. _
....... D'"
Employed = 0
--_. Employed =1
•• 0 .......................
0
1
D .0
° ·~·~::·~-""~~lJ.:::::::.:-I:J.. ~-- ...... --.G--...a..tJ.._~....
D..
10
o
5
0
o
•• .---.--0 0 •
•
-0
---_a •
•
0 0
5
10
0
• 15
20
25
Day
Fig. 12.1. The prevalence of maternal stress and child illness in the MSCM study during the 28 days of follow-up for those families where the mother worked outside the home (employed = 1) and those families where the mother did not (employed = 0).
any of the daily stress indicators when evaluating the dependence of illness on employment, Similarly, we do not adjust for illness in the analysis of employment and stress. Therefore, the only time-dependent variable in our initial analyses is the study day (time) - a non-stochastic time-dependent covariate. Table 12,2 presents unadjusted and adjusted log odds ratios for the primary exposure variable, maternal employment, and both of the longitudinal outcomes, Using generalized estimating equation (GEE) with working independence the crude association that adjusts for a common temporal trend indicates that working mothers are slightly less likely to
TIME-DEPENDENT COVARIATES
AN EXAMPLE: THE MSCM STUDY
250
251
. ~or mothers who were . t mmarles Table 12.1. Covafla e su d those who were not. employed outside the home an d 0 1 Employe n = 112 Employed n = 55 (o/c)
(%)
Married 0= no 1 = yes Maternal health 1,2 = fair/poor 3 = good 4 = very good 5 = excellent Child health 1, 2 = fair/poor 3 = good 4 = very good 5 = excellent Race 0= white 1 = non-white Education o S high school 1 = HS graduate Household size o = less than 3 1 = 3 or more
0
42 58
57 43
9 33 47 11
17 34 32 17
7 7 55 31
5 16 46 33
62 38
37 63
16 84
59 41
38 62
31 69
have ill children (estimated odds ratio = exp(-0.12) = 0.89) but are nearly equivalent in their rates of reporting stress. Adjustment for covariates has a minor impact on the coefficient of employment in the analysis of illness and indicates a non-significant difference between employed and non-employed mothers. In the adjusted analysis of stress we find a different time pattern for employed and non-employed mothers with a significant group-by-time interaction. Therefore, using GEE we conclude that there is a significant decline in the rate of child illness over the 28 days of follow-up but that there is no significant difference between employed and non-employed mothers. For stress we find a difference in the rate of decline comparing employed and non-employed mothers with a negative but non-significant time (week) coefficient of -0.14 for non-employed mothers, and a time (week) coefficient of -0.14-0.20 = -0.34 for the employed mothers. The regression methods
Table 12.2, Logistic regression analysis of the . t' b assocla ,Ion etween and b~th longitudinal illness and stress using GEE with an Illd:pendence workmg correlation matrix. Time is modelled usin the vanable week = (day-14)/7. g
~mployment
Coef.
SE
Z
Coe£.
SE Z Coef. SE Z Illness Intercept -1.86 (0.11) -16.44 -0.50 (0.39) -1.26 -0.50 (0.39) Employed -1.26 -0.12 (0.17) - 0.69 -0.14 (0.17) -0.83 -0.15 (0.18) Week -0.83 -0.19 (0.05) - 3.59 -0.19 (0.05) -3.59 -0.19 (0.06) Married -3.06 0.55 (0.15) 3.69 0.55 (0.15) 3.69 Maternal health -0.06 (0.10) -0.57 -0.06 (0.10) -0.57 Child health -0.32 (0.09) -3.68 -0.32 (0.09) -3.68 Race 0.48 (0.16) 2.91 0.48 (0.16) 2.90 Education -0.01 (0.20) -0.04 -0.Q1 (0.20) -0.04 House size -0.75 (0.16) -4.84 -0.75 (0.16) -4.84 Week x employed -0-02 (0.12) -0.17 Stress Intercept -1.91 (0.10) -18.50 -0.13 (0.45) -0.29 -0.12 (0.45) -0.27 Employed -0.04 (0.20) - 0.20 -0.25 (0.19) -1.28 -0.28 (0.19) -1.42 Week -0.20 (0.05) - 4.37 -0.21 (0.05) -4.41 -0.14 (0.06) -2.49 Married 0.34 (0.16) 2.12 0.34 (0.16) 2.12 Maternal health -0.29 (0.10) -2.91 -0.29 (0.10) -2.91 Child health -0.26 (0.10) -2.57 -0.26 (0.10) -2.58 Race 0.21 (0.18) 1.17 0.21 (0.18) 1.18 Education 0.52 (0.18) 2.85 0.52 (0.18) 2.86 House size -0.46 (0.16) -2.78 -0.46 (0.16) -2.79 Week x employed -0.20 (0.10) -2.13
that we have introduced in Chapter 7 are well suited for ~alyses that focus on the characterization and comparison of groups over tIme. The final scientific question seeks to determine the casual.effect of stress h'ld 'll Figure 12.2 shows raw stress and illness series for 12 r~ on C I I ness. .. I t f variation in the reportmg domly selected famIlies. We find a arge amoun 0 t during foIlowI bject (#219) reports no s ress of stress. For examp e, one su d f t in the first week up while another subject (#156) r~ports 3 h.a~s 0 sk re:.d 3 days in week 1 day in the second week, 5 days III the t Ir .wee , four. Analysis of these data raises several questIOns: . l'IOn between stress on day t and 1. What is the cross-sectional assoCia
illness on day t? d on dav (t _ k) 'J 2. Does illness at day t depend on prior stress measure for k = 1,2, ... ? t al stress on day t? Does ~~~ predict maternal stress? 3. What are the factors that influ~COe child illness on day (t - k) for k - , , 1
;n;
TIME-DEPENDENT COVARIATES
252
IIIn:SSW-W N
Vq
!'"
14
7
!
"i ~.i
.
21
t'ln~s:-vJL N ~ ~~
IL......i i i.....
Str:ss N ,u4/:......
14
7
28
--A.JJ\-.
IIIn~SS N
21
-"
Y If' Stress: ..c Nlli 1
14
7
28
21
7
28
Y
N ooeu.,t,U,,,.UUleet',,,e
Stress *,11II,:
'.'
14
...
21
N
7
28
14
60
..
21
28
Day
SubJecl =102
SUbject = 42
SUbject = 94
n .
IIln:s~ N Y •• • Stress ,,:: ': /': N 01.
14
21
V
7
28
14
21
Day
Subject = 7
Subject = 96
Subject=34
, ,.. , ::.
~"'_il." ~
7
21
...
Day
I.
N ",
14
..
Day
IIIn~ss...A .. " Stress .'\
7
28
. . **,......,u........
14 Day
21
28
IlIn~ULy Stress
t t~~~. ~" r r'\ 14 Day
21
28
28
Illn:s~ N y
.. ' ,'.:' ':, ' . N '_4..,6 .~ i. i ~ •• 10, ... ~ • 7
Stochastic covariates: full and partly conditional means
12.3
y
Day
7
Xii is endogenous meaning that it is both a predictor of the outcome of interest, and is predicted by the outcome measured at earlier occasions. Certain authors refer to scenarios where the covariate influences the response, and the response influences the covariate as feeAlback (Zeger and Liang, 1991). In this situation the response at time t-l, l'iI- , may be both 1 an intermediate variable for the outcome at time t. lit, and a c~nfounder for the exposure at time t, Xii. This situation leads to a consideration of proper targets of inference and appropriate methods of estimation and is discussed in Section 12.5.
Illness
Day
IlIn~s~
28
Subject = 129
'" '1'
21
Day
lI'n:s~ N ........ .., .. ,.." ........ Y t Stress.; N Ai ".Jr
14
N·;. •·
Subject = 117
Subject =112
7
y
Illness
Day
Day
253
Subject = 156
Subject = 110
Subject =41
Stress~! N .l: \
STOCHASTIC COVARIATES
~
~
Stress! \
N.'
~
w
Jr
7
r
u ",,:
i\i1
!\
14
21
': . •• 28
Day
Fig. 12.2. A random sample of data from the MSCM study. The presence or absence of maternal stress and child illness is displayed for each day of follow-up.
With longitudinal data and a time-varying exposure or treatment there are several possible conditional expectations that may be of scientific interest and thus identify useful regression models. We distinguish between partly conditional and fully conditional regression models since the taxonomy identifies models whose parameters have different interpretations, and relates to assumptions required for valid use of covariance weighted est~ation, such as with linear mixed models, or with non-diagonal GEE working models. For example, if we are interested in the relationship between a response on day t and an exposure on the same day, then we can use E(Y;t I Xjd to characterize whether the average response varies as a function of concurrent exposure measurement. We may also hypothesize a la~ between exposure and ultimate impact on the response so focus our analySIS on E(Y;t I X~t-d, or more genera11y on E(Y.:>t I X .t-k ) for some value ofh k. ~Alternatively, d I the entire exposure history may predict o~tcome .and t dere o~ we m~ .; X X·) erhaps allowmg a Simple epen ence 0 .t E(lit I XiI, i2,.... , >t-I, P * = ~ Xis' Finally, we may target on the cumulative exposure, Xit L:s
i
0
The first question considers the marginal association between a longitudinal response variable and a time-dependent covariate. In Section 12.3 we discuss regression methods that can be used in this context. The second question deals with the analysis of a stochastic time-dependent covariate and the specification of covariate lags. This issue has received attention in time-series literature, but we only discuss methods for finite lag models since longitudinal series are typically short and methods for infinite series are not necessary. Finally, question three addresses whether the covariate
•
,
••
0
•
0
TIME-DEPENDENT COVARIATES
STOCHASTIC COVAR.lATES
254
. . endo enous the full covariate condimay depend on any or all When the cavan ate process IS 1/ I X, ~ = 1, 2, ... , , k th tional mean, E( I it . . ' . te is exogenous we now at when a covana f H Xi.' owever, 1/ X X, 2 ., Xii)' If we urther couariates • T) - E( I ; I it-I, ,t-,· , E(Yit IXi. 8 = 1,2, ... , 'tt posures predict the response then I the k most recen ex X, ) _ assume that on Y . ' . ll' E(Y.'t I X·.t-, I X , t -2,"" Xit-k, ... , 21 we obtain further slmphficatlO . Th' fore under specific model assumpere , d" I X't-k). E(Yit I X it-I, X it-2, .. :' 'd a finite covariate lag, the partly can ItlOna ) may equal the full covariate conditions such as exogenelty an X. mean E(YitIXit-I,Xit-2,'''' ,t-k
f)
tional mean. . Iy be interested in the cross-sectional may simp . , In. s~me Sl't uationsY; weand X it . In Section 12.3.1 we discuss estimatIOn asSOCIatIOn between. 'alt d I However many longitudinal studies are issues for cross-sectIOn ma e 6 . , h Ih , . th' act of prior exposure on current ea t stainterested III asse6smg e Imp ) 1 S t ' 124 tUB and thus focus on characterizing E(Yit I X it -l, : ' . , Xii . n ec .lOn . th ds using single or multiple lagged covanate val,. . we dISCUSS regressIOn me 0 'n general we may not believe that future exposure causally nes. Althaugh I . ' d .infl uences c ut r health ager estatus n , when the covanate processy IS en . . nous, the fact that [Xit I fir (t), fif (t - 1) 1 depends on. ~i (t). ImplIes t). ThIS l~entIfies one that E(Yit IXi. 8 = 1, 2, .. , ,T) 'I E(Yit I Xis S important scenario where the full covariate conditIOnal ~ean IS n~t equal to the scientifically desired partly conditional mean. SectIOn 12.5 dIscusses methods of analysis when a covariate is endogenous,
255
Pepe and Anderson (1994) h i ' r.l ] " ave c anfied th E [S{3 (fJ, W) = 0 It IS sufficient to assume at to ensure that
/-lit =: E(Yit I Xit)
:=:
E(Yit I XI X t,
.2, ... ,
X)
iT·
(12.3.1)
Furthermore, if this condition is not l' fi d in the cross-sectional association betw::~sXe but substantive il~terest is pendence GEE should be used othe . b' It and ¥it then workmg inde. rWlse lased regr' . obtam. We refer to (12.3.1) as the full . esslOn estImates may assumption. Although Pepe and AndCOvarta(t1e99COTl).dltlOna.l mean (FCCM) . erson 4 focused on th f GEE , t h e Issue that they raise is important f aliI '. . e use 0 methods including likelihood-based method~r 'hong~tudlnal data analysis · . d rna d e Is , sue as lInear and generalized Imear mlxe To understand the importance of the FCCM d' , . h con Ihon we consider t e sums represented by matrix multiplications in the t' t' fun' S{3({3, W): es Ima lng ctlOn
(12.3.2)
.<.
12.3.1
Estimation issues with cross-sectional models
In Chapter 4 we showed that the multivariate Gaussian likelihood equations take the form m
(12.3.3)
:Z:j .
where wijk = Wijk, and Wijk is the (i,k) element of the weight matrix Wi. In order to ensure that E[S{3«(3, W)] = 0 we can consider the expectation of each summand in (12.2.3):
E[xijwijk(Yik - /-lik)] = E {E[Xijwijk(Yik - /-lik) I Xii, X i2 ,··· ,XinJ}
LX~Wi(Yi - X;(3) = 0,
= E {xijwijdE[Yik I XiI, X i2 ,." ,Xin.] - Itik)}'
i==1
l where Wi = [Var(Y i IXi)r . We noted that the resulting weighted least squares solution to these equations enjoys a consistency robustness since the estimator using a general weight matrix Wi remains unbiased even when I Wi =f. [Var(Y i IXiW . Similarly, in Chapter 8 we introduced estimating equation methods where the regression estimator is defined as the solution to the estimating equation
S(3(f3, W) :::::
f; (8;;; )' Wi(Y m
i -
J.£i) = O.
Consistency of the w . ht d I '. elg e east squares estimator and the GEE regres~Ion estImator relies on the assumption that S «(3 b' d' that IS, has expectation equal to zero. (3, IS un lase ,
W)' ,
If the FCCM condition is satisfied then /-lik = E(Yik IXiI, X i2 , .•• , Xini ) and the estimating function is unbiased. However, if /-ltk = E(Y;k I Xik) '1= E(Yik I XiI, X i2 , . , . , X ini ) then the estimating function will likely be biased and result in inconsistent estimates for the cross-sectional mean structure. Finally, if a diagonal weight matrix is used then 8{3((3, W) simplifies to
S.(f'J, W)
~ t, [t,XijW,;;(}\j -"'j)]
(12.3.4)
and S ({3 W) will have zero expectation provided that Itij =. E(Y,j IXij). In thi: c~e the FCCM condition is not required for conSIstent crosssectional estimation.
TIME-DEPENDENT cOVARIATES
256
STOCHASTIC C'QVARlAl'ES
257
12.3.2
A simulation illustration
and the failure of methods that use . lated data under the following non-diagonal covariance weJghtmg we slmu mechanism: (12.3.5) ViI = 'Yo + 'YI Xil + 'Y2 X il-1 + bi + eilt ·
t To I')1 us f,rat e fhp , . FCCM 1ISIiUmp . ' ,Ion
Xii == pXU-I bI>, Cit, (JI'
tv
(12.3.6)
+ fil,
(12.3.7)
mutually independent mean zero.
This model represents the plausible scenario where a ti~e-dependent . t e hfl8 an a utoregressive structure and a response vanable .depends covarla on both current and lagged values of the covariate. The model yIelds the full conditional and cross-sectional mean models E(Yit I Xii,"', Xin ) = 'Yo E(YiII Xit)
+ 'YIXit + 'l'2 X it-l,
= /30 + {3I X it,
where (30 = 'Yo and /31 = 'Yl + P . 'Y2· The induced cross-sectional model remains linear in Xii. In many applications the cross-sectional association between X it and lit is of substantive interest. For example, in assessing the predictive potential of biomarkers for the detection of cancer, the accuracy of a marker is typically characterized by the cross-sectional sensitivity and specificity. Although alternative predictive models may be developed using longitudinal marker series, these models would not apply to the common clinical setting where only a single measurement is available. Pepe and Anderson (1994) conclude that using longitudinal data to estima~e a cross-sectional model requires that either the FCCM assumption be verIfied or that working independence GEE be used. To demonstrate the bias that manifests through covariance weighting we generated data' under models (12.3.5)-(12.3.7) with bi '" N(O 1) e't '" N(O 1) and E'O '" N(O 1) ' N ( 2 ' , J " , , , fa "" 0,1 - P ). Under these assumptions it can be shown that E[xit_Iw*, It(Yt-(3o-(3X)] h-, t I it
* = wit-l,t . '1'2 • (1 -
p2)
indicating the potential for b' 'f th . 'd' d las I e covanates are time varying (p =F 1), pie Icte by X ( . ./.- 0) . . . . used (w* .. . ./.- 0)' ,t-I 12 r ,and a non-dIagonal weight matnx IS ,t-J,t r . For a range of correlations ( - 0 9 0) . each of which conta' d d P - . - .1 we SImulated 1000 data sets me ata on m - 200 b' . tions per subJ'ect Th b su Jects WIth up to 10 observa. e num er of obs t' I' generated as a uni'corl d erva IOns LOr each subject, ni, was . l' n ran om vari bl b mg dat.a missing completel at a e etween 2 and 10, representY random for a final scheduled follow-up of
v.' l,t IS
Table 12.3. Average estimates r {3 . {10 + (3 1 X it ~ased on models (l2.;.5)~(~;.~~e l~ea.r model E(Y;, I Xid =: "f'J = 1 for different values of the co .. ) With 'Yo = O. 1'1 =: 1. and V8rlate auto-correlation.
~
0.9
~l = 1.9
P 131
=:
0.7 1.7
P==0.5 {3 I -
Independence Exchangeable AR(I)
1.90 1.73 1.73
1.70 1.51 1.36
1.5
1.50
p==O.3
/3 1
1.3
1.33
1.30 1.19
1.11
0,89
P == 0.1 1.1
131
T = 10: We est.ima~ed the cross-sectional regression cot>fficient GEE With workmg. mdependence, compound symmetric (excha(31
1.10 1.01 0.74
.
~::g
and AR(I) correlatlOn structures. Table 123 shows tl . I ' ngea ), . d' . . ie Sllnu alton results m Icatmg that GEE using either exchangeable or AR(I) I' tId . corre atlon strueures ea to biased :stimates of (31. For example, when p =: 0.7 the excha~gea~le GEE estimator is negatively biased with a mean of 1.51, and a relative bIas of (1.51-1.7)/1.7 =: -11% while the AR(I) GEE t' at . . '1 I ' ' es 1m or is SImI ar y negatively biased with a mean estimate of 1 36 and I t' b' ( ., a re a lve .las of 1.3~ -.1.7)/1. 7 = :-20%. These simulations illustrate that if regres. slon analysIs mvolves a time-dependent covariate then either the FCCM condition should be verified, or a working independence GEE estimator should be used. 12.3.3
MSCM data and cross-sectional analysis
The results of GEE analysis of child illness, Yit, and maternal stress, Xit, are presented in Table 12.4. The children of mothers who report stress on day = t are estimated to have an illness odds of exp(O.66) = 1.93 the odds of illness among children of mothers that do not report stress. Unless we can verify the FCCM assumption, the GEE exchangeable and GEE AR(l) estimates cannot be assumed valid. Table 12.4 shows that we obtain smaller estimated regression coefficients using non-diagonal covariance weighting schemes. In particular, using an AR(I) correlation yields a coefficient estimate (0.37 -0.66)/0.66 = 43% smaller than the working independence estimate. In the next section we evaluate the FCCM assumption and find that illness, Yit, is associated with lagged maternal stress, Xit-k for k = 1,2, ... ,7. In addition, stress appears strongly ~utocorrela~ed. Therefore we suspect that GEE estimators based on non-diagonal ~Jght matrices are biased. There are important limitations to the cross-sectlO~al summaries that we use independence estimating equations (lEE) to obtam. The cross-sectional association does not imply causation and it is equally plausible that stress causes illness or that illness causes stress. In order
TIME-DEPENDENT COVARIATES
LAGGED COVARIATES
258
• sis of child illness, Yit, and stress, Xii, T~ble .12.4. GEE. analy relation models. Time is modelled using usmg different workmg cor. I adJ'usts for employment, marital 1 14)/7 AnalysIs a so week = ((ay. '1 j h It I race education, and household , status, maternal and ChI ( ea 1, size (not shown). Working correlation Independence
Stress (Xit ) Week
Exchangeable
Est.
SE
Est.
0.66 -0.18
(0.14) (0.05)
0.52 -0.18
SE
(0.13) (0.05) P= 0.07
AR(I) Est. 0.37 -0.20
P=
SE (0.12) (0.05) 0.40
to infer cause we need to address the temporal ordering of exposure and outcome.
12.4 Lagged covariates
In many applications an ent' . rId' Ire covartate history X X a ) e ~n . consIdered as potentiall d' .. ' il, .2, ... , Xii is availchromc disease epidemiol 't . y pre Ichve of the response Yo r X' _ " ogy I IS common to use . . If· n
12.3.4
Summary
Analysis of stochastic time-dependent covariates requires consideration of the dependence of the response at time t, Iit, on both current, past, and future covariate values. We have shown that GEE with working independence can provide valid estimates for the cross-sectional association between Iit and Xit but that covariance weighted estimation can lead to bias. One solution is to consider specification of the regression model as fully conditional on all of the covariates. In our simulation example in Section 12.3.2 this would require inclusion of the necessary current and lagged covariates. However, in other situations there is feedback where the current :e~ponse influences future covariate values, and satisfying the FCCM condItIon would require conditioning on the future covariates. This may no.t be desirable and therefore alternative methods would need to be conSIdered . . . Pepe and An derson (19)' 94 dISCUSS the FCCM assumptlOns requkil~'ed .to Use GEE with general covariance weighting and offer GEE with war ' ch' . E ng dmdependence as a' sa~' e anaI YSIS Olce. Related work is presented ~isc mon al. (1997) and Pan et al. (2000). The FCCM condition is also d cova~~:~es o~rg(e;er~1ustered data analysis where separate 'within-cluster' different ' ffi' I) i), and 'between-cluster' covariates or j( may have coe clCnts In this h f ,., tion of X .. and th' . case t e ull conditional mean /-Lij is a funcI) e covanate values f 11 h . through Xi. See Palta et or a ot er observations, Xik, k =1-), further discussion. al. (1997) or Neuhaus and Kalbfleisch (1998) for
/t
259
A single lagged covariate
In cer~ain applic.ations there is a priori justification to consider the covariate at a smgle lag tIme k time units prior to the assessment of disease status. For example, many pharmacologic agents are quickly cleared from the body so may only yield short duration effects. In this case analysis can use any of the methods discussed in Chapters 7-10 provided the FCCM condition is satisfied or appropriate alternative methods are selected, such as GEE with working independence. It is perhaps more common that the appropriate lag is unknown and several different choices are considered. If regression methods are used for a single time t* then we can formulate a general model using a lagged covariate as
where Zi represents a collection of additional time-invariant covariates, and /-Lit. = E(Iit* I Xjt* -k, Zi)' In this model the coefficient {31 (k) explicitly depends on the choice of the lag, k. When interest is in the coefficient function (3I (k) and comparisons between the coefficient at different lags k and k*, then parily conditional methods described by Pepe et at. (1999) can be used. Such metho~s allow inference on (31 (k) by forming the observations (Yit~, X~t*-k' Zi) usmg multiple values of k and then using GEE with workmg mdependence. More . . general'lze d rmear model'. generally, consider the partly condItiOnal
h{E(Y;;t I Xis, Zi)}
= (30(t, s) + (31(t, s)· Xis + (3~(t, s)· Zi·
TIME-DEPENDENT COVARIATES
260
,
LAGGED COVARIATES
X. may depend on both the response
Here the coefficient of t~e co~anate I '~ertain applications we may 8.'lsume time t and/or the covana~e tl~e:, In (t -.9) and may restrict analysis to that (3(t,.9) is only a functIOn 0 / (~9;~) refer t'o this as a partly conditional pairs such that t > .9, Pepe et a: t' e is included as a predictor, rather . I 'ngle covanate 1m , {X. 8 < t}, or the entire covanate process model smce on? a BI, than the covariate history '8', 12 T} when modelhng lit, {Xis .9 == , "':' I ~ ~ the covariate functions (lj (t, s) the partly Given funetIOna orms d ' GEE with working independd" I del can be estimate usmg con ItlOna mo, panded data set containing (Yit, Xis, Zi) for ence by constructmg an ex t ' n 2 records per subject derived from ni 11 ' (t ) and may con am i " a pairs. ' .9 . GEE 11 s the sandwich variance estImator to vahdly ffi' f t' observatIOns, Usmg a ow d make inference on. the coe clent unc Ions. compute stand ard errors' an . 't' I models that use a SlUgle value, of the The partIy cond IlOna . covanate 1990) process are st rong Iy rel ated to measures of .crosB-correlatlOn' (Dlggle, . · d providing a generalIzed cross-asSOCIatIOn measure, an d can b e vlewe "'" . To recognize this, recall that the cross-correlatIOn p(s, t) == corr(Yit, ~iS) is related to (3ds,t), where E(litIXis) == (lo(s,t)x+ (ll(S,t)· Xis sl~ce (31(S,t) == p(s,t) , ai/a;, ai == /Var(Yit), and as = /~a:(Yis). SImilarly, when lit and Xit are binary, the logistic partly condItIOnal model specifies (31 (s, t) which is the pairwise log odds ratio (Heagerty and Zeger, 1998). Therefore, the partly conditional models provide a method for characterizing the association between two stochastic processes that uses the flexibility of a regression formulation to capture the temporal structure of association between continuous, discrete, or mixed variables.
[Zo(j), Zl(j), Z2(j) model Z (.) - 'I '
261
Z (j)]
Zj =
.. "
p
.
For example'
h
- J ' The regression model for " In t e polynomial With sums of the products Z ( ') , X.. J.L.it IS then a linear model •
1
J
1
J
It-J
as COvanates:
L
0:
h(J.Lit) = (30
+
L(3J' X it _
j
= (30
j=1
~ i30 + ~ "
[t.
+ ~ [Z' ]. ~
JI
X tt - J ,
J=l
Z,(j)
X,,_,] ,
P
= (lo
+
L 'IXi~,I'
1=0
aQ
12.4,2
Multiple lagged covariates
-
(.1
fJO
(3j = '0
+ (3l ' X it-I + (32' X
. Although distributed lag models permit parsimonious modelling of multIple ~agged measurements, the specification of both the number of lagged cova~lates and the degrees of freedom for the coefficient model need to be consldere~. Selection of the number of lagged covariates, L, or the order of the coeffiCIent model, p, may be determined using likelihood ratio tests for nested model~, or ~sing score or Wald tests (Godfrey and Poskitt, 1975) or through conSIderatIOn of a loss function (Amemiya and Morimune, 1974).
12.4.3
When interest is in using the entire covariate history {Xis s < t} as a predictor then methods that use multiple lagged covariates may be needed. The time series literature has considered models for both infinite and finite co~ariate lags. Since longitudinal data are typically short series we review a tillite lag proposal that useB a lower dimensional model for the coefficients of lagged covari~tes, In distributed lag models (Almon, 1965; Dhrymes, 1971) lagge~ coeffiCIents are assumed to follow a lower order smooth parametric functIon, For example, with a finite lag L a polynomial model with p < L can be used to obtain smooth regression coefficients: h{E(Yit IXis S < t )} -
where X~ = " L Z (') X . ,t,1 Wj=1 1 J ' it-j' In matrIX form we obtain (3 = Z' d h(JLJ = X i{3 = XiZ', = (X;)',. I, an
it - 2
+ .. ,+ (lL . Xit-L,
+ 'I . j + '2 . j2 + ... + IP . jP. r
Polynomial models and mit use of t d' d sp me models (linear, cubic, natural) all pers an ar softwa ' h ' be represented l' re smce t e distributed lag model can as a mear model with appropriate basis elements,
MSCM data and lagged covariates
We first consider estimation of the association between illness, Yit, aIld stress, X it- k , using a single lagged stress covariate, We specify a logistic regression that adjusted for baseline covariates, Zi: logit E(Yit I X it -
k,
Zi) = (lo(k)
+ (l1(k) . X it - k + 13;{k). Zi
and used GEE with working independence for inference, In Fig. 12.3 we display the point estimates and 95% confidence intervals for (3I(k), k = 1,2, ... ,7, based on separate fitted models for each value of k, Next we specify a parametric function for 130 (k) and (31 (k) and assume a constallt (32. Using natural splines with knots at t j = 4,8,12,16 we estimate a lag coefficient function, /31 (k), and pointwise standard errors using all possible pairs (Yit, Xis) such that t > s. Figure 12.3 shows the estimated coefficient function and reveals a decaying association that is not significalltly different from 0 after k = 9. To investigate whether maternal stress measured 00 t~e previous? days is predictive of current child illness we use logistic regresslOo controllmg for
LAGGED COVARIATES
TIME-DEPENDENT COVARIATES
262
263
Tabl~ 1~.5. Coefficients of lagged str . of chIld Illness }';t Estirn t ess, X;t-k, as predIctors '.. aes are fro 1" using GEE with working independence rn a o~lSt~c regression employment status, marital status rna and adJust1ng.for week, at baseline, race, education and h ' ~~na~, and chIld health , ouse 0 d sIze.
Lag coefficient function 0.8 0.6
j (!!
0.4
~
0.2
g ~
Coefficient model
.........
~ ~
. . . . ..
0.0
. .
-- ..
. . . . ~ .. ~
;"
...... ,~.: .
~
.
\\........
'0
. . .
.
,.'
. .
.
. .
Saturated parameters = 7
. . .
....... -- .....
-0.2
X it - 1 X it - 2 X it - 3 X it - 4 X it - 5 X it - 6 X it -7
-0.4 -0.6
o
5
10
15
20
25
Time lag (days)
Coefficients of lagged stress in logistic regression m~dels for illness . X £ k- 1 2 Shown IS a smooth lag that use a single lagged covarIate, it-k or - , ,.... . ., . function with pointwise 95% confidence intervals, and the mdlvldual estImates 3
Natural spline parameters = 4
St.ep funct.ion paramet.ers ::::: 3
Est.
SE
Est.
SE
Est.
SE
0.34 -0.05 0.18 0.25 0.22 0.19 0.25
(0.16) (0.15) (0.13) (0.13) (0.14) (0.14) (0.14)
0.24 0.14 0.11 0.17 0.25 0.26 0.21
(0.16) (0.12) (0.12) (0.09) (0.12) (0.11) (0.13)
0.32 0.13 0.13 0.13 0.23 0.23 0.23
(0.16) (0.10) (0.10) (0.10) (0.11) (0.11) (0.11)
Fig. 12. .
for k = 1,2, ... ,7. baseline covariates. To account for the potential correlation in the longitudinal response we rely on empirical standard errors from GEE with working independence for inference. Table 12.5 shows the fitted coefficients for. the lagged covariates Xit-j for j = 1,2, ... ,7. Using separate unconstram:d coefficients we obtain a significant positive coefficient for Xit-l but obtam a non-significant negative coefficient for X it -2' Coefficient estimates for X it - 3 through X it -7 vary between 0.18 and 0.25. Alternatively, we adopt a distributed lag model using a natural cubic spline model for the coefficients ;3j and using knots at j = 3 and j = 5 requires only four estimated parameters rather than seven. Figure 12.4 shows the fitted coefficients and 95% confidence intervals using this model, and fitted coefficients from a monotone model that assumes ;3j = 1'0 + 1'1 . (1/j). The constraints imposed by the spline models lead to less variation in the estimated coefficients of Xit-j. The model assumptions also lead to decreased standard errors f?r the fitted stress coefficients, ~j = zj;y. The monotone model yieldS ;3j = 0.19+0.03· (1/ j) and exhibits minimal variation in the estimates. One disa?vanta~eof the spline models is that the parameterization does not lead to dIrectly mterpretable parameters. The fitted values from an alternative ste~ functio~ m~del,;3j = I'O+l'l'(j ~ 2)+1'2·(j ~ 5) is shown in Table 12.5. ThIS ~odel mdlcates that lagged stress is positively correlated with illness and yIelds lag coefficient values of 0.32, 0.13, and 0.23. Testing ')'1 = 0
Models for lagged stress coefficients
0.6
0-
~
0.4
-~--- ~----- t---- t-:-:-!
III
'C
"8
0.2
-'0::';"::,-......
....................
OJ
0
= 'E .!!?
0.0
0
IE 1Il 0
u
• Saturated. df= 7
-Q.2
c Spline. df=4
v Monotone. df=2 -Q.4
2
3
. I d t S Fig. 12.4. Logistic regression analysIS of agge B res, child illness using distributed lag models.
7
6
5 4 Time lag (days) X'I
k
•- ,
as predictors of
TIME-DEPENDENT COVARIATES
264
. _ (132 = 133 = 134) versus HI: 131 01 ((32 = 13:j = 134) and evaluates Ho· .131 'fi- t It wl'th a Z value of -1.18. Similarly, 1'2 tests . Id a non-slgm can resu Yle S I 13 - (33 = /34 equals the common value of the h ther the common va ue 2 • we. 13 _ a _ 13 We fail to reject equahty of these coefficients later coeffiCIents 5 - 1-'6 - 7· . •
a 66) E ch of these models suggests an assOCIatIOn between (Z for 'Y2 IS. . . tha revious week and current ch'ld h I I'11 ness a Itough maternaI stress III e p ." the statistical significance of the fitted coeffiCients vanes dependmg on the A
•
specific model choice. . . · e choose to use GEE with an mdependence workmg correlaSIllce w . . d I l'k I'h tion model for estimation we cannot use the maxImize og- I e I ood or information criterion such as AIC or BIC to compare the adequacy of different distributed lag models. As an alternative we can assess the predictive accuracy of each model by deleting individual subjects, re-fitting the model, and comparing observed and fitted outcome vectors. We use the c-index, or area under the ROC curve, as a global summary of model accuracy (Harrell et al., 1984). The c-index is 64.1% for the saturated model with 7 degrees of freedom, and is 63.8% for the spline model (p = 4), and 64.2% for the monotone model (p = 2). Thus, these models provide nearly identical predictive accuracy with the monotone model only slightly favoured. Figure 12.5 shows fitted models using k = 1,2, ... , L lagged stress variables for different choices of L. We can read this plot from right to left to determine the first model that has a significant coefficient for the last
Multivariate models with different lags 1.0
• lag =1:7 • lag = 1:5
" lag =1:4 • lag = 1:3 c lag = 1:2
'§
III
"C "C 0
0.5
Cl
g
c:
1
Q)
'0
:E Q)
0.0
.....................
0 ()
-0.5
2
3
4 Time lag (days)
5
265
lag. In the model with L - 7 ha . . we see t t the nfid . co ence mterval for {37 crosses 0 mdlcating non-significance S"1 find that the confidence interval t: {3' ~ml arly, for L := 5 and L == 6 we lor L mtersects a Th I' resent Wald-based hypothesis test £ . ese eva uatlOns rep• s or nested mod Is d . WIth L = 4 where (34 is significant. Finall F' e, an we first reject the coefficient estimates as we cha thY' Ig. 12.5 shows the changes in . nge e order L of th h " e cavanate lag. In general, the value of the coefficient t: lor eac remamIng t . d . erm Increases as we decrease L and remove more distant I agge covanates.
12.4.4
Summary
In this section we first discussed estimation of th coeffi . elent, I3d k),.of a single covariate measured k time units prio t teh r O e response Yit Either separat e parameters for every value k can be est' t d . . h d . Imae usmg standard met 0 s, o~ ~ smooth covanate function can be estimated by adoptin a partly condl~lOnal regression model. Partly conditional models can be .:ed to charactenze the longest lag at which X· and Y. . . S' . ' . tt-k it remam associated. Imllarly, usmg multIple lagged covariates we discussed both saturated m~thods that allow a separate coefficient for each covariate lag, and distnbuted l~g mod~ls that adopt regression structure for the lag coefficients. M~dels With multIple lagged covariates can be useful to describe the associatIOn between the full covariate process and the response process or can be used to parsimoniously predict the response as a function of th~ complete covariate history.
12.5 Time-dependent confounders
o lag = 1:6
~
TIM&DEPENDENT CONFOUNDERS
6
7
Fi.g. 12.5. Coefficients of lagged stress' '. USIng L lagged covariates X. £ k In lOgistIC regression models for illness , ,t-k or := 1,2,. " , L for different choices of L.
Thaditional epidemiologic regression analysis considers a classification of variables that are related to both an exposure of interest and the outcome as either confounders or intermediate variables. A confounder is loosely defined as a variable that is associated with both the exposure of interest and the response, and which if ignored in the analysis will lead to biased exposure effect estimates. An intermediate variable is one that is in the causal pathway from exposure to outcome and an analysis of exposure should not control for such a variable since the effect of exposure mediated through the intermediate variable is lost. In longitudinal studies a variable can be both a confounder and an intermediate variable, leading to analytical challenges. For example, using data from an observational study of HIV infected patients we may hope to determine the magnitude of benefit (or harm) attributable to treatment with AZT on either patient survival or longitudinal measures such as CD4 count. However, we may find that the CD4 count at time t predicts both later CD4 counts and subse~uent treatment choices. In this case CD4 at time 5 < t is the response varIable for treatment received prior to 5, but is also a predictor of, and therefore a potential confounder for, treatment given at future times, t > 5.
time t on treatment received prior to time A naive regressIOn of CD4 a t g treated subjects. Such a finding D4 I I er mean C amon t may revea a ow £' h t a tients who are more sick are the ones may simply reflect the ac1t tha Pb ct'Ions that follow we first summarize . treatment. n t e su se . . that are ~Jven . then we discuss methods of estimatIOn and issues uBmg a Simple examhPle , SCM data. Although the methods that we ly these methods to t e M . . d d app., develo ed for general analysis With a tIme- epen ent ~~:~:~:d:;,v~nbte:i~ seetio:we focus ~n the special case of an en~ogen~us . covariate. A more general an d theoretical treatment can be found m Robms et al. (1999) and the references therein. . f
12.5.1
Feedback: response is an intermediate and a confounder
To clarify the issues that arise with time-dependent covariates consider a . I . f s tudy tJ'mes , t -- 1"2 with exposure and outcome slDgepalfo . . . measureV) Let Y. be a disease or symptom seventy mdIcator (1 = t ment s (Xt-I,.it· disease/symptoms present, 0 = disease/symptoms absent) and let X t = 1 if treatment is given and 0 otherwise. Assume that the exposure X t - I precedes Yi for t = 1, 2 and that YI either precedes or is simultaneously measured with X I. Figure 12.6 presents a directed graph that represents the sequential conditional models: logit E(YI IX o = xo)
= -0.5 - 0.5 . xo, logit E(X I IYI = YI, X o = xo) = -0.5 + 1.0 . YI,
(12.5.2) Xl,
(12.5.3)
where 'H.f = {Xo, X d and hf = {xo, xI}. These models specify a beneficial effect of treatment X o on the outcome at time 1 with a log odds ratio of -0.5 in the model [VI I X o]. However, the second model specifies that the treatment received at time 2 is strongly dependent on the outcome at time one. For either X o = 0 or X o = 1 if patients have a poor initial response (VI = 1) they are more likely to receive treatment at time 2 than if they responded w~ll (YI ~ .O~. Finally, the response at the second time is strongly correlated With the 1llltlal response and is influenced by treatment at time 2.
-G 1
,
and response
0 500
Xo
n
YI n
0
1
0
1
311
189
365
135
XI
0
1
n
194
117
0
Y2 n
1
500
1
0
142 52 96
E(Y2
1 Xo
= 1, Xl
E(Y2
1 Xo
= 1, Xl
E(Y2
1 Xo
= 0, Xl
E(Y2
/ Xo
= 0, XI
1
21
0 71 0
1
27 44
0
1
0
1
0
1
118
227
138
51
84
1
59 59
0
1
166 61
0
1
113 25
0
1
0
1
19 32 42 42
+ 42)/(138 + 84) = 0.30 = 0) = (61 + 32)/(227 + 51) = 0.33 = 1) = (21 + 59)/(117 + 118) = 0.34 = 0) = (52 + 44)/(194 + 71) = 0.36
= 1) = (25
Table 12.6 shows the expected counts for each treatment/outcome path if 500 subjects initially received treatment and 500 subjects did not. This table illustrates the benefit of treatment at time 1 on the first response, Yl , showing that only 134 subjects (27%) are expected to have symptoms if initially treated as compared to 189 subjects (38%) among those not treated. The apparent benefit of treatment is diminished by the second measurement time with 30% of patients showing symptoms among those receiving treatment at both times (Xo = Xl = 1) versus 36% among those not receiving treatment at either time (Xo = Xl = 0). The conditional distributions in (12.5.1)-(12.5.3) lead to a marginal distribution of Y 2 conditional on the treatment at times 1 and 2 represented by the regression structure: logit E(Y21 rif = hf) = -0.56 - 0.13 . Xo
-
0.10·
Xl -
0.04·
XO • Xl·
(12.5.4)
/1 G ·G
Fig. 12.6. Time-dependent covariate, X t -
Table 12.6. Expected counts when 500 subjects are initially treated, X o = 1, and 500 subjects are not treated, X o = 0, when treatment at time 2, XI, is predicted by the outcome at time 1, Y I , according to the model given by (12.5.1)-(12.5.3).
(12.5.1)
logit E(Y2 1 'H.f = hf, YI = YI) = -1.0 + 1.5· YI - 0.5·
5J
267
TIME-DEPENDENT CONFOUNDERS
TIME-DEPENDENT COVARIATES
266
¥t.
This model indicates a beneficial impact of treatment among those subjects that were observed to have treatment at time 1 and at time 2. Note that the marginal expectation is computed by taking averages Over the distribution of the intermediate outcome. Since the intermediate variable influences both the second outcome and treatment assignment we
268
TIME-DEPENDENT CONFOUNDERS
TTME-DEPENDENT COVARTATES
to obtain both Pr(Yzl Xo, XI) and Pr(X I I Xo): need to average over Y:I
Table 12.7. Regression of stress, X it , on illness, Yit-k k == 0,1, and previous stress, X it - k k == 1,2,3,4+ using GEE with working independence.
== Pr(Yz == 1/ Xo == XO, XI == XI)' Pr(Yz == 1, XI == 11 X o == 1) /12(1,1) == Pr(XI == 11 Xu == 1)
/lZ(XQ, XI)
Pr(Y == I,X 1 == IjX o == 1) == z
L
Pr(Yz == I/X I == I,Y == 1
Intercept Yl,X O
Yit Yit-I
== 1)
YI =0,1
X it - I X it - 2 X it - 3 Mean(X it - k , k ~ 4)
x Pr(X I == 11 YI == YI, X o == 1) x Pr(YI == YI I X o == 1), Pr(X I == llXo == 1) ==
L
Pr(XI == I/YI == YI,XO == 1)
Employed Married Maternal health Child health Race Education House size
YI=O,I
x Pr(YI ==
YI
IX o ==
1).
Suppose that the scientific goal is to determine the effect of treatment at both time 1 and time 2 on the final patient status, Y z· We may consider reporting the observed marginal means lJ,z(xo,xd. However, since YI is correlated with both Xl and Yz we recognize that these observed marginal effects do not account for the confounder YI and therefore do not reflect the causal effect of treatment. The marginal structure does reflect a small beneficial effect of treatment at time 1 in addition to time 2, with the coefficient of X o in (12.5.4) equal to -0.13. On the other hand, if analysis controlled for YI then we would obtain the conditional model (12.5.3) that generated the data. In this model which adjusts for YI there is no effect of Xo on the (conditional) mean of Y z since we have conditioned on the intermediate variable YI and blocked the effect of Xo. Therefore the analysis that adjusts for YI does not reflect the causal effect of both X and ~I since by conditioning on an intermediate variable it only charact~rizes direct effects, This simple illustration forces realization that with longitudinal exposures (treatments) and longitudinal outcomes a variable can be both a confo~nder and an intermediate variable. No standard regression methods can t en be used to obtain causal statements. 12.5,2
MSCM data and endogeneity
To determine if there is feedback in th M current child illness d' t e SCM data we evaluate whether pre IC s current and f t that in our analysis in Section 12 4 u ure maternal stress. Note Xii-I, X it - 2 '" and did t thO we only used lagged values of stress no use e current val X t d' " f ore, If Xit is associated with y. th ue i! 0 pre Ict Yit. There,t en we have eVIdence of endogeneity,
269
Est.
SE
Z
-1.88 0.50 0.08 0.92 0.31 0.34 1.74 -0.26 0.16 -0.19 -0.09 0.03 0.42 -0.16
(0.36) (0.17) (0.17) (0.15) (0.14) (0,14) (0.24) (0.13) (0.12) (0.07) (0.07) (0.12) (0.13) (0.12)
-5.28 2.96 0.46 6.26 2.15 2.42 7.27 -2.01 1.34 -2.83 -1.24 0.21 3,21 -1.28
or feedback, where the response at time t (Yid influences the covariate at future times (X it is the covariate for time t + 1). Table 12.7 presents results from a regression of Xii on Yit-k for k == 0,1, prior stress values and covariates. Using GEE with working independence we find a significant association between Xit and Yit even after controlling for prior stress variables. 12.5.3
Targets of inference
Robins et al, (1999) discuss the formal assumptions required to make causal inference from longitudinal data using the concept of counterfaetual outcomes (Neyman, 1923; Holland, 1986; Rubin, 1974, 1978). Define Y;~Xt) as the outcome for subject i at time t that would be observed if a given treatment regime Xt = (xo, Xl! ... , Xt-I) were followed. For example, Y;~O) represents the outcome that would be observed if subject i received no exposure/treatment at all times s < t while y;~I) represents the outcome that this same subject would have if exposed/treated during all times. For a given subject we can only observe a single outcome at time t for a single s~ecified treatment course, All other possible outcomes, Y;~Xt) for Xt ix t are not observed and are called counterfactual or potential outcomes. Defining potential outcomes facilitates definition of estimands that characterize the causal effect of treatment for both individuals and for
x;.
TIME-DEPENDENT CONFOUNDERS
TIME-DEPENDENT CoVARIATES
270
(I) yeO) epresents the causal effect of treata .- it b: t' response at time t. We cannot opulations. For example, Y P , t - 1 on the zth su Jec s ment though tIme , 'fi a t since we only observe one , h' b eet-specI c el1ec directly estImate t IS su J b' t Bra well-characterized study popupotential outcome for each su Jec. 0 I a t as 0 = E[y(l) _ y,~U)] = d fi e the average causa el1ec t ,t lation we can also e n d 1 observe one outcome per sub(I) [(0)] S' any stu y can on y E[Yt, ] - E ~t . IDce . h'ch potential outcome is observed, h' that determmes w I ject, the mec allls~ ~ 1 t' g the observed outcomes to the becomes critically Important or1 re ,a m randomized study the average . I t s For examp e, III a ., potentia ou come. . 'E(Y: I x - I ) is an unbIased estimate res onse among treated subJects, it t '.. ' p (I). th entire study populatIOn smce the assIgnof the mean of Yi! m e d related to the outcome or covariates. . d d t to treatment IS random an un men . d ' d t ' 1 with full compliance havmg m treate an m Thus m a ran omlze fla , . f th t control subjects the treatment assignment IS mdependent 0 e po en. (y(t) yeo») and therefore the observed treatment effect, tlal outcomes i t ' it . d . , I ' " Y. at = mwi it' 1(X,t. -- 1) - .!..mE" Y. t . l(Xt, = 0) is an unblase estImate of the causal effect Ot· In general, we aEsume that subjects i = 1,2, ... , m are rando~ly sampled from a population of subjects and that we seek to ~ake mference regarding the effect that a treatment applied to this populatIOn would h~ve on the outcome of interest. Let Zi be a vector of time-invariant baselme covariates. Define xtl (z) = E[y/(x tl I Z = z] as the average outcome that would be observed at time t within the sub-population defined by Z = z if all subjects followed the treatment path Xt. For example, in the MSCM we are interested in the effect of stress so that Xt = (X I ,X2 , ••• , Xt-d, a possible pattern of maternal stress exposures, and tt~ 1) (z) represents the prevalence of child illness among children with covariate Z = z if their mother reported stress each day through day t - 1. Define the causal effect of continuous exposure as aT (z) = tt¥) (z) - j.J,~) (z ), where T represents the end of study (or other specified time). Robins et al. (1999) formalized the definition of 'no unmeasured confounders' that is required in order to estimate causal effects from longitudinal data. Assume that for each time t the exposure X t is independent of the vector of future potential outcomes given the observed exposure history x through ti~e t - 1, 1t (t - 1) = (X o, Xl, .. " Xt-I), and the observed outcome. hIstory through time t, 1tY (t) = (YI, Y2, ... , Vi). We state this assumptIOn as
ttl
{( yeo) y(l»). S
's
_
}
,s-t+1,t+2, ... ,T ..lXtl1tx(t-1),1tY(t).
~ote that we are assuming that exposure tIme t and therefore can causally effect v I
(12.5.5)
X t is ascertained or selected at but n t~.T Th d t+k 0 It. e no unmeasure
271
confounder assumption is also referred to as the sequential randomization assumption. This assumption states that given information through time t, exposure at time X t does not predict the value of the future potential outcomes. This assumption would be violated, for example, if there existed an unobserved variable u that influenced the likelihood of treatment or exposure and predicted the potential outcomes. A physician who prescribes treatment to the sickest patients is one such mechanism or variable. Although for treated patients we hope that the observed outcome y(1) is larger than the unobserved outcome y(O) (assuming a larger value of Y is better), we expect that both y(1) and yeO) are lower for patients selected for treatment, X = 1, as compared to the potential outcomes [y(O), y(l)] for the more healthy patients that are assigned X = O. Pearl (2000) cites two main reasons that statisticians have been hesitant to adopt causal models. First, causal inference requires assumptions such as (12.5.5) that are not empirically verifiable. Second, causal statements require new notation. For example, in our simple example given by (12.5.1)-(12.5.3) we structure data such that E(Y21 X o = 0, Xl = 0) = 0.30 and E(Y21 X o = 1, Xl = 1) = 0.36. These conditional expectations represent the average of Y2 among subjects observed to receive certain treatment paths. However, causal statements refer to the effect of interventions in the entire population rather than among possibly select, observed subgroups. That is, although we may observe a subgroup that experiences a treatment course, we may not be interested in the mean response in this specific subgroup since it does not generalize to the mean that would be observed if the entire population experienced the same treatment course. Pearl (2000) uses notation such as E[Y2 do(Xo = 1, Xl = 1)] to denote the outcome for the entire population, or j.J,~1) in our notation. Here the notation do(X = 1) indicates the situation where X = 1 was enforced for the entire study population. Pearl's notation emphasizes the fact that we are interested in an average outcome after assignment of the covariate value rather than the average of outcomes in subgroups after simply observing the covariate status, Table 12.8 presents the outcomes YI and Y2 determined by the conditional distributions (12.5.1)-(12.5.3) when the covariate values are controlled rather than allowed to be influenced by Yi. In this case we obtain /.L(l) = E[Y2 / do(Xo = 1, Xl = 1)] = 0.402 and j.J,~O) = E[Y21 do(Xo 2= 0, Xl = 0)] = 0.267 giving a causal risk difference of <>2 == 0.267 - 0.402 = -0.135. We can also calculate the causal odds ratio as [0.267/(1 - 0.267)]/[0.402/(1 - 0.402)] = 0.542. The causal effects are actually larger than associational comparisons conveyed by the observed marginal means where we find E(Y2 / X o = 1, Xl = 1) E(Y2 1 X o = O,X I = 0) = 0.302 - 0.362 = -0.060, and the odds ~atio is 0.760. The observed mean difference reflects the fact that subjects 1
TIME-DEPENDENT COVARIATES
272
TIME-DEPENDENT CONFOUNDERS
Table 12.8. Expec t ed outcomes when treatment is controlled and the causal path leading from Yi to X2 is blocked. All subjects Xo
= Xl = 1
YI
n
Y2
n
0 0
1 0
0 0
0 0
1 0
0
1 0
1 0
0 0
0 0
0 0
1
0 0
1
0 0
1 0
0 0
Xl n
1 269
0 731
1 0
0 0
731
1
0 0
1 133
O'
0 598
1 269 1 0
0 134
1 134
= (133 + 134)/1000 = 0.267 All subjects Xo = Xl = 0 /leI)
Xo n YI n
Xl n ~
n /leo)
1
0 1000
0
1
0 622 0 622
1 0
0 1 0 1 455 167 0 0
1
0 0
378
0 378
1 0
0 0
There have been several novel approaches proposed for obtaining estimates of causal effects from longitudinal data with time-varying covariates. One approach is termed 'g-computation' and is described in detail for binary response data by Robins et al. (1999) or more generally in Robins (1986). In this approach, causal effect estimates are obtained from estimates of the observed response transition model structure under the necessary assumption of no unmeasured confounders, or equivalently sequential randomization (Robins, 1987). Recall that we are interested in the average response at time T after receiving treatment for all times prior to T, compared to the response at time T after receiving no treatment for all previous occasions. Note that we can decompose the likelihood for a binary response ¥il,"" ¥iT and a binary time-dependent covariate X iO ,.'" X iT - I into the telescoping sequence
0
T
1 0
.c =
II Pr[¥it' ?tnt -
1), H; (t - 1), Zi]
t=l
0 1 143 235
0 0
1 0
0 0
1 0
0 0
1 0
0 0
1 0
0 0
1
x Pr[Xit -
0
= (167 + 235)/1000 = 0.402
with a poor response at t' 1 . Ime were more lIkely to seek treatment and th th b . us e su group wIth (X - 1 X ) subJ'ects (i e Y l'k 1° - , 1 = 1 represent somewhat sicker '. I more ley to be 1) d . ~ compare to those subjects observed to follow (X _ follow (X o = X =0 1, XI = 0)- SImilarly, subjects observed to as compared to' sU~J' t bare more lIkely to be subjects with YI = 1 . ee s a served to follow (X - 0 X ) 1 = O. Note that III order to calculate the stochastic assignment of t t causal. effects we have substituted the . . . assignment meeh' rea ment IY;I, X]0, Wit. h a determllllstlC b at tIme 1,[X I alllSm ut h t 1 ~ve. no a tered the response models Pr(YI I X o) or Pr(Y Iy, X X) can identify the st~blel 'b ?l~i' I . This SImple example shows that if we such as the sequence of U1 In~ .blocks of the data generating process, then we may predict th bcohndl:lOnal distributions in (12.5.1)-(12.5.3) e e aVlOur of th e populatlOn ' . ' under alternatIve,
1)
1
l?tr (t -
1),
H; (t -
2), ZJ
This likelihood factors into the likelihood for the response transition model = Pr[Yit !Hr (t - 1), H; (t - 1), Zi] and the likelihood for the covariate transitions .cx = Pr[Xit - 1 IHr (t - 1), H; (t - 2), Zi]' Unknown parameters in .c y and .c x can be estimated using maximum likelihood as described in Chapter 10. Under the assumption of no unmeasured confounders given by (12.5.5) the causal effect of treatment can be identified from the observed data since (12.5.5) implies (Robins et ai., 1999, p. 690):
.cy
°
Estimation using g-computation
0
1
0 0
manipulated conditions and thereby provide estimates of causal effects (Pearl, 2000). 12.5.4
1 1000
0 0
Xo n
273
ni=!
ni=l
°- ,
Thus, we can use the distribution of the observed response at time t among subjects with an observed treatment path Xt and observed response history HY (t - 1) to estimate the conditional distribution of the counterfactual outcome yt(Xt). We obtain the distribution of the outcome at the final time T by using the conditional probabilities to obtain the joint probability for ~lXt) , .•. ,Yi~Xt) and then summing over all possible intermediate paths for
275
TIME-DEPENDENT CONFOUNDERS
TIME-DEPENDENT COVARIATES 274 (x,)
the first t - 1 outcomes, Yil
12.5.5
V(x,);
, ... , it-I
Table 12.9 shows estimated coefficients for a transition model with Yit
= Pr[Yi~x,) = 1/ Zi = Z], /1 (x,) "" P [V(x,) V(x,) = yt-I, ... , Yil Pr[y;~Xf) I Zi = z] = L.J r it ' ,t-I (x,)(z)
as the response variable and with baseline covariates Z" lagged illness, Y,t-k k = 1,2, and lagged maternal stress, X it - k k = 1,2,3, as predic-
I Z , -- Z ],
_
- YI
y,-[
IV = "" L.J II P r [Vex,) r is t
(
)
~ = Ys-I, ... , 1
l8
Vex,) il
-
.Z _
YI,
i -
]
Z ,
Y,-1 s=1
=L
t
IT Pr[Yisl 'Hi (8 - 1) = Ys-I'
Y t-l s=1
'H;
(8 -
1) =
Xs-I,
Zi = z],
. th first 8 elements of the treatment or exposure path of interh were X B IS e l I t ' . a special case of the g-compu t at'zona l algon'th m ' ca est, Xt. ThIS cu a IOn IS . ' . formula of Robins (1986), In our simple example thIs computatIOn IS
/l~I) =
L Pr (Y21 XI = 1, Yi = YI,X
O=
1)· Pr(YI
= Yll X o = 1)
yl
since Y1 is the only intermediate path. Finally, since we can use the observed data to estimate response transition probabilities, Pr[Yis IHr (s - 1) = 1 Z·~ = z] , we can use the observed data to estimate I Lt (8 -1) = x S-, Y8-1' '1.IX /l(x,)(z) and 8t (z) = /l(I)(Z) - /l(O) (z), In general, the g-computational formula can be evaluated using Monte Carlo to simulate response series Yt, ... , Yi for given treatment sequences and specified covariates, and then we can marginalize over Y1 , ... , Yt - I to obtain /l (X,) (z). Such calculations make it clear that in estimating the causal effect we are controlling the sequence of exposures, but are allowing the intermediate outcomes ~, 8 < t, to unfold and therefore we capture any indirect effects that are mediated by these earlier outcomes. In summary, if we collect adequate covariates Z·, such that we are willmg to assume no unmeasured confounders, then we can use the observed da;a to model the sequence of conditional distributions, Pr[Yit IHr (t - 1), 1-f. i (t - 1); Zi], and then use these to calculate probabilities for a final end-point under exposure paths of interest. By so doing we can provide treatment comparisons that are not confounded by exposure feedback. S.tandard alternatives may not provide a satisfactory solution in this setg . Obs~rved marginal associations are biased since they do not control lOr the pnor outcom h' h es w IC are exposure confounders. Transition modt eIs can control for' pnor ou comes but only capture the direct effects of exposure and no effe t d' t d h c s me la e t rough earlier changes in the outcome.
.
;m
MSCM data and g-computation
tors. Although the dominant serial association is first order (the coefficient of Yit-I equals 2.36), we do find significant second-order dependence. Our model includes Xit-I,Xit-2, and X it - 3 but only the coefficient of X,t-3 obtains significance. We performed several checks including assessing further lagged values of both illness and stress but found no substantial departures. Using this model to perform the g-computation involves choosing a final end-point time, identifying a covariate value of interest, Zj, and then generating Markov chains with stress controlled at 1 for all times, and controlled at 0 for all times. Using 28 days as the endpoint time, and focusing on Zi = z* with employed = 0, married = 0, maternal and child health = 4, race = 0, education = 0, and house size = 0, we obtain p,(l)(z*) = 0.189, and p,(O)(z*) = 0.095. This implies that for this subpopulation, continual maternal stress is associated with a J(z) = (0.189 - 0.095) = 0.094 increase in the prevalence of child illness. A causal log odds ratio comparing continual stress to no stress over 28 days in the subpopulation Zi = z* is estimated from p,(l)(z*) and p,(O)(z*) as 0.80. Using a marginal regression model and GEE gives the
Table 12.9. Regression of illness, Yit, on previous illness, Yit-k k = 1,2, and stress, X it - k k = 1,2,3 using GEE with an independence working correlation matrix.
Intercept
Yit-l Yit-2 X it - 1 X it -
2
X it - 3 Employed Married Maternal health Child health Race Education House size
Est.
SE
z
-1.83 2.36 0.33 0.24 -0.14 0.40 -0.09 0.44 0.01 -0.24 0.31 0.01 -0.53
(0.29) (0.16) (0.14) (0.14) (0.15) (0.13) (0.13) (0.12) (0.07) (0.06) (0.13) (0.14) (0.12)
-6.29 14.83 2.31 1.72 -0.93 3.21 -0.70 3.79 0.10 -3.90 2.50 0.06 -4.42
TIME-DEPENDENT CO VARIATES
TIME-DEPENDENT CONFOUNDERS
277
276
. ,] I dds ratio as 1.38 (based on the sum of , ling assoclatJOna og 0 • corresponc ' f f i '. t . Table 125) and the transition model in saturated model coe, clen ,s m f I 0,14 + 0.40) = 0.,'50. Here a dlf(~et effed 0 on Y . .' Ta)t Je12. 9 gives ", , . timate the causal effect which the marginal associatIOn appears to.oveles . th I'k 1'1· d th t t: h X - I increases e I e I 100 a· should be anticipated when JOt tl-k . vI' . tIle likelihood that X,t+k = 1. The MSCM Yit = 1, and I it = mcreases . ' . " Its are in contra'lt to our example given by (12.5.1)- (12.5.3) anaIYSIS feSU f I Y 0 our 'lkelihood that Y 1 = 1. Fma I] y, t Ile va I'ICl't where Xl decrear;cc] th c I . t . I' on the assumption of no unmeasured confounders, y x causaI estlma ,BS Ie les and on the model form used to estimate Pr[Yit I Hi ~t - 1), Hi (t. - 1~, Zi)One limitation to use of the g-computational algonthm for estimatIOn IS that no direct regression model parameter represents the ~~ll hypothesis of no causa] effect, and thus the approach does not faclhtate formal
structures the causal effect of exposure and identifies a single parameter, fh, that can be used to quantify and test the causal effect of exposure. Estimation for MSMs can be obtained using IPTW estimation. Recall that the key reason we cannot use standard methods such as GEE is due to the fact that the prior response, Y;t-k, is a confounder. However. we do not want estimates that control for the prior response since it is also an int~rmediate variable ..I~ the absence of confounding association implies causatIOn and thus obtammg a population where confounding was nonexistent would allow use of standard regression methods. Robins (1998, 1999) and Hernan et at. (2001) discuss how to use weights to construct a pseudo-population that has the causal relationship of interest but is free from confounding. Define the stabilized weights:
testing.
SWi(t)
(024 _
o
0
•
0
12.5.6
Estimation using inverse probability of treatment weights (IPTW)
One of the advantages of using the g-computational algorithm approach to obtaining estimates of causal effects is the fact that a model for the treatment (or exposure) is not required. Stated alternatively, the likelihood £,y contains the parameters necessary to calculate Ot (z) and the telescoping likelihood corresponding to Xit given by LX is ancillary. However, the g-computational algorithm is based on an indirect parameterization of b"t(z) and does not facilitate testing for exposure effects by setting a parameter equal to zero, or permit structuring of exposure effects in the presence of covariates Zi. An alternative approach to estimation based on marginal structural models (MSMs), introduced by Robins (1998) and discussed by Hern~n e~ al. (2001), does require a model for the covariate process, but permits direct regression modelling of causal effects. In this section we first ~escribe the basic model and then discuss methods of estimation using mverse probability of treatment weighted GEE. . Marginal structural models specify the average counterfactual outcome directly a:s a function of exposure and covariates. Define the average If the subpo pu Iat'IOn WI'th Z i = z experienced a treatment outcome . regIme XI:
IL;X,)(z)
=:
E[Yi~xtll Zi
= z].
We formulate a regres . d I" sian mo e lOr the counterfactual outcomes such as
h{IL;Xt)(z)}
= 130 + I3I X,t* + fJ2 z. r-l,'
2,
where, for example, Xit represents cumulative ex * or any other function of the covariat h' t ~osure, Xit = L:. < t Xis, e IS ory. ThiS model parsimoniously
=
II
Pr(Xis=xi.I7t;1(,(S-1)=hi~(S-1),Zi). s < tPr(Xis = Xis l7tr (s - 1) = h~ (s - 1), 7ti'!. (s - 1) = hi'!. (s - 1), Zi)'
These weights compare the probability of the treatment received through time t - 1 conditional only on knowledge of the treatment histories to the probability of the treatment received conditional on both treatment and response histories. The weights SWi(t) would be identically one if the covariate process was exogenous (by definition) and can therefore be viewed as a measure of endogeneity. In practice, the weights SWi(t) will need to be estimated by choosing and fitting models for the numerator, Pr(Xis l7tf (s - 1), Zi), and the denominator, Pr(Xis l7tns - 1), 7tf (s - 1), Z;). Correct modelling of the denominator is necessary for consistent parameter estimation, while the numerator can be any function of (7tf (s), Zi), the choice of which only impacts estimation efficiency not validity. One assumption necessary for use of weighted estimation is that the weights are bounded away from zero, that is: Pr(Xis I (s - 1),
Hr
7tf (8 - 1), Zi) 2': € > O. GEE with working independence and estimated weights SWi(t) can be used to obtain an estimate of the causal regression parameter {3. Weights are necessary to obtain causal estimates, and GEE is used simply to obtain a sandwich variance estimator that accounts for the repeated meaSures. It is interesting to note that the variance of /3 is smaller when weights are estimated, making the sandwich variance estimator conservative. Formal justification for the IPTW estimation is given in Robins (1999) and references therein. To illustrate that weights can be used to construct a pseudo-population where Yit is still an intermediate but no longer a confounder we return to the simple example given by (12.5.1)-(12.5.3) whose expected counts are shown in Table 12.6. In Table 12.10 we reproduce the expected counts for
TIME-DE PEN •
278
TIM&DEPENDENT CONFOUNDERS
DENT COVARIATES
obtain W t e-weight data and Table 12.10. Examp IeofusingIPT d MSM or that corresponds to (1251) ..causal effect estimates for a saturate (12.5.3).
Xo 1 1 1 1 1 1 1 1
~ o o
o o
o o
1 1 1 1
1 1
0
0 0 1
0 1 1
0 1 1
1
0 0
~
1 0 0 0 0
",(Xo=l,Xl=l) ",(XO=1,Xl=O) ",(XO=O,Xl=l) t.t(Xo=o,x 1 =O)
~
1 1 0 0
1 0 1
~
0 1 0 1 0 1 0 1 0 1 0
Expected count
Weight
41.9 41.9 31.6 19.2 25.2 112.8 61.2 166.3 58.8 58.8 44.4 26.9 21.4 96.1 52.1 141.6
0.712 0.712 1.474 1.474 1.174 1.174 0.894 0.894 0.755 0.755 1.404 1.404 1.245 1.245 0.851 0.851
Re-weighted count 29.8 29.8 46.6 28.3 29.6 132.5 54.7 148.7 44.4 44.4 62.3 37.8 26.7 119.6 44.4 120.6
== (29.8 + 29.6)/(29.8 + 29.8 + 29.6 + 132.5) = 0.268 == (46.6 + 54.7)/(46.6 + 28.3 + 54.7 + 148.7) = 0.364
= (44.4 + 26.7)/(44.4 + 44.4 + 26.7 + 119.6) = = (62.3 + 44.4)/(62.3 + 37.8 + 44.4 + 120.6) =
0
re-weighted population YI no longer is associated with Xl. For example, Pr{X l I Y1
= 1, X o = 1)
= (29.8
+ 29.8)/{29.8 + 29.8 + 46.6 + 28.3)
= 0.443,
Pr{X l I Y1 = 0, X o = 1) = (29.6
+ 132.5)/(29.6 + 132.5 + 54.7 + 148.7)
= 0.443,
showing that YI does not predict Xl. However, Y1 does still predict the final outcome and therefore remains an intermediate variable in the pseudopopulation. Therefore, in the pseudo-population YI is only an intermediate variable and not a confounder, and we can use standard summaries in the pseudo-population to obtain causal summaries.
12.5.7
MSCM data and marginal structural models using IPTW
To obtain MSM estimates for the MSCM data we first estimate SWi(t) based on models for the exposure process Xit. For the denominator we use the logistic regression model presented in Table 12.7 where Y it , ¥it-I, lagged stress, and baseline demographic covariates are used as predictors. For the numerator we use the same regression model except that illness measures are excluded (not shown). Using the estimated weights we then use working independence GEE to obtain estimates of the causal regression parameters and valid (conservative) standard errors. Table 12.11 displays the MSM estimates and standard errors. In this model we use the first three
0.302 0.402
as a function of X o, Yl , and Xl. We also compute the stabilized weights SW(2) = Pr(Xll Xo)/Pr(Xll Yl,XO). Notice that subjects with (Xo = 1, Yl = 1, Xl = 1) are down-weighted, SW(2) = 0.712, while subjects with (X o = I'Yl = O,X l = 1) are up-weighted, SW(2) = 1.174. This suggests that our observed population has an over representation of subjects with (X~ = 1, Yl = I,X 1 = 1) relative to our desired pseudo-population where Y1 IS not a confounder. Intuitively this reflects the fact that when Yl = 1, subjects are more likely to obtain Xl = 1 and in order to correct for this selection we up-weight those with Yl = and down-weight those with l Y == 1. To verify that the pseudo-population (re-weighted count) has the causal structure of interest we compute the observed means as a function of X o and X . 1" e . I, margma lZlllg over Yl . We find agreement (to rounding rror) WIth the g-computation results given in Table 12.8. Note that in the
Y2
279
Table 12.11. MSM estimation of the effect of stress, Xit-k k 2' 1, on illness, ¥it.
Intercept
X it - l X it - 2 X it - 3 Mean(Xit _ k , k 2' 4) Employed Married Maternal health Child health Race Education House size
Est.
SE
Z
-0.71 0.15 -0.19 0.18 0.71 -0.11 0.55 -0.13 -0.34 0.72 0.34 -0.80
(0.40) (0.14) (0.18) (0.15) (0.43) (0.21) (0.17) (0.10) (0.09) (0.21) (0.22) (0.18)
-1.77 1.03 -1.05 1.23 1.65 -0.54 3.16 -1.27 -3.80 3.46 1.57 -4.51
TIME-DEP END
280
ENT COVARIATES
r X' lor k -- 1, 2, 3 , and the average of l"lTlTed values of maternal stress, .t-k this model to estimate the casual "'eO , t a t - 4, We can use. d by Bumming t h e cae ffi clents . f 'ndicators prIOr 0 st ress I 28~day perlO effect of continual stress over a 0 18 + 0.71) :: 0.85. The MSM log odds - 0.19 + ' bt . ed using the g-computational the stress predictors: (0.15 h timate 0 am ratio is comparable to tees. b t:: 0.80 in Section 12.5.5). One , f, e covanate su se algorithm (estImate or on h' th a t we can test the causal null, H o: /31 + advantage of the MSM approac, IS d fEcI'ents and standard errors, We , th estImate coe (32 + (33 + (34 = 0 usmg e d lue of 0,046, Therefore, using the obtain a Z statistic of 1.998 a~ ~~a is of no effect of continuous stress MSM we reject the causal null ypo es on the likelihood of child illness.
12.5.8
Summary
. d d usal targets of inference for analysis of This section has mtro uce ca d h . data WI'th endogeno us covariates. We have demonstrate t at longitudmal . t d d methods cannot provide causal summaries, and have mtroduced san ar t'Ion an d MSM with IPTW as alternative approaches. We have g-computa not presented the detailed theory that underlies the cau~al methodology and refer the interested reader to Robins (1998, 1999), Robms et al. (1999), and Hernan et al. (2001) for further detail and additional references,
12.6 Summary and further reading In this chapter we have provided a taxonomy for covariate processes and a taxonomy for conditional means, or regression models. When a covariate process is exogenous analysis focuses on specifying a regression model in terms of cross-sectional associations, E(Yit IXit}, or in terms of lagged relationships, E(tit IX it - I , X it - 2 , , •• ). We have shown that biased estimation can result if GEE with non-diagonal weighting is used unless the model /lit = E(tit I Xit) for cross-sectional models, or tLit :: E(¥itIXit-I,Xit-I,,,,,Xit_k) for finite lag models, equals the FCCM, E(tit IXis S = 1,2,.", T). For an exogenous covariate process we need o,nly consider dependence of tit on 11.f (t - 1) to satisfy the FCCM conditIon. Partly conditional and distributed lag regression models can adopt a fie~ible relationship between past exposure and current outcomes and can be Implemented using standard software programs, For endogenous covariates we ~ave shown that a prior response variable can be both a confounder and an mt,erm~diate variable. This issue has motivated the development of ;ausal estlma~lon methods by Robins and co-workers. Although we have locused on a smgle tim d d f £ '. e- epen ent endogenous covariate the same issue 0 a conwundmg mtermed' t . bl ' , ate that' d' t' f la e vana e may arise via a time-dependent covanIS IS mct rom th t e response and the exposure. The methods tha
SUMMARY AND FURTHER READING
281
we overview have been developed for this more general scenario and are discussed in Robins (1986, 1987, 1998, 1999) and Robins et ai, (1999) for the case of longitudinal binary response data. Lauritzen (2000) provides an excellent overview of graphical models and causal inference. Finally, we have not discussed another class of methods known as g-estimation and structural nested models. Robins (1999) overviews structural nested models and compares them to both g-computation and MSMs.
CLASSIFICATION OF MISSING VALUE MECHANISMS
283
be that a sequence of atypically low (or high) values on a particular unit foreshadows its removal from the study. 13.2 Classification of missing value mechanisms
13 Missing values in longitudinal data
13.1 Introduction ., I . , the analysis of longitudinal data whenever one or Mlssmg va ues aflse III . h' th t d more of the sequences 0 f measu rements from units Wit m e s u yare I are not.f taken, . mcomp et e, 'III the se nse that intended measurements . . h are I'lable The emphasis IS Important: I we c · lost or are ot herwlse unava . .oose in ~dvance to take measurements every hour on one-half of the subjects and every two hours on the other half, the resulting d~ta could also be described as incomplete but there are no missing v~lues ~n the se?se t~at we use the term; we call such data unbalanced. This IS not Just playmg WIth words. Unbalanced data may raise technical difficulties - we have seen that some methods of analysis can only cope with data for which measurements are made at a common set of times on all units. Missing values raise the same technical difficulties, since of necessity they result in unbalanced data, but also deeper conceptual issues, since we have to ask why the values are missing, and more specifically whether their being missing has any bearing on the practical questions posed by the data. A simple (non-longitudinal) example makes the point explicitly. Suppose that we want to compare the mean concentrations of a hormone in blood samples taken from ten subjects in each of two groups: one a control, the other an experimental treatment intended to suppress production of the hormone. We are presented with ten assayed values of the hormone concen~ration from the control subjects, eight assayed values from the treated subjects, and are told that the values from the other two treated subjects are 'missing'. If we knew that this was because somebody dropped the test t~bes on the way to the assay lab, we would probably be happy to proceed With ~ t,:,o-sample t-test using the 18 non-missing values. If we knew that tbhel mlsslllg values were from subjects whose hormone concentrations fell e ow the sensitivity threshold f th '. . a1 o e assay would mask the ve fl t . ' Ignormg the missmg v ues fl d . .ry e ec we were lookmg to detect. Lest this example o en the sophisticated read f 't that more subtle versions 0 I are not uncomm n . Ier,I we. suggest . o III rea ongItudmal studies. For example, it may
Little and Rubin (1987), give a general treatment of statistical analysis with missing values, which includes a useful hierarchy of missing value mechanisms. Let Y' denote the complete set of measurements which would have been obtained were there no missing values, and partition this set into y' = (y(o), y(rn») with y(o) denoting the measurements actually obtained and Vim) the measurements which would have been available had they not been missing, for whatever cause. Finally, let R denote a set of indicator random variables, denoting which elements of y' fall into yeo) and which into y(rn). Now, a probability model for the missing value mechanism defines the probability distribution of R conditional on y' = (y(o), y(m)). Little and Rubin classify the missing value mechanism as • completely random if R is independent of both yeo) and y(m) • random if R is independent of y(m) • informative if R is dependent on y(m).
It turns out that for likelihood-based inference, the crucial distinction is between random and informative missing values. To see this, let f(y(o), y(m), r) denote the joint probability density function of (y(o), y(m), R) and use the standard factorization to express this as
f(y(o), y(m), r) = f(y(o), y(m))f(r I y(o), y(m)).
(13.2.1)
For a likelihood-based analysis, we need the joint pdf of the observable random variables, (y(o), R), which we obtain by integrating (13.2.1) to give
f(y(O), r) =
J
f(y(o), y(m))f(r I y(o), y(m))dy(m).
(13.2.2)
Now, if the missing value mechanism is random, f(rjy(o),y(m)) does not depend on y(m) and (13.2.2) becomes
f(y(o),r) = f(rly(o»)
J
f(y(o),y(m))dy(m)
= f(r I y(o))f(y(o)).
(13.2.3)
Finally, taking logarithms in (13.2.3), the log-likelihood function is
L = log f(r I yeo))
+ log f(y(o)),
(13.2.4)
which is maximized by separate maximization of the two terms on the right-hand side. Since the first term contains no information about the
284
distribution of Y
MISSING VALUES IN L ONG (0)
ITUDINAL DATA
, 't for the purpose of makinp; inferences ,we can Ignore I
I t th completely random and random missabout y(o). e d t without distinction as Because of the above reSll t, )0. h . ometlmes relerre a ing value mec ams~s, a:c St' t t~ remember that 'ignorabiJity' in this i norable However, It IS nnpor ,an , c . c g ';. f h rk I'I ood function 8B the ba.'ils lor ll1Ierence. " ' . , 'd .. '1 d sense relles on the use 0 tel e I I e alized estImatll1g equatIOns, as etlcnue For example, the met ha d a f gen r , I I . , . ' l ' d 1 under the stronger a.'iSUmptlOn t lut t Ie mlss111 SectIOn 8.2.3, IS va I on y 'h' 'I'k l'h d , h ' ' . mpletely random. Also, even WIt 111 a I e I 00 mg value mec amsm IS co . . h '. of a random mlSSll1g value mec alllsm. a.'l . based anaIySIS, th e t rea tent m ignorable makes several tacit assumptions, The first of these, a.'l emphasIzed by Little and Rubin, is that f(y(o)) and f(r I y(o)) are separately parameterized, which need not be the case; if there a:e parameters common to f(y(o)) and f(r I y(o)), ignoring the first term 111 (13.2.4) leads. to a loss of efficiency, Secondly, maximization of t.h~ secon,d t~rm ~n the nro~t~hand side of (13.2.4) implies that the uncondItIOnal dlstnbutlOn of Y IS t~e correct inferential focus. Again, this need not be the case; for example, III a clinical trial concerned with a life-threatening condition in which missin~ values identify patients who have died before the end of the study and y(o measures health status, it may be more sensible to make inferences about the distribution of time to survival and the conditional distribution of y(o) given survival rather than about the unconditional distribution of y(o). 13.3 Intermittent missing values and dropouts
An important distinction is whether missing values occur intermittently or as dropouts. Suppose that we intend to take a sequence of measurements, Y1,,·., Yn , on a particular unit. Missing values occur as dropouts if whenever Yj is missing, so are Yk for all k 2 j; otherwise we say that the missing :alues ar~ intermittent, In general, dealing with intermittent missing values IS more dlfficu.lt :han dealing with dropouts because of the wider variety of patterns o~ mIsslllg values which need to be accommodated. 'h . When , Illtermittent ml'ssl'n g vaIues aflse t rough a known censonng mechamsm ' for example if all I b I k , v a ues e ow a nown threshold are missing , the EM algoflthm (Dem pst l 1977) provIdes . ea., a possible theoretical' . er t framework (Laud 1988' H h 1999) W . ' ' ,ug es, ,hen mtermittent missing values do not aflse from censoring th f' kn' ' . ' e reason or theIr being missing is often own, slllce the subjects III qu t' . . th' . C " es IOn remam m the study and in some cases IS IllIormatlOn WIll make it reaso bl ' unrelated to the m na e to assume that the missingness is easurement process In h h . be analysed by any method which c . sue cases, t e resultmg data can thermore if the method f l ' a~ accommodate unbalanced data. Fur, 0 ana YSIS IS likel'h d b d' . be valid under the weak . I 00 - ase , the mferences wIll er assumptIOn that th ' . random, e mlssmg value mechanism is
INTERMITTENT MISSING VALUES AND DROPOUTS
285
In contrast, dropouts are frequently lost to any form of follow-up and we have to admit the possibility that they arise for reasons directly or indirectly connected to the measurement process. An example of dropout directly related to the measurement process arises in clinical trials, where ethical considerations may require a patient to be withdrawn from a trial on the ba.'lis of their observed measurement history. Murray and Findlay (1988) discuss this in the context of long-term trials of drugs to reduce blood pressure where 'if a patient's blood pressure is not adequately controlled then there are ethical problems associated with continuing the patient on their study medication'. Note that a trial protocol which specifies a set of circumstances under which a patient must be withdrawn from the trial on the basis of their observed measurement history defines a random, and therefore ignorable, dropout mechanism in Little and Rubin's sense. In contrast, the 'dropouts' in the data on protein content of milk samples arose because the cows in question calved after the beginning of the experiment (Cullis, 1994), and there may well be an indirect link between calving date and milk quality. When there is any kind of relationship between the measurement process and the dropout process, the interpretation of apparently simple trends in the mean response over time can be problematic, as in the following simulated example. We simulated two sets of data from a model in which the mean response was constant over time, and the random variation within units followed the uniform correlation model described in Section 4.2,1. In each data-set, up to ten responses, at unit time-intervals, were obtained from each of 100 subjects. Dropouts occurred at random, according to the following model: conditional on the observed responses up to and including time t - 1, the probability of dropout at time t is given by a logistic regression model, logit(pt)
= -1 - 2Y'_1'
In the first simulation, shown in Fig. 13.1, the correlation between any two responses on the same unit was p = 0,9, and the empirical mean responses, calculated at each time-point from those subjects who had not yet dropped out, show a steadily rising trend. A likelihood-based analysis which ignores the dropout process leads to the conclusion that the mean response is essentially constant over time whereas the empirical means suggest a clearly increasing time-trend. There is no contradiction in this apparent discrepancy: the likelihood-based analysis estimates the mean response which would have been observed had there been no dropouts, whereas the empirical means estimate the conditional mean response in the sub-population who have not dropped out by time t. Another way to explain the discrepancy between the likelihood-based and empirical estimates of the mean response is that the former recognizes the correlation in
SIMPLE SOLUTIONS AND THEIR LIMITATIONS
287
the data and, in effect, imputes the missing values for a particular subject taking into account the same subject's observed values. To confirm this, Fig. 13.1(b) shows the result of the second simulation, which uses the same model for measurements and dropouts except that the within-unit correlation is p = O. Both the empirical means and a likelihood-based inference now tell the same story - that there is no time-trend in the mean response. These examples may convince the reader, as they convince us, that it would be useful to have methods for analyzing data with a view to distinguishing amongst completely random, random and informative dropouts. Fortunately, the additional structure of data with dropouts makes this a more tractable problem than in the case of intermittent missing values. In the rest of this chapter, we concentrate on the analysis of data with dropouts. In Section 13.4 we briefly mention two simple, but in our view inadequate, solutions to the problem. In Section 13.5 we describe methods for testing whether dropouts are completely random. In Section 13.6 we describe an extension to the method of generalized estimating equations which gives consistent inferences under the assumption of random dropouts. In Section 13.7 we review various model-based approaches which can accommodate completely random, random or informative droputs, henceforth abbreviated to CRD, RD and ID respectively.
13.4 Simple solutions and their limitations 13.4.1
2
o
-2
2
4
6 Time
8
10
Fig. 13.1. Simulated realizatio f d I . uniform correlatio d d ns 0 a mo e wIth a constant mean response, n a1n , ran om dropouts: (a) within-unit correlation p = 0.9; (b) within-unl't corre a t Ion p == 0 0 .d . . . . .... '" ataj - - : empIrIcal mean response.
Last observation carried forward
As the name suggests, this method of dealing with dropouts consists of extrapolating the last observed measurement for the subject in question to the remainder of their intended time-sequence. A refinement of the method would be to estimate a time-trend, either for an individual subject or for a group of subjects allocated to a particular treatment, and to extrapolate not at a constant level, but relative to this estimated trend. Thus, if Yij is the last observed measurement on the ith subject, flj(t) is their estimated time-trend and Tij = Yij - fli(tj), the method would impute the missing values as Yik = [J,i(tk) + Tij for all k > j. Last observation carried forward is routinely used in the pharmaceutical industry, and elsewhere, in the analysis of randomized parallel group trials for which a primary objective is to test the null hypothesis of no difference between treatment groups. In that context, it can be argued that dropout, and the subsequent extrapolation of last observed values, i~ an inh~r~nt feature of the random outcome of the trial for a given subject. ValIdIty of the test for no difference between treatment groups is then imparted by the randomization without requiring explicit modelling assumptions. A further argument t~ justify last observation carried forward is that. if, for example, subjects are expected to show improvement .over the duratIOn of a longitudinal trial (i.e. treatment is beneficial), carrymg forward a last
288
MISSING VALUES IN LONGITUDINAL DATA
r
observation should result in a conserva Ive
assessment of any treatment
benefits. I 't' te in particular contexts, we do not 'h WhIlst t ese argum ents are egl , d Ima ~ ard as a general met h0 d . recommend la.,t observation carrIe orw 13.4.2
Complete case analysis . I ay 0 f deaI ·mg
. ' 'th dropouts is to discard all IllcomWI , Another very slmp.e ':" 'ousl wasteful of data when the dropout process Yt 's Perhaps more seriously, it has plete sequences. ThIS IS obvl . I d t the measuremen proces. , 18 unre ate. o. b"f the two processes are related, as the comthe potentIal to mtroduce las I d t be a random sample with respect to plete cases cannot then be assume 0 the distribution of the measurements Yij' . h In general we d0 no t reeomm end complete case analysIs. Perhaps. t Ie · '. hen the scientific questions of interest are genullle y onIy excep t IOn IS w . . f thO k' d confined to the sub-population of completers, but s~tuatlOns .0 ~s III would seem to be rather specialized. There are many msta~ces III whIch the questions of interest concern the mean, or other propertIes, of the measurement process conditional on completion, but t~is is not quite the same thing. In particular, if we accept that dropout IS a random eve~t, a~d if we are prepared to make modelling assumptions about the relatI?nshlP between the measurement process and the dropout process, then the mcomplete data from the subjects who happened to be dropouts in this particu~aT realization of the trial provide additional information about the propertIes of the underlying measurement process conditional on completion, This information would be lost in a complete case analysis.
13.5 Testing for completely random dropouts In this section, we assume that a complete set of measurements on a unit would be taken at times t j , j = 1, ... ,n, but that dropouts occur. Then, the available data on the ith of m units are Yi = (YiI, ... , Yini)' with ni S n and Yij taken at time t j . Units are allocated into a number of different treatment groups. Our objective is to test the hypothesis that the ~ropouts are completely random, that is, that the probability that a umt drops out at time t j is independent of the observed sequence of mea:'u~ements on that unit at times t l , ... , t j - I . We view such a test as a prehmm~ry screening device, and therefore wish to avoid any parametric assumptIOns about the process generating the measurement data. Note that our. defi.nition of completely random dropouts makes no reference to the pOSSIble mfiuence of explanatory variables on the dropout process. For example, suppose that in a study comparing two groups with different mean response profiles, the dropout rate is higher in the group with the higher mean response If we ignored th ld' e group structure the dropout process wou appear to be informative in the undifferentiat~d data, even if it were
TESTING FOR COMPLETELY RANDOM DROPOUTS
289
completely random within each group. We therefore recommend that for preliminary screening, the data should first be divided into homogeneous sub-groups. Let Pij denote the probability that the ith unit drops out at time tj' Under the assumption of completely random dropouts, the Pij may depend on time, treatment, or other explanatory variables but cannot depend on the observed measurements, Yi' The method developed in Diggle (1989) to test this assumption consists of applying separate tests at each time within each treatment group and analysing the result.ing sample of p-values for departure from the uniform distribution on (0,1). Combination of the separate p-values is necessary if the procedure is to have any practical value, because the individual tests are typically based on very few cases and therefore have low power. The individual tests are constructed as follows. For each of k = 1, ... , (n - 1), define a function, hdvl, ... , Y~')' We will discuss the choice of hd·) shortly. Now, within each group and for each time-point tk, k = 1, .. , ,n-l, identify the R k units which have n; ~ k and compute the set of scores hik = hk (Yil, ... , Yid, for i = 1, ... ,Rk. Within this set of R k scores, identify those rk scores which correspond to units with mj = k, that is, units which are about to drop out. If 1 :s Tk < Rk, test the hypothesis that the Tk scores so identified are a random sample from the 'population' of Rk scores previously identified. Finally, investigate whether the complete set of p-values observed by applying this procedure to each time within each treatment group behaves like a random sample from the uniform distribution on (0,1). The implicit assumption that the separate p-values are mutually independent is valid precisely because once a unit drops out it never returns. Our first decision in implementing this procedure is how to choose the functions h k (.). Our aim is to choose these so that extreme values of the scores, hik , constitute evidence against completely random dropouts. A sensible choice is a linear combination, k
hk(Yl, ... ,Yk) =
LWjYj.
(13.5.1)
j=I
As with any ad hoc procedure, the success of this will be influenced by the investigator's ability to pick a set of coefficients which reflect the actual dependence of the dropout probabilities on the observed measurement history. If dropout is suspected to be an immediate consequence of an abnormally low measurement we should choose Wk =, 1 and all other Wj = O. If it is suspected to be the culmination of a sustamed sequence ~f low measurements we would do better using equal weights, Wj = 1 for all J. The next decision is how to convert the scores, hik, into a test statistic and p-value. A natural test statistic is hk, the mean of the Tk scores
290
MISSING VALUES IN LON
GITUDINAL DATA
. _ k h'ch are about to drop out. 't wIth n WI. I' corresponding to those um s 'andom the approximate samphng ( ISI H-" _ R- ",ilk hk, and variance completely at r If d ropouts occur _ . ith mean k - 'k ~t= I t _ 2 lribution of hk is GauRSIan, w,' ' 1)-1 ",Rk (hk - Hd , ThiH iR , h 52 - (Rk ~t=l t 52(Rk - rk)!(rkRk), w em k rng theory. See, for example, kif lementary samp I d R refill t rom e . the presen t context of the 1'k a n k a standard . . . some ' ' . atl'on may then be poor. For an Cochran (1977), However, III "an approxlm will be small and t he GaUSSI I t ndomization distribution of each e exact test, we can evaluate the comPd era mpling If (Rk) is too large for th sis of ran om sa . 1'k hk under the nil.II Ihypof, e, I t' Jrocedure is to sample from the 'ble a terna Ive I ' this to be practlca , a ~asl lH te h- after each of s - 1 indepen, . d' t 'butlOn vve recompu . k randomizatIOn IS r l ' h en wI'thout replacement from the I t ' s of rk scores c os dent ran~~m se ec I~n and let x denote the rank of the original hk amongst set hik, Z - 1, ... , Rk, Th _ xis is the p-value of an exact, Monte the recomputed values. en, P C I test (Barnard 1963). I ar;he final stage' consists of analysing the resultin~ ~et o~ p-.va ~es. · I I ses such as a plot of the empmcal distrIbutIOn . 'f h d'f Informa I grap hIca ana Y , function of the p-values with different plotting symbols to Identl y t e l ferent treatment groups, can be useful here. Diggle (1989) also sugg~sts a formal test of departure from uniformity using the Kolmogorov-Smlrnov statistic; in practice, we will usually have arranged that a preponderance of small p-values will lead us to reject the hypothesis of co~pletelY r~n~o~ dropouts, and the appropriate for~ of the Kolmogorov-Smlrnov statIstIc IS its one-sided version, D+ = sup{F(p) -pl. Another technical problem now arises because each p-value derives from a discrete randomization distribution whereas the Kolmogorov-Smirnov statistic tests for departure from a continuous uniform distribution on the interval a :'S p :'S 1. We would not want to overemphasize the formal testing aspect of this procedure, preferring to regard it as a method of exploratory data analysis. Nevertheless, if we do want a formal test we can again use Barnard's Monte Carlo testing idea to give us an exact statement of significance. We simply rank the observed value of D+ amongst simulated values based on the appropriate set of discrete uniform distributions which hold under the null hypothesis.
Example 13.1. Dropouts in the milk protein data Recall that these data consist of up to 19 weekly measurements of protein content in milk samples taken from each of 79 cows. Also, the cows were ~located amongst three different treatments, representing different diets, III a completely randomized design. Of the 79 cows, 38 dropped out during the study. There were also 11 intermittent missing values. In Section 5.4 we fitted ~ model to these data in which, amongst other things, the mean ~e~~onse Ill. each treatment group was assumed to be constant after an Imbal settling-in period . We not ed that an apparent nse . m . the observed mean response near the end of the experiment was not supported by testing
TESTING FOR COMPLETELY RANDOM DROPOUTS
291
against an enlarged model, and speculated that this might be somehow connected to the dropout process. As a first stage in pursuing this question we now test whether the dropouts are completely random. ' Th: dropo.ut~ are confined to four of the last five weeks of the study, and occur m 12 dlstmct treatment-by-time combinations. To construct the 12 test statistics we use h k (YI , ... , Yk) = Yk, the latest observed measurement and implem.ent.a M~nt~ Ca~lo tes~ using s = 999 random selections frOl~ the randomIzatIOn dlstnbutIOn of h k for each treatment-by-time combination. The resulting p-values range from 0.001 to 0.254. On the basis of these results, w~ reject firmly the hypothesis of completely random dropouts. For example, I~ ~e us: th.e K~lmogorov-Smirnov statistic to test for uniformity of the empIrIcal dlstnbutlOn of the p-values, again via a Monte Carlo implementation with s = 999 to take account of the discreteness of the problem, we obtain a p-value of 0.001. Note that 0.001 is the smallest possible p-value for a Monte Carlo test with s = 999. Table 13.1 shows the 12 p-values cross-classified by dropout time and treatment. It is noticeable that the larger p-values predominate in the third treatment group and at the later dropout times, although the second of these features may simply be a result of the reduced power of the tests as the earlier dropouts lead to smaller sample sizes at later times. We obtained very similar results for the milk protein data when we used hdYl, ... , Yk) = 1'-1 L~=k-r+I Yj, for each of r = 2,3,4,5. The implication is that dropouts predominate amongst cows whose protein measurements in the preceding weeks are below average. Now recall the likelihood-based analysis of these data reported in Section 5.4. There, we accepted a constant mean response model against an alternative which allowed for a rising trend in the mean response, whereas the empirical mean responses suggested a rise towards the end of the experiment. We now have a possible explanation. The likelihood-based analysis is estimating f..ll(t), the mean response which would apply to a population with no dropouts, whereas the
Table 13.1. Attained significance levels of tests for completely random dropouts in the milk protein data. Treatment (diet) Dropout time (wks)
Barley
Mixed
Lupins
15 16 17 19
0.001 0.016 0.022 0.032
0.001 0.001 0.053 0.133
0.012 0.011 0.254 0.206
292
MISS ., 'ING VALUES IN LO
GENERALIZED ESTIMATING EQUATIONS
NGITUDINAL DATA
. th ean response of a uuit condiempirical means are C8timatmg pdt)!, 't~ ~ t Under completely random , h ' u dropped out Jy 1m, . , 'th independent measurements at tional on Its not UVIll" dropouts WI ler random dropouts Wit . II s(~n-. dropouts, or nu(1er nndom < . • (.) L (t) whereat; un( . different, tImes, IL] t - 12 , (t f= 2(t). This underlines the danger of ally correlated mea..'mrementfJ, I~l ) ! /lIe I'U the collo(luial sense. . I d nt~ a"l Ignora ) 1 regardmg ranc om r?po ' .t·OII between Digglc's (19~9) proced'd (IDOl) mt~outaconneci ' Rl out . po ': .11" ls At each time-point, we could use the ure and logistiC regreHslOn an, YSl . to'ry variable in a logistic regression . h ( Y ) as an exp ana .. function k YI,"" ~. fd t Thus if Pk is the probablhty that a model for the probabIlity a ropon. , unit will drop out, we a'lsume that (13.5.2) Then, conditional on the observed valuc~, hik, for all the units who ha~e not dropped out previously, the mean, h, o~ those about to drop out IS the appropriate statistic to test the hypotheSIS that (3 = 0 (Cox and Snell, 1989, Chapter 2). . . Clearly, it is possible to fit the logistic model (13.5.2), or extensIOns of It, using standard methodology for generalized linear models as suggested by Aitkin et ai. (1989, Chapter 3). For example, we can introduce an explicit dependence of Pk on time and/or the experimental treatment administered. Example 13.2. Protein content of milk samples (continued) We now give the parametric version of the analysis described in Example 13.1. Let Pgk denote the probability that a unit in the gth treatment group drops out at the kth time. We assume that (13.5.3) with hk(YI, .. ' ,Yk) == Yk· In this analysis, we consider only those four of the 19 time-points at which dropouts actually occurred, giving a total of 24 parameters. The residual deviances from this 24-parameter model and various simpler models are given in Table 13.2. Note that the analysis is based on 234 ,binary responses. These consist of 79 responses in week 15, the first OccasIOn on which any units drop out 59 in week 11 from the units which did not drop out in week 15, and simil~rly 50 and 46 from weeks 17 and 19; there were no dropouts in week 18. From Jines 1 to 5 in Table 13.2, we conclude first that there is a strong dependence of the dropout probability on the most recently observed meas~rer.ncnt; for example, the log-likelihood ratio statistic to test model 5 f f d W wlthm model 4 is 197.66 - 119.32 == 78 34 on 1 d also conclud th t th . egree 0 ree am. e e, a e nature of the dependence does not vary between t rcat ments or 1 t 3' " . times' none f r the residual d . ' b 0 m~s o. gives a slgmficant reduction 1ll eVlance y companson With line 4. The dependence of the
293
Table 13.2. Analysis of deviance for a logistic regression analysis of dropouts in the milk protein data. Model for log
{Pgk /
+ (Jgkhk + {3gh k Ogk + (3khk Ogk + (3hk
1. Ogk
2. 3. 4.
Ogk
5.0 g k
6. 7. 8.
O:g
Og (l
+ o~ + (3h k + (Jhk
+ (3h k
(1 -
Pgk)}
Residual deviance 111.97 116.33 118.63 119.32 197.66 124.16 131.28 139,04
df 210 219 218 221 222 227 230 232
dropout rate on treatment or time is investigated in lines 6 to 8 of the table where we test different assumptions about the Ogk parameters in the model. We now have some evidence that the dropout rate depends on treatment. Comparing lines 7 and 8 the log-likelihood ratio statistic is 7,76 on 2 degrees of freedom, corresponding to a p-value of 0.021. The evidence for a dependence on time is rather weak; from lines 6 and 7 the log likelihood ratio statistic is 7.12 on 3 degrees of freedom, p-value = 0.089. The parametric analysis in Example 13.2 allows a more detailed description of departures from completely random dropouts than was possible from the non-parametric analysis in Example 13.1. As always, the relative merits of non-parametric and parametric approaches involve a balance between the robustness and flexibility of the non-parametric approach and the greater sensitivity of the parametric approach when the assumed model is approximately correct. Note that in both approaches the implied alternative hypothesis is of random dropouts. This underlines an inherent limitation of any of the methods described in this section - the hypothesis of completely random dropout is seldom of intrinsic scientific interest, and rejection of it does not determine the subsequent strategy for analysis of the data. As we have seen, the distinction between random and informative dropout is at least as important from a practical point of view as is the distinction between completely random and random dropout. 13.6 Generalized estimating equations under a random missingness mechanism One of the attractions of the method of generalized estimating equations (GEE), as described in Section 8.2.3, is that for problems in which the questions of interest concern the relationship between the population-averaged mean response and a set of explanatory variables, the GEE method provides
MODELLING THE DROPOUT PROCESS
MISSING VALUES IN LONGITUDINAL DATA
294
., I assum ption that .the model for d the mInima . consistent inference un er 'fi d In particular, If the analysIs . eetly specl e . the mean response IS carr. . atrix of the response vector • J: for the variance m assumes a workmg Jorm b i t but under reasonably general · y may e as , , which is incorrect, effi Clene.. I r the basic form of the GEE .t 's retamed. J~oweve, . h conditions, conSIS ency I, pletely random, otherWIse t e ~ethod assumes that any dropouts. are. COlm f ting equatIOn IS os t . . h Ima· t'll wish to estimate, under minconsistency of tees · d opouts we may s I When data con t am r , 11'lch would have pff~vailed in the . the mean response w imal assumptIOns, ins et al. (1995) present an extension of the GEE absence of dropouts. Rob d t which preserves the property of d t 'th random ropou s, method to a a WI h response without requiring correct consistent inference about t e mean s ecification of the covariance structure. . . p The basic GEE method uses the estimating equatIOn (8.2.4), whIch we reproduce here as (13.6.1)
Recall that in (13.6.1), Y i denotes the vector of responses on the ith s~b ject, Iti the corresponding mean response, and (3 ~he vector of regreSSI?n parameters defining the mean response. Expressed mformaIly, the essentIal idea in Robins et al. (1995) is that if Pij denotes the probability that subject i has not dropped out by time t j , given that subject's observed measurement history YiI,· .. ,Yi,j-l and any relevant covariate information, then the observation Yij from this one subject is representative of all subjects with comparable measurement and covariate information who would have been observed had they not dropped out. Hence, to restore the unbiasedness of (13.6.1) for the complete population we need to weight the contribution of Yij by the inverse of Pij. This leads to the extended estimating equation,
S{3({3, a) ==
8 (8;;; )' m
Var(Yi)-I p-I (Yi - Iti) == 0,
i~ which .p is a diagonal. matr!x with non-zero
(13.6.2)
elements Pij. Robins et al. ( 995) gIve a careful dIscussIOn of the precise conditions under which (13.6.2) does indeed lead to consistent inferences about (3 when the p" are themsleves estimated from the data using an assumed random dropo~~ model. This extension to GEE r . . . bl . . eqUlres, lllevlta y, that we can consIstently estunate the dropout prob b'l"t' J: • a Illes lOr each subject given their observed m easur~ment history and any relevant covariates. This makes the method b est SUIted to large-scale t d' A b " · dropout model fas u h' les.h th rgua ly, It IS a tall order to fit a paramet rIc ' r w IC e data necessarily provide relatively
295
sparse information, in circumstances where the analysts are reluctant to commit themselves to a parametric model for the covariance structure. Scharfstein et al. (1999) extend this approach to the case of informative dropout and argue in favour of presenting a range of inferences for the quantities of interest based on different values of informative dropout parameters, rather than attempting to estimate aspects of the dropout process on which the data provide little or no information. The idea of weighting observed measurements in inverse proportion to the corresponding estimated dropout probabilities also appears in Heyting et al. (1992). 13.7 Modelling the dropout process
In this section we review a number of approaches to parametric modelling of longitudinal data with potentially informative dropouts, highlighting the practical implications of each approach and the distinctions amongst them in respect of the assumptions which they make about the underlying dropout mechanism. The formal development will be in terms of a continuous response variable. The extension of these ideas to discrete responses will be discussed briefly in Section 13.9. Following Diggle and Kenward (1994), we adopt the notational convention that for anyone subject, the complete set of intended measurements, Y* == (Yj*, ... , Y;), the observed measurements Y == (YI , ... , Yn ) and the dropout time D obey the relationship
Yj ==
{lj*: 0:
j < D, j:::: D.
We emphasize that in this notation a zero value for Y is simply a code for missingness, not a measured zero. Note also that 2 :S D :S n + 1, with D == n + 1, indicating that the subject in question has not dropped out.
13.7.1
Selection models
In a selection model, the joint distribution of Y* and D is factorized as the marginal distribution of Y* and the conditional distribution of D, given Y*; thus P(Y*,D) == P(Y*)P(DjY*). The terminology is due to Heckman (1976), and conveys the notion that dropouts are selected according to their measurement history. Selection models fit naturally into Little and Rubin's hierarchy, as follows: dropouts are completely random ifF(D I Y*) == P(D), that is, D and Y* are independent; dropouts are random if P(D I Y*) == P(D IYt, ... , YD- I ); otherwise, dropouts are informative. * Let () and ¢ denote the parameters of the sub-model for Y and of the sub-model for D conditional on Y*, respectively. We now derive the joint distribution of the observable random vector, Y, via the sequence of
MISSING VALVES IN LO N
296
MODELLING THE DROPOU'"
GTTVDINAL DATA
.1
J
Pk(Hkl y; ¢)f;(y I Hk; O)dy
di = n
L(O, ¢) = L 1 (O)
m
LIO) =
.L log{fdi_I (Yin, i==1
!k(y I Hk; 0,
(13.7.3)
(1371-1373) determine the joint distribution of Y, and hence . ., " d h EquatlOns the likelihood function for 0 and ¢. Suppressing the depen ence on t e parameters, the joint pdf of a complete sequence, Y, is
m
L 2 (¢)
f(y) = R(YI)
II h(Yk IH
L 3 (0, ¢)
k)
/>(Yk IH.) } Pr(Yd = 0 IHd, Yd- k of 0)
g
d-I [
]
{I - Pk(Hk, Yk)} Pr(Yd = 0 I Hd, Yd- I
=
L
log{Pr(D
= di IYin.
Recall th~t under RD, L 3 (0, ¢) depends only on ¢, and can therefore be absorbed mto L 2 (¢). Hence, under RD, (13.7.6)
whilst for an incomplete sequence with dropout at the dth time-point,
=1d-l (y)
Pk(Hk , Yik)}
i:di'S.n
(13.7.4)
IT
= .L .L log{1 -
and
k==2
~ {mYd
di - l
i==1 k==1
m
l' 0), (13.7.5)
where 1a- I (y) deno:es. the joint pdf of the first d - 1 elements of Y* and the ~roduct term wlthm square brackets is absent if d = 2 (H . ate that under either CRD or RD y, the (unobserved) I f h ' Pk k, Y; ¢) does not depend on va ue 0 t e measur t t . emen a tIme tk· It follows that this term can be brought t' of (13,7,2), which then red~~e:I~~ the integral sign on the right-hand side
.
+ L 2 (¢) + L 3 (O,¢),
where
(13.7,2)
and for y 1= 0,
fry)
297
represents the sequence of observed m f' easurements on th . h ' + 1 1'the umt does not drop out and d. ' . e zt umt, where otherwise, Then, the log-likelihood for (0 A..) , IdentIfies the dropout time ''P can be partitioned as
. H where Hk = (Y1 ,,··, Yk-d· Let conditional distributions for Yk .g~venl kl. riate pdf of Y~ given Hk and • . 0
PROCESS
Pr(Yk = 0 I Hk_ 1 , Yk- I of- 0) = Pk(Hki ¢),
smce the integrand is now a pdf d' contribution to the likell'h d an mtegrates to one. This implies that the f 00 separates i t t one or ¢' We now consider the fo f no wo c.omponents, one for 0 and a set of data consisting of m 'trm,O th~ resultmg likelihood function for um S, m whIch Yi = {.. ._ Y'J' J - 1, ... ,di - I}
and maximization of L{O, ¢) is equivalent to separate maximization of L 1 (O) and L~ (¢). Hence the dropouts, which only affect L 2 (¢), are ignorable for .lzk~hhood-based inference about 0. Similarly, L 2 (¢) in (13.7.6) is the log-lIkelIhood associated with the sub-model for D conditional on the observed elements of Y*, and it follows that the stochastic structure of the measurement process can be ignored for likelihood-based inference about ¢. In summary, under random dropout, the log-likelihood associated with a selection model separates into two components, one for the parameters of the measurement sub-model, the other for the parameters of the dropout sub-mOdel, and it is in this sense that the dropout process is ignorable for inferences about the measurement process, and vice versa. Equally, it is clear that this would not hold if there were parameters in common between the two sub-models, or a more general functional relationship b~tween 0 and ¢. Moreover, the implicit assumption that the relevant SCIentific questions are addressed by inference about the Y* process may no.t be reasonable in particular applications. For example, if dropouts occur PrImarily because subjects with a poor response are withdrawn from the stUdy on ethical grounds, it might be relevant to ask what the trial would have shown if these subjects had not been removed, and this is precisely hat is addressed by an analysis of the Y* process. On the other hand, If dropouts OCcur because some subjects experience an adverse reaction to
:v
MISSING VALUES IN LO
MODELLING THE DROPOUT PROCESS
NGlTUDINAL DATA
J: bt . . I nse) then .ITuerences a ou , ' oar chmca respo , from a P . f bJ'eets for whom such adverse t re atment (as dlstmd .. ulatlOn 0 su I Y' relate to a fictitIOuS p~p . ht be of more practical relevance to ana reactions do not occur, and It mig t' and the pattern of responses , " d ' f adverse reac IOns . I' I d se both the mCI ence 0 '. 'th 0 adverse reactIOn. T llS ea s Y I t' n ofsubJeets WI n c . , .• amongst thesub-popu a 10 d 1 hich we discuss m SectIOn 13.7.2. , f tt n mixture, rna . ' I for () am1 on to the Idea 0 pa ,er . J: e s, wtive the log-hkehhooc d out process IS llllorma , I When t he rap , ' I nalysis becomes more romp ex. I the statlstlca a c/J does not separate. ane " he need to evaluate the integral (13.7.2) From a technical pomt of vl~ewt' t th computation of the likelihood. More comp Ica es c t t' d I th relationship between an observed at each dropou Ime, t Ily the need to rna e e . fun damen a , b' bi concomitant (the measurement whIch , t (dropout) and an uno serva e even b d h d the subJ'cct not dropped out) typIcally leads would have been 0 serve a , , 'ffi I 'd t'ft b'l't of the model parameters, makmg It dl cu t, or even d S J: to poor J en I a I I Y , 'ble, t 0 vaI'd ImpoSSI I at e the assumed model from the observed ata. ee, lor example, Fitzmaurice et ai. (1996).
298
Example 13,3. Protein content of milk samples (continued) We now fit the Diggle and Kenward model to the milk protein data, our objectives being to establish whether the dropout process is informative, and if so to find out if this affects our earlier conclusions about the mean response profiles, For the mean response profiles we use a simple extension of the model fitted in Section 5.4, where we implicitly assumed random dropouts. Note that this model allows the possibility of an increase in mean response towards the end of the experiment, If flg(t) denotes the mean response at time t under diet g, we assume that
Inf Section 5.4 ' Example 5.1, we use d a rna del for the covanance . structure a the complete measurement process Y*(t) which included three distinct components of variation' a d ' t . 11 . ran am m ercept component between animals, a sena y correlated compo t 'thO , urement err nen WI In ammals, and an uncorrelated measor component Howe th ' between animal , v e r , e estImated component of variation this componentSt:~e;~~YF~:tl, and in what follows we have chosen to set of the dropouts occur in th 1 y, ~r the dropout process we note that all ast for the probability of dr e t ve weeks of the experiment, Writing Pk opou at time k h for k :s; 14, whereas for k > 15 ' we t erefore assume that Pk = 0
-
,
299
Table 13.3. Likelihood analysis of dropout mechanisms for the milk protein data. Dropout mechanism
2L max
ID RD ({31 == 0) CRD ({31 == {32 == 0)
2381.63 2369.32 2299.06
Table 13.3 gives values of twice the maximized log-likelihood for the full model and for the reduced models with cP1 == 0 (random dropouts) and 4>1 = 4>2 = 0 (completely random dropouts). Comparison of the RD and CRD lines in Table 13.3 confirms our earlier conclusion that dropouts are not completely random. More interestingly, there is overwhelming evidence in favour of informative dropouts: from the ID and RD lines, the likelihood ratio statistic to test the RD assumption is 12,31 on one degree of freedom, corresponding to a p-value of 0.0005, In principle, rejection of the RD assumption forces us to reassess our model for the underlying measurement process, Y·(t). However, it turns out that the maximum likelihood estimates of the parameters are virtually the same as under the RD assumption. This is not surprising, as most ofthe information about these parameters is contained in the 14 weeks of dropoutfree data, With regard to the possibility of an increase in the mean response towards the end of the experiment, the maximum likelihood estimates of {32 and {33 are both close to zero, and the likelihood ratio statistic to test {32 == {33 = 0 is 1.70 on two degrees of freedom, corresponding to a p-value of 0.43, In the case of the milk protein data, our reassessment of the drop~ut process has not led to any substantive changes in our inferences concernmg the mean response profiles for the underlying dropout-free process Y*(t). This is not always so. See Diggle and Kenward (1994) for exa~ples. Note also that from a scientific point of view, the analYSIS rep~r~ed here is suspect, because the 'dropouts' are an artefact of the .defi~ItIOn of time as being relative to calving date, coupled with the ter~llnat.lOn of the study on a fixed calendar date. This leads us on to ~ dISCUSSlO~ of pattern mixture models as an alternative way of representmg potentIally informative dropout mechanisms,
13.7,2
Pattern mixture models
P attern mixture models introduced by L1'ttl e (1993) , work .with the " factor' .IzatlOn , of the joint distribution ' * d D' t th marmnal dIstrIbutIOn of Y an moe 0'
MODELLING THE DROPOUT PROCESS MISSING VALUES IN L
ONGITUDINAL DATA
f Y* given D, thus P(Y*, D) = ·' I distributIOn a . . ,'hl t of D and the can dItlOna . oint of view, it IS always POSSI e 0 P(D)P(Y*' D), From a theoretIcal p . t e model and vice versa, as they d I as a pattern mIX ur . . I , .' f the same joint distrihutJOlI. n pracexpress a selection mO e .' t' . · factorIzatIOns 0 are simply aIternat Ive ifferent kinds of simplIfymg assump IOns, " the two approaches lead to d t Ice, I and hence to different ana yses. , sible rationale for pattern mix' oint of VIeW, a pos , From a rna deIImg P " d t time is somehow predestmed, th t each subject s ropOu . . 'es between dropout cohorts, ThIS ture rna deIS IS a and that the rneasuremendt process I:~:;y to apply very often although, as , I 't tation waul seem un I htera III erpre . ' th 'Ik protein data originally analysed by noted above one exceptIOn IS e ml th 'd t . d d (1994) using a selection model. Because e ropou Dlggle an enwdar . I to different cohorts the literal interpretation times' correspon precIse y . ' of a pattern mixture model is exactly rIght for these d~ta. The arguments in favour of pattern mixture ,mod~llmg are. us~ally o~ a of subjects m a 10ngItudmai tnal ' k'III d " First classification more pragmat IC . . . , , according to their dropout time prOVIdes an obvIous way of dIvIdmg the subjects into sub-groups after the event, and it is sensible to ask whether the response characteristics which are of primary interest do or do not vary between these sub-groups; indeed, separate inspection of sub-groups defined in this way is a very natural piece of exploratory analysis which many a statistician would carry out without formal reference to pattern mixture models. See, for example, Grieve (1994), Second, writing the joint distribution for Y* and D in its pattern mixture factorization brings out very clearly those aspects of the model which are assumption-driven rather than data-driven. As a simple example, consider a trial in which it is intended to take two measurements on each subject, Y* == (Yt, Yn, but that some subjects drop out after providing the first measurement, Let f(y Id) denote the conditional distribution of Y* given D == d, for d = 2 (dropouts) and d = 3 (non-dropouts). Quite gen~rally, f(y Id) == f(YI Id)f(Y2\ YI, d) but the data can provide no informatIon about f(Y2 \YI, 2) since, by definition D = 2 means that Y* is not observed. ' 2 Extensions of the k'nd f l ' tern ' t d 1 I 0 examp e gIven above demonstrate that patmIx ure mo e s cannot b 'd t'fi d . the conditional distributions e 1 en I e WIthout pl~ing restrictions on the use of complete . !(y I d): For example, LIttle (1993) discusses case mtsszng vanable restr' t' h' h assuming that for each d IC Ions, w IC correspond to < n + 1 and t :::: d, 300
.
K
f(Yt\YI'''.,Yt-I,d)=f(y
t
ly1, ... ,Yt-l,n+1).
At first sight pattern ' t hierarchy, Ho~ever Mo~~: urehmodels do not fit naturally into Rubin's , , erg s et al (1997) h h corresponds precisely to the th £ 11 '. s ow t at random dropout e 0 Owmg set of restrictions, which they
301
call available case missing value restrictions: f{Yt I YI" , . ,Yt-I, d) = f(Yt I YI,·,. , Yt-I, D > t).
This result implies that the hypothesis of random drop t b . . . , , ou cannot e tested Without makmg. addItIOnal assumptIOns to restrict the 1· f . c ass 0 a Iternat, , smCe the available case Inissl'ng val ue res t.ne . t'Ions ives under conSIderatIOn, . . , . cannot he verIfied empmcally, The identifiability associated , .problems " . . ' with in~ormat'Ive d ropout models,. ~nd th: ImpOSSIbIlIty of vahdatmg a random dropout assumption on e~lpI~Ical eVIdence a,lone, serve as clear warnings that the analysis of a 10ngItudmai data-set WIth dropouts needs to be undertaken with extreme caution .. How~ver, in the author's opinion this is no reason to adopt the superfiCIally SImpler strategy of automatically assuming that dropouts are ignorable.
Example 13.3, Protein content of milk samples (concluded) The first stage in a pattern mixture analysis of these data is to examine the data separately within each dropout cohort. The result is shown in Fig. 13.2, The respective cohort sizes are 41, 5, 4, 9 and 20, making it difficult to justify a detailed interpretation of the three intermediate cohorts. The two extreme cohorts produce sharply contrasting results. In the first, 19-week cohort the observed mean response profiles are well separated, approximately parallel and show a three-phase response, consisting of a sharp downward trend during the initial three weeks followed by a gentle rising trend which then levels off from around week 14. In the IS-week cohort, there is no clear separation amongst the three treatment groups and the trend is steadily downward over the 15 weeks, We find it hard to explain these results, and have not attempted a formal analysis, It is curious that recognition of the cohort effects in these data seems to make the interpretation more difficult,
13,7.3
Random effect models
Random effect models are extremely useful to the longitudinal data anal?,st. They formalize the intuitive idea that a subject's pattern of ,resp~nses 1~ a study is likely to depend on many characteristics of that subject, mcludmg some which are unobservable. These unobservable characteristics are then included in the model as random variables, that is, as random effects. It is therefore natural to formulate models in which a sUb!ect's propensity to drop out also depends on unobserved variables, that IS, on random effects, as in Wu and Carroll (1988) or Wu and Bailey (1989). In the present context, a simple formulation of a model of this kind woul~ ?e t~ pO,stul~te ' . TT ) a b Ivanate random effect, U = (UI , U2 an d t 0 mo del the jomt distnbutIOn
MISSING VALUES IN
802
MODELLING THE DROPOUT PROCESS
LONGITUDINAL DATA
303
hierarchy, the dropouts in (13.7.7) are completely random if U and U
4.6
1 are independent, whereas if U1 and U2 are dependent then in general the2
4.5
dropouts are informative. Strictly, the truth of this last statement depends on the precise formulation of It (y I UI) and f2(d IU2). For example, it would be within the letter, but clearly not the spirit, of (13.7.7) to set either fr(ylud = fl(y) or h(dlu2) :::: h(d), in which case the model would reduce trivially to One of completely random dropouts whatever the distribution of U.
4.0 4.0
':'.,.,;<:: , ,,--~
.
""
•
3.0
2.5
:.-,
,
""
.
,I
:5 3.6 3.0
"
L----;;:;--1'5-' II 10 15 o
2.5
L---:----:;~-ffi--' 6 10 15 o Time (weeks)
Tim. (w.ek.)
4.0
3.0
2.5
10
15
Contrasting assumptions: a graphical representation
In our opinion, debates about the relative merits of selection and pattern mixture models per se are unhelpful: models for dropout in longitudinal studies should be considered on their merits in the context of particular applications. In this spirit, Fig. 13.3 shows a s~hematic view o~ selecti.on, pattern mixture and random effects models whIch takes the pomt of VIew
4.5
6
13.7.4
100
L--_~-~------:::-----
o
5
nm. (w.ek.)
10
15
(a)
(b)
(c)
(d)
Time (weeks)
80
4.5
60
2.5
40
o
5
10 TIme (weeks)
15
Fig. 13.2. Observed mean response profiles for milk protein data, by dropout
cohort.
of Y· , D and V as
f(y, d, u) :::: h (y Iul)h(d IU2)f3(U).
(13.7.7)
In (13.7.7), the dependence between Y· and D is a by-produot of the Uland V 2, or to put it another Way, Y* 'andD are conditIOnally mdependent given U. In terms of Little flJuj".Rubin's
depende~c~ betwe~n
MISSING VALUES IN LONG!
TUDINAL DATA
A LONGITUDINAL TRIAL OF DRUG THERAPIES
h ther or not we recognIze effects are almost always ';ith u~ (:i: formulating a model for that random d I ) and that one conslderatlO h' k what kinds of causal Id be to t i l l . h 'our roo e s , t em I~ , I t ial with dropouts wou ts of random vanables: plausibly,exist U. Each ts Y dropout tImes measuremen, 'd ndence grap h for these three. ranuom . , Fi, 13,3 is a conditional III epe d e between two vertices mdrcates III 'b gles in which the absence of an et,g are conditionally independent varra , 'bl 'n ques IOn I. that the two random vana es I e most general kind of mode rs, r~p. the third (Whittaker, 1990), Th I ft anel whilst the remammg gIven h in the top- e P , , resented by the complete grap, I d deleted correspond to selectIOn, 'h of which has a Slllg e e ge , graphs, eae d effects models, 'd h k' d of thought experiment that Pattern mixture and ran om , ' t d d as an aJ to t e III P' Figure 13,3 IS III en e . d 'd' how to deal with dropouts. 19. the data analyst must conduct III eCI. IllgI'fy'ng assumptions are possible. 'I that any SImp 1 I 13,3(a) represents a dema I compelled to express a model , uld be more or ess Under this scenano, we wo ., d' t 'b t' ns for Y and U conditional on 11 fon of Jomt IS n u 10 , for the data as a'bl co ec I 13 , 3(b) invites interpretatIOn d' crete vaIues 0 f D , F'g 1. each of the pOSS.I ~ IS , d ff ts or latent subject-specific charas a causal cham m whIch ran om e ec Y £ t 't' U influence the properties of the measurement process, ,or ~ee::bj~~t i~ question, with propensity to drop out subsequently det~r mined by the realization of the measurement process. In contrast" F~g, 13,3(c) invites the interpretation that the s~bject-specific ch~racte~rs~lCs initially determine propensity to drop out, wrth a cO,nsequentlal vanatlOn in the measurement process between different, predestmed dropout cohorts, Finally, Fig, 13,3(d) suggests that measurement and dropout p~ocesses are a joint response to subject-specific characteristics; we could thmk of th~se characteristics as unidentified explanatory variables, discovery of whIch would convert the model to one in which Y and D were independent, that is, completely random dropout. Figure 13.3 also prompts consideration of whether selection modelling of the dropout process is appropriate if the measurement sub-model for Y includes measurement error. To be specific, suppose that the measurement sub-model is of the kind described in Section 5.3.2, in which the stochastic variation about the mean response has three components,
304
:e~~~~~~~;~i:ht
;ma:g~~~~:~ :~eets
tij = J.Lij + {d:jU i + Wi(t ij )} + Zij,
305
.
diag.~am
(13.7,8)
The br~cketing of ~wo of the terms on the right-hand side of (13.7,8) is to emphasIse that {dijU i + Wi (t ij )} models the deviation of an individual subject's resp~~se trajectory from the population average, whereas Zij represents additive measurement error, that is, the difference between the ?bserve~ response Yij and an unobserved true state of nature for the subject m qUest,rOD" In a model of this kind, if dropout is thought to be a response to the subJect s true underlying state, it might be more appealing to model the
conditional dropout probability as a function of this underlying state rather than as a function of the observed (or missing) Yij, and the conclusion would be that a random effects dropout model is more natural than a selection dropout model. Even when the above argument applies, a pragmatic reason for cosidering a selection model would be that if a random dropout assumption is sustainable the resulting analysis is straightforward, whereas non-trivial random effects dropout models always correspond to informative dropout because the random effects are, by definition, unobserved. 13.8
A longitudinal trial of drug therapies for schizophrenia
In this section we present an analysis, previously reported in Diggle (1998), of data from a randomized trial comparing different drug therapies for schizophrenia, as described in Example 1.7. Table 13.4 lists the data from the first 20 subjects in the placebo arm, Table 13.4. PANSS scores from a subset of the placebo arm of the longitudinal trial of drug therapies for schizophrenia, from weeks -1, 0, 1,2,4, 6 and 8, where -1 is pre-randomization and 0 is baseline, A dash signifies a missing value,
-1
0
1
2
84 68 108
112 56
112
117
79
87
104
77
69 96 72 106 96
44
89 95 60
79 72 94 104 102 119 94 91 73
113 111 118 89
78 64
76 90 105
84
8
58
70
68
60 84
98
88
98 88
111
99
84 53 75 116 94 153 66 57
95
113
83
108
76
101
53
55
64 86
95
108
98 52 144
47
72
116 90 57
71
64
72
65
103
95 97
110 97
107
102
123 100
113 90
63
77
57
115
121
103
104 91 64 84
88 122
6
4
113 48
80
121
90 52
62
A LONGITUDINAL TRIAL OF DRUG
MISSING VALUES IN LONGITUDINAL DATA
THERAPIES
307
306
' t only 253 are listed as completing the 23 t s a complete sequence of PANSS scores, RecaJl th at , 0 f theh 5 16 po. len 'ded d Ith gh a furt er provi stu y, a o.u d' t 'b t' of the stated reasons for dropout, whilst gives the IS rJ U IOn . each 0 f Tabl e 16 '. h b f dropouts and completers III the'SIX Table 1 7 gIves t e num ers 0 . ,. . . N t th t the most common reason for dropout IS madtreatment groupS. 0 e a . h I b , d th t the highest dropout rate occurs III t e pace 0 equate response, an a d ' 'd group, followed by the haloperidol group and the lowest ose flspen one
100
group.
As shown in Fig. 1.6, all six groups show a mean response profile decretlBing over time post-baseline, with slower apparent rates of decrease
towards the end of the study. Figure 13.4 shows the observed mean response as a function of time within sub-groups corresponding to the different dropout times, but averaged across all treatments. The mean response profile for the compIeters is qualitatively similar to all of the profiles shown in Fig. 1.6, albeit with a steeper initial decrease and a levelling out towards the end of the study. Within each of the other dropout cohorts a quite different picture emerges. In particular, in every case the mean score increases immediately prior to dropout. This is consistent with the information that most of the dropouts are due to an inadequate response to treatment. It also underlines the need for a cautious interpretation of the empirical mean response profiles. Figure 13.4 provides strong empirical evidence of a relationship between the measurement and dropout processes. Nevertheless, we shall initially analyse.the m~asurement data ignoring the dropouts, to provide a basis for comparison with an integrated analysis of measurements and dropouts. From now on, we ignore the pre-baseline measurements because these preceded the establishment of a stable drug regime for each subject Ch T~s phase of the analysis follows the general strategy described in I ap er 5. Wfie first convert the responses to residuals from an ordinary eas t squares t to a model h' h 'fi combination of tl'm d t w IC speC! es a separate mean value for each e an reatment W th . this residual process u d th . e. en estimate the variogram of only on the time-separ' n. er eTh8SsumPt~on that the variogram depends , at IOn u. e resultmg' . III Fig. 13.5, Its salient £eat vanogram estImate is shown 11 Ures are' a sub t t' I . smooth increase with u. . . s an 10. mtercept; a relatively , a maximum value b t . cess variance These feat su s antlally less than the pro, . ures suggest fitt' III Section 5.2.3 whilst th lUg a model of the kind described . .' e general shape f th . consistent WIth the Gau . . a e VarlOgram appears to be sSlan correlation function (5.2.8). Thus,
Q)
rn
c::
o
c. ~
90
c::
rc Q) E
80
70
o
2
4
8
6
Time (weeks)
Fig. 13.4. Observed mean responses for the schizophrenia trial data, by dropout cohort an averaged across all treatment groups.
and
2
Recall that the three variance components in this model are a , the variance of the serially correlated component, 7 2 = a 2 al' the measurement error component, and v 2 = cr2a2' the between subject component. As a model for the mean response profiles J.li(t), Fig. 13.6 suggests a collection of convex, non-parallel curves in the six treatment groups. The
MISSING VALVES IN LON
308
A LONGITUDINAL TRIAL OF DRUG THERAPIES
GITVDINAL DATA
400
300 .1200 ~ 100
oLO---~2---~4---~6;;----';8--~ Lag
Fig. 13.5. The estimated variogram for the schizophrenia trial data. The horizontal solid line is an estimate of the variance of the measurement process. The dlUlhed horizontal lines are rough initial estimates of the intercept and asymptote of the variogram.
400
300
~
.g 200 III
> 100 0 0
2
4
6
Lag
:i :3,6,.
8
g. The estimated variogram for the schizophrenia trial data (~_), oget er With a fitted parametric model ( ).
simplest model consistent with th'Is beh' . aVlour IS /L;(t)
'=
/L
+ 81; + (hi + "Ike:
k = 1, ... ,6,
(13.8.1)
where k '= k(i) denotes the treatment . allocated. A more sophistl' t d . group to which the ith subject is . a constraint that the meanca e non-lInear m0 d e,I perh aps'mcorporatmg . response should b hOrIzontal asymptote as t' . e monotone and approach a lme mcreases , m'Ight be preferable on biological
309
grounds and we shall return to this question in Ch t 14 B ., I d .. f h ap er . ut for a purely ernprnca escnptlOn 0 t e data a low-order p l ' 1 . 0 ynoIDla model should be adequate. Note also that, as dIscussed earlier' thO h . . . III IS C apter, the empirIcal man response traJectones are not estimating th ·(t) d 'd d I h' e /-it ,an should be conSI ere on y as.rou.g ~Ides to their general shape. The focus of SCIentIfic mterest is the set of me . '" an response curves, /l i (t). In partIcular, we wIsh to mvestJgate possible simpl'Ifi cawns t' f h . 0 t e assumed mean . .response . , model by testmg whether correspond'Ing . se t,s 0 f contrasts are ~Igmficantl.y dIfferent fro.m ~ero. As described in Section 5.3.3, two ways of Implementmg tests ~f t~IS kmd are to use the quadratic form To defined at (5.3.14) or the log-hkehhood ratio statistic W k defined at (5.3.15). For the current example, two different kinds of simplification are of potential interest. First, we can ask whether the quadratic terms are necessary, that is, whether or ~ot the six quadratic parameters II; in (13.8.1) are all zero. The quadratIc form statistic is To = 16.97 on 6 degrees of freedom, correponding to a p-value of 0.009, so we retain the quadratic terms in the model. Second, in Fig. 1.6 the differences amongst the mean response curves for the four risperidone treatments were relatively small. Also, when we fitted the model (13.8.1) we found that the estimated values of the risperidone parameters were not related in any consistent way to the corresponding dose levels. This suggests a possible simplification in which all four risperidone doses are considered as a single group. Within the overall model (13.8.1) the reduction from the original six to three groups is a linear hypothesis on 9 degrees of freedom (3 each for the constant, linear and quadratic effects). The quadratic form statistic is To = 16.38, corresponding to a p-value of 0.059. We therefore arrive at a model of the form (13.8.1), but with k = 1,2 and 3 corresponding to haloperidol, placebo and risperidone, respectively. The parameter estimates for this model are shown in Table 13.5. Note that the treatment code is 1 = haloperidol, 2 = placebo, 3 = risperidone, and that estimation of the mean response parameters incorporates the constraint that 61 = 0, that is, 62 and 63 represent the differences between the intercepts in the placebo and haloperidol groups, and in t~e the risp~ridone and haloperidol groups, respectively. The ran~om all~catlOn of patIents t~ treatment groups should result in estimates 62 and (h close to zero, an this is indeed the case, both estimates being less than two standard erro~s in magnitude. The evidence for curvature in the mean response profiles ~s not strong; the log-likelhood ratio test of the hypothesis that the quadr~tIc parameters are all zero is D = 8.72 on 3 degrees of freedom, correspondmg to a p-value of 0.033. How well does the model fit the data? Figure 13.6. shows . th~ cor. an d parame t ric maxImum hkelihoo d respondence between non-parametnc estimates of the variogram. The fit is reasonably good, in that. the valu~s of V (u) are closely modelled throughout the observed range of tIme-lags m
MISSING VALUES I
A LONGITUDINAL TRIAL OF DRUG THERAPIES
N LONGITUDINAL DATA
311
ent model fitted to d th measurem 5 Parameter estimates fo~ medropout a,.'lsumption and un er 13 Table . ., . I data , under ran 0 h nia t na mption the SChIZOp re . d ro pout assu informatIve SE Parameter Estimate
310
Mean response t2 , k !J>i(t) = f.l + 15k + 8k t + 'Yk .
= 1,2,3
!J>
01 02 03 81 B2 83 1'1 1'2 1'3
88.586 0 0.715 -0.946 -0.207 0.927 -2.267 -0.113 -0.129 0.106
0.956 0 1.352 0.552 0.563 0.589 0.275 0.081 0.088 0.039
95
90 \ \ \ \
\
.............
\
, Q)
UI
85
c
0 C-
\
"
\
'\, " \
l!?
"
"
"
",
\
,
\
C1l
....................."'\\...
"
\
c
....,.......
,
\
UI
.•....
\.,\
\
Q)
~
.....
\
\ \
80
\ \ \
Covariance structure 2)} ,(u) =a2{al+1-exp(-a3u Var(Y) = q2(1 + al + a2)
CJ2
al a2 a3
170.091 0.560 0.951 0.056
the data. Notice, however, that the estimate of 01 involves an extrapol~~ tion to zero time-lag which is heavily influenced by the assumed parametn form of the correlation function. . e Figure 13,7 shows the observed and fitted mean responses m t~ haloperidol, placebo and risperidone groups. On the fac~ of i~, t~e fit IS qualitatively wrong, but the diagram is not comparing hke wIth hke. As discussed in Section 13.7, the observed means are estimating the mean response at each observation time conditional on not having dropped out prior to the observation time in question, whereas the fitted means are actually estimating what the mean response would have been had there been no dropouts. Because, as Fig. 13.4 suggested and as we shall shortly confirm, dropout is asociated with an atypically high response, the non-dropout subpopulation becomes progressively more selective in favour of low responding subjects, leading to a correspondingly progressive reduction in the observed mean of the non-dropouts relative to the unconditional mean. Note in particular that the fitted mean in the placebo group is approximately constant, whereas a naive interpretation of the observed mean would have suggested a strong placebo effect. It is perhaps worth emphasizing that the kind of fitted means illustrated in Fig. 13.7 would be produced routinely by software packages
\ ~,
75
70
~
~ o
2
4 Time (weeks)
6
8
. t he pace I b0 (•••: •• ) , haloperidol (( - ) Fig. 13.7. Observed and fitted means III ared .. p . WIth fitted ts means ...... , and risperidone (------) treatment groups, com and - - - - - -, respectively) from an analYSIS Ignormg dropou .
. corre1at ed data.. They the correresult which include facilities for modellmg the are likely ' d 1 hich recogmzes 1of .a lIkelihood-based fit to a mo e tsw on t h e same subJ'ect but treats ation between repeated me~ureme~ nd Rubin sense. As discussed lllissing values as ignorable III the LIttle a t be appropriate to ' Section 13.7, estimates 0 f t h'IS k"III d , mayor may no In l' t 'on but in any event th . tIcular app ICa 1 , d e scientific questions pose III a par d and fitted means surely the qualitative discrepancy between the observe deserves further investigation,
A LONGITUDINAL TRIAL OF DRUG THERAPIES
MISSING VALUES TN LONGITUDINAL DATA
312
of measurements and dropouts. d t joint ana Iy. S IS' . . We therefore procce 0 a. 134' which the mean response WIthIn . .' In ase I'mmediately before dropout, T he empirical behaviour of FIg. ' h sharp mcre .' , each dropout cohort 8 ows ~ derance of 'inadequate response as coupled with the overwhelmmg prePtonmo'delli~g the probability of dropout £i dropout sugges s the stated reason or ':I e that is a selection model. In the . of the meEl8urec respons " . as a ,functIOn . 'm Ie 10 istic regression model for dropout, WIth the first Instance, we fit a SI p g lanatory variable. Thus, if Pi) denotes t asurement as an exp . most recen me . t' d ops out at the jth time-pOInt (so that the the probability that patIen z r . . jth and all subsequent measurements are mIssIng), then
logit(pij)
= ¢o + ¢lYi,j-l.
(13.8.2)
The parameter estimates for this simple dropout model are ¢o = -4.~17 and ~1 = 0.031. In particular, the positive estimate of 1>1 confirms that .hl~h responders are progressively selected out by the dropout process. W1thm this assumed dropout model, the log-likelihood ratio statistic to test the sub-model with (it = 0 is D = 103.3 on 1 degree of freedom, which is overwhelmingly significant. We therefore reject completely random dropout in favour of random dropout. At this stage, the earlier results for the measurement process obtained by ignoring the dropout process remain valid, as they rely only on the dropout process being either completely random or random. Within the random dropout framework, we now consider two possible extensions to the model: including a dependence on the previous measurement but onei and inclUding a dependence on the treatment allocation. Thus, we replace (13.8.2) by (13.8.3)
where k :::: k(i) denotes the treatment allocation for the ith subject ~ot.h exte~sions yield a significant improvement in the log-likelihood, ~ mdlcated III the first three lines of Table 13.6.
Table 13.6. Maximized log-likelihoods under different dropout models. logit(pij) Log-likelihood (30
+ (3!Yi ,j-I
+ (3IYi,j_1 + (32Yi,j-2 + (3IYi,j-l + (32Yi,j_2 (301< + "fYij + {3IYi.j-l + (32Yi,j
Finally, .we test ~he random dropout assumption by embedding (13.8.3) within the InformatIve dropout model logit(pij)
= /3ok + "YYij + 131Yi,J-1 + 132Yi.J-2.
(13.8.4)
FraIn lines 3 and 4 of Table 13.6 we compute the log-likelihood ratio statistic to test the ~ub-model of (13.8.4) with') = a as D = 7.4 on 1 degree of freedom ThIS corresponds to a p-value of 0.007, leading us to reject the random dropout assumption in favour of informative dropout. We emphasize at this point that rejection of random dropout is necessarily pre-conditioned by the particular modelling framework adopted, as a consequence of the Molenberghs et al. (1997) result. Nevertheless, the unequivocal rejection of random dropout within this modelling framework suggests that we should establish whether the conclusions regarding the measurement process are materially affected by whether or not we assume random dropout. Table 13.7 shows the estimates of the covariance parameters under the random and informative dropout models. Some of the numerical changes are substantial, but the values of the fitted variogram within the range of time-lags encompassed by the data are almost identical, as is demonstrated in Fig. 13.8. Of more direct practical importance in this example is the inference concerning the mean response. Under the random dropout assumption, a linear hypothesis concerning the mean response profiles can be tested using either a generalized likelihood ratio statistic, comparing the maximized log-likelihoods for the two models in question, or a quadratic form based on the approximate multivariate normal sampling dis~ribution. of the estimated mean parameters in the full model. Under the mformatIve dropout assumption, only the first of these methods is available, ~ecause the current methodology does not provide standard errors for the estImated treatment effects within the informative dropout model. Under the random dropout model (13.8.3), the generalized likelihood ratio statistic to test the hypothesis of no difference between the three mean . D = 42.32 on 6 d egrees 0 f freedom , whereas under the response profiles IS Table 13.7. Maximum likelihood estimates ~ covariance parameters under random dropout an informative dropout models. Parameter
(30
(301<
2
-20743.85 -20728.51 -20724.73 -20721.03
313
Dropout Random Informative
170.091 137.400
0.560 0.755
0.951 1.277
0.056 0.070
A LONGITUDINAL TRIAL OF DRUG THERAPIES
LONGITUDINAL DATA . MISSING VALUES IN ~ __ .._
314
315
95
400
E
300
i 200
90
'I:
~
, \
100
. \~~ .. ".
,
\
,,
OL------::---------;4~----66---~88- 2 o Lag
\
....... \
85
,
\
" Q)
en c
0 0-
,,
~
( 8 4) th corresponding statistic is D = informative dropout model 13.., e profiles differ significantly 35.02. The conclusion tha~ the mean response The estimated treatment
\
Q)
.,
.\'\
'. ....
\'
\\ ,
c
~
\
, \, \\,\
~
..........
\
en
the schizophrenia trial data ( - ) , Fig. 13.8. The estimated ~ariogra~ or 'ng random dropouts ( ) and together with fitted parametflc mode s assuml informative dropouts (......). f
...........
\
80
\
,
\
\ .....
,,
, ~
\ ...
-- --
75 ............
~=~~"::::f:,:,::~~::::;y~~:~~7::~~their estimat~dl'tan~M~
errors when we move from the random dropout model to the m orma :h dropo~t model. For each parameter, the absolute difference betwe~n .e estimates under informative and under random dropout assumptIOns IS less than the standard error of the estimate under the random dropout assumption. Finally to emphasize how the dependence between the measurement ., processes affects the interpretation of the 0 bserved mean and dropout response profiles, Fig. 13.9 compares the observed means in the three treatment groups with their simulated counterparts, calculated from a simulated realization of the fitted informative dropout model with 10 000 subjects in each treatment group. In contrast to Fig. 13.7, these fitted means should correspond to the observed means if the model fits the data well. The correspondence is reasonably close. The contrast between Figs. 13.7 and 13.9 encapsulates the important practical difference between an analysis which ignores the dropouts and one which takes account of them. By comparison, the distinction between the random dropout model (13.8.3) and the informative dropout model (13.8.4) is much less important. In summary, our conclusions for the schizophrenia trial data are the follOWing. Figure 13.9 demonstrates that all three treatment groups show a downward trend in the mean PANSS Score conditional on not having
"-
70
-- '.
~
~ o
2
4 Time (weeks)
6
8
) haloperidol (--) and risperiFig. 13.9. Observed means in the placeboJ";jOth simulated means conditional done (- - ~ - - -) treatment groups, compare and , respectively), from an on not yet having dropped out (...... , informative dropout model. b t en . . the mean PANSS score e we dropped out of the study; the reductlOn7~8 If we want to adjust for the baseline and 8 weeks is from 86.1 to . ~ fitted means are those shown selection effect of the dropouts, the releva~ANSS score is almost constant in Fig. 13.7. In this case, t~e fitted n:e~~wnward trend in each ~f the ~~ in the placebo group but stIll shows , hange between baseline an c 86 5 to 71.4. I n VIe . w of the f m active treatment groups. The ' 0 n - average . . . P is now ro . . d t weeks In the flspendone grou h It of an ma equa e response, fact that the majority of dropou t s are t e resu
DISCUSSION
. LONGITUDINAL DATA MISSING VALUES IN
317
." gP. PANSS ~l:Or('. WI' I bserved avUd ' " r . t this artificially deflates t,w (~, ' '1'37 Ilrp morr' appropnatr' 1lI( ,land tha, " t.h fitted curves In F II!;, " j s t Inn are those' III .f the treatnwIl., . , 11 u(' that ,c ". wou ( arg , 'I' .h mi('a! effN:t,ivene..ss 0 , " , 'hipves H lower mean .. t 'of tJw llOe e " . 'spendone , I e , , ea ,ors Fr ('I'tlwr IJOlllt of VIPW, rJ, l t.' 'II 'l!Id tlip estllnatpd F' 13.!J, om·,· . 'hout tile ,1'1<, ' ,Ii' score than weeks dose to tllf' 20 Yr, t ]wtwpcn basehne. ,L! .1' ''''II'Hl!J/'OV('lllenl, Vii Ith ' .J Win rCf UC lOn ., -t 't ing a e 111)( " m '~ ~hi('h is regarded a,s demons ',ra" . 'fi' ant variation in the dropout teflon, " then'! IS SIg11I C. , tI ' 1, 'tl 1 till' higllest rates III, ,\(. reganI o·tll ('. r]ro!JOut process, , .t .groups, WI. ' . , t' es between the three trratmen., '1 nL> O'roup. There is also slglllfra, . . I ' t in the I'lspenr 0 ' M i I 'b group and the owes ' , l ,ii' !rarrw1/Jork r'('plY'8('n,f(~r pace 0 . . h' thf'. sdf'.dzrm m.or (, m y , I'. icant evidence that, 11Jzt.m. h ' m is infonnatlve, Howpver, t lIS " 's b'/ equation (J,J, 8,4/) " the, dropout Imec ' ams . egarding the mean PANSS " score, n . II ff 't the cone mllons r ' , does not matena y}l ec ., , d ' andom dropout assumptIOn, , ' h th un er a I' by companson WIt ,e,analYSIS ,
316
P~NSS
IwlopeTlrl(~J ~I~I:~t~
COllles
;:n-
C)
13.9 Discussion
. ' owin literature on dealing with dropouts ll1 longltud, There , . thl's chapter is necessarily incomplete, ' I t ISd'efast-gr and t IIeg revIew III " , ma SUI S, ' f hat different perspective to ours, IS A useful introductIOn, rom a somew Little (1995). 'h t 1.' By An emerging consensus is that analysis of data WIt po ~n Ia ' lt informative dropouts necessarily involves assumptlOns W h'IChare dIfncu . ( 97)' or even impossible, to check from the observed da.ta. Copas and LI 19 reach a similar conclusion in their discussion of lllference based on nonrandom samples, This suggests that it would be unwise to rely on the precise conclusions of an analysis based on a particular informativ~ ~ropout model. This of course should not be taken as an excuse for aVOldmg the issue, but it may well be that the major practical value of informative dropout models is in sensitivity analysis, to provide some protection against the possibility that conclusions reached from a random dropout model are critically dependent on the validity of this stronger assumption. The discussion published alongside Scharfstein et ai. (2000) contains a range of views on how such sensitivity analyses might be conducted, One striking feature of longitudinal data with dropouts, which we first encountered with the simulated example in Section 13.3, is the frequent divergence of observed and fitted means when dropouts are ignored. This arises with likelihood-based methods of analysis when the dropout mechanism is not completely random and (as is almost universal in our experience) the data are correlated. In these circumstances the likelihood implicitly recognizes the progressive selectivity of the ~on-dropout sub-population and adjusts its mean estimates accordingly, In fact, this represents a rather general counterexample to the commonly held view that ordinary least squares regression, which ignores the correlation structure of
the data, usually gives acceptable estimates of the mean response function, As WP have emphasized in the earlier sections of this chapter, the difficulty with ordinary least squares in this context lies not so much in the method itsp]f (which is, of course, unbiased for the mean irrespective of the true correlation structure), but rather in a failure to specify precisely what is the rpCjuired estimand. Our discussion in this chapter has focused on linear, Gaussian measurPHlf'nt models, both for convenience of exposition and because it is the context in which most progress has been made with respect to the analysis of data with potentially informative dropouts. However, the ideas extend to discrete response models, as do the attendant complications of interpretation, Kenward et al, (1994) and Molenberghs et al. (1997) discuss selection models for discrete longitudinal data under random and informative dropout assumptions, respectively, Fitzmaurice et al. (1996) discuss both random and informative dropout mechanisms in conjunction with a binary response variable, with particular emphasis on the resulting problems of non-identifiability, Follman and Wu (1995) extend random effect models for dropout to discrete responses, " Generalized linear mixed models (Breslow and Clayton, 1993) proVIde an extension of linear Gaussian selection modelling ideas to discl:ete or categorical response as follows. Consider the linear model for a contllluous response variable, v. 1 1) --
1/. t"'1)
+ d'U i + Wi(t ij ) + Zij, 1)
, h' h .. - E[Y:·] U· is a vector of random effects for the ith subIII w lC Pl) 1)' 1 , . TtY (t)' serially correlated ject with associated explanatory vanables d i ), 1 IS a 'd d 'h b' t d Z· is a set of mutually 1ll epen random process for the zt Sll Jec an 1) d set ofh' be trivially re-expresse as a ent measurement errors, T IS can . U d the Wi(t) as conditional distributions for the Yij gIven the ,an
Yij IVi, Wi(t)
,...... N(j.Lij
/ + dijUi + uri (t .. ) VI
1)'
7 2)
,
(13.9,1)
2 . mutually independent conditional where 7 = Var( Zij) and the Y;j ale. . t eneralized linear mixed on the U i and the Wi (t). To turn thlS 1ll 0 ~ g ly replace the Gaussian . I esponse we Slmp h model for a discrete or categonca r .' d' t 'bution for example t e . . . . ropnate IS n , 't ble link function to transcondItIOnal m (13.9.1) by a mOIl" app. P . t ther WIth a SUI a f h OlSson for a count response, oge I W.(t), onto the whole ate f: h I' d' t I I " + d..U i + 1 1) 't lOrm t e mear pre lC or, 'flij - t""1) 1) uld then pOSl a m odel dropout we co · t real line. To allow for informa Ive , d'ng dropout probab'l I. h 1.. and correspon 1 . f for the relationshlp between t e 'fl ) . d'ctor '1'1 is a functIOn 0 1.1 e Imear pre 1 '/ h ' h h 1 f 1 for thinking about t e ities. Models of this kind, in w lC . . J:use u tive dropout behaVIOur. Unobserved random variables, may be very k' . h ld 'nduce llllorma f IUds of mechanisms WhlC cou 1 ., 1 '11 is a form 0 rand 0 m Modelling the probability of dropou t condltIOna on '/
NCITUDINAL DATA VALUES IN 1.0 J MT uSTNC ,-"
;J18
, ' 1'}.J. 7'} Brcause TJ is a fUIH"tion of I f I in SectIOn ,.J, " effects morldling fiB ( e ~ne( , " , j W(t) these models generally f'qlllitl' , , • nabJefl U itnf unobserved ran(Iom V.I " I'" 'rks about thr Iw(,d to cOlisHler • J . t lind ear leI lema ' to informatIve (ropou., , , .:fi' assllmlJtions about /.Iw dropout ," f. e1uslOnH to Spf,(! (, , . y 0 con" I !vI f (Ia'lnc'ntally the tWnHl,tlvlt , ' t f apr> yore un . , , the exawple of Murm(lchanJflm contmue) .. h,t 1 ut can in sowl' cases be J (19H8) [(·mmds us t .1 (ropo ray lind Fmd ay, d' ,., ents whether or not these include a dimdly related to observe trWiJ.~IlI(,m ' , t error component. ,. meWmrAllTlCTfl tl' I' C'Vl" dl'ucussion is only indirectly. re.,levant to tlw I,JrobMUl: I 0 ,Ill a) J . h' j'ff"' ' , ' 'tt t 'ssing values willch can be rat (r (I crcnt In Il~m8 raIsed by mt.erml , ,em , mJ. , character from dropouts, , . ,_ ' 'tt t missing valueR can arise through explJCltly stated cenSOrIng Int,erml. en .. 'I l' bl rules, For example, values outside a stated range m,ay be ,SImp y unr~ I~ e because of the limitations of the measuring techmques ,m use tll1S IS a feature of many bioassay techniques, Methods for handlIng censored data are Vf~ry well eRtabliRhed in survival analysis (e.g. Cox and Oakes, 1984; Lawless, 1982), and have been developed more recently for the types of correlated data structures which arise with longitudinal data (Laird, 1988; Hughes, 1999), When censoring is not an issue, it may be reasonable to assume that intermittent missing values are either completely random or random, in which case a likelihood-based analysis ignoring the missing values should give the relevant inferences, An exception would be in longitudinal clinical trials with voluntary compliance, where a patient may miss an appointment because they are feeling particularly unwell on the day, From a practical point of view, the fact that subjects with intermittent missing values remain in the study means that there should be more opportunity to ascertain the ~e!lBons for the missingness, and to take corrective action accordingly, than IS often the case with dropouts who are lost to follow-up. Perhaps the most important conclusion from this chapter is that if dropouts are, not completely random we need to think carefully about what the relevant mfel'e~c:s are, Do we want to know about the dropout process; or ,about the condItIonal distribution of the measurements given that the um\~~ n~t dropped out; or about the distribution that the measurements wou tlO,l.ow in the ,absen,ce of dropouts? Diggle and Kenward (1994) use d len model t.o lllvestJgate f h some 0 t e consequences of informative dropouts USI'ng - ltd d , Slmu a e ata th h d h mative dropouts d d' ey s owe t at wrongly treating inforas ran om ropouts int d b' . 1'0 uces las m~o p~ra~eter estimates, and that likelihood-based mative dropout processes f d methods can be used to IdentIfy mforrom ata sets of l' t' , a rea IS Ie SIze. A much more difficult problem is to I'dent'f' , I Ya ulllque model f 1 or any rea dataset, where all aspects of the model are u k , _ h n nown a prwi'/. and random or informative dropout ' as we ave suggested above, time-trends in the mean respon~:~cesses may be partially confounded with l'
0',"
14 Additional topics
"
The ?eld of lon~itudinal~ata analysis continues to develop. In this chapter, we gIve a short mtroductlOn to several topics, each of which can be pursued in greater detail through the literature cited,
14.1
Non-parametric modelling of the mean response
The case study on log-bodyweights of cows, discussed in Section 5.4, introduced the idea of describing the mean response profile non-parametrically_ There, the approach was to use a separate value for the mean response at each time-point, with no attempt to link the mean responses at different times. This assumes firstly that the complete mean response curve is not of direct interest, and secondly that the times of measurement are COIDIDon to all, or at least to a reasonable number, of the units, so as to give the required replication at each time, An instance in which neither of these assumptions holds is provided by the CD4+ data of Example 1.1, For these data, the CD4+ count as a function of time since seroconversion is of direct interest, and the sets of times of measurement are essentially unique to each person in the study. To handle this situation, we now consider how.to fit smooth non-parametric models to the mean response profile as a,fu~ctlO~ of time, whilst continuing to recognize the covariance structure wlt~m umts, In Section 3.3 we discussed the use of smooth, non-parametrIc curves as exploratory to~ls, drawing mainly on well-established methods for cros:sectional data. If we want to use these methods for confirmatory analys~ , we need to consider more carefully how the correlation structure of longIt· , , fh much to smooth the data, ud mal data impinges on conSIderatIOns 0 ow and how to make inferences, , tal treatTo develop ideas we assume that there are no experImen t d ' 'bl Th data can be represen e as ments or other explanatory vana es, e b f ure.. to.) '_ . '_ 1 m} where ni is the Dum er 0 meas {(Y1J' 'J , J - 1, ... , ni, z.- , ... , . ' _ ~m n' for the total number ments on the ith of m umts. We WrIte N - £...i=I ' of measurements. Our model for the data takes the form
320
NON-PARAMETRIC MODELLING OF THE MEAN RESPONSE
ADDITIONAL TOPICS
,_ re independent copies of a stationary where {ci(t), t E R} fan -: 1,.,:, m a 2 and correlation function p(u), and a (t)}, wIth varIance random process, {E th f nction of t t) is a smoo u " . ( J' non parametric estImate of J1(t) IS the mean response, Jl , , d' t 't'vely appea mg, ~ sImple, an mUll data with large weights given to measurements t' t ""0 implement this idea, we define a a weIghted aver~ge of theI ' t whICh are case 0 . .11 , at t Imes ijt ' () b mmetric , non-negative valued functIOn I< u to easy kerne 1f unc zan, I" t _ 0 and small values when Iu I is large. For taking large values c ose a u , . what 11rollows we use the GaussIan kernel, exampIe, III I«u) = exp( _u 2 /2),
(14.1.1)
, Sect'10n 3"3 Now , choose a positive number, h, and define weights as III (14.1.2)
Note that w~.(t) is large when tij is close to t and vice versa, and that the rate at ~iuch the w;'(t) decrease as Itij - t I increases is governed by the value of h, a small :aJue giving a rapid decrease, Define standardized weights,
so that 2:::'12::;:"1 Wij(t) == 1 for any value of t, Then, a non-parametric estimate of Jl( t) is
321
which controls the ?verall degree of smoothing and takes on the role played by the constant h m non-adaptive kernel estimation. While the choice of h has a direct and maJ'or impact th I' on e resu tmg estimate J1( t), the chOIce of ~ernel function is generally held to be of secondary importance. ~he GaussIan kernel (14,1.1) is intUitively sensible but not uniquely compellmg. Estimators of this kind were introduced by Priestley and Chao (1972) for independent data, and have been studied in the longitudinal data setting by Hart and Wehrly (1986), Muller (1988), Altman (1990), Rice and Silverman (1991), and Hart (1991). From our point of view, the most important question concerns the choice of h. In some applications, it will be sensible to choose h to have a particular substantive interpretation; for example, smoothing over a 'natural' time-window such as a day if we wish to eliminate ciradian variation. If we want to choose h automatically from the data, the main message from the above-cited work is that a good choice of h depends on the correlation structure of the data, and in particular that methods for choosing h based on a false assumption of independence between measurements can give very misleading results, Rice and Silverman (1991) give an elegant, cross-validatory prescription for choosing h which makes no assumptions about the underlying correlation structure. Their detailed results are for smoothing spline estimates of M(t), but the method is easily adaptable to kernel estimates. For ~ g.iven h, let p,(k>(t) be the estimate of Jl(t) obtained from (14.1.3) but OIruttmg the kth subject, thus A
'
ni Tn
p,(t) =
p,(k) = LLwi;)(t)Yij, i#j=1
ni
L L Wij(t)Yij' i==1 j==l
(14.1.3)
A useful refinement of (14.1.3) is an adaptive kernel estimator in which wthe re~lace ~he cons~ant, h, by a function of t such that h(t) is s~all when ere IS a high densIty of data I t t d' ' h f case a ,an VIce versa. This is consistent wI'th th e vIew t at or data h' h h' h . times th b d w IC are Ig ly replIcated at a fixed set of , e 0 serve average at tim t' ate of Jl(t) without an t. e IS a reasonable non-parametric estimis, in effect setting / ~m~o~Illg ov,:r a range o~ neighbouring times, that ative behaviour can be h' so, USlUg a functIOn h(t) with this qualit. s own more gener 11 t . Jl(t). See, for example, Silverman 19 a y 0 Improve the estImates of kernel estimate is approxim t eI ( ~4) who demonstrates that a variable lent to a smoothing spline. In the remainder of this section w:c Y of h, The essential ideas ~pply ollSlIter Ihn detail the case of a constant value a so 0 t e case f d t' L where, typically, the function h(t) is i d 0 a ap Ive r;.ernel estimation n exed by a scalar quantity, b say,
::w:a
where wiJ>(t) = wij(t)/ {Eiik E;~l wij(t)}. The~, the Rice and . . . the qu antlty Silverman prescription chooses h to mlrumlze
S(h) =
ff
2
{Yij - P,(i)(tij)}
•
(14.1.4)
i=1 j=l . that it is estimating the mean The rationale for minimizing (14.1.4 ) IS h d ' points t·· up to , 'J , across t e eSlgn d ( ) square error of pet) for It t average h 'I1 this write an additive constant which does not depend on . 0 see ,
E [{y;; -
jl(;) (t;;)}']
~ E [ ({Yl' - ,,(t;,)) + {,,(t;,) - fie;) (~j)} )']
NON-PARAMETRIC MODELLING OF THE M ADDTTIONAL TOPICS
322
. d .d . to the following three terms: and expand the rl,e;ht-han SI e In
+ 2E [{1J. lJ
, ,)}2] E [{Yij - IJ. (t 'J
+ E [{/l( tij) -
- p(tij)} i/l(tiJ) - il(' 1(t'J)}]
() 2] .
P, '}
he first ofthe three terms in this expression is eq~al to Var(YiJ) and Now, t d I n " while the second is zero because E( Yij) = J.t( tij) and does not epene 0 ., • . llij is independent of p,(i)(tij), by constructIOn. Thus, E [hiij - jJ.(i)(tij)}2] = Var(Yij)
+ MSE(i)(tiJ , h),
(14.1.5)
where MSE(i)(t, h) is the mean square error of ('l(i)(t) for J.l(t). Substitution of (14.1.5) into (14.1.4) gives the result.. . . . One interesting thing about the Rice and SIlverman preSCrIptIOn IS that it does not explicitly involve the covariance structure of the data. However, this structure is accommodated implicitly by the device of leaving out all observations from a single subject in defining the criterion S(h), whereas the standard method of cross-validation would leave out single observations to define a criterion
Direct computation of (14.1.4) would be very time-consuming with large n~mbers of subjects. An easier computation, again adapted from Rice and SIlverman (1991), is the following. To simplify the notation, we temporarily suppress the dependence of Wij(t) on t and write w' = ~ni W" so that L.."J=l 'J' 4."i=1 Wi == 1. Note that
"m
"
YI),,-tJ(i)(t)_ .... .) -
== Yij ==
{/l.' (tij)
Yij -
{fi,(t
-
{Yij -
X
ij ) -
A(tij )}
{t J=I
~ ~
-
t
WijYij
} / (1 - Wi)
WijYij}
{I + wd(1 - Wi)}
J=l
+ {w;/ (1 -
WijYij!Wi -
Wi)}
A(ti j )}
•
(14.1.6)
Using (14.1.6) to compute S(h) 'd avO! s the need £ l' . th em 1eave-one-out estimates, A(i)(t). Fu or exp lCI: computation of rther, computatIOnal savings can
EAN RESPONSE
323
be made by collecting the time-points t,· int d . ' 'J' 0 a re uced set say t r _ 1, ... , p, and computmg only the p estimates //(t ) C h 'I . '" C , t'" ,. lor eac va ue of h . To rna k e mlerences about M(t), we need to know th . . e c,ovanance structure f the data. Let V be the N x N bl k-d' a _ m OC lagonal covanance matrix of the data, where N - Li=I ni· For fixed h, the estimate r/(t) d fi db ' ne y (14 .1 ,3) is a lmear com b'mat'IOn 0 f the data-vector y say' (.t),.. _ e()' _ 1 J1 . - w t y. Now, for p values t,., r - '.... let I-t be the vector with rth element' (t ) and W the p x N matnx WIth rth row w(t,.). Then, J1 ,.,
,.p,
A
'
it == Wy,
•
(14.1.7)
and it has an approximate multivariate Gaussian sampling distribution with
E(jL)
= WE(y)
and Var(ft) = WVW'.
(14.1.8)
Note that in general, {L(t) is a biased estimator for M(t), but that the bias will be small if h is small so that the weights, Wij, decay quickly to zero with increasing Itij - t I. Conversely, the variance of jl(t) typically increases as h decreases. The need to balance bias against variance is characteristic of non-parametric smoothing problems, and explains the use of a mean-squared-error criterion for choosing h automatically from the data. In practice, a sensible choice of h is one which tends to zero as the number of subjects increases, in which case the bias becomes negligible, and we can use (14.1.8) to attach standard errors to the estimates jl(t). The covariance structure of the data can be estimated from the timesequences of residuals, rij = Yij - P(tij). In particular, th.e empirical variogram of the rij can be used to formulate an appropr~ate model. Likelihood-based estimation of parameters in the assumed covarIance struc. al pom . t 0 f view , the ture is somewhat problematical. From apractIc computations will be extensive if the number of subjects is large and t~ere is little replication of the observation times, tij, which is pre~isely the SItuation in which non-parametric smoothing is most likely reqUIred. Also, the ab sence of a parametric model for the mean pu t s the inference on a rather . shaky foundation. Simple curve-fitting (moment) estimates of covarIance parameters may be preferable.. 1418) for the A final point to note is that expreSSIOns (14.1.7) and ( 'd' th value m . ' f me that both Vane ean vector and varIance matnx 0 JL assu h iF t of estimating whilst of h are specified without reference to the data. Tee. ec. 1 1S arge, Parameters in V will be small when th e nu mber of umts all hen JL(t) is the effect of calculating h from the data should also b~ sm llw nderstood a smooth function although the theoretical issues are ess we u A
NON-PARAMETRIC MODELLING OF THE MEAN RESPONSE
ADDITIONAL TOPICS 324
. . detailed discussion of these and other here. The references CIted above gIve issues. I . can be extended to incorporate .d how the ana ySls . We now comll er arametric analysis, Wf> can simply . I t tments For a non-p b' ts from each treatment group. From expenmenta ,rea .. th d separately to su Jec app Iy 1,he me, 0 f' .t' referable to use a common value of the . t' int a VIeW I IS P l' • • an mte~pre Ive po ,Wh' the data are from an experiment companng smoothmg constant, hI.. en uared-erro r criterion (14.1.4) for choosing several treatments, t e mean-sq . . . ,t . . f t 'butions from wlthm each treatrnen gIOUp. h consists of a sum 0 can n . . . f h • 1, 1, • estimated from the emplflcal vanogram a t e Covanance s rue ure IS . . . I t reatment . reSIduals pool ec across ,. groups , assummg a common covanance . . structure in all groups. Another possibility is to model treatment contrasts ~arame~ncal~y usmg a linear model. This gives the following semi-parametnc specIficatlOn of a model for the complete set of data,
(14.1.9) where reij is a p-element vector of covariates. This formulation also covers situations in which explanatory variables other than indicators of treatment group are relevant, as will be the case in many observational studies. For the extended model (14.1.9), the kernel method for estimating J1.(t) can be combined iteratively with a generalized least squares calculation for {3, as follows:
1. Given the cur:ent estimate, (3, calculate residuals, rij = Yij - X~j{3, and use these III place of Yij to calculate a kernel estimate, p,(t). 2. <;?iven p., calculate residuals, rij = Yij - fl(t), and update the estimate {3 using generalized least squares,
325
Example 14.1. Estimation of the population mean CD4+ curve For the CD4+ data, there are N = 2376 observations of CD4+ II b ' . ce num ers on m = 369. men mfected WIth the HIV virus Recall th t t" . . . a Ime IS measure d in years WIth the ongm at the date of seroconversion h' h'IS k nown . . " , W IC approximately for . each mdlvldual. As in our earll'er , paramettiC ' ' analYSIS of these data, we mclu~e the follOWing explanatory variables in the linear part of the model: smokmg (packs per da!); recreational drug use (yes/no); numbers of sexual partners; and de~resslve symptoms as measured by the CESD scale. For the non-parametnc part of the model, we use a mildly adaptive kernel estimator, with h(t) = b{f(t)}-025, where f(t) is a crude estimate of the local density of observations times, t ,J , and b is chosen by cross-validation, as described above. . Figure 14.1 shows the data with the estimate fi(t) and pointwise confidence limits calculated as plus and minus two standard errors. The standard errors were calculated from a parametric model for the covariance structure of the kind introduced in Section 5.2.3. We assume that the variance of each measurement is 7 2 + a 2 + 1/2 and that the variogram within each unit is ,(u) = 7 2 + a 2 {1- exp( -au)}. Figure 14.2 shows the empirical variogram and a parametric fit obtained by an ad hoc procedure, which gave estimates f2 = 14.1, 0- 2 = 16.1, f)2 = 6.9, and eX = 0.22. Figure 14.1 suggests that the mean number of CD4+ cells is approximately constant at close to 1000 cells prior to seroconversion. Within the first six months after seroconversion, the mean drops to around 700. Subsequently, the rate of loss is much slower. For example, it takes nearly three years before the mean number reaches 500, the level at which, at the
2500
where ?C is the N. >< p matrix with rows X~j' V is the assumed blo.ck-diagonal covarIance matrix of the data and r is the vector of reSIduals, rij.
3. Repeat steps (1) and (2) to convergence. This algorithm is an exa I f th Hastie and Tibshirani 19~6 eo. e back-fi~ting ~lgorithm described by there is a n f (d' ). Typically, few IteratlOns are required unless ear-con oun mg betw th r of the model. This might well ha ee: . e mear and non-~arametric parts nomial time-trend 'In the l' pp n If, for example, we mcluded a polymear part Furth d t '1 . a discussion of the asym t t' ' . er e al s of the algOrIthm, and given in Zeger and Diggl~(~~~4P)roPdertles of the resulting estimators, are an Moyeed and Diggle (1994).
Q;
.c
~
c 'ijl 1500
o
+
2!i ()
-2
o Years since seroconversion
. t and pointwise confidence Fig. 14.1. CD4+ cell counts with kernel estlma e limits for the mean response profile.
NON-LINEAR REGRESSION MODELLING ADDITIONAL TOPICS 326
327
variance (72, whilst the mean response function 11(.) , I' , '''''' ,IS a non- mear function of explanatory varIables Xi measured on the ith sub' t d Jec an parameters . . 1 I (3 For examp Ie, an exponentIal growth model wI'th ' . a smg e exp anatory variable x would speCIfy
40
30
Some non-li~ear models can be con~erted to a superficially linear form by transformatlOn, For example, the SImple exponential growth model above can be expressed as
y(u) 20
10
0
3
2
0
4
5
u
Fig. 14.2. CD4+ cell counts: observed and fitted variograms *: sample variogram - - -: sample variance ._; fitted model.
time the data were collected, it was recommended that prophylactic AZT therapy should begin (Volberding et al., 1990), As with Example 5.1 on the protein contents of milk samples, the interpretation of the fitted mean response is complicated by the possibility that subjects who become very ill may drop out of the study, and these subjects may also have unusually low OD4+ counts,
14.1.1
Further reading
Non- and semi-parametric methods for longitudinal data are areas of current research activity, See Brumback and Rice (1998), Wang (1998), and Zhang et ai, (1998) for methods based on smoothing splines. Lin and ~arroU (200?) develop methods based on local polynomial kernal regresSIOn, and Lm and Ying (2001) consider methods for irregularly spaced longitudinal data, 14.2 Non~linear regression modelling
In this ~ection, we consider how to extend the framework of Chapter 5 t~ non-1mear models for the mean response, Davidian and Giltinan (1995) gIVe a much mOre detailed ac t f th' , fr ,coun 0 IS tOPIC, Another useful review, om a somewhat ,dIfferent perspective, is Glasbey (1988). The cross-sectIOnal form of the non-II'n . , ear regreSSIOn model IS
Y;
= ,1(X i; (3) + Zi
i
== 1" , , ,n,
(14,2.1)
where the Zi are mutually independent d . , and are assumed to be normall d' t 'b eVIatI?nS from the mean response y IS n uted WIth mean zero and common
where J-i* (-) = log J.L(-) and f3j = log (31' However, it would not then be consistent with the original, non-linear formulation simply to add an assumed zero-mean, constant variance error term to define the linear regression model,
Y* = J-i*(x; (3*)
+ Z;
i
= 1"", n,
as this would be equivalent to assuming a multiplicative, rather than additive, error term on the unlogged scale. Moreover, there are many forms of non-linear regression model for which no transformation of f.1.(-), (3 and/or x can yield an equivalent linear model. Methods for fitting the model (14.2,1) to cross-sectional data are described in Bates and Watts (1988). In the longitudinal setting, we make the notational extension to 't,T 1: ij
=,...,I/(x ij,'(3)+Zij
]·=l" .. ,nt",' i=l",.,n,
(14,2.2)
. WI'th'III su b'Jec ts , and where, as usual, i indexes subjects and J, OCCasIOns consider two generalizations of the cross-sectional model: 1. Correlated error structures - the sequence of deviations Zij; j = 1, , . , , ni within the ith subject may be correlated; 2. Non-linear random effects - one or more 0 f the regression parameters . f . 'fi c s t ochastic perturbatIOns d0 a (3 may be modelled as subJect-specI ' (14 2 2) (3 is replaced by a ran am population average value, t h us III .' 'fr d' t 'b _ £ h subject am a IS n u vector B i , realised independent IY or eac . ) 'th mean (3 and tion (usually assumed to be multivariate GaUSSIan WI variance matrix
V,e.
. . Cha ter 5 we could treat these P , t'on of randomly Recall that in the linear case conSIdered III t ' th t ntext the assurop I Wo cases as one, because III a co 1 fan mean response varying subject-specific parameters leaves the p.opu as~ructure of the Yij' unchanged and affects only the form of the covarIance
JOINT MODELLING
ADDITIONAL TOPICS 328
alized linear models dkussed in , 1 .Is as for t he gener " , . For non-hnear moe c , j'ff t implications, both for statIstIcal th two CfJ.8eR have c I eren Chapter 7, e ,t t' f the model parameters. ' analysis and for the interprfl ,a JOn 0 C:orrclatcd cr7YJrs , . (1422) with a parametric specifiC<.ltion for the The model of mterest lS ." Z n ) d euo t,e th e' vecZ - (ZI f h ZIJ',Let' ; ,covariance structure ate I "." ..:..' ' ted wl'th the rth subject, and t; - (td,"" t,.ni) the tor a f Z "aSSOCIa , , " . Z f II IJ , 't of measurement times. We fJ.8sume that j 0 ows eorrespon dm/l; vee or . " . ' ' t e GlllJSSl'ari distribution WIth mean zero and varIance matnx a mu It,Ivafla ,, 14,2.1
OJj
OJ
V;(tji a), I hd 'b d . For exploratory analysis, we can use the genera approac escn e . III Chapter 3 to identify suitable parametric families for the mean functlOn itO and the covariance matrices V;(t;ia). H0v.:ever, often the reas.on for adopting a non-linear model will be that a partIcular form for t-t(.) IS suggested by the scientific context in which the data arise, For example, in pharmaco-ldnetic studies, J.L(-} is often obtained as the solution to a differential equation model of the underlying biochemical processes involved (Gibaldi a.nd Perrier, 1982). The form of the covariance matrices Vi(t i ; a) may also be derived from an explicit stochastic model but this appears to be rare in practice. More commonly, the correlation structure is chosen empirically to provide a reasonable fit to the data, its role being to ensure a.pproximately valid inferences about itO. If this is the case, it would be appropriate to develop the model for the V; (,) from the variogram of the ordinary (nonlinear) least squares residuals after fitting the chosen parametric model for IJ,(.). Once parametric forms have been chosen for the mean and covariance structure, the log-likelihood function for the complete set of model parameters (/3, a) follows as rn
L(n,(3) =
LL (n,I3),
(14.2.3)
i
;",1
where -2£; (0, (3)
==:
log IV,(t i ; n) I + (Y i
-
f.idV,(t i ; a) -I (Y i
_
Iti),
(14.2,4)
Y i is the ni-element vector with jth 1 . vecto.r w.ith jth element f.i(Xij; (3). e ement Yij and Itl IS the ni-element Llkehhaad-based inference f II . dples as far the linear mod I °d?WS accordmg to the same general prin. . e S Iscussed in Ch t 5 nan-hnear setting, there is not the same 0 .ap er al~hough, lD the of an explicit solution for (3' cand't' 1 pportumty to explOIt the existence IlOna ·on a.
14.2.2
329
Non-linear random effects
In this second generalization of the cross-section I
l' . . (14 ). a non- Illear regressIon model, equatIOn .2.2 IS assumed to hold conditionall th . ~ h' h y on , e regressIon parameters IJ, W Ie are then allowed to vary randoml b t b' 'th b' y e ween su Jects. Thus, for t h e Z su Ject the parameter (3 is replaced bIB h 'ff t b' ~ I , were the B, for d I eren su Jects are mutually independent In It' . . t G . .. . . Ulvafla ·e ausslan random vanables WIth common mean (3 and variance matrix ViJ. Let b =. (b l ., ... , bm ) d~note the set of realized values of B 1 , ... , B . m The~, the IIkelI~ood functIOn fo~ the model parameters 0 .. 13. and ViJ is obtallled by takmg the expectatIon of the conditional likelihood given b with respect to the assumed multivariate Gaussian distribution of b. Thus, if €;(a,b) = expLi(a,b), where Li(-) is the contribution to the conditional log-likelihood from a single subject, as defined by (14.2.4), then the overall log-likelihood for the random effects model applied to data on m subjects is
L(a,(3, Vj3) = flog
J
€;(o:,b)f(b;{3, ViJ)db,
(14.2.5)
i=1
where 10 denotes the multivariate Gaussian density with mean (3 and variance matrix V/3' Exact likelihood calculations for this model therefore require repeated numerical evaluation of an integral whose dimension is equal to p, the number of elements of (3. While this task is computationally feasible for typical values of p it would require a specially written program for each particular class of non-linear models. Lindstrom and Bates (1990) therefore propose an approximate method of evaluation in which the contribution to the conditional likelihood, Ri(a,b), is replaced by a multivariate Gaussian density whose expectation is linear in b. This allows explicit eval.uation of the integral terms. The resulting algorithm provides a c0n,rputatlOnally fast, albeit approximate, method for a wide class of non-Imear mode~s and assumed covariance structures for the error terms Zij' The method IS implemented in the Splus function nlme O.
14.3 Joint modelling of longitudinal measurements and recurrent events
. h' d'gm which either treats Untl!. now we have worked mostly WIt lD a para 1 . b
times of ~easurements as fixed by the study design or, m lan dO tservath~ · t times are unre a t e 0 t lanaI setting assumes that the measuremen . t' phenomenon ~f interest and from a statistical modellmg per~pec Ive c~ examp e, th erefore be treated as if' they had been fi xed'lD advance:1 Thus,d lor ta the data . d V. '1 (1990) epl epsy a , III Our earlier discussions of the Thall an al . fix d t' 'ntervals b f ents lD e lme-I \V~re presented as counts of the num ers 0 eVl Id have been to treat the PrIor to analysis. An alternative approach wou
ADDITIONAL TOPICS
JOINT MODELLING
330
331
· '''s the response variable. Often, f h . dividuaI seIzures
. actual tImes . 0 t fe . IIId'vldual , even t - t'1mes into interval-counts will be a the aggregatIOn 0 ImfcI strategy I)u t I't represents a discarding of poten' rf'AJ.'lonable data-ana y I, , .' b anch of statistical methodology ha.s . . f f Indeed a major r . IOn. '. ' to deaI wl'th such data, which are vanously tlal m orma d' 't own fight deve Iope m'ut I sprocess da t a, eve nt history data or recurrent event data, known as pOI t la (1993) Survival analysis, also a major ample Aml'ersen e .· , S f, ee, or ex. . ' I 1 I in its own " right r of statlstJeal met laC 0 o g y , refers to the specIal. ca.se a , the outcome 0 f m . t erest is the time of occurrence of a smgle, ae in which
We assume that the data-format for a Single sub' t' . . Jec IS a measurement n and a cou t' sequence 0: J = 1, ... , n at tImes t J·: J' == 1 . . '. ,. , , , n mg process {N(u): 0 :s; u :s; d} which Identifies event-times within the inte I (0 d) ' h occur a fer t ·time d are therefore censored, rva , , Events w h Ie One aim is to provide a bivariate modelling framework which includes the standard methods of choice for separate, univariate analyses of the measurement and event processes. For the measurement p . rocess. we therefore assume that
terminating event. . h' h In this section, we consider briefly how we might analyse data m w IC each subject in a longitudinal study provides both a sequence, of measurements at a set of fixed times and a set of random times at w~lch events of substantive interest occur. The resulting methods are potentially relevant to many kinds of investigation, including the followin~ exa~ples. . In AIDS research, many studies are concerned with usmg a longitudinally measured biomarker such as CD4 cell count or estimated viral load to predict the time to onset of clinical AIDS. In longitudinal clinical trials, inferences about mean response profiles may need to be adjusted to take account of informative dropout; in Chapter 13, we discussed this area under the implicit assumption that dropout occurs at one of a number of prespecified measurement times, but in some applications dropout will be a well-defined event occurring at a precisely recorded time. In psychiatric studies, it may be of interest to model jointly the times of onset of acute episodes and the longitudinal evolution of a general measure of psychiatric disturbance such as the Positive and Negative Syndrome Scale (PANSS) used in the case study of Section 13.8. Models and methods for dealing with data of this kind have become wi~ely studied in recent years. Hogan and Laird (1997a) give an excellent revIew. Other contributions include Pawitan and Self (1993), Tsiatis et at. (1995), Faucett and Thomas (1996), Lavalley and De Gruttola (1996) Hogan and Laird (1977b), Wulfsohn and Tsiatis (1997) Finkelstein and Schoenfeld (~999), Henderson et al. (2000) and Xu and Z~ger (2001). Note that, accordmg to the p t' 1 . the distrib art' ICU ar context, the focus for inference may be on modellmg f th . on a 10 't d' 1 U IOn 0 e tIme to a terminal event conditional ngl U ma measurement se d' . . lono-itudirlal m quence, on a Justmg mference about a "" easurement sequence t 11 f ' . modelling th e Jom . . t evo lUtlOn ' of a 0 a ow or mformatlVe dropout or on m ' Much of the literatu r 't d b ea:'urem~nt and an event-time process. and makes extensive us : ~~ ~an~oove IS motIvated by specific applications, latent stochastic proces t . d m effects, or more generally underlying es, 0 m Uce asso . t' b and event-time processe H ,cIa Ion etween the measurement lation given in Henders~~ ;re; weoglve an outline of the general formuconcerning inference for thee a' \2 00) and note some unresolved issues resu lbng class of models.
(14.3.1)
Go
•
where the mean response J.l(t) is described by a linear model, ZJ is a measurement error, normally distributed with mean zero and variance 7 2 , and WI (t) is a Gaussian stochastic process which can be decomposed into a random effect term and a stationary term, say
as discussed in Chapter 5. For the event process, we adopt a semi-parametric proportional hazards formulation as in the seminal work of Cox (1972), with a second latent stochastic process providing a time-dependent frailty term. Thus, we model the intensity of events as
A(t)
= Ao(t)exp{a(t) + W2 (t)},
(14.3.2)
where Ao(t) is a non-parametric baseline intensity, a(t) is a linear model which describes the proportional effects of explanatory variables measured at time t, and the Gaussian process W 2 (t) has a decomposition comparable to that of WI (t), namely
W2(t)
= d2(t)'U2 + V2 (t).
ASsoclatlOn " between the measurement and event-time processes is induced 'b t 'on for . d'IS t nUl . the two ranby postulating a joint multivariate GaUSSIan dom effect vectors U1 and U2, and a non-zero cross-covanance structure, /'12(U) == Cov{W1(t), W 2 (t for the biva~iate process W~t). Y and N Inference in this model is not entirely straightforward, Us~ d t to denote the measurement and event data respectively, and h l~ok l.eh:~~ t h e complete path of WI (t) for O:S; t < _ d ,we can express t e l e I contribution from a single subject in the form
un,
f(O) =fl (f);Y)E w /y[f 2 (O;NIW2]'
(14.3.3)
d d [. corresponding to the In (14.3.3), the term f l (0; Y) is of the stan;: or~ in Chapters 4 and 5. multivariate Gaussian distribution of Y, as ISCUsse
MULTIVARIATE LONGITUDINAL DATA
ADDITIONAL TOPICS
333
332
. ally mOff~ complicated. It reduces to tlJ(~ standard The second tflrm IS gflner " ". ' t' I hazards model with frailty if the component latent form for a propar ,lOna ,." .., , ,I TV' (.) are indeIJenrlent although tfus ca.,;(~ IS of hmIted ( ) ann rr Z ' prorR..5SeS WI' , interest hflrfl. A simplfl method of estimation is to ~se standarc~ ~ethods to analyse the meH8urement data, including evaluatIOn of the rrlHlImum mean square predictor for Wz(-), then to analyse the event-time data llsing predicted values of Wz(t) in place of the true, unknown values. It will usually be preferable to base inference on the full likelihood function (14.3.3). Faucett and Thomas (1996) and Xu and Zeger (2001) use Bayesian methods, implemented via Markov chain Monte Carlo. Wulfsohn and Tsiatis (1997) obtain maximum likelihood estimators using an EM algorithm, in the special case when W2(t) = "I WI (t) and WI (t) = d I (t)' UI . Henderson et at. (2000) extend the method of Wulfsohn and Tsiatis to a wider elMS of models and note the identifiability problems which can arise when W2 (t) is allowed to be time-varying in conjunction with a nonparametric specification of the baseline intensity Ao (t). Some authors have questioned the wisdom of relying on the assumption of Gaussian random effect or latent process models, on the grounds that the resulting inferences can be sensitive to assumptions which cannot easily ~e checked from the available data. This echos the concerns expressed In ,eha.pter 13 about reliance on specific informative dropout models to adjust mferences about the mean response in incomplete longitudinal measurement ~ata. These issues are raised, for example, in the discussion of Scharfstem et at. (1999).
developed latent "variable models and gen eraI'lze d estlmat" . . . regressIOn analysIs w1th multivariate res mg equatIOns for , ponses, We Illustrate the approach to mit' , " . u Ivanate long't d' I d consHienng a SImple pre-post design 'In who .h . 1,11 ma ata by IC a vector res . 1 at ba.<;eline and then again after a IJeriod f t ponse IS 0 )served . . a ,reatment for two grou receIvmg a placebo, the other a new treatm t Th t . ps, one form en . e 1J3.<;IC model takes the Y,)k
= (3ok + /11kPostJ + "YkPostJ . * Trt + C'Jk. , £.
(
14.4.1)
where Yijk is the response for item k at time t fa ' P . ., f 1) r person l. ost· IS the mdIcator 0 the post-treatment time and Trt indicate th t t J ' , s ' e rea ment group . . to which the lth person belongs (O-placebo; l-active treatment). If we let Y;j = (YijI, Yij2': .. ,Yij30) be the 3D-dimensional vector response, then the model can be wntten as
Yij = (30
+ (31 . Post j + "I . Post) * Trt j + E,),
where (30 = «(301, ... , (3030) is the vector of expected responses at baseline for each o~ the 30 items, f3I = (f3II, ... , 1'1130) is the vector of changes from baselme for the placebo group and "I = ("11,1'2,"" "130) comprise the 30 item-specific differences "Ik in the average change from baseline for the treatment and placebo groups. Here, "I is the parameter vector of interest. In the classic linear model, we would further assume that fi = (E;I' <2)' is a 60 x 1 vector of near zero residuals with 60 x 60 covariance matrix:
14,4 Multivariate longitudinal data
Tmhe~:e ~s a tfr,end in .biomedical and public health research toward outcome ,,,,,,mes 0 IIlcreasmg complex1't Th' b . t d y. IS ook pnmarily considers complexities that arise with repea e measures of I . . common to observe muIr . t a sca ar outcome. But 1t IS also measure the severity of l~~na ~ ou:comes repeatedly, For example, we IZOP comprises 30 different sc t relllc symptoms with the PANSS which symp am reports prod' 3 at each time, In neuroim . ,ucmg a 0 x 1 response vector agmg, we record rep t d' each of which is a high d" ea e Images on an individual , - ImenslOnal vect fl' . . ' exam~le IS time series studies of or 0 . voxe mtenSItJes. A final I gene expreSSiOn arrays where the outcome IS a 10 OOO-dimensl' . ana vector of t' sectIOn, we briefly discuss ext. es Imates of mRNA levels. In this . . enslOns of Ion 't d' appropnate for multivariat gi u mal data analysis methods 'th' e responses W £ WI a 30-dunensional out W · e ocus on the PANSS example rmear models are set up simil come. e also 0 I . l E n y conSIder linear models' nony work on this topic was by O:Brien (1984) and Pocock et al. (19~rl W et al. (1996) and Gray and Br k 0 focused on statistical tests. Dupuis 00 meyer (1998 2000) , have more recently
:rl
where Vl l = Var(€id, V22 = Var(fI2), and VIZ = COV(fiI, EiZ)' The goal of statistical inference is to estimate the regression coefficients as efficiently as possible. Suppose, we observe Y; = (r:'I' r:~)' for each of m persons and let Xi be the three (predictors) by two (times) design matrix for each item: Int Postj Trti
Xt
=
G~
~ti)-
Let Xi = Xt 0 ho be the Kronecker product. with Xt. on ~he dia~ anal and 0 elsewhere. Finally, collect the regreSSIOn coeffiCIents mto.fJ {(f3ok, /1!k' /'k), k = 1, ... , 30}. Then, we can write the model for a smgle person as
Yi
Xi
f3
60 x 1
60 x 90
90 x 1
+
fi, 60 xl
fi '" N (0, V).
ADDITIONAL TOPICS 334
Th b the GUllSs-Markov theorem, the minSuppose V is kn?wn · ~n, t Y f (3 (and the MLE under the Gaussian imum varianced unbIase d estlam e 0 38sumption) is given by
~
I I I I I I I I
0-
1
and we have then '" N ((3, (2:::1 XW- Xi) -1).. . Figure 14.3(a) shows estimates of 'Yk for the 30 Items along wIth confidence intervals for a data set of PANSS scores . t e 95 at 70 • • • • • approxlma for m = 174 schizophrenic patients who partIcIpated in a clImcal trIal .' . . . comparing risperidol (6 mg) to placebo. We can see that for every item, the estimate of 'Yk IS negatIve mdICatmg less severe symptoms for patients receiving risperidone. To obtain these estimates, we used the REML estimate of V as discussed in Section 4.5. The challenge in the analysis of multivariate longitudinal data, particularly with a high-dimensional response, is to choose a sensible approximations to V. Note, even in this simple pre-post design, V is 60 x 60 and has
Cl
are available fr~m joint longitudinal modelling of the 30 items by weighting the data with V-I if V is poorly estimated. One simple approach is to weight the data by Wi = diag (a~ ,... ,(50) -1 , where 8-2 is an estimate of Var(~jk). The resulting estimator
I I I I I I I I
XiS)' W Xi] I
J
Y."k a ., -- X* jkJJik
+ fijk,
e
I I
z
e
I
e
I
e
I I I
e a
I I
a e
I
Cl
I
a
1
e
I I I I I I I I I
e e
e
---e--e e
I I
co
c.!l
e
I
I I I
e
1-I
It! o
I
o
c;)
--
--e---
-e---
------
--&--
---e--
---e--
--&-
-----
--&--
---e--
-e-
t--
Ie I I I I
--
I
e
--+-
o'.
Here, we are accounting for th 'bl' but ignoring correlation e .pOSSI y different variances of the items A better strategy m~g~~ngtltems ~t the same or different times. model with random effects forein~i~~~slder an hierarc~ical random effects f uals but also for Items. Returning to the basic model we might , assume or the first level, that
B
I I I
i=1
~ I,' [t, X:W (Y; - XiSHY; -
B
I I I I I
is approxim~~ely nor.mal with mean (3 and variance which can be estimated by the empIrIcal vanance formula
Va;(Jiw)
B
1I
~w = (f:x:wxt)-I tx:WYi = 10 1fx:wYi, i=1
e
I
z
I --e-I I --e-I l-e--e-I I I --&-
11.
e
I
(6~) = 1770 parameters. We are likely to lose any gains in efficiency that
.=1
e
e
I
I
e o
"f
~
-
--e--
--
--e--
-&--
U)
q
l()
q
c:i
0
<7
'I
B
ADDITIONAL TOPICS
336
where (iik that
= ((iO'h ,(3)'h' lik)'
In a second level, we assume, for example, bI
+
(j :3 x 1
(i;k 3x1
3 x 1,
-, th lat1'on arid item average regression coefficients, 15 k. is the were h (i IH (~ pOpll deviation of the eoeffieients for item k from fJ and ,), 18 the deVIat.lOn of subject i's coefficients from (J, Note, this parti::ular second ,level m~de,1 a.'l~umes > '
-
,
'
,
there are no interactions of item and subject; the subject dev1atlOn IS the same for all items. To complete the specification, we can assume Ok and bi are independent, mean zero Gaussian variables with variances D/j and Db, rp,spectively. In some applications, we might allow only a subset of the coefficients to have random effects so that Do or Db might be degenerate. This multilevel formulation provides a lower-dimensional parameterization of the variance matrix Var(Y) = V, To see the specifics, we write
Yt
Xi
60 xl
60 x 90
6 90 x 1
+ +
fi 60 xl,
where 130 is a 30 x 1 vector of ones and 6 = (6i, 6&, . , , ,6~0)" Then, we have Yt == Xj(J ® 130 + f; , = X; (6 + bi QS1130) + fi,
<
where Var(fi) == Xi(D/j ® 130 + Db ® 1301~0)X: + V and V = Var(fi)' If we assu.me V == diag (ar,... , a~o, ar, ... , a50) , then the only correlation among Ite~s at the same or different times derives from shared random e(ffec)ts. ThIs model reduces the parameters necessary to specify V from
6~
== 1770 down to
G) + G) +
30 = 36.
We can estimate the fixed iF t f {3ters by restricted m' . e. ec s a and the covariance parameBayes' estimates of ~Imum likelIhood. ~lso of interest are the empirical k k == 1 30 (3 + Ok, the populatIOn average coefficients for item , , ... , . These can be estimat d b ' method as follows First bt' th . e y a SImpler approximate Vk == Var{i3k 1(3,,) 'k == 1 ,0 ;~n If e maxImum likelihood estimates Sk and mean j3 and varia~ce D ' 't'h" (3~ we sume t~e (3k s are independent with rv , dthe empmcal .. estImate is given by II" en k N((3'D/j+Vi) k an Bayes'
a.:
~k ==
(Do
+ VK)-l[D/jSi + Vk .8j.
Figure 14.3(b) shows these estimate 30PANSS items. COmparing thes s for the treatment effect for the can see the value of borrow. e results to those from Fig. 14.3(a) one mg strength across items. '
Appendix Statistical background A.I Introduction This appendix provides a brief review of some bas'IC S t,at'IS t'lcaI concepts used through~ut the b~ok. Readers should find sections A.2 and A.3 useful fo: the matenal that IS presented in Chapters 4, 5, 6 and 13 which deal wIth methods for analysing data with a continuous response variable. These fou: ch.apters als~ make extensive use of the method of maximum likelihood, whIch IS the subject of section AA. Sections A.5 and A.6 outline the basic concepts of g~neralised li~ear mo.dels, which provide a unified methodology for the analySIS of data WIth contmuous or discrete responses. This material is most relevant for Chapters 7-11.
A.2 The linear model and the method of least squares In many scientific studies, the main goal is to predict or describe an outcome or response variable, in terms of other variables which we will refer to as predictors, covariates or explanatory variables. The explanatory variables can either be fixed in advance, such as the treatment assignment in an experiment, or uncontrolled, such as smoking status in an observational study. Regression analysis has been used since the early nineteenth century to describe the relationship between the expectation of a response variable, Y, and a set of explanatory variables, Xj: j = 1, ... ,po The linear regression model assumes that the response variable and the explanatory variables are related through
where r; is the response for the ith of m subjects and Xij is the value of the jth explanatory variable for the ith subject. Usually, XiI = 1 for every subject so that (3 is the intercept of this regression model. The €j are rando~ variables ~hich are assumed to be uncorrelated with each other, and to have E(fi) = 0 and Var(fj) = u 2 • This implies that the first 2 two moments of Yi are E(Yi) = m~~ and Var(li) = u , whe:e ~i ~d {3 are P-element vectors. Note that these statements involve no dlstflbutlOnal
338
STATISTICAL BACKGROUND
MULTIVARlATE GAUSSIAN THEORY 339
, b t 1': although a comman assumption is that the , joint , assumptIOn a au" , It" te Gaussian, Models of thIS kmd f Y, Yo IS mu Ivana , , 'b' dlstn utlOn a I, ' , , , , m h 'mental and observational studies, In have been widely used III bot ex~ef1 s fact linear models include as special case dummy variables, used , " h'ch the x·Jare , (1) the analysis of vanance, III w I " h 11 t'on of experimental umts to treatments, to mdlCate tea oca I , 1e regreSSIOn, ' I'n which the x·J are quantitative variables, . (2) multlp .
for ¢ which are linear combinations of the Y: Th' . . least-squares estimation is known as the G ,. MIS °kPtlmahty property of auss- ar ov Theo The constant variance, 0'2, of the (. is II rem. estimated by 1 usua y unknown but can be
'f ' e , I'n which the x·J are a mixture of contmuous (3) the analySIS a covananc and dummy variables,
Many books, including Seber (1977) and Draper and Sm'th (1981) , , f Ieast-squares estimation, I , give more d et al'Ied d'ISCUSSlOns 0
Each regression coefficient, !3j, describes the c,hange in the e~pected value of t he response van'able , Y , per unit change of Its correspondmg explanatory variable, Xj, all other variables held fixed, , The method of least squares (Legendre, 1805; Gauss, 1809) IS a longstanding method for estimating the vector of regression coefficients, {3. The idea is to find an estimate, /3 say, which minimizes the sum of squares m
RSS =
l)Yi - x~(3)2. i=1
This procedure is formally equivalent to solving
which gives rise to
m
2
0- =
L:(J!i - X:,8)2/(m - p), i=l
A.3 Multivariate Gaussian theory This section reviews some of the important results for multivariate Gaussian observations, For more detailed treatment of multivariate Gaussian theory see, for example, Graybill (1976) or Roo (1973). A random vector Y = (Yi"." Yn ) is said to follow a multivariate Gaussian distribution if its probability density is of the form
f(y; J.t, V) = (21r)-n/21 V
1- 1/ 2 exp{ -(y -/t)'V-I(y -/t)/2},
where -00 < Yj < 00, j = 1, ... , n, As in the univariate case, this distribution is fully specified by its first two moments, IL = E(Y) and V = Var(Y), A convenient shorthand notation is Y rv MVN(IL, V). The following properties of the multivariate Gaussian distribution are used extensively in the book: 1. Each 1j has a univariate Gaussian distribution.
~ternative, and more familiar, form of the least-squares estimate /3 is obtam~d by defining an m-element vector Y = (Y1 , .. " Ym ) and an m by
An p
matrIX X with ijth element
Xij'
Then,
i3 = (X' X)-l X'Y, The least-squares estimate /3 . .
Firstl 't· b' '.' enJoys many deSirable statistical properties. ,y, ,I IS an un lased estimator of (3 that is E(f;,) = r.l It . matrIX IS ' 'fJ fJ' S varIance Var(,B)
= 0'2 (X' X)-I,
Secondly, for any vect fk :i. = a' ?:J has th II or a o. nown coefficients, if we let ¢ = a' f3 then 'P IJ e sma est possIble var' lance amongst all unbiased estimators
2. More generally, if ZI = (Y1 , •.. , Yn1 ) with nl < n, then Zl also follows a multivariate Gaussian distribution with mean ILl = (/-ll, . , . ,/-lnl) and covariance matrix Vi which is the upper left nl by nl submatrix of V, 3. If, additionally, Z2 = (Yn1 +1 , •. " Yn ), then the condition~~istribution of Zl given Z2 = Z2 is multivariate Gaussian. Its conditional mean vector is
and its conditional variance matrix is
where J.t2 = (ILnl +I, ... , Jin) and V is partitioned as
_(Vii Vi2). V{2 V22
V-
STATISTICAL BACKGROUND
LIKELIHOOD INFERENCE 341
then BY is also distributed . frank m < n, . t . . an m x n matnx. 0 with mean - vee.tor B ,... I I and vanance rna fiX 4 If B IS . a multivariate GaussIan,
340
;VB'. )'V-I(y - p,) has a chi-squared . bl U - (Y - tt . U 1/2 T h random vana e d "'''-n' 5. distrzbutzon e, . WI'th n degrees of free om, which we wnte as , l'h od inference b A.4 LIke I 0 .ft t' of the probability or pro a(J) . . d d . b ed on a specI ca JOn d t This expression, f(y; , IS III exe Likelihood inference IS as b'lity density for the observed a a, Y'(J Once the data are observed, the I f nk n parameters, . . (J Th by a vector..0 u. f(·) nowthat are unk nown to the investIgators are. en, only quantItIes III . . th function the likelihood functwn for 8 IS e
Example A.I. Consider YI and Y2 to be two independent binomial observations with sample sizes and probabilities (nI,PI) and (n2,P2), respectively. A typical example for which this setting is appropriate is a clinical trial where two treatments are being compared. Here, Y; denotes the number of patients responding negatively to the treatment i, to which ni subjects were assigned, and Pi is the corresponding probability of negative response for i = 1,2. It is convenient to transform the parameters (PI, P2) to (O I ,fh), where PI(I- P2 ))
0 1 = log ( P2(1 -pI)
This leads to a likelihood function for
L(8IYI'Y2 ) CXPIYl (1 -PI )n 1 -
L(O/y) = f(Yi(J)·
(~ 1 - PI
Note that the likelihood is interpreted as a f une t'JOn of (J , with Y held fixed
=
at its observed valu~. . . f (J is the value, 0, which maximizes The maximum lzkelthood esh:nate o . 'th That is for any the likelihood function or, eqUIvalently, Its logan m. , value of 0,
= exp{0 1
L(O Iy) ~ L(91 y). According to the likelihood principle, 0 is then regarded as the v~lue ~f ~ which is most strongly supported by the observed data. In practice, obtained either by direct maximisation of log L, or by solving the set of equations
8(8) = 8l0gL/&O = O.
(A.4.1)
The function 8(0) is known as the score function for (J. Very often, numerical methods are required to evaluate the maximum likelihood estimate. Popular methods include the NeIder and Mead (1965) simplex algorithm for direct maximisation of log L, or Newton-Raphson iteration for solution of the score equations (AA.l). The maximum likelihood estimate is known to enjoy many optimality I.?roperties in large samples. In particular, under mild regularity conditions, o is asymptotically unbiased, and asymptotically efficient in the sense that the elements of 0 are estimated with the smallest possible asymptotic variance~ of a~y. as~mptotically unbiased estimators. The asymptotic variance rnatnx of 0 IS given by the expression
v = {-E(8210g Lj802n -I. The matrix V-I. IS
I k '. . a so nown as the Fzsher znjormation mat'T"tx for
(J.
,
Yl ( )
~ 1- P2
YI
) Y2
+ 02YI + 02Y2 -
(J
= (0 1 ,
(
2
)
of the form
P2Y2 (1 -P2 )n2-Y2
(1- ptJn l (1 - P2)"2
nI log (1
+ e61 +0
2 )
-
n 2 10 g (1
+ e62)}.
The parameter 01 is called the log-odds ratio. A zero vall!e for 01 den~tes equality of PI and P2· The maximum likelihood estimate, (J, can be denved as the solution of the pair of equations, nI exp(B I + ( 2 ) = YI _ nlPI = 0, YI - 1 + exp(B I + ( 2 ) and YI
+ Y2
nI exp(OI + ( 2) _ n2 exp(02) - 1 + exp(B I + ( 2) 1 + exp(B2)
= YI + Y2 - nlPI - n2P2 = o.
This gives
o~I --
») ,
10 (YI(n 2 - Y2 g Y2(nl - YI)
B~ = log ( Y 2 ) . n2-Y2
2
Fisher's information matrix for B can be obtained by straightforward algebra, and is
V- I
-
-
+ (2 ) 2 + exp(Ol + (2)} (0 ) nl exp 01 + 2 {I + exp(OI + (2)}2
({I
) nI exp(01 + O2 (0 + 0 )}2 {I + exp I 2
nl exp(Ol
nI exp {I
(LJ
(11
+ exp(OI
+ LJ
)
° + 172
)}2 2
+
) .
n2 exp(02)
{I
+ exp(B2 )}
2
~, u er left entry of V, namely The asymptotic variance of 0 1 IS the pp (0 )}2J-I I , [nl exp(Ol + (2)/{1 + exp(OI + O2)} 2J- + [n2 exp(02)/{1 + exp 2
343
· h can be estimated consistently by whw 1
Y; + n]
1
+-.!-+ __
1 - lil
112
>
•
>
"
the Pearson's chi-squared test statistic, The nu b fd ' exampIe IS ' one, because the sub-mod I hm er 0 egrees of freedom in thIS the unrestricted model has two. e as one parameter, whereas
112 - Y'2
. III . t h'18 exam pie means that both n] and n'2 are The word 'asymptotw' large, . . i by fitting a series of sub-models wlnch Likelihood Illfcrence procee( s , , '" ' . means thoa tach sub-model III the ' sequence are nested ThIS ec h IS'contamed b withm the prevIOus one, I n Example.A, 1, an interestmg , hypot eSIS, or su Th · that 01 = 0, corresponding to equalIty, 1of PI and' P2· e mo de,I t 0 t es t IS t' '£r b t thl's sllb-model and the full model Wit I no restnc JOn on dluerence e ween , ,.. , 01 can be examined by calculating the likelihood ratw test statzstzc, which is defined as >
GENERALIZED LINEAR MODELS
STATISTICAL BACKGROUND
342
,
' .
G = 2{log L(O Iy) -log L(Oo Iy)}, where 00 and iJ are the maximum likelihood estimates of (J under the null hypothesis or sub-model, and the unrestricted model, respectively. Assuming that the sub-model is correct, the sampling distribution of G is approximately chi-squared, with number of degrees of freedom equal to the difference between the numbers of parameters specified under the sub-model and the unrestricted model. An alternative testing procedure is to examine the score statistic, S( (J), as in (A.4.l), The score test statistic is
S(Oo)V(Oo)S' (0 0 ), whose null sampling distribution is also chi-squared, with the same degrees of freedom as for the likelihood ratio test statistic. In either case the submodel is rejected in favour of the unrestricted model if the test s'tatistic is too large, Example A.2. (continued) Suppose that we want to test whether the probabilities Pi and P2 from two treatment ' d ' I Th' " , s are 1 entICa, IS is equivalent to testing the sub-model :I~h:~ := NOlte that the value of ()2 is unspecified by the null hypothesis n ere ore las to be estimated The al eb ' £ . . ratio test stat' t' G' .' g ralC orm of the hkehhood in application~s i~c c IS c~~p~cated, and we do ~ot give it here, although n has the simple for: easl y e evaluated numencally. The SCore statistic
t
(YI - EI)2 + (Y2 - E 2)2 EI E2 where E; = ni(YI + Y2)/(nI + n ) 's null model that the two gr 2 I the expected value for Y; under the oups are t h e same Thi t . . . . s s atlstlC IS also known as
A.5 Generalized linear models Regression models for independent discrete and t' 'fi d , . con ·1llUOUS responses have been Ull! e under the class of generaliZed linear d l GL~l . .' moes, or lV s (McCullagh and NeIder, 1989), thus providing a common bod f t t' t' a l' d' Y 0 sa ,IS ,leal met h 0 d a Iogy lOr Iuerent types of response Here we " t h . I' " ., review. I' sa lent features of thIS class of models, We begin by considering two particular GLMs, logistic and Poisson reg:ession models, and then discuss the general class, Because GLMs apply to mdependent responses, we focus on the cross-sectional situation as in sec~ion A.2, w~th a si~gle response Yi and a vector Xi of P expla~atory vanables assOCIated WIth each of m experimental units. The objective is to describe the dependence of the mean response, J1.; = E(Y;), on the explanatory variables.
A,5.1
Logistic regression
This model has been used extensively for dichotomous response variables such as the presence or absence of a disease. The logistic model assumes that the logarithm of the odds of a positive response is a linear function of explanatory variables, so that
Iog Prey; Pr(li
=
1) = Io gf.Li- - =x'(3 . i
= 0)
1 - Jii
:ax.
iFigure A.I shows plots of Pr(Y = 1) against a single explanatory able x for several values of j3, A major distinction between the lOgIstIC regres~ion model and the linear model in Section A,2 is that the linearity applies to a transformation of the expectation of Yi, in this case the log odds transformation rather than to the expectation itself. Thus, the regression coefficients (3 'represent the change of the log odds of the response , variable per unit, change of x. Another feature 0 f t h e d'ch ~ 0 t om?us response . variable is that the variance of li is completely determmed by Its mean, Jl•. Specifically,
This is to be contrasted with the linear model, where Var(l'i) is usually asSumed to be a constant, u 2 , which is independent of the mean.
STATISTICAL BACKGROUND
344
----.~
1.0
GENERALIZED LINEAR MODELS
A.5.3
..,.,
345
The general class
Linear, logistic and Poisson regression models are all special cases of generalized linear models, which share the fOllowing features. First, the mean response, J.li = E(Yi), is assumed to be related to a vector of covariates, x, through
0.8
0.6 pIx) 0.4
For logistic regression, h(J.li) = log{J.li/(1 - J.li)}; in Poisson regression, h(Jli) = log(Jli)' The function h(.) is called the link function. Second, the variance of Yi is a specified function of its mean, J.li, namely
0.2
0.0 L...--,-_-.--_:--;~ 2 4 -4 -2 o
Var(Y;)
X
Fig. A.!. The logistic model, p(x) = exp(#x)/{1 +exp(#x)}; - : .........: (3 = 1; - - -: # = 2.
#
= -0.5;
A.5.2 Poisson regression Poisson regression, or log-linear, models are applicable to problems in which the response variable represents the number of events occurring in a fixed period of time. One instance is the number of seizures in a given timeinterval, as in Example 1.6. Because ofthe discrete and non-negative nature of count data, a reasonable assumption is that the logarithm of the expected count is a linear function of explanatory variables, so that
= Vi = ¢V(J.li).
In this expression, the known function v (.) is referred to as the variance function; the scaling factor, ¢, is a known constant for some members of the GLM family, whereas in others it is an additional parameter to be estimated. Third, each class of GLMs corresponds to a member of the exponential family of distributions, with a likelihood function of the form
f(yd = exp[{Yi f}; - t/J(fh)}/r/> + C(Yi,¢)]'
(A.5.l)
The parameter 0i is known as the natural parameter, and is re~ated to ?,i through Jli = 8t/J(Od/80i . For example, the Poisson distribution IS a speCIal case of the exponential family, with
log E(Yi) == x~f3.
~ere, the regression coefficient for a particular explanatory variable can be mterpreted as the logarithm of the ratio of expected counts before and after a o~e unit increase in that explanatory variable, with all other explanatory variables held constant. The term 'Poisson' refers to the distribution for counts derived by Poisson (1837), p(y) == exp( -J.l)/lY Iyl, y
= 0,1, "
i~
Var(Yi)
= E(Yi) == exp(x~,8).
. me . I u d e the Gaussian t or Normal Other distributions within this family gamma distribution, the binomial distribution and the two-parame er distribution. . f3 n be estimated by solving In any GLM the regression coeffiCients, ,ca the same estimating equation,
..
istic At's logt. hregreSSiOn, the assumption that Yi follows a Poisson distribuIon Imp les t at the varianc f v · d . th e mean and vanance . e a same .l i IS etermmed by its mean. In this case, are the
,
Oi=logJli' t/J(Oi) = exp(Oi) , e(Yi,¢)=-log(Yi')' r/>=1.
S(f3) =
f (:;)
I
V;I{y; - J.li(f3)} = 0,
(A.5.2)
i=I
) is the derivative of the logarithm where Vi = Var(Yi). Note that SCf3. ~ which is the maximum likeof the likelihood function. The solut~on f3 : ly reweighted least squares; l 'h . b bt' ned by Iterat Ive . I I ood estimate, can e 0 8J d t 'led discussion. Finally, marge See McCullagh and NeIder (1989) for a e 8J
QUASI-LIKELIHOOD
STATISTICAL BACKGROUND 346
samp Ies
,
'b t'
~ follows a Gaussian dlstrI u JOn
f3
V
=
(
~ (Dlti)' v:-
£-
Df3
with mean {3 and variance
D1.li) - I '8/3
then a simple calculation gives the asymptotic variance matrix of i3 as
4>
(A.5.3)
I
347
8 m
(
Xi
)-1
exp(x~f3)x:
i=1
.. I b V which is obtained by replacing This variance can be eHtImate( Y in the expression (A.5.3).
13 with
i3
A.6 Quasi-likelihood . rt f the GLM family is that the score function, One Importadnt prlope ~h~ mean and the variance of the }i, Wedderburn S(f3) depen s on y on . (A 5 2) ". ~an (1974') was the first to point out that the estimati~g equatIOn therefore be used to estimate the regression coefficIents for any chOIce~ of link and variance functions, whether or not they corre~pond to a ~artIcu lar member of the exponential family. The name quasz-score funetwn was coined for S(f3) in (A.5.2), since its integral with respect to 13 can be thought of as a 'quasi-likelihood' even if it does not :o~stitute a ~rop:r likelihood function. This suggests an approach to statIstIcal modellIng 10 wWch we make assumptions about the link and variance functions without attempting to specify the entire distribution of }i. This is desirable, since we often do not understand the precise details of the probabilistic mechanisms by which data were generated. McCullagh (1983) showed that the solution, ~, of the quasi-score function has a sampling distribution which, in large samples, is approximately Gaussian with mean (3 and variance given by equation (A,5.3).
Example A.2. Let YI ",., Ym be independent counts whose expectations are modelled as
/3
Thus, by comparison with (A.5,3) the variance of is inflated by a factor of 4>. Clearly, ignoring over-dispersion in the analysis would lead to under-estimation of standard errors, and consequent' over-statement. of significance in hypothesis testing. In the above example, 4>E(Y,) is but one of many possible choices for the variance formula which would take account of over-dispersion in count data. Fortunately, the solution, is a consistent. estimate of (3 as long as h(lti) = x~l3, whether or not the variance function is correctly specified. This robustness property holds because the expectation of S(f3) remains zero so long as E(}i) = lli((3), However, the asymptotic variance matrix of has the form
/3,
/3
V,
~V
(t.(:)'
v;lVal(v;)v;l: ) V
Note that V2 is identical to V in (A.5.3) only if Var(¥;) = Vi· When this assumption is in doubt, confidence limits for f3 can be based on the estimated variance matrix
11, = Y
(t.(:)'
~,({3)}'V;l: ) Y,
V,-l{V; -
(A.6.1)
/3
evaluated at (:J, We call V a model-based variance estimate of and V2 a robust variance estimate in that V2 is consistent regardless of whether the specification of Var(Yi) is correct.
logE(¥;) = x:f3,
i
= 1, ... , m,
In bio~edical studies, frequently the variance of Vi is greater than E(Yi), the varJa~ce expression induced by the Poisson assumption, This phenomenon IS known as over d' . 0 ne way to account for thIS . IS . to - zsperswn, assume that Var(Y.) - -I.E(Y.) h -I. . . N ' - '+'. i ": ere '+' IS a non-negative scalar parametel. ate that for the POlsson d1stribution 1> = l' if we allow -I. > 1 we no longer hav d' t 'b . " 'P, 1s.n utton from the exponential family. However if we defi ne (3' as t hee aSolutlOn to '
i=I
An alternative to the variance function Vi = 4>lli,
' t 'b tion (Breslow, 1984), is the form induced by the poisson-gamma d IS n u Vi = Iti(l
+ lti¢J) ,
1\,
m
L>i{Yi - exp(x~t3)} == 0
Example A.2. (continued)
'
, 'is difficult to choose empiriWith a limited amount of data avaIla~le, and McCullagh, 1993). g cally between these two variance functIOns ( llUl TT helps to alleviate 'ance estlmat e, Y2, The availability of the rob ust varI , f, ula in larger samples. of varIance orm ' the concern regarding the ch Olce
STATISTICAL BACKGROUND 848
It is interesting to note t~at in the special case where Iti == It and hence log f-li == 13o, the estimate '\12 reduces to m
~ - 2 L,..(Yi - Y) 1m 2 ,
i=l
Bibliography
the sampling variance of ~o == logY (Royall, 1986).
Aerts, M. ~nd. Claeskens, G. (1997). Local polynomial estimation in multiparameter lIkelIhood models. Journal of the American Statisti I A . t· 92, 1536-45. ca ssocza wn, Afsarinejad, K. (1983). Balanced repeated measurements designs. Bio etrik 70, 199-204. m a, Agresti, A. (1990). Categorical data analysis. John Wiley, New York. Agresti, A. and Lang, J. (1993). A proportional odds model with subject-specific effects for repeated ordered categorical responses. Biometrika, 80, 527-34. Agresti, A. (1999). Modelling ordered categorical data: recent advances and future challenges. Statistics in Medicine, 18, 2191-207. Aitkin, M., Anderson, D., Francis, B., and Hinde, J. (1989). Statistical modelling in GLIM. Oxford University Press, Oxford. Albert, J.H. and Chib, S. (1993). Bayesian analysis of binary and polytomous data. Journal of the American Statistical Association, 88, 669-79. Alexander, C.S. and Markowitz, R. (1986). Maternal employment and use of pediatric clinic services. Medical Care, 24(2), 134--47. Almon, S. (1965). The distributed lag between capital appropriations and expenditures. Econometrica, 33, 178-96. Altman, N.S. (1990). Kernel smoothing of data with correlated errors. Journal of the American Statistical Association, 85, 749-59. Amemiya, T. (1985). Advanced econometrics. Harvard University Press, Cambridge Massachusetts. Amemiya, T. and Morimune, K. (1974). Selecting the optimal order o.f ~olyno mial in the Almon distributed lag. The Review of Economics and Statzstzcs, 56, 378-86. Andersen, P.K., Borgan, 0., Gill, R.D., and Keiding, N. (1993). Statistical models based on counting processes. Springer-Verlag, New York. Anderson, J.A. (1984). Regression and orde~ed categorical variables (with Discussion). JournoJ, d{ the Royal Statistical Soczety, B, 46, 1-30.
BIBLIOGRAPHY BIBLIOGRAPHY 350
. mponcnt models with binary A'tk' M (1985). VarIance co ..' t B 47 AnderfolOn, D.A. and I .m, .. " al 01 the Royal Statz8tzcal 80cu ',I), , , . interviewer vaTlahlllty. Journ reRpon s C . ' " 203 10. . d reg r c8SlUTt. Au mtroduction ~) Pl t transfo1'1natlO ns an p Atkinllon, A.e, (198.1. 0 II" " ~7!alysis. Oxford University' ress, to ,qraphical mdhod.~ of diagTto,~tzc regresszon )
Oxford. . h app I'Ica t'1011 • t t rreilltcd data Wit ' , , A (1994) Lo~iHtic re~resHlon ,or au oco AzzzaIml,. . . 81 767 75 to repeated me8J!ureH. Biol1wtnka" , A rc resentation of the joint distribution of re/lponses to n Bahadur, R.R, (1961). P 't lysi8 and prediction (cd. H. Solomon), 't In Studzes on I em ana f dichotomous I ems. h t' I Studies in the Social Sciences VI, Stan ord pp. 158-68, Stanford Mat ema Ica . University PresR, Stanford, California.
351
Breslow, N.E. and Day, N.E. (1980). Statistical methods in cancer research, Volume J. lARC Scientific Publications No. 32. Lyon. Breslow, N.E. and Lin, X. (1995). Bias correction in generalized linear mixed models with a single component of dispersion. Biometrika, 82, 81-91. Brumback, B.A. and Rice, J.A. (1998). Smoothing spline models for the analysis of nested and crossed samples of curves (C/R: p976-994). Journal of the American Statistical Association, 93, 961-76. Carey, V.C. (1992). Regression analysis for large binary clusters. Unpublished PhD thesis, Department of Biostatistics, The Johns Hopkins University, Baltimore, Maryland. Carey, V.C., Zeger, S.L., and Diggle, PoOL (1993), Modelling multivariate binary data with alternating logistic regressions. Biometrika, 80, 517-26.
Barnard, G.A. (1963). Contributions to the discussion of Professor Bartlett's paper. Journal of the Royal Stati.~tical Soezety, B, 25, 294.
Carlin, B.P. and Louis, T.A. (1996), Bayes and empirical Bayes methods for data analysis, Chapman and Hall, London.
Bartholomew, D.J. (1987). Latent variable models and factor analysis. Oxford University Press, New York.
Chambers, J.M. and Hastie, T.J. (1992). Stati8tical models in S. Wadsworth and Brooks-Cole, Pacific Grove.
Bates, D.M. and Watts, D.C. (1988). Nonlinear regression analysis and its Applications. Wiley, New York.
Chambers, J.M., Cleveland, W.S., Kleiner, 8., and Thkey, P.A. (1983). Graphical methods for data analysis. Wadsworth, Belmont, California.
Becker, R.A., Chambers, J.M., and Wilks, A.R. (1988). The new S language. Wadsworth and Brooks-Cole, Pacific Crove.
Chib, S. and Carlin, B. (1999). On MCMC sampling in hierarchical longitudinal models. Statistics and Computing, 9, 17-26.
Besag, J. (1974). Spatial interaction and the statistical analysis of lattice systems. Journal of the Royal Statistical Society, B, 36, 192-236.
Chib, S. and Greenberg, E. (1998). Analysis of multivariate probit models. Biometrika, 85, 347-61.
Billingsley, P. (1961). Statistical inference for Markov processes. University of Chicago Press, Chicago, Illinois.
Clayton, D.G. (1974). Some odds ratio statistics for the analysis of ordered categorical data. Biometrika, 61, 525-31.
Bishop, S.H. and Jones, B. (1984). A review of higher-order cross-over designs. Journal of Applied Statistics, 11, 29-50.
Clayton D.G. (1992). Repeated ordinal measurements: a generalis~ esti~t, . ing equation approach. Med~cal Researc h Counct'I Biostatistics Umt Technacal Reports, Cambridge, England.
Bishop,Y.M.M., Fienberg, S.E., and Holland, P.W. (1975). Discrete multivariate analysis: theory and proctice. MIT Press, Cambridge, Massachussetts. Bloomfield, P. and Watson, G.S. (1975). The inefficiency of least squares. Biometrika, 62, 121-28.
~oo~h, J.C, and Hobert, J.P. (1999). Maximizing generalized linear mixed model hkeh.ho.ods with an automated Monte Carlo EM algorithm. Journal of the Royal Stattstlcal Society, B, 61, 265-85. Box, G.P, ~nd Jenkins, C.M. (1970). Time series analysis _ forecasting and control (revised edn). Holden-Day, San Francisco, California. Breslow, N.E. (1984). Extra-Poisson variation in log linear models. Applied Statlstlcs, 33, 38-44.
rBreslow,' N.E. d
and Clayton D G (1993) A . ' . . . pprmomate inference in generalized mear mlxe models. Journal of the American Statistical Association, 88, 125-34.
Cleveland, W.S. (1979). Robust locally, w~ighted r~gr~ion a~~9~;~othing scatterplots. Journal of the American Statzstzcal ASSOCIatIOn, 14, Cochran, W.G. (1977). Sampling techniques. John Wiley, New York. y data Biometrics, 46, d I fi b' Conaway, M.R. (1990). A random effects mo e or mar . 317-28. d 'ifi ce in regre.ssion. Cook, D. and Weisberg, S. (1982). Residuals an an uen Chapman and Hall, London. amples (with d Copas, J.B. and Li, H.G. (1997). Inference for ~n-;:5~~5~ Discussion). Journal of the Royal Statistical Soctety, " . 06 Second supplement published 182? Courcier. Reissued with a supplement, 18 . 929 576-9 in A source book In A portion of the appendix was translated, 1 ,pp.
BIBLIOGRAPHY BIBLIOGRAPHY
352 b H A Ruger and H.M. Walker, McGraw , DESmith ed. trans. y . . y: k mathematzcs, . d' 1959 in 2 volumes, Dover, New or. H'Il New York' reprmte ' d t Chapman and Hall, London. I " Cox, D.R. (1970). Analysis of bmary a a. .' . d i d life tables (with dIl'ICUsslOn). Journal of Cox, D.R. (1972). RegressIOn mo e S an _ the Royal Statistical Soczety, B, 74, 187 200. . " , ' · t fstical analysIs. Statzstzcal sczence, 5, I d Cox, D.R. (1990). Role of rna e SIllS a I J.
169-74.
353
Diggle, P.J. (1990). Time series: a biostatistical introduction. Oxford University Press, Oxford.
•
.
Cox, D.R. and MIller, Wiley, New York.
H D (1965). The theory of stochastic processes. John .,
1984). Analysis of survival data. Chapman and Hall, Cox, D.R. an d 0 a kes, D . ( London. J (1989) Analysis of binary data. Chapman and Hall, II E .. Cox, .R . an d Sne, . London.
n
Diggl e , P.J. and Kenward, M.G. (1994). Informative dropout in long't d' I d t . ( . h d' .) A . I U ma a a analySIS WIt Iscusslon. pphed Statistics, 43, 49-73. Diggle, P.J. (1998). Dealing with missing values in longitudinal studies. In Advances zn the statzstzcal analysis of medical data (ed. B.S. Everitt and G. Dunn), pp. 203-28. Edward Arnold, London. Diggle, P.J. and Verbyla, A. (1998). Nonparametric estimation of covariance structure in longitudinal data. Biometrics, 54, 401-15. Draper, N. and Smith, H. (1981). Applied regression analysis (2nd edn), Wiley, New York. Drum, M.L. and McCullagh, P. (1993). REML estimation with exact covariance in the logistic mixed model. Biometrics, 49, 677-89.
Cressie, N.A.C. (1993). Statistics for spatial data. Wiley, New York.
Dupuis Sammel, M. and Ryan, L.M. (1996). Latent variable models with fixed effects. Biometrics, 52, 650-63.
Crouch, A.C. and Spiegelman, E. (1990). The evaluation of integrals of the form J f(t) exp(_t 2 ) dt: application to logistic-normal models. Journal of the American Statistical Association, 85, 464-69.
Emond, M.J., Ritz, J., and Oakes, D. (1997). Bias in GEE estimates from misspecified models for longitudinal data. Communications in Statistics, 26, 15-32.
Cullls, B.R. (1994). Contribution to the Discussion of the paper by Diggle and Kenward. Applied Statistics, 43, 79-80.
Engle, R.F., Hendry, D.F., and Richard, J.-F. (1983). Exogeneity, Econometrica, 51, 277-304.
Cul1is, B.R. and McGilchrist, C.A. (1990). A model for the analysis of growth data from designed experiments. Biometrics, 46, 131-42.
Evans, J.L. and Roberts, E.A. (1979). Analysis of sequential observations with applications to experiments on grazing animals and perennial plants. Biometrics, 35,687-93.
Davidian, M. and Gallant, A.R. (1992). The nonlinear mixed effects model with a smooth random effects density. Department of Statistics Technical Report, North Carolina State University, Campus Box 8203, Raleigh, North Carolina 27695. Davidian, M. and Giltinan, D,M. (1995). Nonlinear mixed effects models for repeated measurement data. Chapman and Hall, London. Deming, W.E. and Stephan, F.F. (1940). On a least squares adjustment of a sampled frequency table when the expected marginal totals are known. Annals of Mathematical Statistics, 11, 427-44.
~empster, A.P., L~ird, N.M., and Rubin, D.B. (1977). Maximum likelihood from mcomplete data B, 39, 1-38.
VIa
the EM algorithm. Journal of the Royal Statistical Society,
Dhrymes ' P "J (1971) . D'zs t n 'b ut ed Iags: problems of estimation and formulation. Holden-Day, San Francisco, DBiggle't P,J · 4(1988). An approach to the analysis of repeated measures. lOme ncs, 4 ,959-71. Diggle, P.J. (1989). Testing for rand d . Biometrics, 45, 1255-58. om ropouts 1ll repeated measurement data.
Evans, M. and Swartz, T. (1995). Methods for approximating int~~als instatistics with special emphasis on Bayesian integration problems. Statzstzcal Sctence, 10,254-72. Faucett, C.L. and Thomas, D.C. (1996). Simultaneously model.lin g censored survival data and repeatedly measured covariates: a Gibbs sampling approach. Statistics in Medicine, 15, 1663-86. Fearn, T. (1977). A two-stage model for growth curves which leads to Rao's covariance-adjusted estimates. Biometrika, 64, 141-43. 'l' th d its applications (3rd Feller, W. (1968). An introduction to probabz zty eory an edn). John Wiley, New York. d D A (1999) Combining mortality and . . . . .' Moo" 18 1341-54. Finkelstein, D.M. and Schoenfel, longitudinal measures in clinical trials. Statzstzcs m zcme, , . . d pendence estimating equaFitzmaurice, G.M. (1995). A caveat c~ncern~ng III e309-17 . tions with multivariate binary data. Bzometncs, 51,
BIBLIOGRAPHY BIBLIOGRAPHY
354 and Clifford, P. (1996). Logistic regression Fitzmaurice, G.M., Heath, A.F.; . . Journal 01 the Royal Slalzstzcal models for binary panel data wIth attntlOn.
Society, A, 159, 249 63.
. . • J N M (1993). A hkehhood-based method for ., " "k 80 141 .51. Fitzmaurice, G.M. and Lam, analysing longitudinal binary responses. Bwmetrz a, , " . . M 1 n t itzky A.G. (HJ9:3). RegresSIon models Fitzmaurice, G.M., LaIrd, N. ., all( .0 n ," 8 284 99 for discrete longitudinal responses. Statz.9tzcal Sczence" .
355
Godambe, V.P. (1960). An optimum property f l ' " " A if" 0 regu ar maxImum likelihood estImatIOn. nna so Mathemattcal Statistics, 31, 1208-12. Godfrey, L.G. and Poskitt, D.S. (1975). Testing the r t· t" f . l 1 esrIe IOns 0 the Almon lag technIque. Journa 0 the American Statistical Assoc,a ; t"lOn, 70 ,105-8. Goldfarb, N. (1960). An introduction to longitudinal stat1..stical analysis: the method of repeated observations from a fixed sample. Free Press of Glencoe, Illinois.
M (1995) An approximate generalized linear model with Follman, D. and Wu,. '.. " .. , 51 15168 random effects for informative mlssmg data. Bwmetrzcs" .
Goldstein, H. (1979). The design and analysis of longitudinal studies: their role in the measurement of change. Academic Press, London.
. k S J (1992). Repeated measures in clinical trials: anaF'r1son L J and Pococ, ., . S t" t" " , ... t t' tics and its implication for deSIgn. ta zs zcs m lysis using mean summary s a IS • Medicine, 11, 1685--1704. . LJ d Pock S.J. (1997). Linearly divergent treatment effects in F'r1son, ... an oc , . . t t' t' clinical trials with repeated measures: efficient analYSIS usmg summary s a IS ICS. Statistics in Medicine, 16, 2855-72.
Goldstein, H. (1986). Multilevel mixed linear model analy SIS . usmg . 't . " I era t'Ive generalised least squares. Bwmetrika, 73, 43-56.
Gabriel, K.R. (1962). Ante dependence analysis of an ordered set of variables. Anna18 of Mathematical Statistics, 33, 201~12. Gauss, C.F. (1809). Theoria motus corporum celestium. Hamburg: Perthes et Besser. Translated, 1857, as Theory of motion of the heavenly bodies moving about the sun in conic sections, trans. C. H. Davis. Little, Brown, Boston. Reprinted, 1963; Dover, New York. French translation of the portion on least squares, pp. 11134 in Gauss, 1855. Gelfand, A.E. and Smith, A.F.M. (1990). Sampling-based approaches to calculating margina densities. Journal of the American Statistical Association, 85, 398-409. Gelfand, A.E., Hills, S.E., Racine-Poan, A., and Smith, A.F.M. (1990). Illustration of Bayesian inference in normal data models using Gibbs sampling. Journal of the American Statistical Association, 85, 972-85. Gelman, A., Carlin, J.B, Stern, H.S. and Rubin, D.B. (1995). Bayesian data analysis. Chapman and Hall, London.
Goldstein, H. (1995). Multilevel statistical models (2nd edn). Edward Arnold, London. Goldstein, H. and Rasbash, J. (1996). Improved approximations for multilevel models with binary responses. Journal of the Royal Statistical Society, A, 159, 505-13. Gourieroux, C., Monfort, A., and Trognon, A. (1984). Psuedo-maximum likelihood methods: theory. Econometrica, 52, 681-700. Graubard, B.I. and Korn, E.L. (1994). Regression analysis with clustered data. Statistics in Medicine, 13, 509-22. Gray, S.M. and Brookmeyer, R. (1998). Estimating a treatment effect from multidimensional longitudinal data. Biometrics, 54, 976-88. Gray, S. and Brookmeyer, R. (2000). Multidimensional longitudinal data: estimating a treatment effect from continuous, discrete or time to event response variables. Journal of American Statistical Association, 95, 396-406. Graybill, F. (1976). Theory and application of the linear model. Wadsworth, California.
Gibaldi, M. and Perrier, D. (1982). Pharmacokinetics. Marcel Dekker, New York.
Greenwood, M. and Yule, G.U. (1920). An enquiry into the nature of frequency distributions to the occurrence of multiple attacks of disease or of repeated accidents. Journal of the Royal Statistical Socieity, Series A, 83, 255-79.
Gilks, W., Richardson, S., and Speigelhalter, D. (1996). Markov chain Monte Carlo in practice. Chapman and Hall, London.
Grieve, A.P. (1994). Contribution to the Discussion of the paper by Diggle and Kenward. Applied Statistics, 43, 74-6.
Gilmour, A.R., A?ders~n, R.D., and Rae, A.L. (1985). The analysis of binomial data by a generalized hnear mixed model, Biometrika, 72, 593-99.
Griffiths D.A. (1973). Maximum likelihood estimation for the beta-binomial distribution, and an application to the househo Id d'ISt n'b u t'IOn 0 f the total number of cases of a disease. Biometrics, 29, 637-48.
Glasbey,C.A. (1988). Examples of regression with serially correlated errors. The Statzstzczan, 37, 277-92.
G11~~e~ G.FI·V · and McCullagh, P. o
e
(1995). Multivariate logistic models. Journal ova S tattsttcal Society, B, 57, 533-46.
Gromping, U. (1996). A note on fitting a marginal model to mixed effects loglinear regression data via GEE. Biometrics, 52, 28()-5. . f ult' . trouped Guo, S.W. and Lin, D.Y. (1994). Regression analySIS 0 m lvarl8 e g survival data. Biometrics, 50, 632-39.
BIBLIOGRAPHY
BIBLIOGRAPHY
357
356 E t d d generalized estimating equations Hall DB. and Severini, T.A. (1998). x en ;tatistical Association, 93, 1:365-75. for ~lu8~ered data, Journal of lite Amencan .. . .. t' egression. Cambridge Umverslty HardIe, W. (1990). Applied nonparame nc r .
Press New York.
• , . 1 'ff R M Pryor D.B. and Rosati, R.A. (1984). .., ' ' ., 5' . t· . E Lee , K.L. 'CalI, 11 F.., Harre, . [. . proved prognostic predictIOn. , talls zcs tTl Regression modelling strateglCs or 1m Medicine, 3, 143-52. . . t' with time series errors. Journal Hart, J.D. (1991). Kernel regressIOn estlma Ion of the Royal Statistical Society, B, 53, 173-87. Kernel regression estimation using ( 1986) E . Hart J.D. and WeIu Iy, T . ' . - l A _. , t d t Journal of the American Statzstzca ssoczatlOn, repeated mellBuremen s a a. 81, 1080-88.
Harville, D. (1974). Bayesian inference for variance components using only error contrllBts. Biometrika, 61, 383-85.
Heckman, J.J. and Singer, B. (1985). Longitudinal anal . Cambridge University Press, Cambridge. yszs
fl b 0
a Our market data.
Hedayat, A. and Afsarinejad, K. (1975). Repeated d . .. . measures eSlgns I In A r vey of statzstzcal deszgn and linear models (Ed J N S' )" su . . . nVllBtava. North-Holland Amster d am. ' Hedayat, A. and Afsarinejad, K. (1978). Repeated measure d ' II A of Statistics, 6, 619-28. s eslgns, . nnals d 1 Hedeker, D. and Gibbons, R. (1994). A random-effects ordinal regr~s . ' B lOmetrics, . " SlOn mo e for mu 1tl'1 eve I ana I YSIS. 50, 933-44. Henderson, R., Diggle, P., and Dobson, A. (2000). Joint modelling of longitudinal measurements and recurrent events. Biostatistics, 1, 465-80. Hernan, M.A., Brumback, B., and Robins, J.M. (2001). Marginal structural models to estimate the joint causal effect of nonrandomized treatments. Journal of the American Statistical Association, 96, 440-8.
Harville, D. (1977). Maximum likelihood estimation of variance components and related problems. Journal of the American Statistical Association, 72, 320-40.
Heyting, A., Tolboom, J.T.B.M., and Essers, J.G.A. (1992). Statistical handling of dropouts in longitudinal clinical trials. Statistics in Medicine, 11, 2043-62.
Hastie, T.J. and Tibshirani, R.J. (1990). Generalized additive models. Chapman and Hall, New York.
Hogan, J.W. and Laird, N.M. (l977a). Model-based approaches to analysing incomplete longitudinal and failuretime data. Statistics in Medicine, 16, 259-72.
Hausman, J.A. (1978). Specification tests in econometrics. Econometrica, 46, 1251-72.
Hogan, J.W. and Laird, N.M. (1977b). Mixture models for the joint distribution of repeated measures and event times. Statistics in Medicine, 16, 239-57.
Heagerty, P.J. (1999). Marginally specified logistic-normal models for longitudinal binary data. Biometrics, 55, 247-57.
Holland, P. (1986). Statistics and causal inference. Journal of the American Statistical Association, 81, 945-61.
Heagerty, P.J. (2002). Marginalized transition models and likelihood inference for longitudinal categorical data. Biometrics 58 (to appear).
Huber, P.J. (1967). The behaviour of maximum likelihood estimators under nonstandard conditions. Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability, 1, LeCam, L.M. and Neyman, J. editors, University of California Press, pp. 221-33.
Heagerty P.J. and Kurland, B.F. (2001). Misspecified maximum likelihood estimates and generalized linear mixed models. Biometrika, 88, 973-85. Heagerty,. P.J. and Zeger, S.L. (1996). Marginal regression models for clustered ordlllal measurements. Journal of the American Statistical Association 91 1024-36. ' , Heagerty,. P.J. and Zeger, S.L. (1998). Lorelogram: a regression approach to eX~lormg dependence in longitudinal categorical responses. Journal of the Amencan Statzstxcal Association, 93, 150-62. Heagerty, P.J. and Zeger S L (2000) M . l' rk l'h d' f . '. . . . argma Ized multilevel models and I e I 00 In erence. Statlstlcal Science, 15, 1-26. Heagerty, P.J. and Zeger S L (2000) Mit' . . ' " . u lVaflate continuation ratio models: connectIOns and caveats. Biometrics, 56, 719-32. Heckman, J.J. (1976) The commo t f n s ruct~re 0 statistical models of truncation, sample selection a d I: 't d d 1 for such models ~ 17 e E ependent vaflables, and a simple estimation method . nna s 01 conomzc and Social Measurement, 5, 475-92.
Hughes, J.P. (1999). Mixed effects models with censored data with application to HIV RNA levels. Biometrics, 55, 625-9. Jones, B. and Kenward, M.G. (1987). Modelling binary data from a three-point cross-over trial. Statistics in Medicine, 6, 555-64. Jones, B. and Kenward, M.G. (1989). Design and analysis of cross-over trials. Chapman and Hall, London. Jones, M.C. and Rice, J.A. (1992). Di~playing .the imp~~~1:::~es of large collections of similar curves. The Amencan Statzstzczan, ' .h . I rrelation- a state-space Jones, RM. (1993). Longitudinal data wzt serta co . approach. Chapman and Hall, London. Jones, R.H. and Ackerson, L.M. (1990). Serial correlation in unequally spaced longitudinal data. Biometrika, 77, 721-31.
BIBLIOGRAPHY
BIBLIOGRAPHY
359
358
. . F (1991) Unequally spaced longitudinal data .' . J onCll, R"H. and Boadl-Boteng, "161-75 wit.h IlClrial eorrclation. Bzometnc,~, 47, . .. (' J (1978) Mining geostatistic8. Academic Press, • Journel, A.G. and HUlJoregts, N " New York. of systf'matic sampling from conveyor belts. Jowett, G.H. (19S2), T he accuracy" Applied Statistics, 1, SOg. . I P t·· R I (1980) . Tlw statistical aualysi8 of failure time KalbflClBch, J. D ,am ren Ice, ,." data. .John Wiley, New York.
Lauritzen, S.L. (2000). Causal inference from gr h' I R-99-2021, Department of Mathematical Sci en ap Alcalbmodels.. Res~arch Report ces, a org Ulllversity. Lavalley, M.P. and De Gruttola , V. (1996) • IV ~Iodel ~ .. S lor empIrIcal B ayes t' of longitudinal CD4 counts. Statistics in Medicine, 15, 2289_305. es Imators Lawless, J.F. (1982). Statistical models and meth d ' I" . ' a s Jar IJetlme data, John \Viley , k N ew Yor. Lee, Y. and NeIder, J.A. (1996). Hierarchical gener I' d I' .. ') J . . a Ize meal' models (with dISCUSSIon. ournal of the Royal Stattstlcal Society, Series B, 58, 619-78,
Kalman, R.E. (1960). A new approach to linear filtering and prediction problems. Journal of Baaic Engineering, 82, 34 45.
Lee, Y. and Neider, J.A. (2001), Hierarchical generalised linear mod I th 'f r d r e s , a syn eSIS 0 genera Ise meal' models, random-effects models and structured d' . Biometrika 88, 987-1006. IsperSlOns.
Karim, M.R, (1991). Generalized linear models with random eff~cts: a. Gi~bs sampling approach, Unpublished PhD thesis from the Johns Hopkms Umverslty Department of Biostatistics, Baltimore, Maryland.
Legendre, A.M. (1805). Nouvelles Methodes pour La determination des orbites des comEtes. John Wiley, New York.
Kaslow, R.A"
Ostrow, D.G"
Detels, R. et al, (1987). The Multicenter
AIDS Cohort Study: rationale, organization and selected characteristics of the participants. American Journal of Epidemiology, 126, 310-18. Kaufmann, H. (1987). Regression models for nonstationary categorical time series: asymptotic estimation theory. Annals of Statistics, 15, 863-71. Kenward, M.G. (1987). A method for comparing profiles of repeated measurements. Applied Statistics, 36, 296-308. Kenward, M.G., Lesaffre, E., and Molenberghs, G. (1994). An application of maximum likelihood and estimating equations to the analysis of ordinal data from a longitudinal study with cases missing at random. Biometrics, 50, 945-53. Korn, E.L. and Whittemore, A.S. (1979). Methods for analyzing panel studies of acute health effects of air pollution. Biometrics, 35, 795-802. Laird, N.M. (1988), Missing data in longitudinal studies. Statistics in Medicine, 7,305-15. N.M. and Wang" F (1990) t' t'lIIg rates of change III . randomized Laird, I' ' . . E SIma c InIcal tl'lals, Controlled Clinical Trials, 11, 405-19.
LB~ird, tN :M . and Ware, J .H. wme ncs, 38, 963-74.
(1982). Random-effects models for longitudinal data
.
Lang, J.B, and Agresti A (1994) S' I f ,'.' . Imu taneously modeling joint and marginal IS l'l U Ions 0 multIvarIate categorical responses Jo . Statzsllcal Association, 89, 625-32. . urnal of the Amencan
d' t 'b t'
Lang, J.B., McDonald J,W and Sm'th P W modeling of mUltivaria~e cat" . I I , ' .F. (1999). Association-marginal egOl'lca responses' a max' l'k l'h Journal of the Amencan St t' t· l A ' .' Imum lei ood approach. a IS lca ssonatlOn, 94, 1161-71. Lange, N. and Ryan L (1989) A . " . ssesslIIg norm I't· d Annals of Statistics, 17, 624-42. a I y 111 ran om effects models.
Lepper, A.W.D. (1989). Effects of altered dietary iron intake in Mycobacterium paratuberculosis-infected dairy cattle: sequential observations on growth, iron and copper metabolism and development of paratuberculosis. Res. Vet. Sci., 46, 289-96. Liang, K.- Y. and McCullagh, P. (1993). Case studies in binary dispersion. Biometrics, 49, 623-30. Liang, K.-Y. and Zeger, S.L. (1986). Longitudinal data analysis using generalized linear models. Biometrika, 73, 13-22. Liang, K.-Y. and Zeger, S.L. (2000). Longitudinal data analysis of continuous and discrete responses for pre-post designs. Sankya, B , 62, 134-48. Liang, K.-Y., Zeger, S.L., and Qaqish, B. (1992). Multivariate regression analyses for categorical data (with Discussion). Journal of the Royal Statistical Society, B, 54, 3-40. Lin, D.Y. and Ying, Z. (2001). Semiparametric and nonparametric regression analysis of longitudinal data. Journal of the American Statistical Association, 96,103-26. Lin, X. and Breslow, N.E. (1996). Bias correction in generalized linear mixed models with multiple components of dispersion. Journal of the Amencan Statistical Association, 91, 1007-16. Lin, X. and Carroll, R.J. (2000). Nonparametric function estimation for cJust~red data when the predictor is measured without/with error. Journal of the Amencan Statistical Association, 95, 520-34.
J an d B a t es, D .M . (1990). Nonlinear mixed effects models for LI'nds t rom, M ., repeated measures data. Biometrics, 46, 673--87.
360
BIBLIOGRAPHY
BIBLIOGRAPHY 361
. " t'mg equations for correlated binary data: . . S (1989) Generalized estlma D LipSitZ,.. f ciation Technical report, epartment using the odds ratio as a measure 0 as~o Ith' of Biostatistics, Harvard School of Public Hea .
Molenberghs, G. and Lesalfre E (1994) M . I ,. . argma mod r f I J e mg 0 corre ated ordinal data using a multivariate Plackett distr'b t' , , . 1 U IOn. ournal o{ the A ' me7'1.can Statistzcal Assoczatzon, 89, 633-44.
. gton, D . (1991). Generalized estimating. equa. L· 't S laird N an d Harrm IPSI Z, ., -' ,., d . jds ratios as a measure of assoCIatIOn. tions for correlated binary ata: uSing or Biometrika, 78, 15.360.
Molenberghs, G. and Lesaffre, E. (1999). Marginal d I' . I d a t a. S tatzstzcs ' , m . Medicine, 18, 2237-55. rna e mg categonca
. ture models incomplete data. ) P attern-mlx Little, R..1.A. (1993. . ' for multivariate , Journal of the American Statistical Assoczatwn, 88, 125-34. Little, R..1.A. (1995). Modelling the drop-out mechanism in repeated-measures studies. Journal of the American Statzstzeal Assoczatwn, 90, 1112 21. Little, R..1.A. and Rubin, D.B. (1987). Statistical analysis with missing data. John Wiley, New York. Little, R.1. and Rubin, D.B. (2000). Causal effects in clinical and epidemiological studies via potential outcomes: concepts and analytical approaches. Annual Review in Public Health, 21, 121-45. Lin, G. and Liang, K.-Y. (1997). Sample size calculations for studies with correlated observations. Biometrics, 53, 937-47. Lin, Q. and Pierce, D.A. (1994). A note on Gauss-Hermite quadrature. Biometrika, 81,624-9. Lindsey, J.K. (2000). Obtaining marginal estimates from conditional categorical repeated measurements models with missing data. Statistics in Medicine, 19, 801-9.
f 0
'. multJvanate '
Molenberghs, G., Michiels, B., Kenward M.G. and D' I P.1 ( . . h . , , Igg e, " 1997). IVhssmg data mec amsms and pattern-mixture models St t' 't· N I ' a ~s ~ca . eer and~ca 1 5 3 - 6. 1 , 52 , Monahan, J.F. and Stefanski, L.A. (1992) Normal scale nIl' t '. • . . x ure approXimatIOns to F (x) and computatIOn of the logistic-normal integral. In Handbook of the logzstzc dzstrzbutzon (ed. N. Balakrishnan), pp. 529-40. Ivlarcel Dekker, New York. Morton, R (1987). A generalized linear model with nested strat of extra-Poisson variation. Biometrika, 74, 247-57. Mosteller, F. and Tukey, J.W. (1977). Data analysis and regression: a second course in statistics. Addison-Wesley, Reading, Massachusetts. Moyeed, RA. and Diggle, P.J. (1994). Rates of convergence in semi-parametric modelling of longitudinal data. Australian Journal of Statistics, 36, 75-93. Milller, M.G. (1988). Nonparametric regression analysis of longitudinal data. Lecture Notes in Statistics, 41. Springer-Verlag, Berlin. Munoz, A., Carey, V., Schouten, J.P., Segal, M., and Rosner, B. (1992). A parametric family of correlation structures for the analysis of longitudinal data. Biometrics, 48, 733-42.
Lunn, D.J., Wakefield, J., and Racine-Poon, A. (2001). Cumulative logit models for ordinal data, a case study involving allergic rhinitis severity SCores. Statistics in Medicine, 20, 2261-85.
Murray, G.D. and Findlay, J.G. (1988). Correcting for the bias caused by dropouts in hypertension trials. Statistics in Medicine, 1, 941-46.
Mason, W.E. and Fienberg, S.E. (eds) (1985). Cohort analysis in social research: beyond the zdentification problem. Springer-Verlag, New York.
NeIder, J.A. and Mead, R (1965). A simplex method for function minimisation. Computational Journal, 7, 303-13.
McCullagh, P. (1980). Regression models for ordinal data (with discussion). Journal o{ the Royal Statzshcal Society, B, 42, 109-42.
Neuhaus, J.M., Hauck, W.W., and Kalbfleisch, J.D. (1992). The effects of mixture distribution misspecification when fitting mixed-effects logistic models. Biometrika, 79, 755-62.
McCUllagh, P. (1983). Quasi-likelihood functions. Annals of Statistics, 11, 59-67. . , McCUllagh, P. and NeIder J A (1989) G Hall, New York. ' . . . eneraltzed lznear models. Chapman and McCulloch, C.E. (1997). Maximum I'k I'h d . mixed models. Journal o{ th A ' I e I 00 ~lgonthms for generalized linear e me7'1.can Statzstzcal Association, 92, 162-70. Mead, R. and Curnow, R.N. (1983) S ' . '. experimental biology Chap d H . tatzshcal methods zn ag'7'1.culture and . man an all, London. Molenberghs, G., Kenward M G d L e~affre, E. (1997). The analysis of longitudinal ordinal data Wi~h . '£ ., a~ III ormatJvedvl dropout. Biometrika, 84, 33-44.
Neuhaus, J.M. and Jewell, N.P. (1990). Some comments on Rosner's multiple logistic model for clustered data. Biometrics, 46, 523-34. Neuhaus J M and Kalbfleisch J.D. (1998). Between- and within-cluster , . . , .' 638-45 covariate effects in the analysis of clustered data. BlOmetncB, 54, . Neuhaus J.M. Kalbfleisch J.D., and Hauck, W.W. (1991). ~ comparison " , . average d approaches for analyzmg correlated of cluster-specific and populatIOn binary data. International Statistical Review, 59, 25-36. . . Neyman J. (1923). On the applIcatIOn a f pro ba b'l't 11 y theory . ' to agricultural . ' . , experiments: essay on prinCiples, sectIOn 9 , t ransIa ted in Stat1,Stzcal Sczence, 1990, 5,65-80.
BIBLIOGRAPHY BIBLIOGRAPHY
362
. t based on partially E L (1948). Consistent estlma es NeYr,nan, .1. and ~~~~' Ec~~ometrica, 16, 1-32. consIstent observatl . samples with multiple end. PC (1984). Procedurcfi for comparIng O'Brien, . ' . pOI·nts. Biometrics, 40, 1079-87. . r)'ance functIon "n.stimation for non-normal . M.C. (1992). ParametrIc va . Palk't d me!L'lurement data. Biometncs, 48, 18-30. repea e f :I' doth -H (1997). Effect of con oune mg an er Palta, M., Lin, C-Y., and Chao, d'. I d ta In Modelling longztudinal and . .' d Is for longltu lIla a . . ' rnisspecIflcatlon In rno e l" t'ons and future directzons (Spnnger spatially correlated data: methods, app zca z_ ' Lecture Notes in Statistics, Volume 122), 77 87. . . 2000). A note on margmal Imear ( W L . T A and C annett , J .E. Pan, ." ~UhIB, 'I'~ d response data. American Statistician, 54, 191--5. regressIOn Wit corre a e . . 1985). Nested analysis of varIance wIth k K.H, ( Pantula S.G. and Po IIOC, autocor;elated errors. Biometrics, 41, 909-20.
w,.
Patterson, H.D. (1951). Change-over trials. Journal of the Royal Statistical Society, B, 13, 256-71. Patterson, H.D. and Thompson, R. (1971). Recovery of inter-block information when block sizes are unequal. Biometrika, 58, 545-54. Pawitan, Y. and Self, S. (1993). Modelling disease marker processes in AIDS. Journal of the American Statistical Association, 88, 719-26. Pearl, J. (2000). Causal inference in the health sciences: a conceptual introduction. Contributions to Health Services and Outcomes Research Methodology, Technical report R-282 , Department of Computer Science, University of California, Los Angeles. Pepe, M.S. and Anderson, G.A. (1994). A cautionary note on inference for marginal regression models with longitudinal data and general correlated response data. Communication in Statistics - Simulation, 23(4), 939-51. Pepe, M.S. and Couper, D. (1997). Modeling partly conditional means with longitudinal data. Journal of the American Statistical Association, 92, 991-8. Pepe, M.S., Heagerty, P.J., and Whitaker, R. (1999). Prediction using partly conditional time-varying coefficients regression models. Biometrics, 55, 944-50. Pierce, D.A. and Sands, B.R. (1975). Extra-Bernoulli variation in binary data. Technical Report 46, Department of Statistics, Oregon State University.
Pinh~iro,. J.C. and Bates, D.M. (1995). Approximations to the log-likelihood functIOn III the non-linear mixed-effects model. Journal of Computational and Graphical Statistics, 4, 12-35. Plewis, 1. (1985). Analysing change: measurement and explanation using longitudmal data. John Wiley, New York.
363
pocock, S.J., Geller, N.L., and Tsiatis, A.A. (1987). The analysI's of mult' 1 . cI"Imca1 t rIa. . 15 B'wmetrics, 43, 487-98. Ip e endpoints III poififion, S.D. (1837). Recherches sur la Probabilite des Jugements en Mat' · C" lere Criminelle :~ .en M atlere. IVlle, ~recede€'..s des Regles Generales du Calcul des ProbabIlItIes. .Bacheher, ImprImeur-Libraire pour les Mathematiques , Ia Physique, etc., P ans. pourahmadi, M. (1999). Joint mean-covariance models with application to longitudinal data: unconstrained parameterisation. Biometrika, 86, 677-90. Prentice, R.L. (1986). Binary regression using an extended beta-binomial distribution, with discussion of correlation induced by covariate measurement errors. Journal of the American Statistical Association, 81, 321-27. Prentice, R.L. (1988). Correlated binary regression with covariates specific to each binary observation. Biometrics, 44, 1033-48. Prentice, R.L. and Zhao, L.P. (1991). Estimating equations for parameters in means and covariances of multivariate discrete and continuous responses. Biometrics, 47, 825-39. Priestley, M.B. and Chao, M.T. (1972). Non-parametric function fitting. Journal of the Royal Statistical Society, B, 34, 384-92. Rao, C.R. (1965). The theory of least squares when the parameters are stochastic and its application to the analysis of growth curves. Biometrika, 52, 447-58. Rao, C.R. (1973). Linear statistical inference and its applications (2nd edn). John Wiley, New York. Ratkowsky, New York.
D.A. (1983).
Non-linear regression modelling. Marcel Dekker,
Rice, J.A. and Silverman, B.W. (1991). Estimating the mean and1cohvarRianc~ . Ily wh en th e d a t a are curves . Journal 0 t e oya structure nonparametnca Statistical Society, B, 53, 233-43. . ated me!L'lurement data. Ridout, M. (1991). Testing for random dropouts III repe Biometrics, 47, 1617-21. h to causal inference in mortality studies Robins, J.M. (1986). A new approac. . 0 control of the healthy worker with sustained exposure periods - applIcatIOn t . survivor effect. Mathematical Modelling, 7, 1393-512. ch to causal inference III morRobins, J.M. (1987). Addendum to 'A new .ar;rr~a lication to control of the tality studies with sustained exposure peno ~ M aft matics with Applications, healthy worker survivor effect.' Computers an a e 14, 923-45. . t' . uivalent trials. Stat'!S ICS Robins, J.M. (1998). Correction for non-compliance III eq in Medicine, 17, 269-302.
364
BIBLIOGRAPHY
. I t tural models versus structural nested models J M (1999). Margma fl ruc " b' Ro ms, .. ' . . I 0t t' tical models in epidemwlogy: the envzrontoolfl for causal mference. n.7 a UI ) 9~ 1"4 f as , , ' L (. 1 ME Halloran and D, Berry, pp. .j o J . MA ment and elmzcal trza~. u. ., Volumr; 116, Springer-Verlag New York. . .I.M., G' reen IdS causal effect Robmfl, an , ., an d Hu ,F. -C. (1999). Estimation of the I.. . ' (n the marginal mean of a repeated Jmary outcome of a tlme-varymg exposure J . ' ' ' . . '}I D'ISCUSSlon. . ) Jo umal o.rthe American WIt ~ . . , StatU/tIcal Assoczatwn, 94, 687712. (
' J M Rotnl'tzky" A. and , Zhao L.P. (1995). Analysis. . of semi parametric Rob Ins, ,. " regression models for repeated outcomes in the presence of mJssmg data. Journal oj the American Stati.~tical Association, 90, 106-21. Rosner, B. (1984). Multivariate methods in ophthalmology with application to other paired-data situations. Biometrics, 40, 1025-35. Rosner, B. (1989). Multivariate methods for clustered binary data with more than one level of nesting. Journal of the American Statistical Association, 84, 373-80. Rothman, K.J. and Greenland, S. (1998). Modern epidemiology. LippincottRaven. Rowell, J.G. and Walters, D.E. (1976). Analysing data with repeated observations on each experimental unit. Journal of Agricultural Science, 87, 423-32. Royall, R.M. (1986). Model robust inference using maximum likelihood estimators. International Statistical Review, 54, 221-26. Rubin, D.B. (1974). Estimating causal effects of treatment in randomized and non-randomized studies. Journal of Educational Psychology, 66, 688-701. Rubin, D.B. (1976). Inference and missing data. Biometrika, 63, 581-92.
~ubin, D.B. (1978). Bayesian inference for causal effects: the role of randomization. Annals of Statistics, 6, 34-58. . Samet J M D ' . . F (20 ' :., ~mmlcl,.., Curnero, F.C., Coursac, 1., and Zeger, S.L. N 001 Fme partIculate aIr pollution and mortality in 20 US cities 1987-1994 ew ngland Journal of Medicine, 343(24), 1798-9. ' . Sandland, R.L, and McGilchrist C A (1979) S . Biometrics, 35, 255-71. ,. . . toChastIC growth curve analysis. Schall, R. (1991). Estimation in ener r d r . Biometrika, 78, 719-27. g a Ize mear models WIth random effects.
~charfstein, D.O., Rotnitzky, A., and Robi . Ignorable dropout using semipar t . ns, J.M. (1999). Adjusting for nonnc non-respon d I (. . . J ournal of the American Stati t'arne IA " se mo e s WIth DIScussIon). s tca ssoczatzon, 94, 1096-1120 . Seber, G A F (1977) L' . . . ,mear regression analysis. John Wiley, New York. Self, S. and Mauritsen R (1988) P [' " . ower/sample' I I . Inear models. Biometrics, 44, 79-86. SIze ca eu atlons for generalized
BIBLIOGRAPHY 365
Senn, S..1. (1992). Crossover trials in clinical . research. John Wiley, Chichester A (1997) . ShclIler, L.B., Beal, S .L " and Dun ne,. . cellsored ordered categorical longitud' I d ' AnalYSIS of nonrandomly , J ma ata from an I . . ' . a geBlc tnals (with DiscusslOn). ournal of the American St t t' I a zs lca ASSoCtatlOn, 92, 1235~55, Shih, .1. (1998), Modeling multivaraite d' [ . . 1115 28. lscrete allure tIme data. Biometrics, 54, Silverman, B.W. (1984). Spline smoothin . th ' , , g. e eqUIvalent va . bl k I Annals of Statzstlcs, 12,898-916. . na e erne method. Silverman, B.W. (1985). Some aspects f th . I' . ' . 0 e sp me smoothmg h non-parametnc regreSSIOn curve fitting (w'th D' , . approac to ., , I . Iscusslon) fouT7 I ,( th R Statzstzcal Sonely, B, 47, 1-52. . . W OJ e oyal SkeIlam, J.G. (1948). A probability distribution d "d f " 'b' b' erne rom the bmomlal dis trI utIOn y regardmg the probability of SUccess . bl b ' , . a.<; vana e etween the sets of . I J trIa s. ournal of the Royal Statzst1cal Society, B, 10, 257-61.
Snedecor~ and Cochran, W.G. (1989). Statistical methods (8th edn). Iowa State UmversIty Press, Ames, Iowa.
G.":,,,.
t' Snell, E.J. (1964). A scaling procedure for ordered categorical data B' 40, 592-607. . lOme ncs,
S~lomon~ P.J. and Cox, D.R (1992). Nonlinear components of variance models. Bzomet'T"tka, 79, 1-11. Sommer, A. (1982). Nutritional blindness. Oxford University Press, New York.
~om~er, A., Katz, J., and Tarwotjo, 1. (1984). Increased risk of respiratory mfectlOn and diarrhea in children with pre-existing mild vitamin A deficiency. American Journal of Clinical Nutrition, 40, 1090-95. Spall, J. C. (1988). Bayesian analysis of time series and Dynamic models. Marcel Dekker, New York. Stanek, E.J. (1988). Choosing a pre-test-post test analysis. American Statistician, 42,178-83. Stefanski, L.A. and Carroll, R.J. (1985). Covariate measurement error in logistic regression. Annals of Statistics, 13, 1335-51. Stern, RD. and Coe, R (1984). A model fitting analysis of daily rainfall data. Journal of the Royal Statistical Society, A, 147, 1-34. Stiratelli, R., Laird, N., and Ware, .1.H. (1984). Random effects models for serial observations with binary responses. Biometrics, 40, 961-71. Stram, D.O., Wei, L.J., and Ware, J.H. (1988). Analysis of repeated ordered categorical outcomes with possibly missing observations and time-dependent covariates. Journal of the American Statistical Association, 83, 631-37.
BIBLIOGRAPHY
366
BIBLIOGRAPHY 367
'r taka w,. aRK. (2000). Random effects in geneTD Speckman, P.L., an d su . d lB' . odels (GLMMs). In Generalized Imear mo e 5, a ayeszan ahzed Imear mIxed m S Gh hand B. Mallick), pp. 23·-39, Marcel-Dekker, perspectwe (ed. D. Dey,. os, New York. ,I A Rand Tran L. (1999). A comparison of mixed ' . d I I TenHave T.R., Kunse man, . ., '. . . d I for binary response data wIth two neste eve s effects lOgIstIC regressIOn mo e of clustering. Statistics in Medicine, 18, 947-60. S
~n,.,'
TenHave, T.R. and Dttal, D.H. (1994). Subject-specific an~ pOPulatfiilon-avAeragl~dd ., pp ~e contmatlOn rat'10 Iog I't models for multiple discrete survIval pro es. Statistics, 43, 371-84.
Volberding, P.A., Lagakos S W K h M A tomatic human immunod~fi~ie~~y~~ ' ' · r · ~t al. (1990). Zidovudine in asymprus Inlectlon . The N ew E ngland Journal of Medicine, 322, 941-9. Waclawiw, M.A. and Liang, K-y. (1993) P d' . generalized linear model. Journal of the re .IctIon of random effects in the 171-78. mencan Stat~tlcal Association, 88,
A
Wakefield, J. (1996). The Bayesian I' f models. Journal of the American Stati ~nalYAsls 0 . population pharmacokinetic s ~ca ssoczatwn, 91, 62-75. Wang, Y. (1998). Smoothing spline mod Is 'th . e WI correlated random errors. J ourna l 0 f th e A meNcan Statistical Association, 93, 341-48.
Thall, P.F. and Vail, S.C. (1990). Some covariance models for longitudinal count data with overdispersion. Biometries, 46, 657-71.
Ware, J.H. (1985). Linear models for the analysis of I 't d' I . American Statistician, 39, 95-101. ongl u ma studIes. The
Thara, R, Henrietta, M., Joseph, A., Rajkumar, S., and Eaton, W. (1994). Ten year course of schizophrenia - the Madras Longitudinal study. Acta Psychiatrica Scandinavica, 90, 329-36.
Ware, ~.H., Lipsitz, S., and Speizer, F.E. (1988). Issues in the analysis of repeated categorIcal outcomes. Stat~st~cs m Medicine, 7, 95-107.
Tsay, R (1984). Regression models with time series errors. Journal of the American Statistical Association, 79, 118-24.
Ware, 'J.H., .Dockery, D., Louis ' T "A et at . (1990) . Longi't u d'maI and crosssectIOnal estImates of pulmonary function decline in never-smoking adults. Amencan Journal of Epidemiology, 32, 685-700.
Tsiatis, A.A., De Gruttola, V., and Wulfsohn, M.S. (1995). Modelling the relationship of survival to longitudinal data measured with error. Applications to survival and CD4 counts in patients with AIDS. Journal of the American Statistical Association, 90, 27-37.
Wedderburn, R.W.M. (1974). Quasi-likelihood functions, generalized linear models and the Gaussian method. Biometrika, 61, 439-47.
Tufte, E.R. (1983). The visual display of quantitative information. Graphics Press, Cheshire, Connecticut.
West, M., Harrison, P.J., and Migon, H.S. (1985). Dynamic generalized linear models and Bayesian forecasting (with Discussion). Journal of the American Statistical Association, 80, 73-97.
Tufte, E.R (1990). Connecticut.
Cheshire,
White, H. (1982). Maximum likelihood estimation of misspecified models. Econometrics, 50, 1-25.
Tukey, J.W. (1977). Exploratory data analysis. Addison-Wesley, Reading, Massachusetts.
Whittaker, J.e. (1990). Graphical models in applied multivariate statistics. John Wiley, New York.
Envisioning information.
Graphics Press,
Tunnicliff~Wi~son, G. (1989). On the use of marginal likelihood in time series model estimatIOn. Journal of the Royal Statistical Society, B, 51, 15-27.
Vel~eman, P.F. and Hoaglin, D.C. (1981). Applications, basics and computing of exp oratory data analysis. Duxbury Press, Boston, Massachus~tts.
?
V,Dertbekse, and Molenberghs, G. (2000). Linear Mixed Models for Longitudinal a a. prmger, New York. Verbyla, A.P. (1986). Conditioning in the growth curve mod I B' t'k 73 475-83. e. wme N a, , Verbyla, A.P. and Cullis, B.R. (1990) M d II' . . ments. Applied Statistics, 39, 341-56.' 0 e mg m repeated measures experIVerbyla, A.P. and Venables, W.N. ( ) model. Biometrika, 75, 129-38. 1988 . An extension of the growth curve
Williams, E.J. (1949). Experimental designs balanced for the estimation of residual effects of treatments. Australian Journal of Scientific Research, 2, 149-68. Williams, D.A. (1975). The analysis of binary responses from toxicological experiments involving reproduction and teratogenicity. Biometrics, 31, 949-52. Williams, D.A. (1982). Extra-binomial variation in logistic linear models. Applied Statistics, 31, 144-48. Winer, B.J. (1977). Statistical principles in experimental design (2nd edn). McGraw-Hill, New York. Wishart, J. (1938). Growth-rate determinations in nutrition studies with the bacon pig, and their analysis. Biometrika, 3D, 16-28. Wong, W.H. (1986). Theory of partial likelihood. Annals of Statistics, 14,88-123.
BIBLIOGRAPHY 368
'I K R (1989) Estimation and comparison of changes in the Wu, M.C. an. dr B al ey, . . censoring' . ..' t' e right condItIonal hnear model. B"wmetncs, 45, presence 0 f Illlorma IV .'
~omparison
939-55. Wu, M.C. and Carroll, R.J. (1988). Estimation and of changes in the presence of right censoring by modeling the censonng process. Bwmetncs, 44,
Index
175-88. Wulfsohn, M.S. and Tsiatis, A.A. (1997). A joint model for survival and longitudinal data measured with error. Biometrics, 53, 330-39. Xu, J. and Zeger, S.L. (2001). Joint analysis of longitudinal data comprising repeated measures a.nd times to events. Applied Statistics, 50, 375-87. Yu, 0., Sheppard, L., Lumley, T., Koenig, J.Q., and Shapiro, G. (2000). Effects of a.mbient carbon monoxide and atmospheric particles on asthma symptoms, results from the CAMP air pollution ancillary study. Environmental Health
Perspectives, 12, 1-10. Yule, G.D. (1927). On a method of investigating periodicities in disturbed series with special reference to Wolfer's sunspot numbers. Philosophical Transactions of the Royal Society of London, A, 226, 267-98. Zeger, 8.L. and Diggle, P.J. (1994). Semi-parametric models for longitudinal data with application to CD4 cell numbers in HIV seroconverters. Biometrics, 50,689-99. Zeger, S.L. and Karim, M.R. (1991). Generalized linear models with random effects: a Gibbs sampling approach. Journal of the American Statistical
Association, 86, 79-86.
Zege~, S.L. and Liang, K-Y. (1986). Longitudinal data analysis for discrete and contmuous outcomes. Biometrics, 42, 121-30.
~ger, 8:L. asnd Lia.ng, K~. (1991). Feedback models for discrete and continuous lme Benes. tahshca Stmea, 1, 51-64. Zeger, S.L. and Liang . longitudinal data St 't'Kt',-Y ." (1992) M d: A . n overVIew of methods for the analysis of . a t5 te5 m e tcme, 11, 1825-39. Zeger, S.L., Liang, K-Y. 'e and ) Models for longitudinal data: a generalized estimating u t'Albert ,P.8. (1 9~8. q a lon approach. Bwmetrics, 44, 1049-60. Zeger, S.L., Liang, K.-Y. and Self S G ( d ' " . 1985): The analysis of binary longitudinal data with time~ind pen ent covanates. Bwmetrika, 72, 31-8. Zeger, S.L. and Qaqish B. (1988) M k . quasi-likelihood appro~h B t'" ar ov regressIon models for time series: a . tome ncs, 44, 1019-31. Zhang, D., Lin, X., Raz J and So M tic mixed models for l~n~itud' 1 ~ers, .F. (199B). Semiparametric stochasAssociation, 93, 710-19, ma ata. Journal of the American Statistical Zhao, L.P. and Prentice R L (1990) generalized quadratic mod 1 'B: .. Correlated binary regression using a
e.
wmetnka, 77, 642-48.
• ~c page num bers. Note: Figures and Tables are indicated by ;ta!"
adaptive quadrature technique 212-13 examples of use 232, 238 age effect 1, 157 in example 157-9, 159 AIDS research 3, 330 see also CD4+ cell numbers data alternating logistic regressions (ALRs) 147 see also generalized estimating equations analysis of variance (ANOVA) methods 114-25 advantages 125 limitations 114 split-plot ANOVA 56, 123-5 example of use 124-5 time-by-time ANOVA 115-16,125 example of use 116, 118 limitations 115, 125 ante-dependence models 87-9, 115 approximate maximum likelihood methods 175,210 advantage 212 autocorrelated/autoregressive random effects model 210, 211 in example 239 autocorrelation function 46, 48 for CD4+ data 48, 49 for exponential model 57, 84 autoregressive models 56-7,87-8, 134 available case missing value restrictions 300 back-fitting algorithm 324 Babadur representation 144 bandwidth, curve-fitting 45
Bayesian methods for generalized linear mixed models 214-16 examples of use 232, 238 beta-binomial distribution 178-9 uses 179 bias 22-4 bibliography 349--68 binary data, simulation under generalized linear mixed models 210, 211 binary responses logistic regression models 175-84 conditional likelihood approach 175-8 with Gaussian random effects 180-4 random effects models 178-80 log-linear models 142-3 marginal models 143--6 examples 148--60 sample size calculations 30-1 boxcar window 43 boxplots 10, 12 BUGS software 216 calf intestinal parasites experiment data 111 time-by-time ANOVA applied 116, 118 canonical parameters, in log-linear models 143, 153 carry-over effect experimental assessment of 7, 151-3 ignored 149 categorical data generalized linear mixed models for 209-16 examples 231,232,237-40
INDEX
371
INDEX 370
con d I't'JOns.I generalized linear regression model 209 conditional likelihood advantages of approach. 1778 for generalized linear mIxed models 171-2 maximization in transition model 138,
categorical data (cont,) £ 208-44 Iikelihood-ba.Oled methods or , Il'zed margms. , models for 216-31 examples 231-3, 240·-3 4 ordered, transition models for 201·transition models for 194-204 examples 197-201 193 " for random intercept logistIC regressIOn categorical responses, aBsociation model 175-8 among 52-3 random intercept log-linear model for causal estimation methods 273-4 count data 184-6 causal models 271 see also maximum likelihood causal targets of inference 269-73 conditional maximum likelihood CD4+ cell numbers data 3-4 estimation correlation in 46-8 and depressive symptoms score 39-41 random effects model for count data 184-6 estimation of population mean generalized linear mixed curve 325-6 graphical representation 9, 35-9 model 171-2 for transition models 138, 192-3, 203 marginal analysis 18 parametric modelling 108-10 conditional means 13, 191, 209 full covariate conditional mean 253 prediction of individual trajectories 110, 112-13 likelihood-based estimates 292, 238 random effects models 18, 130 partly conditional mean 253 time-dependent covariates 247 conditional models 153, 190 variograms 50, 51, 326 conditional modes 174 cerebrovascular deficiency treatment conditional odds ratios 144, 146-7 trial 148 confirmatory analysis 33 conditional likelihood estimation 177 confounders data 148 meaning of term 265 marginal model used 148-50, 181 time-dependent 265-80 maximum likelihood estimation 180-1 connected-line graphs 35, 35, 36, 37 random effects model used 181 alternative plots 37-8 chi-squared distributions 342 continuous responses, sample size chi-squared test statistic, in cow-weight calculations 28-30 study 107 correlated errors c-index 264 general linear model with 55-9 clinica.l trials non-linear models with 327,328 dropouts in 19, 285 correlation as prospective studies 2 among repeated observations 28 see also epileptic seizure .. ,; in longitudinal data 46-52 schizophrenia clinical trial consequences of ignoring 19 cohort effects 1 correlation matrix 24-5, 46 complete case analysis 288 for CD4+ data 46, 48 complete case missing variable correlation models restrictions 300 exponential 56-7 complete data score functions 173 uniform 55-6 completely random dropouts count data testing for 288-90 examples 160-1 in examples 290-3 generalized estimating equations completely random missing values 283, for 162-3 284 log-linear transition models for 204-6
marginal model for 137, 160-5 over-dispersed 161, 186 parametric modelling for 160-2 random effects model for 137, 186-8 counted responses marginal model used 160-5 random effects model used 184-9 counterfactual outcomes 269 regression model for 276-7 covariance structure modelling of 81-113, 323, 324 reasons for 79-80 covariate endogeneity 245, 246 covariates 337 external 246 internal 246 lagged 259-65 stochastic 253-8 time-dependent 245-81 cow weight data 103-4 parametric modelling 104-8 crossover trials 148 examples 7,9, 10,148-53,176-7 further reading recommended 31-2, 168 GLMMs compared with marginalized models 231-3 marginal models applied 148-53 random effects models applied 176-7 relative efficiency of OLS estimation 63 time-dependent covariates in 247 see also cerebrovascular deficiency treatment trial; dysmenorrhoeal pain treatment trial cross-sectional analysis, example 257-8 cross-sectional association in data 251, 254 cross-sectional data, fitting of non-linear regression model to 327 cross-sectional models correlated error structures 327, 328 estimation issues for 254-5, 256 non-linear random effects 327, 329 cross-sectional studies bias in 22-4 compared with longitudinal studies 1, 16-17,22-31,41 ,159-60 cross-valida-tion 45 crowding effect 205 cubic smoothing spline 44 curve-fitting methods 41-5
data score functions 173 derived variables 17, 116-23 examples of use 119-23 design considerations 22-32 bias 22-4 efficiency 24-6 further reading recommended 31-2 sample size calculations 25-31 diagnostics, for models 98 Diggle-Kenward model 295 fitted to milk protein data 298-9 informative dropouts investigated by 298, 318 distributed lag models 260-1 in examples 262, 269 dropout process modelling of 295-8 in examples 298-9, 301 graphical representation of various models 303-5 pattern mixture models 299-301,304 random effects models 301-3, 304, 305 selection models 295-8, 304-5 dropouts 284-7 in clinical trials 13, 285 completely random, testing for 288-90 divergence of fitted and observed means when ignored 911,316 in milk protein study 285 random 285 reasons for 12, 285, 299 ways of dealing with 287-8 complete case analysis 288 last_observation-earried-forward method 287-8 dysmenorrhoeal pain treatment trial 7, 9 data 10, 151 , ' GLMMs compared with margmallzed models 231-3 marginal model used 150-3 random intercept model fitted 177 efficiency, longitudinal co~pared with cross-sectional studIes 24-6 EM algorithm 173-4, 284, 332 " I Bayes estimates 112 emplrlC3 endogeneity, in example 268-9 endogenous variables 246',253 epileptic seizure clinical trIal 10 boxplots of data 12
372
INDEX
INDEX 373
epileptic seizure clinical trial (cont,) data 11 summary statistics 163 marginal model 163-.5 Poisson model used 161-2 random effects model 185-6, 188, 189 estimation stage of model-fitting process 95-7 event history data 330 exogeneity 245, 246-7 exogenous variables 246 explanatory variables 337 exploratory data analysis (EDA) 33-53, 19B-9,32B exponential correlation model 56-7, 84, B9 compared with Gaussian correlation model B7 efficiency of OL9 estimator in 61-2 variograms for 85 external covariates 246 extra-binomial variation 178
feedback, covariate-response 253, 258, 266-B first-order autoregressive models 56-7 87-8 ' first-order marginalized transition modeI/MTM(l) 226-7 230 241 in example 241, 242 ' , Fisher's information matrix 340 in example 341 fixed quadrature technique 212 examples of use 232 formulation of parametric model 94-5 F-statisti~ 115, 120, 122-3, 123-4, 125 full covanate conditional mean 253 full covariate conditional mean (FCCM) assumption 255 f~rther reading recommended 258 sImulation illustration 256-7 Gauss ~ Herml't e quadrature 212 213 GaussIan adapt' , , lve quadrature 212-13 GaussIan assumptions further reading recommended 189 general linear model 55 maximum likelihood estimation ,under 64-5, 180, 181 GaussIan correlation model 86
compared with exponential correlation model 87 variograms for 86 Gaussian kernel 43, 320, :321 Gaussian linear model, marginalized models using 218-20 Gaussian random effects logistic models with 180-4 Poisson regression with 188 Gauss-Markov Theorem 334, 339 g-computation, estimation of causal effects by 273-4 advantages 276 in example 275-6 generalized estimating equations (GEEs) 138-40, 146-7 advantages 293-4 for count data 162-3 example 163-5 and dropouts 293-5 further reading recommended 167, 168 for logistic regression models 146-7 203,240, 241 ' in examples 149, 150, 154 for random missingness mechanism 293-5 and stochastic covariates 257, 258, 258 and time-dependent covariates 249-50, 251 generalized linear mixed models (GLMMs) 209-16 Bayesian methods 214-16 and conditional likelihood 171-2 and dropouts 317 examples of use 231, 232, 237-40 maximum likelihood estimation for 172-5,212-14 generalized linear models (GLMs) 343-6 contrasting approaches 131-7 extensions 126-40 marginal models 126-8, 141-68 random effects models 128-30 169-89 ' tra?sition models 130--1, 190--207 genenc features 345--6 inferences 137-40 gen~rallinear model (GLM) 54-80 WIth correlated errors 55-9 ex~onential correlation model 56-7 umform correlation model 55-6 geostatistics 49 Gibbs sampling 174, 180,214
Granger non-causality 246 graphical representation of longitudinal data 6-7, 12,34-41 further reading recommended 53 guidelines 33 growth curve model 92 Hammersly-Clifford Theorem 21 hat notation 60 hierarchical random effects model 334, 336 holly leaves, effect of pH on drying rate 120-3 human immune deficiency virus (HIV) 3 see also CD4+ cell numbers data
es~imation of causal effects using 277-9
III example 279-80 iterative proportional fitting 221
joint modelling, of longitudinal measurements and recurrent events 329-32 joint probability density functions 88-9 kernel estimation 42--3 compared with other curve-fitting techniques 42, 45 in examples 42,43, 325 kernel function 320 Kolmogorov-Smirnov statistic 290, 291
ignorable missing values 284 independence estimating equations (lEEs) 257 lag 46 individual trajectories lagged covariates 259--65 prediction of 110-13 example 261-5 example using CD4+ data 112-13 multiple lagged covariates 260-1 Indonesian Children's Health Study single lagged covariate 259--60 (ICHS) 4, 5 last observation carried forward 287-8 marginal model used 17-1B, 127, 132, latent variable models 129 135-6, 141, 156-60 marginalized models 222-5 random effects model used 18 129 least-squares estimation 338-9 130, 132-3, 182--4 " further reading recommended 339 time-dependent covariates 247 optimality property 33B-9 transition model used 18, 130--1, 133, least-squares estimator 33B 197-201 bias in 23--4 inference(s) variance of 63 about generalized linear models 137-40 weighted 59--64, 70 about model parameters 94, 97-8 likelihood-based methods informative dropout mechanisms 295,316 for categorical data 208-44 in example 313-16 for generalized linear mixed models 209-16 representation by pattern mixture for marginalized models 216-31 models 299-301 informative dropouts for non-linear models 328 likelihood functions 138, 171, 173, 340 consequences 318 likelihood inference 340--3 investigation of 298, 318, 330 examples 341-3 informative missing values 80, 283 intercept likelihood ratio testing 98, 342 likelihood ratio test statistic 342 of linear regression model 337 linear links 191 random intercept models 90-1, 170, linear models 337-8 210,211 and least-squares estimation intermediate variable, meaning of method 338-9 term 265 marginal modelling approach 132 intermittent missing values 284, 2B7, 318 random effects model 132-3 internal covariates 246 transition model 133-4 inverse probability of treatment weights linear regression model 337 (IPTW)
374 link functions 191-2, 345 logistic regression models 343, 344 and dropouts 292 generalized estimating equations for 146-7 example 251 and lagged covariates 261-5 marginal-modelling approach 127, 135-6, 146-7 and Markov chain 191 random effects modelling approach 134-5, 175-80 examples 176-7,180-4 logit links 191 log likelihood ratio (test) statistic 98, 309 log-linear models 142-3, 344 canonical parameters in 143, 153 marginalized models 220-1 marginal-modelling approach 137, 143-6, 162, 164-5 random effects modelling approach 137 log-linear transition models, for count data 204-6 log-links 191 log odds ratios 52, 129, 147, 341 in examples 200, 235, 236 standard error (in example) 148-9 longitudinal data association among categorical responses 52-3 collection of 1-2 correlation structure 46-52 consequences of ignoring 19 curve smoothing for 41-5 defining feature 2 example data sets 3-15 calf intestinal parasites experiment 117 CD4+ cell numbers data 3-4 cow weight data 108 dys~enorrhoeal pain treatment tflal 7, 9, 10 epileptic seizure clinical trial 10 11, 12 ' Indonesian children's health stud 4,5 y milk protein data 5-7 8 9 pig. weight data 34 ' , s~hlzophrenia clinical trial 10-13 14 SItka spruce growth data 4-5 6' gener~1 linear models for 54-80' ,7 graphIcal representation 6-7 1" 34 , >G, -41
INDEX further reading recommended 53 guidelines .'33 missing values in 282-318 longitudinal data analysis approaches 17-20 marginal analysis 17-18 random effects model 18 transition model 18 two-stage/derived variable analysis 17 classification of problems 20 confirmatory analysis 33 exploratory data analysis 33-53 longitudinal studies 1-3 advantages 1,16-17,22,245 compared with cross-sectional studies 1, 16-17,22-31 efficiency 24-6 lorelogram 34, 52-3 further reading recommended 53 lowess smoothing 41,44 compared with other curve-fitting methods 42, 45 examples 3, 36, 4 a
Madras Longitudinal Schizophrenia Study 234-7 analysis using marginalized models 240-3 marginal analysis 18 marginal generalized linear regression model 209 marginalized latent variable models 222-5, 232 maximum likelihood estimation for 225 marginalized log-linear models 220-1 233 marginalized models ' for categorical data 216-31 examples of use 231-3,240-3 example using Gaussian linear model 218-20 marginalized random effects models 223, 225 222, marginalized transition models 225-31 advantages 230-1 in examples 233,241-3 fir~t-order/MTM(l) 226-7, 230, 241 III example 241, 242 se~ond-order/MTM(2) 228 III example 242
INDEX 375 marginal mean response 17 marginal means definition 209 likel~hood-based estimates 232, 242 log-lInear model for 143-6 marginal models 17-18, 126-8, 141-68 advantages of direct approach 216-17 assumptions 126-7 examples of use 17-18,127,132, 135-6, 148-60 further reading recommended 167-8 and likelihood 138 marginal odds ratios 145, 147 marginal quasi-likelihood (MQL) methods 232 marginal structural models (MSMs) 276 advantage(s) 280 estimation using IPTW 277-9 in example 279-80 Markov Chain Monte Carlo (MCMC) methods 214-16, 332 in examples 232,238 Markov chains 131, 190 Markov models 87, 190-206 further reading recommended 206-7 see also transition models Markov-Poisson time series model 204-5 realization of 206 maximum likelihood algorithms 212 maximum likelihood estimation 64-5 Compared with REML estimation 69, 95 for generalized linear mixed models 212-14 in parametric modelling 98 for random effects models 137-8, 172-5 restricted 66-9 for transition models 138, 192-3 see also conditional likelihood; generalized estimating equations maximum likelihood estimator 60, 64, 340 variance 60 MCEM method see Monte Carlo Expectation-Maximization method MCMC methods see Markov Chain Monte Carlo methods MCNR method see Monte Carlo Newton-Raphson method mean response non-parametric modelling of 319-26 parametric modelling of 105-7
mean response profile(s) for calf intestinal parasites experiment 118 for cow weight data 106 defined in ANOVA 114 for mil~ protein data 99, 100, 102,302 for schIzophrenia trial data 14 307 309,311,315 " measurement error and random effects 91-3 and serial correlation 89-90 and random intercept 90-1 as Source of random variation 83 measurement variation 28 micro/macro data-representation strategy 37 milk protein data 5-7 8 9 dropouts in 290-1 ' , reasons for 285 testing for completely random dropouts 291-3 mean response profiles 99, 100 102 302
'
,
parametric model fitted 99-103 pattern mixture analysis of 301, 302 variogram 50, 52, 99 missing value mechanisms classification of 283-4 completely random 283, 284 random 283, 284 missing values 282-318 effects 282 ignorable 284 informative 80, 283 intermittent 284, 287, 318 and parametric modelling 80 model-based variance 347 model-fitting 93-8 diagnostic stage 98 estimation stage 95-7 formulation stage 94-5 inference stage 97-8 moments of response 138 Monte Carlo Expectation-Maximization (MCEM) method 214 Monte Carlo maximum likelihood algorithms 214 Monte Carlo Newton-Raphson (MCNR) method 214 Monte Carlo test(s), for completely random dropouts 290, 291
376 Mothers' Stress and Children's Morhidity (MSCM) Study 247-53 cross-sectional analysis 257-8 and endogeneity 268-9 g_computation 275-6 and lagged covariates 261-5 marginal structural models using IPTW 279-80 sample of data 252 Multicenter AIDS Cohort Study (MACS) 3 CESD (depressive symptoms) scores 39-40, 41 objective(s) 3-4 8ee also CD4+ cell numbers data multiple lagged covariates 260-1 multivariate Gaussian theory 339-40 multivariate longitudinal data 332-6 examples 332
natural parameter 345 negative-binomial distribution 161, 186-7 NeIder-Mead simplex algorithm 340 nested sub-models 342 Newton-Raphson iteration 340 non-linear random effects, in crosB-sectional models 327, 329 non-linear regression model 326-7 fitting to crosB-sectional data 327 non-linear regression modelling 326-9 non-parametric curve-fitting techniques 41-5 see also kernel estimation; lowess; smoothing spline non-parametric modelling of mean response 319--26 notation 15-16 causal models 271 conditional generalized linear model 209 dropout models 295 mar~inal generalized linear model 209 mIDClmum likelihood estimator 60 multivariate Gaussian distribution 339-40 non-linear regression model 326-7 parametric models 83-4 time-dependent covariates 245 no- unmeasured-confounders assumption 27Q-.1 273 numerical integration m~thods 212-14
INDEX odds ratio, in marginal model 127, 128 ordered categorical data 201-4 proportional odd modelling of 201-3 ordering statistic, data representation using 38 ordinary least squares (OLS) estimation and ignoring correlation in data 19 naive use 63 errors arising 63-4 in nonlinear regression modelling 119 relative efficiency in crossover example 63 in exponential correlation model 61-2 in linear regression example 62 in uniform correlation model 60-1 in robust estimation of standard errors 70, 75, 76 and sample variogram 50, 52 outliers, and curve fitting 44-5 over-dispersed count data, models for 161, 186-7 over-dispersion 162, 178, 346 ozone pollution effect on tree growth 4-5 see also Sitka spruce growth data
panel studies 2 parametric modelling 81-113 for count data 16~2 example applications 99-110 CD4+ data 108-10 cow weight data 103-8 milk protein data 99-103 fitting model to data 93-8 further reading recommended 113 notation 83-4 pure serial correlation model 84-9 random effects + measurement error model 91-3 random intercept + serial correlation + measurement error model 90-1 serial correlation + measurement error model 89-90 and sources of random variation 82-3 partly conditional mean 253 partly conditional models 259-60 pattern mixture dropout models 299-301 graphical representation 908 304 Pearson chi-squared statistic 186 Pearson's chi-squared test statistic 343
INDEX penalized Quasi-likelihood (PQL) methods 175, 210, 232 example of use 282 period 1 pig weight data 34 graphical representation 34-5, 35, 36 robust estimation of standard errors for 76-9 point process data 330 Poisson distribution 161, 186, 344 Poisson-gamma distribution 347 Poisson-Gaussian random effects models 188-9 Poisson regression models 344 population growth 205 Positive And Negative Syndrome Scale (PANSS) measure 11, 153, 330, 332 subset of placebo data 305 treatment effects 334, 335 potential outcomes 269-70 power of statistical test 28 predictive squared error (PSE) 45 predictors 337 principal components analysis, in data representation 38 probability density functions 88-9 proportional odds model 201-2 application to Markov chain 202-3 prospective collection of longitudinal data 1,2
Quadratic form (test) statistic 97, 309 quadrature methods 212-14 limitations 214 quasi-likelihood methods 232, 346-8 in example 347-8 see also marginal quasi-likelihood (MQL) methods; penalized quasi-likelihood (PQL) methods quasi-score function 346 random dropout mechanism 285 random effects + measurement error models 91-3 random effects dropout models 301-3 in example 312-14 graphical representation 909, 304, 305 random effects models 18, 82, 128-30, 169-89 assumptions 17Q-.1
377
basic premise 129, 169 examples of use 18, 129, 130, 132-3 fitting using maximum likelihood method 137-8 further reading recommended 189 hierarchical 334, 336 marginalized 222, 223, 225 multi-level 93 and two-stage least-squares estimation 57-9 random intercept models 90-1, 170, 210, 211 in example 239 random intercept + random slope (random line) models 210, 211, 238 in example 238-9 random intercept + serial correlation + measurement error model 9Q-.1 random missingness mechanism 283 generalized estimating equations under 293-5 random missing values 283, 284 random variation separation from systematic variation 217,218 sources 82-3 two-level models 93 reading ability/age example 1,2, 16 recurrent event data 330 recurrent events, joint modelling with longitudinal measurements 329-32 regression analysis 337 regression models notation 15 see also linear ... j non-linear regression model relative growth rate (RGR) 92 repeated measures ANOYA 123-5 see also split-plot ANOYA approach repeated observations correlation among 28 number per person 28 respiratory disease/infection, in Indonesian children 4, 131-6, 156-60, 182-4 restricted maximum likelihood (REML) estimation 66-9 compared with maximum likelihood estimation 69, 95 in parametric modelling 96, 99, 100
INDEX
INDEX
379
378 . t d maximum likelihood (REML) rest nc e estimation (co nt.) in robust estimation of standard errors 70-1,73-4, 79 . . retrospective collection of longltudmal data 1-2 RIce-Silverman prescription 321,322 robust estimation of standard errors 70-80 examples 73-9 robust variance 194, 347 roughness penalty 44 sample size calculations 25-31 binary responses 30-1 continuous responses 28-30 for marginal models 165-7 parameters required 25-8 sample variogram(s) 49 e'lCamples 51, 90, lOS, 105, 101 SAS softwll.l'e ~80, 214 saturated models 50, 65 graphical representation 909 limitations 65 robust estimation of standard errors in 70-1,73 scatterplots 33, 40 and correlation structure 46, 41 examples 96,98-49,4 5 ,41 schizophrenia clinical trial 10-13 dropouts in 12,13,306-16 marginal model used 153-6 mean response profiles 14, 901, 311, 315 multivariate data 332, 334, 935 PANSS measure 11,153,330,332 subset of placebo data 805 treatment effects 334, 335 random effects model used 181-2 variograms 308, 314 schizophrenia study (Madras Longitudinal Study) 234-7 analysis of data 237-43 score equations 173--4, 340 score function 340 score test statistic(s) 241, 242, 342 second-order marginalized transition modeljMTM(2) 228 in example 2413 selection dropout models 295-8 in example 312-16 graphical representation 303, 304-5
semi-parametric modelli~g 324 . sensitivity analysis, and mformatlVe dropout models 316 serial correlation 82 plus measurement error 89-90 and random intercept 90-1 pure 84-9 as source of random variation 82 Simulated Maximum Likelihood (SML) method 214 single lagged covariate 259-60 Sitka spruce growth data 4-5, 6, 7 derived variables used 119-20 robust estimation of standard errors for 73-6, 77 split-plot ANOVA applied 124-5 size-dependent branching process 204-5 smallest meaningful difference 27 smoothing spline 44, 320 compared with other curve-fitting techniques 42 smoothing techniques 33-4,41-5, 319 further reading recommended 41,53 spline 44 see also smoothing spline split-plot ANOVA approach 56, 123-5 example of use 124-5 split-plot model 92, 123, 124 stabilized weights 277 in example 278 standard errors robust estimation of 70-80 examples 73-9 standardized residuals, in graphical representation 35, 96 STATA software 214 stochastic covariates 253-8 strong exogeneity 246-7 structural nested models, further reading recommended 281 survival analysis 2 systematic variation, separation from random variation 217,218
time-by-time ANOVA 115-16,125 example of use 116, 118 time-dependent confounders 265-80 time-dependent covariates 245-81 time series analysis 2 time-by-time ANOVA, limitations 115-16 tracking 35
trajectories see individual trajectories transition matrix 194, 195 transition models 18, 130-1, 190-207 for categorical data 194-204 examples 197-201 for count data 204-6 examples of use 18, 130-1, 133, 197-201 fitting of 138, 192-4 marginalized 225-31 for ordered categorical data 201-4 see also Markov models transition ordinal regression model 203 tree growth data see Sitka spruce growth data Tufte's micro/macro datarrepresentation strategy 37 two-level models of random variation 93 two-stage analysis 17 two-stage least-squares estimation 57-9 type I error rate 26-7
unbalanced data 282 uniform correlation model 55-6, 285
variance functions 345 variograms 34, 48-50
autocorrelation function estimated from 50 in examples 51,52,308, 31 4,326 for exponential correlation model 85 further reading recommended 53 for Gaussian correlation model 86 for parametric models 102, 105, 107 for random intercepts + serial correlation + measurement error model 91 for serial correlation models 84-7 for stochastic process 48, 82 see also sample variogram vitamin A deficiency causes and effects 4, 197 see also Indonesian Children's Health Study Wald statistic 233, 241 weighted average 320 weighted least-squares estimation 59-64 working variance matrix 70 choice not critical 76 in examples 76, 78 xerophthalmia 4, 197 see also Indonesian Children's Health Study