Analysis of Longitudinal Data, Second Edition

If" •r '" r ~ r .,. f e: ..- ~ :.:. ,. m .I;J ~ rtl a ...---4 C\j ;:j, 4--l Q 0 , !"""'4 '"0. en • !"""...

Author: Peter Diggle | Patrick Heagerty | Kung-Yee Liang | Scott Zeger

962 downloads 3380 Views 15MB Size Report

This content was uploaded by our users and we assume good faith they have the permission to share this book. If you own the copyright to this book and it is wrongfully on our website, we offer a simple DMCA procedure to remove your content from our site. Start by pressing the button below!

Report copyright / DMCA form

DOWNLOAD PDF

If"

•r '" r

~

r

.,. f

e:

..-

~

:.:.

,.

m .I;J ~

rtl

a ...---4

C\j

;:j,

4--l Q 0 , !"""'4 '"0.

en

• !"""'4

fJ)

bIJ ~~

0

:~

-4--" ~ , ,,....( ....-.l

{ij

r

!-l

«

r'

'-'

::,;

C

,...... ~

'-,

C

r'

'.J

~

'J:

fJJ

~

t.L.

>.,

......

•

F"

~J

t:./J

·

;J...,

~(,

.....

.."

F'

-

,-,

-.

::J :J

>=-

I

~.[

~

.......

r-

:.J

•

-

:::( CJ

N

• •

.......

t------'

...... 0 :.;, r..r.

.",

,......

·..... 'U ...... -''" --. .....-,

.-

~ '.."J

~

t~

·'"

,,-.,

c.c .......

:::r::

•

,----.1

-

;,....., Q.J

.......

C".i

;::L,

SCIENCE

OXFORD

SERIES

STATISTICAL -----

. nd regression ., -mattOns , a . PI t transJOI uv fsttCS A. C. Atkinson: . °t:~free multivariab~e s~at: analysis: a user's 1. ..6 Stone: Coordtna . . les of multwarta 2. iV', ski' prtnctp I 3. W. J. Krz anow . . d J Hinde: Statistica perspecttve d on B. francIS, an . . k' D An ers , 4. M. Alt Ill,. . LIM . . I' t oduction modelling tn G T series: a biostattstu:.a tn.tca r 1system approach Peter J. Diggle: tm,e r time series: a dynam 5. 1111 go Non- lmea 6. Howe on. b . Estimating junctions . d related models 7 V. P. Godam e. N Donev: Opttmum an I . A C. Atkinson and A.· Q u;ng and related mode s 8.· 1 V Basawa: ue. Bhat and . . t d """easurements 9. U.N' M d I f r repea e ... 10 J. K. Lindsey: 0 e s 0 efficient models . . N T Longford: Random co 'on and calibratwn 11. .' t regress z , I . f p. J. Brown: Measuremen, . and Scott L. Zeger: Ana ystS 0 12. Peter J. Diggle, Kung-Yee LIang, 13. longitudinal data ., p t 'cal methods for reliability data J. 1. Ansell and M. J. Phllhps: rae t

Analysis of Longitudinal Data SECOND ED1TION PETER J. D1GGLE Director, Medical Statistics Unit Lancaster University

PATRICK J. HEAGERTY Biostatistics Department University of Washington

14.

~~~:~~ndsey:

Modelling frequency a~d count data 15. J L. Jensen: Saddlepoint approxtmatwns 16. . G h' l dels 17. Steffen L. Lauritzen: rap t~~ .~~ l' d smoothing methods for data 18. A. W. Bowman and A. Azza lUI. pp te analysis d dit" 9 J K Lindsey: Models for repeated measurements Secon e. ; t ~o: Michael Evans and Tim Swartz: Approximating integrals vw on e Carlo and deterministic methods . 21. D. F. Andrews and J. E. Stafford: Symbolic computatwn for statistical inference 22. T. A. Severini: Likelihood methods in statistics 23. W. J. Krzanowski: Principles of multivariate analysis: a user's perspective Updated edition 24. J. Durbin and S. J. Koopman: Time series analysis by state space models 25. Peter J. Diggle, Patrick J. Heagerty, Kung-Yee Liang, Scott L. Zeger: Analysis of Longitudinal Data Second edition 26. J. K. Lindsey: Nonlinear models in medical statistics 27. Peter J. Green, Nils L. Hjort, and Sylvia Richardson: Highly structured stochastic systems 28. Margaret S. Pepe: Statistical evaluation of medical tests

KUNG-YEE LIANG and

SCOTT L. ZEGER School of Hygiene fj Public Health Johns Hopkins University, Maryland

OXFORD UNIVERSITY PRESS

OXFORD \1NIVllJl8ITY PRESS

Oxford OX2 6DP Great Clarendon Street, f h Unl'versity of Oxford, , d rlmen! 0 ! e I h' Press IS ~ epa f ~"JJence in research, scho ars lp. Oxford University " h l S objectIve 0 ex.... 'd ' It furthers the Umversh,.. by blishing WOr/dWI em and educatIon pu Oxford New York , aI m Hong Kong KarachI r Auckland Cape Town'dDaM elsbS ra:e Mexico City Nairobi L ur Madn e ou . KuaIa ump NewDeIh'I To' Jalp el' Toronto ShanghaI With offices in , h'l Czech Republic France Greece Argentina Austria BraZI: ~ I e South Korea Poland Portugal Guatemala Hun~ary ItadyThap;n d Turkey Ukraine Vietnam Singapore SWItzer/an a an Published in the United States by Oxford University Press Inc., New York @

Peter J, Diggle, Patrick J, Heagerty, Kung-Yee Liang, Scott 1. Zeger, 2002 The moral rights of the author have been asserted Database right Oxford University Press (maker) First edition 1994 Second edition 2002 Reprinted 2003, 2004

All rights reserved. No part of this publication may be reproduced, stored in a retrieval system, or transmitted, in any form or by any means, without the prior permission in writing of Oxford University Press, or as expressly permitted by law, or under terms agreed with the appropriate reprographics rights organization. Enquiries concerning reproduction outside the scope of the above should be sent to the Rights Department, Oxford University Press, at the address above. You must not circulate this book in any other binding or cover and you must impose this same condition on any acquirer. A catalogue record for this book is available from the British Library

Library of Congress Cataloging in Publication Data (Data available) ISBN 0 19 852484 6 10 9 8 7 6 5 4

Printed in Great Britain , on acid-free paper by BiddIes Ltd" King's Lynn, Norfolk

To Mandy, Claudia, Yung-Kuang, Joanne Jono, Hannah, Am,elia, Margaret, Chao-Ka~g, Chao-Wei, Max, and David

Preface

This book describes statistical models and method f th I' I'd" s or e ana YSIS of ongitu mal d.ata, with a strong emphasis on applications in the biological and health SCIences. The technical level of the book is roughly that of a ~rst year postgraduate course in statistics. However, we have tried to write m such a way that readers with a lower level of technical knowled e b t experience of dealing with longitudinal data from an applied point gf'. u 'II b bl . 0 VIew, WI e a ,e t? appreCIate and evaluate the main ideas. Also, we hope that readers With mterests across a wide spectrum of application areas will find the ideas relevant and interesting. In classical univariate statistics, a basic assumption is that each of a number of subjects, or experimental units, gives rise to a single measurement on some relevant variable, termed the response. In multivariate statistics, the single measurement on each subject is replaced by a vector of measurements. For example, in a univariate medical study we might measure the blood pressure of each subject, whereas in a multivariate study we might measure blood pressure, heart-rate, temperature, and so on. In longitudinal studies, each subject again gives rise to a vector of measurements, but these now represent the same physical quantity measured at a sequence of observation times. Thus, for example, we might measure a subject's blood pressure on each of five successive days. Longitudinal data therefore combine elements of multivariate and time series data. However, they differ from classical multivariate data in that the time series aspect of the data typically imparts a much more highly structured pattern of interdependence among measurements than for standard multivariate data sets; and they differ from classical time series data in consisting of a large number of short series, one from each subject, rather than a single, long series. . The book is organized as follows. The first three chapters prOVIde an introduction to the subject, and cover basic issues of design and explorat?ry analysis. Chapters 4, 5, and 6 develop linear models a~d ass.ociated s.tatIstical methods for data sets in which the response variable is a contmuous

PREFACE "Hi

melL~urement. Chapters 7, 8, 9, 10, and 11 are concerned with generalized

linear models for discrete response variables, Chapter 12 discusses the issues which arise when a variable which we wish to use as an explanatory variable in a longitudinal regression model is, in fact, a stochastic process which may interact with the response process in complex ways. Chapter 13 considers how to deal with missing values in longitudinal studies, with a focus on attrition or dropout, that is the premature permination of the intended sequences of measurements on some subjects. Chapter 14 gives a brief account of a number of additional topics, Appendix A is a short review of the statistical background assumed in the main body of the book. We have chosen not to discuss software explicitly in the book. Many commercially a.vailable packages, for example Splus, MLn, SAS, Mplus or GENSTAT, mclude some facilities for longitudinal data analysis, However, non~ of the currently available packages contains enough facilities to cope "":lth the full range of longitudinal data analysis problems which we cover III the book. For our own analyses, we have used the S s stem (Becker et al., 1988; Chambers and Hastie 1992) with dd't' lY defined functions for longitudinal data an a IlOna llserH system which is a publically a 'I bl ~ YSIS and" more recently, the Splus (see www.r-project,org), val a e so tware enVIronment not unlike

i '

We have also made a number of mar ' In particular, the chapter on " e substantIal changes to the text I h ' mlssmg values is b ' engt of Its counterpart in the first ed' , now a out three times the chapters which reflect recent method lltl?n, and we have added three new Most of the d t 0 oglcal developments d' a a sets used in th ' and can be down-loaded from eth are in the public ',www,maths.lancs,ac r' erst author's web-s't J

h~:p~I;I'

weJ;;;:'\http, / / f aoul ty . k

wa;:'g~~ggl~ t

I e oak remains incomplete '

~he

U

bO~k

from Ihe second

heagerty/.

inn~:re~:r~n~e~f::::~ce,O~IOngit~~~al~a~:~:o~~~sit

anlh~r;;

reflects our own econometrics ,tIstIclans. We are aware as they have arisen included III the social sciences more of other relevant work in not attempted te this related work but whilst we have Many fro dO cover It m detail e second edition we h len s and call ' ' ave Hubbard typed m h eagues have helped US . duction, Larry Ma;~er of th~ book, Mary Jo WIth thi~ project. Patty John Hanfelt St' Ii ' Damel Tsou B M Y Argo faCIlitated l't H ' Ir ng H'lt ' ev ellen H ' S proMackey, Jon Larry Moulton, Beth Melton, P mg, preparation f d' ' and Thomas Lu 1 nge, Joanne Katz ac knowledge support from a a lagra ey g ' M ms, and readingmth d a ev asSIstance with' erck Development G e raft. We gratefully rant to Johns Hopkins

som~ ~: erence~ t~

co:~~,

Wa~e~:id

i~et~erallY,

Ni~k~:son,

.

PREFACE

Ulliversity. In thOIS second i" . I' t h e fi. e( ItJon ix g rap h Ical. errors, n . ' We h av· explanatIOns We th rst edition and I e (Urrected a 11l I faults of bot'h k' ank those read!:'rs f lave tried tu ('} .Ifnl )('r of typumds I ' 0 H fi an y s and obscurities. ' an( accept r('sp()\lSI"II~. rst f edition wi'10 .JlolIlted uI~le of our JIll tv . ()r
Lancaster Seattle Baltimore November 2001

P. ,J. D. P. .T. H. K. Y. L. S. L. Z.

Contents

1 Introduction 1.1 Longitudinal studies 1.2 Examples 1.3 Notation 1.4 Merits of longitudinal studies 1.5 Appro~ch~s to longitudinal data analysis 1.6 OrgamzatlOn of subsequent chapters

1 1

3

15 16 17 20

2 Design considerations 2.1 Introduction 2.2 Bias 2.3 Efficiency 2.4 Sample size calculations 2.4.1 Continuous responses 2.4.2 Binary responses 2.5 Further reading

22 22 22 24

26 28

30

31

3 Exploring longitudinal data 3.1 Introduction 3.2 Graphical presentation of longitudinal data 3.3 Fitting smooth curves to longitudinal data 3.4 Exploring correlation structure 3.5 Exploring association amongst categorical responses 3.6 Further reading

33 33

4

54

General linear models for longitudinal data 4.1 Motivation 4.2 The general linear model with correlated errors 4.2.1 The uniform correlation model 4.2.2 The exponential correlation model 4.2.3 Two-stage least-squares estimation and random effects models 4.3 Weighted least-squares estimation 4.4 Maximum likelihood estimation under Gaussian assumptions 4.5 Restricted maximum likelihood estimation 4.6 Robust estimation of standard errors

34 41

46 52 53 54 55 55 56 7 5 59 64 66 70

CONTENTS xii

. structure deIs for covariance

. 5 ParametrIc mo 5.1 Introduction . 5.2 Models 5.2.1 Pure serial correlat;on asurement error . I fan p us me . I orrelation plus 5.2.2 Senal corre a I Random intercept plus sena c 5.2.3 measurement error Random effects plus measurement error 5.2.4 Model-fitting 5.3 5.3.1 Formulation 5.3.2 Estimation 5.3.3 Inference 5.3.4 Diagnostics 5.4 Examples 5.5 Estimation of individual trajectories 5.6 Further reading

90 91 93 94 95 97 98 99 110 113

xiii

9

6.1 Preliminaries 6.2 Time-by-time ANOVA 6.3 Derived variables 6.4 Repeated measures 6.5 Conclusions 7 Generalized linear models for longitudinal data 7.1 Marginal models 7.2 Random effects models 7.3 Transition (Markov) models 7.4 Contrasting approaches 7.5 Inferences 8 Marginal models 8.1 Introduction 8.2 Binary responses

8.2.1 The log-linear model 8.2.2 Log-linear models for marginal means 8.2.3 Generalized estimating equations

8.3 Examples 8.4 Counted responses

:.4.1 Parametric modelling for count data .4.2 Generalized estimating e uat'

IOn approach

126 126 128 130 131 137 141 141 142 142 143 146 148 160 160 162 165 167

Random effects models 9.1 Introduction 9.2 Estimation for generalized linpar n . I . " IlXP( mOl Ipis 9.2.1 CondItIOnal likelihood 9.2.2 Maximum likelihood PstilIlatioll 9.3 Logistic regression for binary f('sponses 9.3.1 Conditional likelihood approach 9.3.2 Random effects models for binary uata 9.3.3 Examples of logistic modds. witl; . , C'"UISSlan random effects 904 Counted responses 904.1 Conditional likelihood method 9.4.2 Random effects models for counts 9.4.3 Poisson~Gaussian random effects models 9.5 Further reading

10

'fransition models 10.1 General 10.2 Fitting transition models 10.3 Transition models for categorical data 10.3.1 Indonesian children's study example 10.3.2 Ordered categorical data lOA Log-linear transition models for count data 10.5 Further reading

11

Likelihood-based methods for categorical data 11.1 Introduction 11.1.1 Notation and definitions 11.2 Generalized linear mixed models 11.2.1 Maximum likelihood algorithms 11.2.2 Bayesian methods 11.3 Marginalized models 11.3.1 An example using the Gaussian linear model 11.3.2 Marginalized log-linear models 11.3.3 Marginalized latent variable models 11.3.4 Marginalized transition models 11.3.5 Summary 11.4 Examples 11.4.1 Crossover data 11.4.2 Madras schizophrenia data 11.5 Summary and further reading

114 114 115 116 123 125

6 Analysis of variance methods

8.5 Sample size calculat' .. q IOns reVISIted 8.6 Fu rther reading

CONTENTS

81 81 82 84 89

12 Time-dependent covariates 12.1 Introduction 12.2 An example: the MSCM study

169 169 171 171 172 175 175 178 180 184 184 186 188 189 190 190 192 194 197 201 204 206 208 208 209 209 212 214 216 218 220 222 225 231 231 231 234 243 245 245 247

CONTENTS xiv

t es . al models .th cross-section 12 3 Stochastic covaria . t'10n I'SSUes WI • . 1 Estlma 12.3. A 'mulation illustratIOn t'onal analysis 12.3.2 Sl M data and cross-sec I 12.3.3 MSC 12.3.4 Summary ovariates . 12 4 Lagged c d covariate e . tes . 12 4 1 A single lagg . . I ed covafla 12.4.2 Multiple agg d 1 gged covariates 12.4.3 MSCM data an a 12.4.4 Summary founders d Time-dependent can . intermediate an a 12.5 12.5.1 Dree dback'. response IS an confounder . MSCM data and endogenelty 12.5.2 . 3 Targets of inference 12.5. . computatIOn . 4 Estimation usmg g12.5. d t nd 9 computatIOn 12.5.5 MSCM a a a b bTty of treatment 1~5.6 Estimation using ) inverse pro a I I weights (IPTW d I MSCM data and marginal structural mo es 12.5.7 using IPTW 12.5.8 Summary 12.6 Summary and further reading 13 Missing values in longitudinal data 13.1 Introduction . 13.2 Classification of missing value mechamsms 13.3 Intermittent missing values and dropouts 13.4 Simple solutions and their limitations 13.4.1 Last observation carried forward 13.4.2 Complete case analysis 13.5 Testing for completely random dropouts 13.6 Generalized estimating equations under a random missingness mechanism 13.7 Modelling the dropout process 13.7.1 Selection models 13.7.2 Pattern mixture models 13.7.3 Random effect models 13.7.4 Contrasting assumptions: a graphical representation 13.8 A longitudinal trial of drug therapies for schizophrenia 13.9 Discussion

CONTENTS

253 254 256 257 258

259 259 260 261

265 265 266 268

269 273 275 276 279 280 280 282 282

283 284 287

287 288 288 293 295 295 299

301 303 305

316

xv

14

Additional topics 14.1 Non-parametric modelling of the mean response 14.1.1 Further reading 14.2 Non-linear regression modelling 14.2.1 Correlated errors 14.2.2 Non-linear random effects 14.3 Joint modelling of longitudinal measurements and recurrent events 14.4 Multivariate longitudinal data

Appendix Statistical background A.I Introduction A.2 The linear model and the method of least squares A.3 Multivariate Gaussian theory A.4 Likelihood inference A.5 Generalized linear models A.5.l Logistic regression A.5.2 Poisson regression A.5.3 The general class A.6 Quasi-likelihood

319 319 326 326 328 329

329 332 337 337 337 339 340 343 343 344 345 346

Bibliography

349

Index

369

1 Introduction

1.1 Longitudinal studies

The defining characteristic of a longitudinal study l'S th at'ill d"d 1 IVl ua s are measure d . repeatedly through time Longitudinal studl'e . t .... s are ill con rast to c~oss-sectlO~al .st~dIes, III which a single outcome is measured for each indivI~ual. WhI!e It. IS often possible to address the same scientific questions wIth a !o~gItudIll~1 or cross-sectional study, the major advantage of the ~ormer IS ItS capacIty to separate what in the context of population studIe~ are called cohort and age effects. The idea is illustrated in Fig. 1.1. In FIg. 1.1(a), reading ability is plotted against age for a hypothetical crosssectional study of children. Reading ability appears to be poorer among older children; little else can be said. In Fig. 1.1(b), we suppose the same data were obtained in a longitudinal study in which each individual was measured twice. Now it is clear that while younger children began at a higher reading level, everyone improved with time. Such a pattern might have resulted from introducing elementary education into a poor rural community beginning with the younger children. If the data set were as in Fig. 1.1(c), a different explanation would be required. The cross-sectional and longitudinal patterns now tell the same unusual story - that reading ability deteriorates with age. The point of this example is that longitudinal studies (Fig. 1.1(b) and (c)) can distinguish changes over time within individuals (ageing effects) from differences among people in their baseline levels (cohort effects). Cross-sectional studies cannot. This concept is developed in more detail in Section 1.4. In some studies, a third timescale the period, or calendar date of a m~a surement, is also important. Any two of age, period, and cohort determme the third. For example, an individual's age and birth cohort at a given measurement determine the date. Analyses which must consider all three scales require external assumptions which unfortunately are difficult to validate. See Mason and Feinberg (1985) for details. . Longitudinal data can be collected either prospective~y, foll0v.: mg subjects forward in time, or retrospectively, by extractmg multIple

EXAMPLES

INTRODUCTION

3

and is reflected in books writt en WIt' I1 such a }"" " example, Goldstem (1979) Plew' (if)"'" PI> H,ltlons III nllnd, Sr(>, for , IS, 0,)) ' or He (ok Illan and Sini!,pr (l9g,j)

(e)

2 (b)

0

(a)

1.2

,

0

Examples

We, now I 'fi introduce seven kJlllTitudl'n' ,.., a 1 (ata srts t 'II SClentl c and statistical is~uec \vh'IC,I1 aflse ' \VIti " 0 ' I 0.1" list rate tlIP, kinds of ,," .., and other data sets will be usp(1 tl I 1 onl!,ltudlllal studlrs, Tlwsp '. Iroui!, lout the' 1 k f 'Il ' relevant methodology, The exam I,. h' lOU or I list rat Iun of t]w p ( S avp bern ('h·· f .' and health sciences to represellt a rangf' 0 f ('hall·OS0n . frOIll tlw llloloi!,ical . presentmg each data set, common a I r.' . ,f ni!,es or analYSIS. Aftf'r n( (Istmglllshmg featurrs arp discussed, Example 1.1. CD4+ cell numbers 0

Age

Age

Age othetical data on the relationship between reading ability and age. Fig. 1.1. Hyp e. on each person from historical records. The statistical measuremen t ., . . L 't d' al data methods discussed in this book apply to both sltuatlOns. O~gl u m are more commonly collected prospectively since the quahtr of repea~ed measurements collected from past records or from a person s recollectlOn may be inferior (Goldfarb, 1960), . ' . Clinical trials are prospective studies which often have t1me to a chmcal outcome as the principal response. The dependence of this univariate measure on treatment and other factors is the subject of sU'T'Vival analysis (Cox, 1972). This book does not discuss survival problems. The interested reader is referred to Cox and Oakes (1984) or Kalbfleisch and Prentice (1980). The defining feature of a longitudinal data set is repeated observations on individuals enabling direct study of change. Longitudinal data require special statistical methods because the set of observations on one subject tends to be intercorrelated. This correlation must be taken into account to draw valid scientific inferences. . The issue of ac~ounting for correlation also arises when analysing a smgle !o~g tzme s.enes of measurements. Diggle (1990) discusses time series allal!SIS m the blOlogical sciences. Analysis of longitudinal data tends to ~: slmpler because subjects can usually be assumed independent Valid IDlerellces can be mad b b . . th' e Y Orrowlllg strength across people. That is b' . ' e consIstency of a pattern ac conclusions, For th1'S re . ~oss su Jects 1S the basis for substantive ason lllierences fro I 't d' . made more robust to mod I ' . m ongl u mal stud1es can be , e assumptIOns than th f . . partIcularly to assumptions ab t th ose rom tIme senes data, Sociologists and econom' t oU e nature of the correlation. t d' IS S 0 it en refer to I 't d' s u zes. Although many of th t t' . ongl u mal studies as panel b k . e s a 1stlcal meth d h' h 00 WIll be applicable to the I' a s W 1C we discuss in this our e h . , ana YS1S of data fr th ' mp aslS 1S firmly motivated b 0 ' om e soc1al sciences, proble~ arising in the biological ~e lur ex?enence of longitudinal data emphasIS would have been appropriateaf~~ sC1ences.. A s?mewhat different many SOCIal SC1ence applications ,

,

,

0

.

,

•

0

The human immune deficiency virus (HIV) , " ' , b'l' ' causes AIDS bv rerlucing person s a I Ity to fight infection HIV att k ' . a CD4+ II h' h h " ac s an unmune cell called the ce w ~c orc estrates the body's immunoresponse to infectious agents, litre 0 f bl CAn unmfected individual has around 1100 cell's per 1111'11'1' o~d, D4+ cells, decrease in number with time from infection so that an mfected person s CD4+ cell number can be used t a mom't,or d'Isease , , pro~ressl~n. F.lgure 1.2 displays 2376 values of CD4+ cell number plotted aga1~st t1me smce seroconversion (time when HIV becomes detectable) for 369 mfected men enrolled in the Multicenter AIDS Cohort Study or MACS (Kaslow et al., 1987). In Fig. 1.2, repeated measurements for some individuals are connected to accentuate the longitudinal nature of the study. An important objective 0

2500

lii

.0

E

:0

<::

Qi

1500

(J

+

.',

"
.

'.

..

0 0

500 0 -2

o 2 Years since seroconversion

4

Fig. 1.2. Relationship between CD4+ cell numbers and time since seroconversion due to infection with the HIV virus.. : individual counts; - - ; sequences of measurements for randomly selected subjects; ......... : lowess curve.

EXAMPLES

INTRODUCTION

'j

f CD4+ cell deplrtion, ) • " t' characterize the typiCa ~ with the immune system alH of MACS lfi .0 • "f 'tl ~ 'Interaction of HlV , etrI'c t ']iLrJ y h . ' A n-par am ' , .smooth curve Thifi hc~lps 0 (., "sellin~ infected men. nO J fi ure to highlight the 'ist w!WTI c·o1Jl)·· ' I led to t)e g . 'I oW f~SS ' ~ 'j, 1994) ha.s beetl a( C . I ber is constant IIlltl z r r 'wrl DI~g c, . CD4+ eel num (,P/!,P " Nc tr t hat. the fiwrag e ~ ~ uickly at first, avcral!,"f' trend, J. '.. and then decrea.<;es, more q , the time of HeromnverslOn , . 1 '. of these data are to . Ob'pc'tives of a )ongitudmal ana YSls , .J " . . f CD4+ cell depletIOn; , ,I . . . tIme course a (I) estunat(· t)( ,lverage . d' . I .1 men taking account of the . . course for III IVIC ua (2) estunfite t.he tUlle . CD4+ cell determinations; measurement error m in the rate of f h t eneity across men (3) characterize t.he degree 0 e ,erog , progrrssion; (4) identify factors which predict CD4+ cell changes. 4

, I time coUffie 0

Indonesian children's health study E xamp Ie 1. 2.

. £ t t dy (which we WIll reler 0 Alfred Sommer and colleagues con d uc t ed a s u . . as the Indonesian Children's Health Study or ICHS) III the Aceh pro:vmce of I'ndonesia to determine the causes and effects of vitamin A defic~ency in pre-school children (Sommer, 1982), Over 3000 children were medIcally examined quarterly for up to six visits to assess whether they suffere? from respiratory or diarrhoeal infection and xerophthalmia, an ocular mamfestation of vitamin A deficiency. Weight and height were also measured. We will focus on the question of whether vitamin A deficient children are at increased risk of respiratory infection, one of the leading causes of morbidity and mortality in children from the developing world, Such a relationship is plausible because vitamin A is required for the integrity of epithelial cells, the first line of defence against infection in the respiratory tract. It has public health significance because vitamin A deficiency can be eliminated by diet modification or if necessary by food fortification for only a few pence per child per year, The ~ata on 275 ~hildren are summarized in Table 1.1. The objectives of.analysis are to estJmate the increaBe in risk of respiratory infection for ChJld~~n who are vitamin A deficient while controlling for other demogrd'ap IC factors, ~nd to estimate the degree of heterogeneity in the risk of Isease among chIldren. Example 1.3. Growth of Sitka spruce Dr Peter Lucas of the Biolo ical S . . .. ~rovided these data on the g;owth ~:~~ces D!VlSlOn at LancaBter University IS to assess the effect of II . tka spruce trees. The study objective . ozone po utlOn on t IS cornman in urban areas th . ree growth. As ozone pollution on tree growth is of consid~rabel I~Ptact of increaBed OZone concentrations size , were h '. e III erest The re . SIze IS conventionall . sponse vanable is log tree y meaBured by the product of tree height

Table 1.1. SUlIllnarv of 12fH) (1"~"n'\II' f' , , ~ .. ' ( . "II~" 1('~plf<Jl"n' 111 cctHHl R,I), Xt'rophl hahllia filld ;to,. ,)-. . ,.., '1I1_1.,,.hildn·lIfnllll I ICHS (SlJlIllIll'r, }'JK:2) tie

X(~rophl haJmia

NlJ Yes

RJ

:2

:\

-

I)

.J

(

NlJ Yt's

~J(J

:2:1fi :Hi

:nlJ

IiI;

:~q

~J

11:\ i

I)")

S

NlJ Yes

()

:2

!")

X

.\

()

1

()

lK i

()

()

Il

Il

1

.)

(I

and diameter squared. The data for 79 trees over two growing seasons are listed in Table 1.2, and displayed graphically in Fig. 1.3. A total of 54 trees were grown with ozone exposure at 70 ppb; 25 were grown under control conditions. The objective is to compare the growth patterns of trees under the two conditions. In Fig. 1.3, two features are immediately obvious. Firstly, the trees are indeed growing over the duration of the experiment - the mean size is an increasing function of time, The one or two exceptions to this general rule could reflect random variation about the mean or, less interestingly from a scientific point of view, errors of measurement. Secondly. the trees tend to preserve their rank order throughout the study - trees which are relatively large at the start tend to be relatively large always. This phenomenon, a by-product of a component of random variation between experimental units, is very common in longitudinal data and should feature in the formulation of any general class of models. Example 1.4. Protein content of milk In this example, milk was collected weekly from 79 Australian cows and analysed for its protein content. The cows were maintained on one of three diets: barley, a mixture of barley and lupins, or lupins alone. T~e data we,re provided by Ms Alison Frensham, and are listed in Table 1.3. FIgure 1.4.dlSplays the three subsets of the data corresponding t.o .each of the three dIets. The repeated measurements on each animal are JOllled to acc~ntuate the longitudinal nature of the data set. The objective of the study IS to determine how diet affects the protein in milk. It appears from ~he fi~ure that . t hich in turn gIves hIgher valbarley gives higher values than t h e Imx ure, w . ues than lupins alone. A plot of the average traces ~or eac~l group (DIggle, 1990) confirms this pattern. One problem with SImple .mferenc~s, how~ ever , is that in this example, time is measured in weeks Slllce calvmg, an

INTRODUCTION . ~ Sitka.

6

EXAMPLES

spruce trees grown in

(a)

M~aRlJf{:rnents of log-Size ~'thin each year, the data an"

Table 1.2. (nriclwd environments. I f< controlled environment 111 or OZO/le- l . ' d' g to our norm. l' f, IIr blocks, (:(Jrr('A~pon III . ' 27 trees each, have an (/.r/{nnJIZC( HITho0 first two cham IJers,. contalllmg . containing 12 and 13. trees ('ham Jr.rll. I 1I111wmg two, f (Jzo/le-eJJri(:h(~d atlJloliphere'l t/~ r~roJ) atmosphere. Data below are rom . I It norma (OIl rCHpcr;t.lvely, IIlVC . the firHt dlilmher only. 1 January 1988 Time ill days sIDe e

8

~ ~.~.

7

~-

6

'"iiJ N

a, 5

0

-'

4

.

/52 4.51 4.24 :1.98 4.36 4.:14 4..59 4.41 4.24 4.1l2 3.84 4.07 4.28 4.47 4.46 4.6 3.73 4.67 2.96 3.24 4.36 4.04 3,53 4.22 2.79 3.3 3.34 3.76

174 4.9B 4.2 4.:lfi 4.77

4.0.5 0.08 4.li6 4.64 5.17 4.17 4.31 4.8 4.89 4.84 4.08 4.15 4.88 3.47 3.93 4.77 4.64 4.25 4.69 3.1 3.9 3.81 4.36

639

674

3

528

613

227

496

579

469

556

2.'j8

5.9 4.92 4.99 S.3 5.97 5.7(; 4.05 5.2.3 4.05 5.38 5.76 6.12 4.67 4.67 5.1 4.9 5.27 5.55 5.23 5.55 5.11 5.34 4.17 4.35 4.61 4.87 5.18 5.34 3.76 3.89 4.76 4.62 5.02 5.26 4.86 5.09 4.68 4.97 5.07 5.37 3.3 3.38 4.34 4.96 4.21 4.54 4.7 5.44

6.16 5.2 S.87 5.53 6.5 6.3:1 6.13 5.61 6,48 4.94 5.26 5.76 5.99 5,47 4.65 5.24 6.44 4.15 4.63 5.44 5.25 5.64 5.76 3.61 5.46 4.93 5.65

6.18 5.22 5.88 5.56 6.5 6.34 6.14 5.63 6.5 4.94 5.26 5.77 6.01 5.49 4.69 5.25 6.44 4.15 4.64 5.44 5.27 5.64 5.8 3.65 5.49 4.96 5.67

6,48 5.39

6.65 5.65 6.34 5.93 6.83 6.78 6.57 6.18 7.14 5.33 5.66 6.18 6.39 5.93 5.21 5.45 6.74 4.72 5.08 5.73 5.65 5.74 6.37 4.18 6.03 5.48 6.04

6.87 5.71 6,49 6.21 7.1 6.91 6.78 6.42 7.26 5.53 5.81 6.39 6.45 6.06 5.38 5.65 7.06 4.76 5.27 5.77 5.69 5.78 6.35 4.13 6.07 5.49 6.02

6.95 5.78 6.58 6.26 7.17 6.99 6.82 6.48 7.3 5.56 5.84 6.43 6.57 6.15 5.58 5.65 7.11 4.93 5.3 6.01 5.97 5.94 6.58 4.36 6.2 5.7 6.05

6.99 5.82 6.65 6.2 7.21 7.01 6.81 6.47 6.91 5.57 5.93 6.44 6.57 6.12 5.46 5.76 7.04 4.98 5.43 5.96 5.97 6.18 6.55 4.43 6.26 5.74 6.03

7.04 5.85 6.61 6.19 7.16 7.05 6.86 6.46 7.28 5.6 5.89 6.41 6.58 6.12 5.5 5.83 7.11 5.07 5.2 5.96 5.89 5.99 6.55 4.39 6.28 5.74 5.91

2

6.15 4.96 ,'j.n:l S.3(; 6.2R 6 5..33 5,48 6.24 4.8 5.1 5.65 5.74 5.46 4.59 4.93 5,49 4.3 4.64 5,45 5.25 5.18 5.58 3.55 5.4 4.86 5.32

20/ 5.4J 4.68 4.70 5.1 5.42 5.36

6.04 5.68 6.79 6.39 6.36 5.82 6.77

5.05 5.38 5.98 6.08 5.7 5.01 5.25 6.61 4.41 4.77 5.49 5.5 5.53 6.11 3.93 5.77 5.15 5.63

the experIment ' was terminated 19 about half of the 79 sequences of ;.eeks aft~r the earliest calving, Thus, plete. Calving date may well b l~k protem measurements are incomphysioIOglC . al processes that also e asSOCiated '. d t . ' d"Irectly or mdnectly with the observations ,protein content. If ;his is the IS 1ll Chapter 11 no e Ignored in inference . Th'IS Issue , h " N en up ison ote ow the multitude of lines in ' FIg. 1.4 confuses the grou . On the other hand the I' , Illes are useful t h p comparo s ow the variability across

~~~themi~sing

shou~ er~~e

I

0

200

-

--~-.-._-

-------

400

T

600

-----, 800

Days since 1 January 1988 (b)

8 7

Q)

6

N

'0;

0, 0 -1

5 4

3

2 0

200

400

600

800

Days since 1 January 1988

(a) control; Fig. 1.3. Log-size of 79 Sitka spruce over two growing seasons: (b) ozone-treated.

compromise displays time and among individuals. Chapter 3 discusses which more effectively capture patterns in longitudinal data. Example 1.5. Crossover trial Jones and Kenward (1987) report a data set from a three-period crossover trial of an analgesic drug for relieving pain from primary dysmenorrhoea (menstrual cramps). Three levels of the analgesic (control, low and high) were given to each of 86 women. Women were randomized to one of the six possible orders for administering the three treatment levels so that the effect of the prior treatment on the current response or carry-over effect could be assessed, Table 1.4 is a cross-tabulation of the eight possible outcome categories with the six orderings. Ignoring for now the

Table .. al e>..-penment . -11) cows all1.3.t dPercentage t d protein content of milk samples taken at weekly intervals . In the ongm were oca e a ran om amongst three diets. cows 1-25 " barley- cows 26--52. barle... . -3- /-9 Iupius . 'T 'Iupms; COWS;) D t below are from the barley diet only. 9.99 signifies missing. . " aa

3.57 3.25 3.60 3.50 3.76 3.71 3.60 3.86 3.42 3.89 3.59 3.76 3.90 3.33 4.00 3.22 3.66 3.85 3.75 9.99 3.10 3.86 3.79 3.84 3.61

3.63 3.24 3.98 3.66 4.34 4.36 4.17 4.40 3.40

3.75 4.20 4.02 4.02 3.90 3.81 3.62 3.66 4.44 4.23 3.82 3.53 4.47 3.93 3.27 3.32

3.89 3.38 3.29 2.72 3.45 4.06 3.78 3.64 3.35 3.32 3.19 3.95 3.71 3.35 3.52 3.28 2.66 3.40 4.09 3.25 3.35 3.74 3.76 3.40 3.58

3.65 3.09 3.30 2.90 3.51 3.95 3.10 3.32 3.39 3.42 3.27 3.53 3.55 3.22 3.47 3.02 3.10 3.22 3.60 3.33 3.48 3.49 3.58 3.44 3.48

3.47 3.29 3.43 3.05 3.68 3.42 3.52 3.56 3.51 3.65 3.55 3.60 3.73 3.25 3.57 3.62 3.28 3.55 3.82 3.27 3.90 3.34 3.68 3.46 3.25

~>8CD"1""'a

..... Cltaq

e; ::s ::s.,

~

P'l

~

::s .---.. CD 00' ;:::! .,

....... II

~.

,00

(!; P'

e. "d

~

Go ~

<+

E.

(l)

~

S'

po

C.1l

<::>

~ J:>

<.n

(J'I

CP

~

(J'I

c.:I

X

»

8_.

~

""j

<"l

'-----' w

t;j

;!l.-..../ ~ . 0

t;j:::r'ct-f-j::S 0.3. ('") :::r' '"d

gi=l-o §

J:>

o

~;?

p.§~gt:3 ..... rn :;J ,..... 0> t;j:;J~a.%~ _.

w

(JI

C'!"

'-----'

a

II -

6

o 0....,

<0 ..... ::s ::s 8"~~:::r':;J .... (l) ('") ..... 0 ~ ~. ao. S ..... n ,.....:::r' co

:::r'

'"i:.n

i:.n

(')

(1) s.o'------tV '"' (l) 0 tV

10--'

w

~C'!"

~'"d (1)Q)~ ~ \J; S' 0 0' O, ..... ao. ~...,

o

0

J:>

--'l

0.. (1) ~ Ul ::s ~ co ~~ct-~<: p 0" co

po

i:.n

Protem content

:§'

o

o

-::s

CD

po

i:.n

U1

o

.,.

9.co _.

C'!"

p

C.'

<:>

= CO' g

g; ~ ~ f-j(1)g-::S.,

g

i:.n

I\)

.j>.

w

p.<"l

S

M-

o

.p-

w

CP P.

8~g-.?~ .-.

w

Protei n content

:-'

S _. >:

<+"(1)rn;::i CD (1) .....

:::r'::S

(n

w

:s

C?""(j '--'...,

_P"'<+;::i

"d

I\J

~Ill-

~g~~g

P'l _~

3.90 3.78 3.82 3.83 3.71 4.10 4.02 4.13 4.08 4.22 4.44 4.30 3.16 3.34 3.32 3.31 3.27 3.41 3.45 3.12 3.42 3.-10 3.17 3.00 3.20 3.27 3.22 2.93 2.92 2.82 2.64 9.99 9.99 9.99 9.99 9.99 2.80 3.20 3.18 3.14 3.18 3.24 3.37 3.30 3.40 3.35 3.28 9.99 3.77 3.90 3.87 3.61 3.85 3.94 3.87 3.60 3.06 3.47 3.50 3.42 3.99 3.70 3.88 3.71 3.62 3.74 3.-12 9.99 9.99 9.99 9.99 9.99 3.64 3.83 3.73 3.72 3.65 3.50 3.32 2.95 3.34 3.51 3.17 9.99 3.97 9.99 3.78 3.98 3.90 4.05 4.06 -L05 3.92 3.65 3.60 3.7-1 3.50 3.55 3.28 3.75 3.55 3.53 3.52 3.77 3.77 3.7-1 4.00 3.87 3.35 3.09 3.65 3.53 3.50 3.63 3.91 3.73 3.71 ·US 3.97 4.06 3.55 3.60 3.75 3.75 3.75 3.89 3.87 3.60 3.68 3.68 3.56 3.34 3.96 9.99 3.70 9.99 3.45 3.50 3.13 9.99 9.99 9.99 9.99 9.99 3.74 3.61 3.42 3.46 3.40 3.38 3.13 9.99 9.99 9.99 9.99 9.99 3.33 3.12 2.93 2.84 3.07 3.02 2.75 9.99 9.99 9.99 9.99 9.99 3.50 3.71 3.55 3.13 3.04 3.31 3.22 2.92 9.99 9.99 9.99 9.99 3.22 3.45 3.51 3.38 3.00 2.94 3.52 3.48 3.02 9.99 9.99 9.99 3.01 3.50 3.29 3.16 3.33 3.50 3.46 3.48 3.98 3.70 3.36 3.55 3.85 3.72 3.35 3.01 3.55 3.70 3.73 3.65 3.78 3.82 3.75 3.95 3.88 3.98 3.60 4.05 3.72 3.75 3.36 3.65 3.41 3.15 3.68 3.54 3.53 3.40 3.33 3.42 3.60 3.70 3.43 3.50 3.58 3.70 3.55 3.58 9.99 9.99 9.99 3.18 3.22 3.32 3.56 3.27 3.61 3.66 3.47 3.34 3.50 3.42 3.52 3.77 3.98 4.12 3.46 3.88 3.60 4.00 3.83 3.80 9.99 9.99 9.99 9.99 9.99 3.85 3.75 3.37 3.00 3.24 3.44 3.23 3.80 3.52 3.93 3.68 3.68 3.76 3.47 3.32 3.47 3.40 3.27 3.74 9.99 9.99 9.99 9.99 3.35 3.14 3.51 3.74 3.50 3.08 2.77 3.22

Protein content

.

(')

e,<+'1 8~""'" '<: P"' o::S <+ Ul CD <: Ul 0 ., 00' 0 (S 00' ::s g; • 0".'<+ <+ <: 0 ~ ~ <+ .....

3.77 3.00 2.93 3.05 3.60 3.92 3.66 3.47 3.21 3.34 3.50 3.73 3.49 3.16 3.45 3.52 3.15 3.42 3.62 3.57 3.65 3.71 3.57 3.63 3.60

-Iofj

~~O"~~O

tl'

3.73 3.33 3.25 3.11 3.53 3.73 3.42 3.57 3.13 3.27 3.60 3.26 3.40 3.24 3.63 3.15 3.00 3.28 3.84 2.97 3.35 3.24 3.66 3.50 3.47

g

(;5 iii' il" ('")

S 'g ~

e:..~~ ~_O

......

~

:E Cll

=8

:E Cll

~

Cll

0

w w

~

'0'0 Er (D

([)

~

'"

v>

~

II

~9

_:::l

-J

-.j-.j

<"l

o

::1

Cf.

~

'" 0 A

~

0

p. ,.,.,

""

&;

<0

'J'I

<"l

~

0 ~:a

UI

~

~

I

cr

e;

~ p..

~

N

'" °L

°L I

~

I

~

~~

,

<:!

INTRODUCTION

EXAMPLES

]I)

, ~ h treatment and response of patIents lor eac l" ' Table 1.4. Nurn }er· t ' 1 of analgesic treatment lor pam . h riod crossover rIa ) sequence In tree-pc h (J nes and Kenward, 1987 ' from primary dysmenorr oea 0 Response sequence in perio?s 1, 2, 3 (0 == no relief; 1 == rehef) 1

Treatment sequence'

000

100

012

o

a

021

2

1 1

102

o o

1

120 201

3

a

210

1

Total

6

010 2

001 2

a

a

1 1

1 1

5

a a

o o

8

4

4

110 1 0 0 8 1

4 14

101

o o 8

o 7 3 18

011

9 9 3 0 2 1 24

111 1 4

1 1 1

o 8

'T'

.10

t I a

15 16 15 12 14 14 86

'Treatment: 0 == placebo; 1 == low; 2 = high analgesic. Example 1.6. Epileptic seizures This example comprises data from a clinical trial of 59 epileptics, analysed by !hall and Vail (1990) and by Breslow and Clayton (1993). For each pat~ent, th~ number of epileptic seizures was recorded during a baseline penod ~f e~ght .weeks. Patients were then randomized to treatment with the antI-epIleptic drug progabide, or to placebo in addition to standard ch~motherapy, The number of seizures was then recorded in four consec utlve ,two-we~k intervals. These data are reprinted in Table 1 5 d ~aphic~y d.lsplayed using boxplots (Thkey 1977) in F' 15' T'han are ICal questron IS whether the ' , Ig. .. e medFigure 1.5 is suggestive of aPsrmogallbldde re~uce,s the rate of epileptic seizures. , a re uctlOn III th possIbly, at week two Inl" e average number except . , . lerences must take' t ' vanatlOn among people in th b l' III 0 account the very strong , e ase me level of s ' h' persIst across time In th' eIzures, w Ich appears to £ " . IS case the nat I h aClhtate the detection of a t ~ ura eterogeneity in rates will chapters. rea ment effect as will be discussed in later Example 1. 7. A clinical trial of dru th ' Our final example ConSI'd d g eraples for schizophrenia d'ffi ers ata from d' ;er:r;~:v~~~~~ygiDmeSpin the treatmen~~~nCh~:~~dscchl~nicahltri~l comparing r eter 0 1Z0P reilIa Th d t "!'e have data from 523 ut~ang, Janssen Research Foundat" e a a Iowmg six treat pa lents, randoml 11 IOn. levels 2 6 10 me~ts: placebo, haloperidol 2ci a ocate~ amongst the folapy. Ri~pe~ido~ea: d 16~g, Haloperidol is reg~:d a~d nsperidone at dose escflbed as 'a novel chem' Ie as a standard therICa compound with useful

11

Table 1.5. Four Ruccpssivp 1 epileptics, CovariateR are ~;\"W(J-w('ek seizlIrp "Ollllh for pacll ()f ,"}',J , h k ' Jllvallt trpat IllPI t (( elg t-wPp baseline seizure COl Ul t's. and ageI (ir J == pla,·pho. 1- ]ll'()P,'l]ll',!r'), ,., , J

Yl

Y2

Yj

.'i :3

:~

3 :3

Y4

Trt,

9 8

21 7

o o o o o o

640 40 20 23 566 14 13 6 26 12 6 6 8 12 446 7 9 12 16 24 10 11 0 0 003 37 29 28 352 306 343 343 233 8 12 2 18 24 76

2

o

5

240 441 7 .'i

lR 2

212 314 13 15 13 11 14 9 879

4

12 5

o 22 5 2

14 9 5 3 29 5 7 4 4

5 8

25

o o o o o

o o o o o o o o o o o o o o

Base 11

11 r,

8 66 27

A~e :ll 30 2" :16 22 2!l

12 52 23 10 52 33

:n

18

23 36

42 87

50 18 111

18 20 12 9

17 28 55

42 37 28 36 24

26 26

28 31 32 21 29 21 32

25

12

o

47

30 40 19 22

8

1 1

76

18

38

32

1 2

4

o

9

10

)'1

ypars),

)'e!

)',

)',

I)

4

:j

I)

:~

I

Ij

I

:j

2

Ij

7

I

Tel .

4

:,

22

17

I I'J

Hj

'\

7

-]

4

o

I I 1 I I

-]

7 IH

7 2

1 2

1

" 2 3 ,I

2

o 5 11 10 19

1

6 2 102 4

8 1

18 6 3 1

.]

:j

o o

4

o

14 5

25

7 1

6

8 7

2

4

10 1 65 3 6 3 11 3

8

8

o

o

72 2

63 4 7

3

5 1

:3 15

28

5 13

4

o

4 19

3 8 1

2

5 23 3

o

o

o o

1

4

3

o 2

21)

J'l III 1'1

2·] :,1

211 1k 24 :j(J

1·1

:'.',

I

11

0,7

1

fi,

20

1 1 1 1 1 1 1 I

41 7 22

22 2H 2:l

13 46 36

40 43 21

38

35

7

2.5

1 1 1 1 1 1 1 1

36

26

11 I'll

25 22

22 42 32 56 24

32 25

35

1

16

1 1 1 1

22

21 41 32 26

25

21

13

36

12

37

pharmacological characteristics, as has been demonstrated in 'in vitro and in vivo experiments.' The primary response variable was the total score obtained on the Positive and Negative Symptom Rating Scale (PANSS), a measure of psychiatric disorder. The study design specified that this score should be taken at weeks -1, 0, 1, 2, 4, 6, and 8, where -1 refers to selection into the trial and 0 refers to baseline, The week between selection and baseline was used to establish a stable regime of medication for each patient. Eligibility criteria included: age between 18 and 65; good general health; total score at selection between 60 and 120, A reduction of 20 in the mean score was regarded as demonstrating a clinical improvement.

iNTRODUCTION

----------------

12 10

/a)

-----------~------

Tahlp

1 6 F· 11 . '.. .' .. ' ""1 ('11<\' oll'tnl'\lII"11
r

.

------

I I

, 11111"

In,.!

II ' J 11('11);1

Illitd(,'jlJat (. n'''I'' 111"-

Il!t ('f-('III'I"III dlll""~ 1<,1111\\'-11]>

IAlst t(,

-

()llH'f f(·;Lo.;, I} J

I I 1

lhw()(ljJPral

I

i\,(,

\Vit hdfP\\, ,'olJsP1l1

3

2

Baseline (b)

"'111111

Adv'·r",· "XP/'rJl'II'"

__ r.•.•-·_~Ll. [=:.:;J ~

I'

fl,l

AI'l!lII1IJaI lall [(-'-111,

T T

j

"_ _(rug t!J'·r"pi .." ___

1~ ,

10

1I ~

~

8

Table. 1.7. NumlJers of dmpuub alJd l'()lllph.tprs by treatment group III tlIP schizophH'llia trial. TIl(' treatment codes are: }) =: !J]an·ho ' 11 -~ I la II/Pf'f}C' i1/ I . . 20 mg, ~2 =: .nspendone 2 mg, 1'6 =: risjJf'ridollP Gill!!;, rIO = nspendone 10 mg, 1'16 =: rislwridone IG mg,

£ ~

6

l/)

~ ::>

.~ 4

I

l/)

~ enl ::>

r

2

I

I

1 I 1

/i o Baseline

I

I

I

I

T I

T I

T I

T

I

I I I I

T

I

I

I I

I

I

1 2

3

4

Fig. ,1.5, Boxplots of square-root-transformed seizure rates for epileptics at

p

h

r2

1'6

riO

rIG

Totol

Dropouts Completers

61 27

51 36

51 36

34 52

39 48

34 54

270 253

Total

88

87

87

86

87

88

523

~r~~l;~~ and for four subsequent two-week periods: (a) placebo; (b) progabide-

Of the 523 patients onl 253 r although a further 16 pr;v'd are lsted as completing the study, criterion for completion in~l~d ~eO~fl ete seq,uenee,of PANSS scores as the distribution of the stated e a£ 0 ow-up mterVlew, Table 1.6 gives the fd reasons or dropout T bl 1 . o ropouts and eompleters in each of th .' a e .7 gIves the numbers common reason for dropout is 'i ad e SIX treatment groups. The most out of the 270 dropouts The:' hequate response', which accounts for 183 group, followed by the halop .~g 1est dropout rate occurs in the placebo group. One patient provided e~~ ~ ~roup and the lowest dose risperidone wasF~ot considered further in the a:~ a~ all after the selection visit, and Igure 1.6 shows th b YSIS. .thi e 0 served WI n each treatment mean response as f . have not yet d groUP, that is each ave . a unctIOn of time rapped out. All six groUp h rage IS over those patients who ss OWam ean response profile with

J

the following features: increasing between selection and baseline; decreasing post-baseline; slower rate of decrease towards the end of the study. The mean response in the placebo group shows a much smaller overall decrease than any of the five active treatment groups. The risperidone treatments all show a faster rate of decrease than haloperidol initially, but the final mean responses for all five active treatment groups are similar. The overall reduction in mean response within each active treatment group is very roughly from 90 to 70, which appears to meet the criterion for clinical improvement. However, at each time-point these observed means are, necessarily, calculated only from those subjects who have not yet dropped out of the study, and should therefore be interpreted as conditional means. As we shall see in Chapter 13, these conditional means may be substantially different from the means which are estimated in an analysis of the data which ignores the dropout problem,

NOTATION

spruce .data' set includes only 7!J ' tn'e"" ..' a~'ll \"J'tl I .).) .' I"'ilia \\\', . ~,) 1II...LS1}f('lIWllh, the objectives of HlP studies diller. For ('X"r111'\ ' , " (, 'J I 1 ti W (,'n . I + I Iat a "d, infer('nc('s arc to be made ~\l\" ' t \ Hi t COllHS(' II'}ng • . • about tIll~ iwlivill\l"\ . ( .) L)~ PC 't Sf... {'lin he prOVided, wlwfea.s Jll th .. ('fU""UVI'f t rl' , ·.1. t \1(> .."\. (·f.I)!,(· . . " H'''pon".. 'III 1I1(' I)ul tlOn for eacb of t lw twu In>at III('IIIS ,'~ • \J(' j', 'II '1'1 ' (1I'lfprell('p~ I H~. ll':-\f' a PO influence the speClfk approadJ to H.lIalysi:-; as dis(lI)o,s('d ill d.. tai\ thfolll!,hont >

14 95

••

.

.

.

,.

.-;0

the houk. 1.3 Notation To the extent possible, the hook will dpscl'ilJP ill words tlll' ma inr idpas underlying longitudinal data analysis, Howewr. df'tails awl pl'l"·is,, st at,,ments often require a mathematical formulation. This S("'!ion prpspnts the

notation to be fonowed. In general, we will use capital letters to represent random variahlrs or matrices, relying on the context to distinguish the two.
70

l7j~l~

_ - - r16

- - r2

- - r6

I

65

o FI

g.

will be in bold type. Turning to specific points of notation, we let Y,) represent a H'.sponse variable and Xij a vector of length p (p-vector) of explanatory variables observed at time tij, for observation j = 1, ... , n on subject i == 1, , .. , m. The mean and variance of ¥ij are represented by E(Y'J) == II,} and Var(Yij) = Vij' The set of repeated outcomes for subject i arr collected into an ni-vector, Y i ::::: (¥il,"" Yin,), with mean E(Yi) ::::: /.-ti and 11; x 11; covariance matrix Var(Yi ) == Vi, where the jk element of V, is the covariance between Yij and ¥i/c, denoted by CoV(Yij,¥,k) = lJ;jk' .'Ne use H; for the n' x n' correlation matrix of Yi' The responses for all umts are referred t ' h N - t (Y Y) which is an N-vector WIt = '\"'m L,,=I n,. 1,"" m' h h Y t o asMost longitudinal analyses are based on a regression model sue as t e

2

4 Time (weeks)

6

8

1.6. Observed mean response profiles for the schizophreni~ tria! data. The b h h i 'd 1 20 mg r2::::: flspefldone 2 mg,

treatment codes are: p "" place 0, ::::: a open 0 r6:= risperidone 6mg, r10::::: risperidone 10 mg, r16

.'.

= nspendone 16 mg,

In all seven examples, there are repeated observations on each experimental unit. The units can reasonably be assumed independent of one another, but the multiple responses within each unit are likely to be correlated. The scientific objectives of each study can be formulated as regression problems whose purpose is to describe the dependence of the response on explana.tory variables. There are important differences among the examples as well. The responses in Examples 1.1 (CD4+ cells), 1.3 (tree size), 1.4 (protein content), and 1.7 (sc~izophrenia trial) are continuous variables which, perhaps after transformatIOn, can be adequately described by linear statistical models. However , the response'IS b'mary m ' Examp1es 1.2 (respIratory . . dlSe~e). and 1.5 ~presence of pain); and is a count in Example 1.6 (number o ~1~ures), Lmear models will not suffice here. The choice of statistical :~g~t~;esst odfePthend on bthe tYPe of outcome variable. Second, the relative e num er 0 f experime t 1 't tions per unit var ' n a um s and number of observa3000 participant: :~~s::heSI~ examples. For ~xample, the ICHS had over mas seven observatIOns per person. The Sitka

linear model, ¥ij ::::: {31 X ij1

= Xij'{3

+ {32Xij2 + ... + {3pX;jp + E;j

+ (ij,

, ector of unknown regression coefficients where {3 = ({31"'" (3p) IS a ~v 'bl esenting the deviation of the and f" is a zero-mean random vana e repr . -\1 - 1 for alli and t) d 1 d' t' n x·.'{3 TYPIC
Yi = Xi{3 + €i, where Xi is a

ni

. x P matrix WIth

Xij

. h 'th row and In t e J

€i ::::: (Eill , ",finJ·

APPROACHES TO LONGITUDINAL DATA ANALYSIS 17

INTRODUCTION

I erimental unit is nut ' the natura exp 'I itudinal studJes, , Yi of measurements ()ll Not(' that In ang, mllnt Y; , but the sequence'b I' t replication we refer . 'ndividuliJ mell8UfC J I hen we talk a Ou W t, h( I I I'.t Dor examp e, w'b . d'IVI'dual measurements. e , I' 'Jil'! HU )Jcr . 1" fill ~:J :I~~ ~:~':f~; of Hubjeets, n(~t tlJ("lnUl:e:~I; according to context: subject, ., I . terms mtcrc lIW,., , shl1/1 ww tIll' fal awlllP; ., . 'm'll individual. 't persall, anJ < , I'xpCrillJP.llta I um., " I(l

,

1.4 MerIts a

f Ion itudinal studies

g f. I ngitudinal study is its . , tI (' pcime advantage 0 a 0 f F' 1 1 As ment.lOned abovc, ,1, Th t'ficial reading example 0 Ig. , effectiveness for sturlyiug change, lear .I I' ar regression models using , ,,' I' 1 to the c 813S a f me d' I can (HL'liJy be genera IZC( , .', 0 s sectional and longitu ma , "TIJistinction between cr s . t.lle new notat.UJIl, W ( 'd . t' f the simple linear regreSSIOn , . I .J ,. rer by ('onSI eJ a IOn 0 jllf(~rerH~e IS mar e C Cd, . .' t II by thinking of the without. intercept.. The general case follows na ura Y explflllat.ary variable 1105 a vector rather than a sca~ar. h d I . ' I Ii, ·t UdY (l I i-- 1) we are restrIcted to t e mo e In Ii cross-secf,lOna L

Yil

= (jC3~il

+ Ed,

i

(1.4,1)

= 1, ' , , , m,

where jjc represents t.he difference in average Y across two sub-populations which differ by olle unit in x. With repeated observations, the linear model can be extended to the form

Yij == f3cxil

+f3dXij -

XiI)

+ fij,

j

= 1,.", ni;

i

= 1,.,., m,

(1.4.2)

(Ware et ai" 1990), Note that when j = 1, (1.4,2) reduces to (1.4.1) so f3c has t.he same cross-sectional interpretation. However, we can now also est.imate f3L whose interpretation is made clear by subtracting (1.4,1) from (1.4.2) to obtain

(Yij - Yil ) = I3L( X ij

-

XiI)

+ fij -

fil.

!hatris, f3L.re presen,ts the expected change in Y over time per unit change x lor a gIVen subject. th In Fig, ,l.l(b) , I3c. and (3L have opposite signs; in Fig. 1.1(c) th h e same SIgn. To estImate how individ 1 . . ' ey ave sectional study we m t ua s change WIth tIme from a cross, us assume (3c = 13 W' th I ' . strong assumption is unnec ,L. 1 a ongItudmal study, this Even when 13 _ 13 1 e~sar~ smce both can be estimated. c - L, ongItudmal studi t d b cross-sectional studies The b ' f . es en to e more powerful than . d' 'd , a s I S 0 mference ab t (3 . III lVl uals with a particular val f ou c IS a comparison of t lie 0 x to othe 'th . con rast, the parameter {3L is estim t d b rs WI a dIfferent value. In at two t i m ' a e Y comparing , es, assUlnmg x changes w'th . a person s response person can be thought of as . 1 hme. In a longitudinal study each servmg as h' h ' IS or er own control. For most III

outcomes, there is considerablp, varl',a 1))'1'1 ' 1 Y across 1'" I I influence of unmeasured eharactprist' .' , "h lIl( 1\ Ie ua s dul' to the , J( S sur ~s gf'\ pt k mental exposures, personal habits ' ..1 , ., I Ie 1IIa p-ujl. l'IlViCOllall" so (Jll, Ilwsl' t, I ' time, TheIr Influence is cancpllpd in th' , 't' '. I III to \lprsisl ovpr , f r rs 11IJallUll of j , tl' 1 estimatIOn 0 (3e, . L, If V 0 ISI'un' t hp '0

,

,

0,

0

Another merit of the lorwitudirnl Qtll I IV IS '. 1't S it I"Jllltv I( 1',' .., <" '1 degree of variation in Y anoss t imp f ' ' , ) 11"111 \W' want to estImate a man s Immune status as f(,ftp't I' I', ('D 4+ \('\'('], " ( .PI III lis WIth cross-sectIonal data, one man's estimate. 11111,Qt, (I.. . ..I P\\\ upon 'lata from others to overcome measurement error. But awraITillIT. afocr ' . I(' 'Ignocrs . " h hISS p('op the natural dIfferences III CD4+ level among persons. With c!"[l('atpd valu!"s. we can borrow strength across time for the person of inter(,sl a<J well a<J across people, If there is little variability among people, one man's estimate can rely on data for others as in the cross-sectional case. However. if the variation across people is large, we might prefer to use only data for the individual. Given longitudinal data, we can acknowledge the naturally occurring differences among subjects when estimating a person's current value or predicting his future one,

1.5 Approaches to longitudinal data analysis With one observation on each experimental unit, we are confined to modelling the population average of Y, called the marginal mean response; there is no other choice. With repeated measurements, there are several different approaches that can be adopted. A simple and often effective strategy is to (1) reduce the repeated values into one or two summaries; (2) analyse each summary variable as a function of covariates,

Xi·

For example, with the Sitka spruce data of Example 1,3, the linear growth rate of each tree can be estimated and the rates compared across t:he ozone groups. This so-called two-stage or derived variable analy~is, whIc,h dates at least from Wishart (1938), works when Xij = Xi for all 1, and J Slllce t~e summary value which results from stage (1) can only be regress.ed on, Xbillll 'f ' tant explanatOlY vana es stage (2). This approach is less use fu I I Impor change over time. ary statistics we . d' th . ated responses to summ . ' In lIeu of re U~I~g e Ie~e. ., This book will discuss three can model the indIVIdual Yij III terms of X~3' distinct strategies. . as in a cross-sectional study. The first is to model the margmal mea: . tory disease in children For example, in the ICHS, the frequency 0 respIra

INTRODUCTION

APPROACHES TO LONGITUDINAL DATA ANALYSIS

red Or in the 18 . A deficient would be cO~f;ized as a funcare and are not Vitaml~D4+ level would be chabra independent, this I the average t likely to e f h Wh0 . CD4+ examp ~, ;ated values are no. bout the form ate cortion of time. Sl~ce :~~ also include assumptIOns :sum E(l'i) = X t (3, and e margin.al arw!1/8'tIJ ~ . the linear model we c~n t d The marginal model relation. For exam:h:~:n,8 and 0 must be estlm~i: 'the mean and co~ari Var(Yi) ::=; 11,(0) 1 tage of separately mode gb ut (3 can sometimes approach ha~ the ~ van d below valid inferenc~s a 0 d '11 be dlscusse , ~ V( ) IS assume . . anee. As WI 'hen an incorrect form or ~ I assumes that correlatIOn be made even W ac'h the random effects rno e, . n coefficients vary A second appro , b use the regresslO . . among repeated responses ehca ditional expectation of Yi) given arIses ., H we model t e can across indiVIduals. ere:. . b on specific coefficients, ,8t, y I tlepers , (1.5.1) E(Y;j

l,8d =

X~j(3i'

. gle person to estimate (3i from . t little data on a sm r t' ns Because there IS 00 h t th (3. 's are independent rea lza 10 (Yo X) alone, we further assume t a I'fe t 't (3. = (3 + Vi where (3 is II I .., . th ean (3. we wn e t • from some dlstnbutJOn WI m . bI then the basic heterogeneIty -mean random vana e, h . U ' fixed and. i IS abezero restate d'III t. el rns af the latent variables, Vi. T at IS, !l8sumptJOn can t d b the U' 's that are common to there are unobserved factors represen.e y t Ie thus inducing all responses for a given person but which vary across peop , . ICHS it is reasonable to assume that the propensIty ' I the I t h t e carre a JOn. n , . , . f th . of respiratory infection naturally varies across chIldren IrrespectI.ve 0 elr vitamain A status, due to genetic and environmental ~actors whtch cannot be ellSily mellSured. Random effects models are partIcularly useful when inferences are to be made about individuals as in the CD4+ example. The final approach, which we will refer to as a transition model (Ware et at., 1988) focuses on the conditional expectation of Yij given past outcomes, Y;j-I, .. " Y;1' Here the data-analyst specifies a regression model for the conditional expectation, E(Yij IYij -I, ... , Yit, Xij), as an explicit function of Xij and of the past responses. An example for equally spaced binary data is the logistic regression Pr(Y;j=I/Y;j-t, ... ,Yil,Xij) _ g 1 - Pr(Y:, = 11 Yo,IJ -I, ... , v:.I il, ~3 ) I)

10

19

respons~~. With cr~ss-sectional data, only

the> de]wnd('IH'P of Y on X lH'pd be speCified; there IS no correlatiOiI TI', t I I .. " . If rf' arf' a f'a.'it t I[\'e COllsequencps of Ignonng the correlatIOn when it f'xists in longitudinal data: (1) incorrect inferences about rpgression ('oefficif'nts. (3:

(2) estimates of (3 which are indfil'iPIll. t hat is. If'sS prpcisf' than !JossiJ,!p: (3) sub-optimal protection against bia.'if's causf'S by missing data. To illustrate the first two, suppose Y'j = ,1 0 + ,11 f J + ('J' f J =: -:t _ 2. , ... 2,3 where the errors, E'j, follow tlw first-ordf'r alltorf'gn'ssivr modPl. tij = Otij-I + Z,j and the Z'j'S af(' independrllt. mf'an zrro. Gaussian variates. Suppose further that we ignore the correlation and USf' ordinary least squares JOLS) to ?btain a slope estimate. i3 . and an f'stimatf' ~f 0L8 its variance, VOLS . Let {3 be the optimal estimator of (3 obtained by taking the c?rrelation into account. Figure 1.7 shows t~e true variancE'S of i3 and (3 as well as the stated variance from OLS, VOLS , as a function of 0LS the correlation, ex, between observations one time unit apart. Note first that VOLS can be grossly incorrect when the correlation is substantial. Ignoring correlation leads to invalid assessment of the evidence about trend, Second, is less variable than OLS ' That is, better use is made of the available information about the trend by accounting for correlation. The two costs listed above are true for most regression problems and correlation patterns encountered in practice.

fJ

fJ

,,

, ,, .

,

0.10 , Q)

u

,-

C

ell

~ 0.06

,,

".,"

,;;,'

.,.

"", .

.. '

X~3 (3 + o:Yij - 1 .

,

, , . ,,

(1.5.2)

....

,.'

o(.~

•

_-----_._-

0.02

Thansition models like (1.5.2) combine the assumptions about the dependence ?f Y on x and the correlation among repeated Y's into a single example, the chance of respiratory infection for a child h h mIght depend on whether she was vitamin A deficient but also on w et er she had an infection at the prior visit, In each of the three approach d I .~l we mo e both the dependence of the response on the explan t a ory vana es and the autocorrelation among the

;:~:~I~~H~s ~n

-1.0

0.5

0.0 First lag correlation

0.5

1.0

. f OLS estimates, and actual variFig. 1.7. Actual and estimated vaTlance~ 0 functions of the correlation . d Ieast -squares estImates, ance of optimally welghte d bas OLS : actual "lor OLS ' reporte Y t . between successive measuremen s. ' .........: best possible.

RODVC'fIOl'i Jl'i'f 20

,

robleIIlS can

ORGANIZATION OF SUBSEQI , 1':;\ T ('IL\PTEHS

'f ned into two be partl 10

analYIlIS P . dic focus and the I angitudinal data the seJen I b lr f v on X IS grail P ' sian 0 J h greater than the num er , ( ) is rnuC c the .regres tal umts m ( 1) thasP, whp,r , mber of cxperJIIlen 't (n)' . t or where m is nll , e r um " 'me mteres of observatIOns p. relation is of prI where the cor (2) problems , the average number of small. , Example 1.1 to estlIIla~en falls into group 1. The CS data m oconverslO . I t this Using the MA , oftiIIle since ser t'ons is immatena 0 , a functIOn t d observa I . l' CD4+ CD4+ cells as 'mong repea e . f one indivldua s n correlatIOn a E timatlO 0 d II d h nature 0 f t e , lative to n. s t be correctly roo e e , d m IS large re ' t cture mus th purpose an ,the correlatIOn s ru n Sitka spruce grow , f type 2 smce ' h ffeet of ozone 0 curve IS 0 t' ferences about tee of similar size. rree Drawing co , mal 0 of type 2 since m and n ,are but is nevertheless use1 3) IS S , b 's impreCIse, f 1 (Exan JPleI' 'lication scheme a ove I I ' With objectives 0 type , The c asSI , for ana ySls. d II' .' f tl'me in correctly mo e mg gu ide to strategIes ful as a roU gh th majorIty 0 d h . th data analyst must invest e explanatory variables an t eIr th: mean, th~t is, including all n~ce~s;~~ctional form for each predictor, g interactions and identifyi,n t:~ell;~g the correlation. A.s we will show in Less time need be spent m ~ robust variance estImate can be used Chapter 4, if m is large relatIve to n ~ rameters even when the correla'd' r s about regressIOn pa . to draw vall 1Illerence b th the mean and covanance , 'fi d In group 2 however, 0 tion is mlsspecl e. , I' t to obtain valid inferences. models must be approxImate y correc .J

,

1.6 Organization of subsequent chapters This chapter has given a brief overview of.the.ideas underlying lo~gitud ina! data analysis. Chapter 2 discusses deSIgn Issues such as choosmg the number of persons (experimental units) and the number of repeated observations to attain a given power. Analytic methods begin in Chapter. 3, which focuses on exploratory tools for displaying the data and summarIZing the mean and correlation patterns. The linear model for longitudinal data is treated in Chapters 4-6. In Chapter 4, the emph3.'lis is on problems of type 1, where relatively less effort will be devoted to careful modelling of the covariance structure, Details necessary for modelling covariances are given in Chapter 5. Chapter 6 discusses traditional analysis of variance models for repeated measures, a special case of the linear model. Chapt.ers 7-11 treat extensions to the longitudinal data setting of generalized lInear models (GLMs) for discrete and/or continuous responses. The three approaches based upon marginal, random effects and transitional models are contrasted in Chapter 7 E h h . . . '1 . . ac approac IS then dIscussed m detal , WIth the focus on the anal . f t · ySIS a wo Important special C3.'les, in which the

21

rf'RpOnSe variable is eithpr a binary Olltl'CJI!J(' or a COUllt, whilst Chaptf'r 11 df'scribes rf'cent rnethodol(J~i('al r!(>V£>\OpllH'Jlts whidi illt('~rat(' thp diffprent approaches, Chapter 12 deRrTib{>s th(· prohlpJlIS which call arise whf'1l a t.ime-varying f'xplanat.ory varil1blp is df>rivl'fj fro!ll a ;,tflf'h'Lo.;tic proc('ss whir:h may interact with t.he reRj)oIlS{' pnH'pss ill a ('olllplp); nHlIlllPr: in partie ular, this req uif{~S ('an'ful considpff\t iOIl of \\' hat f"rlll of cOlld it ionin/!; is appropriate. Chapter I:! dis('ns~i(>S til(' pn,blplJIs raispd by missing values, which frequently arise in IOIlp;itwlinal stndips. Chaptl'r 11 givps "hart introductions to sf'veral additional topics which ha\'(' l>pf'n til(' foellS of recent research: non-pararnptric and lIon-linp~lr IIlIHlp\\iIl/!;: 1Il1lltivariate extensions, includinp; joint Ulo(!l-.lling, of lonl!,it \l(lina\ JlWaSlIfl'lIlpnts and recurrent events. Appendix A is a hrief rf'vip\\, of t hp st atistical tIlf'ory of linear and GLMs for regression analysis with indq)f'ndent observations, which provides the foundation for the long,itudinal methodology df'vf'loprd in the rest of the book,

BIAS 23

[(p-expressing (2.2.1) as Y'j

= til' + r"iJ-" + Ij(.r ')

r

-

. tI

1+

(2.2.2)

(IJ'

we IlCJte that . !V tbat t1lP I" .... , . . this modp!. .a'iSIlIlW" . : - ; nup I'wit due' to .Ed IS thp saIne as tlw lorl"l \. . ! 'ff fO",,-,,( I"llOllal p/fpc1 . h h" )!,I 1]1 IlJo. p PI"I fI'pn'''PlJtf'd h. the ng t- and RICh,. This assllmrlti ' , ! ~ .r ·-.TiI OIJ . . Oil IS rat If'f a sl fOIJ ' ., I I faIl 111 many studips. Thp modp) ("'1 I I'f' g (HII Mil (oOHwd to n nllH I j('d by '\11·' I to have their own intprcept (j tI 't " 1 . , ' ! J \ \ Ing 1.'0.1" , pprson . . '. 0,. ,,
2

Design considerations

l )

)f'

,J.

M

'

Y:1) == (:10, + IIj( .r'j-,rll)+f,). 2.1 Introduction

. 'onal and longitudinal studIes from ' ·h fer we contras t crosS-S ectI I n th IB c ap". . d' Chapter 1 the major advantthe deAign perspectIve. As dlscusse III , . . r" J 't d' 1study is its ability to distinguish the cross-sectIOnal age 0 a ongI u lIla . d h and longit.udinal relationships between the explana~ory vaf1able~ an. t e response. The price associated with this advantage IS the potent.lal hIgher COBt incurred by repeated sampling of individuals. At the deSIgn stage, we can benefit' from guestimates of the potential size of bias from a cross-sectional study and the increase in precision of inferences from a longitudinal study, This information can then be weighed either formally or informally against the additional cost of repeated sampling. Then if a longitudinal study is selected, we must choose the number of persons and number of measurements on each. In Sections 2.2 and 2.3 we quantify the potential bias of a cross-sectional study as well as its efficiency relative to a longitudinal study. Section 2.4 presents sample size calculations when the response variables are either continuous 01' dichotomous.

Bot~ (2.2.1) ,a~d (~.2.3) represent extwrne ca,e" for l1Ioddlinp; till' crosssectIOnal vanatlOn m the response '1'Inf.. I n t IIf' 'lonner C3.W. . variable at l·Ja,e one assumes that the cross-sectIOnal 3.'>Sociation with r "tl 't d' I f f ' ,I IS 1(' samf' a'i t h. e IongI u ma e eet; m the latter case, the baseline level is allowed to be dIffe~ent for ev~ry person. An intermediate and often more effective way to modIfy (2.2.1) IS to assume a model of the form (2.2.4)

as suggested in Section 1.4. The inclusion of Xii in the model with separate coefficient (3c allows both cross-sectional and longitudinal effects to be examined separately. We can also use this form to test whether the cross-sectional and longitudinal effects of particular explanatory variables are the same, that is, whether f3c = (3L. From a second perspective, one may simply view XiI as a confounding variable whose absence may bias our estimate of the true longitudinal effect. We now examine the bias of the least-squares estimate of (J derived from the model (2.2.1). The estimate is m

2.2 Bias

i3 =

As the discussion of the readin I' . is possible for the ass . t' ~ examp e III SectlOn 1.4 demonstrates, it cl a response Y deterlll1'neod f Ion etween an explanatory variable, x, and 1 , rom a cross t' I from the association measured . I -s~c rona study to be very different liT b III a ongItudinal stud ~ve egin our stUdy of bias b fa " " y. SectIOn 1.4. Consider a y r~ahzmg thiS Idea using the model from . response vanable th t b h vanes among subjects Example . I d a at changes Over time and Adopting the notatio~ given i s ~nc.u e age, blood pressure, or weight. the form n ectlon 1.3, we begin with a model of

Y:-I3+ 'J 0 13Xij + f··

tJ'

,

,···,m.

(2.2.1)

L:

n

m

L:(Xij -

i)(Yij - Y) /

i=1 j=1

L

n 2)Xij -

x)2,

i=1 j=1

where x = Lij Xij/(nm) and Y = 2:ij Yijl(nm). When the true model is (2.2.4), simple algebra leads to A

E(f3) = (3L

+

2:~1 n(Xil - XI)(Xi - x) ((Jc - (JL) (X" - i)2 ,

"m "n

L..t=1 L..-J=I

tJ

" I d - - " x· 1m . Thus the cross-sectional estimate were Xi = u . Xij n an Xl - ui tl h which assu~es {3L = {3c, is a biased estimate of f3L and is unb~ed onl it' either (3L = {3c or the variables {XiI} and {xd are orthogon to eac

i3

.

J==l, ... ,n· i==1

(2.2.:~)

h

DESIGN CONSIDERATIONS 24

.

. f a surprise ot,her. The latter result, IS a n Yti == (30

I'f one re-expresses (2.2.4) as f" (2.2.5)

+ (3L X ii + XiI ((3c - fJd +

'J'

. I ase of (2.2.4) where the . (2 2 1) is seen as a specJa C In t,his re-expresslOn, '. . the model. . variable Xil has been oml.tte? fr,om an estimate for the longitudmal effect, and X. as is clear from T he direction of the b188 III (J as . J ti n between Xii tl . a dependR upon th~ corre a a . essage to be conveyed here IS vL, E((3) bove The mam m a .' , h' J hange over time and vary across the expression for '. t,hat, when dealing w~th c.ov~rIate~ wr~:i~e to use model (2.2.4) instead of subjects at the bl1selmc, It IS goo P t e Jarate the longitudinal effect, r odel allows one a s r (2.2.1). T he Jorr~er. n~ h from the cross-sectional effect where which describes llldlvlduaJ c ange,. d"d Is with different x's. ' on Y is made between III IVI ua companson 2.3 Efficiency

addressed the issue of bias. in cross-sectional studies of . Th e prevlOUS sectl'on . change. Even when (3c = (3L, longitudinal studIes tend t~ be ~ore powerful, liB we now examine in detail. Assuming the model speCIfied III (2.2.1), the variance of which uses the data from the first visit only, is

Pc,

EFFICIENCY

10 case 1, we assume the 'f um arm conel' R k = P for an . 4 k ' a t'HJll Illat rix, H J Y J.,... '. The f' £Uoetl'(O I f' I j ((> 111(>( 'il . I '

f'

=

~~~~J~~I (

H)\('

_ (J)

2"

J' ,c

t \('11

l'

. If J

rl'
=c

k al1d

to

__

+6){1-p+Tl(J6/(1t-,))}'

where

8=

the aw~raJTpd witl ' I' h '.." ~11-S_ll_)J('f't \'arialio!1 ill .f t e betwpen-subjpcts vari'lti(;l -1'- .-. - -. ~ _

L,)x'J -

,

11 .1 ,II \'lS11

1

i,j2/{m(1I -IJ}

L,(X,-iF/(m~' Figure 2.1 gives plots of e against 8 f Except when J is small and the c;mU10~r('soU1~;.plf'<'t('.(~ v.ahws of p and /l. to be gained by conducting longitudl'nal·~trred.a,.Io,n P IS high, there IS much repeated observations is as small 11..<; two. S u IPS even when th e num b er 0 f has

Itnh~~~~'Rwe ~on~;~~~

the. si:uation when the true correlation matrix P . ThIS IS the correlation structure of a first order autoregreSSIve process discussed in Chapter 5. In this case •

]k -

1 - p2

e where (12 = Var(€ij)' On the other hand, the variance of JL, which uses all the data, is the lower right entry of

where

~ is the ni x ni correlation matrix for Y; = (Y.. t

XI _ (1 i~1

1 Xn

ti"

v.) .. ,

I

in;

an

d

1) x . tn,

We can address the question of how repea,!;ed measurements on h much more can be learned by taking d (1 eac person by co . h ~ an . c in (2.2.5) when !3e == (3 . W mpanng ~ e varia!lces of f3L ~pecIfic ~easure of efficiency. The\ma~leUse e == Var(!3L)/Var((3c) as the mfOrmatlOn gained by taking additional r the value of e the greater is the . As expected, the value of e de d measurements On each person tlOn matrix W pen s On the true t t . 1 . . ' e Consider two correIai' . s rue ure of the correlaongItudlllal studies. For simplicity Ion m.atnces commonly Occurring in , we Continue to assu me ni = n for all i.

== {;(I ---p--)'{n--~(n------=-'2)---"'p};-+~o'Y---"'/---"'(n-+-l-)'

where .'Y = n(n + 1) - 2(n - 3)(n + l)p + (n - 3)(n _ 2)p2. The message regardmg efficiency gain is similar as can be seen in Fig. 2.2. We note that in either case, e is a decreasing function of o. One implication for design is that we may obtain a more efficient (less variable) estimate of (3 by increasing the within-subject variation in x when possible. Suppose Xij is the age of person i at visit j so that {3 represents the time rate of change of the response variable. With three observations for each person, it is better, all other factors being equal, to arrange the visits at years 0, 1, and 3 rather than 0, 1, and 2. In the former design, the within-subject variation in x is {(O - 4/3)2 (1- 4/3)2 (3 - 4/3)2}/3 = 1.45 as opposed to {(O - 1)2 (1 - 1)2 (2 - 1)2}(3 = 0.67 in the latter case, The efficiency for the former design depends on the type of correlation structure, the magnitude of p, and the between-subject variation in x at visit 1 (the denominator of 0). For example, with exchangeable correlation structure with p = 0.5 and the between-subject variation at visit one equal to 10, the variance of for the (0, 1, 3) design is only 78% of that for the (0, 1, 2) design. In other words, delaying the last visit from year two to year three is equivalent to increasing the number of persons in the study by a factor of 0.78- 1 = 1.28.

+

/3

+

+

+

ERATIONS

--------l

DESIGN CONSrD 26 (a)

(b)

SAMPLE SIZE CALCULATIONS (a)

1.0

27

1.0

1.0 1 - - - - - -_ _

(b)

1.0

0.8

0.8 0.8

~ 0.6

~

ccI)

c

>.

~

'0

ffi

tJ

0.8

0.6

~

>.

u

Q)

'0

0.4

:E

w

0.4

U

it:

w

0.2

0.6

c::

(]J

0.4

0.2 0.2

0.2 0.0 0.0 L.,----r-.--,..-~~ 0.0 0.5 1.0 1.5 2.0 8 (el

0.0

0.5

1.0

1.5

0.0

8

0.0

0.5

1.0 8

~

(el

1.0

'0

:=

0.0

0.5

1.0

1.5

2.0

1.0 r------_~ 0.8

~

0.6

>.

u c:

l: CD

w

2.0

8

0.8 ~

1.5

0.6

Q)

'0

0.4

ffi

0.2

0.4 0.2 "

--"':.::.::..:..::..=.~.==--=---=--=:: -

0.0 0.0

0.5

1.0 8

1.5

2.0

Fig. 2.1. Relationship between relative efficiency of cross-sectional and longitudinal estimation and nature of variation in x, with uniform correlation structure and n observations per subject: (a) n = 2; (b) n = 5; (c) n == 10. - - : p == 0.8; - - -; p = 0.5; - - - -; p == 0.2; - . - . -: p = 0.0.

2.4 Sample size calculations As with cross-sectional st d' . t' . d . u les, mves Igators conducting longitudinal studIes nee to know In advance the b f required to achieve 'fi d . n~m er 0 subjects approximately a specI e statIstIcal power l t d . t' gators must provide the followin .. . n any s u y, mves 1sample sizes. g quantItIes to determine the required

(0:)- This u t't ity that the study will' . t ~ an 1 y, denoted O!, is the probabilreJec e null hypothesis when it is correct;

1. Type I error rate

o.a '---..----,r---,.---.,..---,-J 0.0

0.5

1.0 8

1.5

2.0

Fig. 2.2. Relationship between relative efficiency of cross-sectional and longitudinal estimation and nature of variation in x, with exponential correlation structure and n observations per subject: (a) n == 2; (b) n = 5; (c) n = 10. --; P == 0.8; - - -: p

== 0.5;

- - - -: p == 0.2; - . - . -: p = 0.0.

for example, it would correspond to the probability of declaring a significant difference between treatment and control groups when the treatment is useless. A typical choice of O! is 0.05. 2. Smallest meaningful difference to be detected (d): T~e i~:e~~ig~ ators typically want their study to reject the null ~ypothesIs v:'lt ig probability when the parameter of interest deViates from Its value under the null hypothesis by an amount d or more, whe~e t~ va~~e of d is chosen to be of practical significance. We call thIS v ue e smallest meaningful difference.

DEfiIGN CONSID

ERATIONS

the probability that the p of a statistical test J~ et Of course, this h ' is mcorre . P)· Thl) power, , .1. power ( . ' II h otheRis w en I t h . as the power study rejects the ~~UJ i:::arreet is the null hhYPdeo~i:::;~ from the null ust dep('ncj on t a 8B t e m " . ., the type I error ra e" . ent might be that the must appro ac 1 · A t pical reqUlrem h .... proa(:hes zero. Y.. f the null hypot eSls IS ' hypot IWillS ar h the devIatIOn rom power is at lea'lt 0.9 w en '. gfuJ difference. aUest meanlO . bl V" J sm at least r.,i tIe ", t' ous response vana e, I iJ ' • • (2). For a can IOU . b'l't . ""'urement varIatIOn (T • th unexplained varIa 1 I Y III 4. Me= 2 V (Y. ') measures e bl the quant.ity (T = ar .) 2' , etimes available to a reasona e lue of (J IS som . ' d'· . h '1 t tudies or SImIlar stu les prevlthe response. T e v a . , . through eIther pI 0 S . ., t l' 't approXllnatJOn 0 h . e the statIstIcIan muS e ICI ted in the literature. I, erwlS , ously repor ' I 'ble guess at its value. from the investigator a p mlSI .. . h £ Hawing additional quantities are In longit.udinal studIes, teO 2/l

.

needed: bservations per person (n): The number, n, 5 Number 0 f repea ted O . 1 'd ' of observations per person may be constrained by prac~lca consI .era. d to be balanced against the sample SIze. For a gIven twns, or may nee total cost, the investigator may be free to choose between a small value of n and a large sample size, or vice versa. 6. Correlation among the repeated observations: As with the measurement variance, the pattern of correlation among repeated observations can sometimes be estimated from previous studies. When this is not possible, a reasonable guess must be made.

In the remainder of this section, we give sample size formulas separately for continuous and binary responses. For simplicity, we present only formulas for two-group comparisons. However, the principles apply to general study designs, the formulas for which will be developed and presented in Chapter 8. 2.4.1

Continuous responses

We consider the . two groups A and B We . simple pro blem 0 f comparmg assume that III group A the res d , . variable as follows: ' ponse epends on a single explanatory Yi'==(3 +(3 IAXij J OA

+ fi J''

(,ALCULATI()I';~

the duration between thr> fi . 29 . tl,1e rates of chalwe' erst awl thp tl ," , are Y f .l I \ ISlt. III wltich ,., , 4 L ,., III or j1,rnll\ s A · ,Ls' ,JI A and .j et. zp denote t.he pth 'I J, and B rf'sp.'l'tiv('hIII 1 - (1· ' qllantl p of a st· II1 I· I ' , . ~ - III 1- IJIA be the O!f'.t.S pf'r " f ( I 11((1\ f 1I11"n'st, \\'illt 11 f·lX(,.,( all( 1 I error rate rt and !)()'''P p , ,., ) 1p wt an' n""d"d t " ',I' " r . IS ,I( Ill'\'p t vpP c

.

f)

(J)

(:lA,ll

where Q = 1 - P and 8 2 = ~ ( . :r: L.. I t h e Xj' J J

-

"' '

r)2/ 1 tl

. I'lill-snbj('l't variann> uf

I..JP \\'\t

To illustrate, consider a hYI)()tllet']r'al .1" , .' . , , e \lHeal..tnal Oil t IH' f' ffPet of a new . . pressnre. Tlue\" VI ,'t, . I " VISIt, are planned at years 0 2 a d h TI SIS. me III lllg thp hasrline , , n 0 IUS n - '3 I 2 error rate Q = 0.05 power P _ 0'8 " -, ant 8:r: = ·t.22. For type I , ' and small t ' d = 0.5 mmHg/year, the table belo~ . es nIPalllllgfll1 difference for both treated and control f gIVes the numlwr of subjects needed groups or some selected values of p and u2, · t rea t ment III reducing blood

p

100

200

0.2 0.5 0.8

313 195 79

625 937 391 586 157 235

300

Note that for each value of 0'2, the required sample size decreases as the correlation, p, increases. Finally, the sample size formula presented above can be extended easily to correlation structures other than the exchangeable one. Let R, a n x n matrix be the common correlation matrix for each subject. In this case, one simply replaces 0'2(1 - p)/(ns;) in (2.4.1) by the lower right entry of the following 2 x 2 matrix

]. == 1, ... ,n,. 'z -- 1

, ... , m . group B, the same equation holds but· . (3IB. Both groups have the WIth dIfferent coefficients f30B and same number of b' ' repeated observations W su Jects, m; each person has n p for all j f:. k. In additI'oenassume that Var(fij) = 0'2 and Corr(~· ~k) = f I ' we assume th t tJ , t o exp anatory variables so th t a each person has the same set a Xij = x'J' Ayplca t · 1 example of such x' is J In

SAMPLE SIZE

A second important type of problem is to estimate the time-averaged difference in a response between two groups. The appropriate statistical model

DE.'1IGN CONSID

ERATIONS

FURTHER READll';G

'

31

A SHUInlllg also that Cor (Y Y , f I ' r 'J. .k)

30

is written as

Yij ; : ;

a I'll

+/31Xi

+tij,

. i;;::; 1, .. " 2m, ';;::; 1. ... ,n, J . I'·t . bi for the zth su )Jec .

· tor varIa e esponse for groups . nm ent in dlca the average r where XI Iii t e . nI"ngful difference bctwe: n groUP is Let d be the mea I f subjects neede per A and n. The num ier a ) }/ d2 )2 2f1 + (n - 1 P n TTl ::= 2(z" + zQ (> /( A 2) (2.4.2) 2fl + (n - l)p} n/J. , ) == 2(z" + zQ '/r in standard deviation , gful dlHerence I A - d/(f ill the smallest meanm h t ble below gives the samp e where . r, == 0,8, and n ::= 3, tea A units. Fbr a :=: O.Oil, P £ e selected values of p and /J.. size needed per group lor som .

= (J for all J. -i--r k aurl that d is t hp Sll II . mealllug U difference b"t. " I " W(cn t If> re:-;ponsf> " . . . la pst B, the number of subjects . I I prol""lllhtlPs fOI RlOups A, d . nfP( cr fWI J!,IOllp is . an T1l=

. h treatment asslg

{ z0

(2-~)I/:l ])(1

20

Note that in this case, for each value of (12, the required sample size increases with the correlation, p. . It may seem counter-intuitive at first glance that the sample Size formula is decreasing in p in the first case, but increasing in the second case. The explanation is that in the first case, the parameter of interest, (3, is the rate of the change in Y. As a result, the contribution from each subject to the estimation of f3 is a linear contrast of the V's whose variance is decreasing in p. However, in the second case, the parameter of interest is the expected average of the V's for individuals in a group and the variance of the corresponding estimate is increasing in p. For an arbitrary correlation matrix, R, the number of subjects needed pe; ~~up ~s readily modified by replacing {I + (n - l)p}/n in (2.4.2) by (1 R 1)-, where 1 is a n x 1 vector of ones.

2.4.2 Binary responses For a binary respons . bl . case in Section 24 1eTvahna. e, we conSider the situation similar to the last . .. at IS, we assume Pr(Y;j = 1) = {PA PB

~n group A III

group B

j

= 1, ... , n;

i = 1, ... ,m.

-

1)(,)

d

P

0.3

0.2

0.1

0.2 0.5

15 21 27

35 49 64

204 265

0.8

0.2 146 65 37 24 0.5 208 93 52 34 0.8 270 120 68 44

+ (11

- - - -----.

where p = (PA + PB)/2 and ~/- = I - Ie 1- 0 5 h i, '(,r () (I.W, P . .' PA - ", t e tahle below givps till'I' . - 11.1'\. 11 = .1. aud . SdlUP p SIZ(' Ill'pdp I se I ec:te d p and d = /'riB -I) A· A··' tI. . . 1I( p"r l!,roup for SOlllP S II! Ip all"lr . "'I!,0lls pro) ('III wit I I ' . t' 0\ (011 1I11l01lS

40 50

30

IIU}2 _~j--, ~I

Il,Jl

Li (%) P

+ZC,i(PA'}A.+!i {

143

response variable, the required size increases. wI'th th" carreIat'lOn, p. .,,, . sample . Furthermore, the same modificatIOn of the sample size formula can be used for general correlation structures as described above. In su.mmary, this section has shown that correlation hetween repeated observatIOns can affect required sample sizes differently, depending on the problem. A simple rule of thumb is that positive correlation increases the variance, and hence the required sample size for a given power, when estimating group averages or differences between averages for more than one group. In contrast, positive correlation decreases the required sample size when estimating a change over time within individuals, or differences between groups in the average change.

2.5

Further reading

In this chapter, we have discussed some of the most basic design considerations which arise with longitudinal studies. In particular, we have assumed that each subject is allocated to a single treatment group for the duration of the study. . . There is an extensive literature on the more sophisticated deSign Issues which arise in the related area of crossover trials, in which each subject receives a sequence of different treatments, with the or~er of presenta~ion of the treatments varying between subjects. Much of thiS work emph~lzes the construction of designs which enable valid inferences to ~e made m the h by t h e res.pouse to a gIVen treatment presence of carry-over effects, were may include a residual dependence on the prevIOus treatment.

NSIDERATIONS

W'lliaIllB (1949) 'gns are 1 d to crossover des~ sed in IIedayat an arliest references ments are discUS and Jones (1984). TWO of the (1951 ), Later d~v~: (1983), and BiShO~1992) give general and ~at~::(~975, 1978), Af8arm~ (1989) and by senntrials. The work ~y AfsarmeJ b Jones and Kenwar I sis of crossover elated data wlil The book,s Y the design and ~a. y alculations for carr introductIOns to n sample Size c Liu and Liang (1997) tod in Chapter 8. . , d an d presen e be utIlize DESIGN CO

32

3

Exploring longitudinal data 3.1 Introduction Longitu~inal data an~lysis, like other statistical methods, has two components which operate Side by side: exploratory and confirmatory analysis. Exploratory data analysis (EDA) is detective work. It comprises teclmiques to visualize patterns in data. Confirmatory analysis is judicial work, weighing evidence in data for, or against hypotheses. This chapter discusses methods for exploring longitudinal data. John W. lUkey, the father of EDA, said, 'EDA can never be the whole story, but nothing else can serve as the foundation stone - as the first step' (lUkey, 1977). Data analysis must begin by making displays that expose the patterns relevant to the scientific question. The best methods are capable of uncovering patterns which are unexpected. Like a good detective, the data analyst must keep an open mind ready to discover new clues. There is no single prescription for making effective graphical displays of longitudinal data, but the following are a few simple guidelines:

(1) show as much of the relevant raw data as possible rather than only data summaries; (2) highlight aggregate patterns of potential scientific interest; (3) identify both cross-sectional and longitudinal patterns as distinguished in Section 1.1; (4) make easy the identification of unusual individuals or unusual observations, h I t ' nship of a response with Most longitudinal analyses address t e re a 10 tt I t of the . I ding time Hence a sca erp 0 f explanatory variables, 0 ten me u .' b" d' play Section 3.2 1 t iable IS a asiC IS . response against an expo ana ory varw the uidelines above. Special atteng. equired to avoid graphs discusses scatterplots desIgned to folIo ,. 1 data sets where care IS r h . tIon IS gIven to arge d' smoothing techniquest at that are excessively busy, Section 3.3 Isc~sses f an explanatory variable . . 1 se as a functIOn 0 1 highlIght the typICa respon . d 1 Smoothing splines, kerne without reliance on specific parametnc mo e s.

GITUDlJl1AL DATA

CRAPHICAL PRESENTATION

EXPI,ORING LON

,

'.,

. d S tion 3.4 drs(llsrevlewe . eC ~ ml'thod, lowcHlJ, Ilre eated observations or (!/oitimntorfol, fwd Il rol)llH. the' 1lS.'lociation arn(Jn~ replIy spaced times, a.<;soHeH metJlOds to exp or,e ;rvatioll8 are made at equal tl'on With unequally .. 'j I When o)H . ( f ('orre Ii ' . t ltD jr((jjVJ( lIa . .' J '/ilillrecl in tflrms ) . ~ t've In this chap 1'1', r,iat.ion is (~()IlVl~~IJ{l;t ;~I;~IlJio~ram i.'! often more e he~:ISe~tion 3.5 we also HPIlN!d Ohllervflt..Jon ': '!I'rlUOIJH responses, althvug, 8) graphical method g . ~ s IS on (.011. Z 19, a t.hr main Or,I;.) warn (Heagerty and ~ge~. I categorical data. Other deHf~rjhe thn orn o. . HtrtwtlJJ'f~ in long ltu ma . . tel'S for expJorin/ol fl.HHOc~at/on 'j' t 1'1' iIlllstratmj in later chap . . data c1isplayR for dlserde I lUI a :14

I

I--

t

Table 3.1. Bodyweights of pigs in nine successive weeks of follow-up. The full data-set. consists of measurements on 48 pigs. Dat.a below are for 16 pigs only. Week 1

24.0 22.5 22.5 24.0 24.5 23.0 22.5 23.5 20.0 25.5 24.5 24.0 23.5 21.5 25.0 21.5

2

3

4

32.0 30.5 28.0 31.5 31.5 30.0 28.5 30.5 27.5 32.5 31.0 29.0 30.5 30.5 32.0 28.5

39.0 40.5 36.5 39.5 37.0 35.5 36.0 38.0 33.0 39.5 40.5 39.0 36.5 37.0 38.5 34.0

42.5 45.0 41.0 44.5 42.5 41.0 43.5 41.0 39.0 47.0 46.0 44.0 42.0 42.5 44.0 39.5

5

6

48.0 54.5 51.0 58.5 47.5 55.0 51.0 56.0 48.0 54.0 48.0 51.5 47.0 53.5 48.5 55.0 43.5 49.0 53.0 58.5 51.5 57.0 50.5 57.0 47.0 55.0 48.0 52.5 51.0 59.0 45.0 51.0

7

8

9

61.0 64.0 61.0 59.5 58.0 56.5 59.5 59.5 54.5 63.0 62.5 61.5 59.0 58.5 66.0 58.0

65.0 72.0 68.0 64.0 63.0 63.5 67.5 66.5 59.5 69.5 69.5 68.0 65.5 63.0 75.5 64.5

72.0 78.0 76.0 67.0 65.5 69.5 73.5 73.0 65.0 76.0 76.0 73.5 73.0 69.5 86.0 72.5

. '"

-=~:l"

80

OJ 60

~

.c0>

~

40

rC8cntation of longitudinal data 3.2 GraphIcal J> , , ' . ' 1 1 h is the scatterplot of the . J' I dal a an obVIOUS hrst gr P Wit.h !on/ollf.IH Ilia ", , III'. fltTaillSt. tune, ,., response Vflrln.) 3 1 Wci~hts of pigs h Examp Ie .. '.' . d I D ' Philip McCloud (Monas r data provlfle Jy I d' . ' t Tllhll' :3.1 )Ill ..'! 8 0 m ' . It f 48 pigs measure m nme , M If ume) on t.he WeJg 1$ 0 . Vniw,rslty, e)(). ". I' I t.he data graphically. Lmes connect sllccessive weeks. FIgure 3.1 (IS)) ays

_

(JJ- L1)]'\1,1 fI'[)J:\.\1. DATA

4

T,me (weeks)

Fig. 3.1.

8

Data on the weights of 48 pigs over a nine-wpek l)Priod.

the repeated observations for each animal This' I h ent a numb f . .' simp p grap makes apparS d er ~ Imp~rtant patterns. FIrst, all animals arp gaining weiRht. ec~n , the pigs which are largest at the beginning of the observation ~er:od t~nd to be largest throughout. This phenomenon is called 'trackmg. T~lr~, the spread among the 48 animals is substantially smaller at the begmm.ng of the study than at the end. This pattern of increasing variance over tIme could be explained in terms of variation in the growth rates of the individual animals. Figure 3.1 is an adequate display for exploring these growth data, although it is hard to pick out individual response profiles. We can do slightly better by adding a second display obtained from the first by standardizing each observation. This is achieved by subtracting the mean, Ii). and dividing by the standard deviation, Sj, of the 48 observations at time j, and replacing each Yij by the standardized quantity Y;j = (Yi] - Ii]) Is j' The resulting plot is shown in Fig. 3.2. Its effect is as if we were running a magnifying glass along the overall mean response profile, adjusting the magnification as we go so as to maintain a roughly constant amount of variation. As a result, the plot is able to highlight the degree of 'tracking', whereby animals tend to maintain their relative size over time. With large data sets, connected line graphs become unduly cluttered, and an alternative strategy for the basic time plot may be needed. Example 3.2. CD4+ cell numbers Figure 3.3 displays the CD4+ cell numbers for a subset of ~OO seroc.onverters from the Multicenter AIDS Cohort Study (MACS) agamst the tl~e . . ThI' P10 t'm cludes only those men WIth vanable, years since seroconverSlOn.

EXPLORING LONGITUDINAL DATA

GRAPHICAL PRESE .

. -NfATION OF LOI\C;rn'[)JNAL DAL\

36

:n

r-----_

3 2500

2 Q;

.D

E

::J

co

I:

'ijj 1500

::l

-c iii Q)

a:

. u

a

+

o

()

-1

500 -2

a

~

_ _---,.-

_ r-

-3 2

4

6

-2

8

Time (weeks)

4

.

. e Sll1ce seroconverslOn, wil h sequences of . d ata on each subject shown as connected j' . me segments.

2500 ~

Q)

.D

E ::J

I:

""iii 1500

"+

~

0

()

500

a

o 2 Years since seroconversion

2 Years since seroconverslon

Fig. 3.4. CD4+ counts against tim , ' ,

andardized residuals from data on the weights of pigs. Fig. 3.2. St

-2

-

o

4

Fig. 3.3. CD4+ counts against time since seroconversion, with lowess smooth curve.. : data; ~~: lowess curve. at least seven observations that cover the date of seroconversion. We can see that the CD4+ cell numbers are relatively constant at about 1000 cells until the time of seroconversion and then decrease afterwards. This mean pattern is highlighted by the non-parametric curve also shown in Fig. 3.3. The curve was calculated using lowess (Cleveland, 1979); the degree of smoothness was chosen subjectively. The curve indicates that the rate of CD4+ cell loss may be more rapid immediately after seroconversion. The repeated observations for an individual have not been connected in Fig. 3.3, so that it is not possible to discern whether the pattern of

decreasing cells is common to most individuals . Tl·la.t .IS, we cannot (I'1St .lll' . gUlsh cross-se:tlOnal from longitudinal trends in the data. The reader is referred to FIg. 1.1 for a discussion of cross-sectional and longitudinal patterns. Figure 3.4 displays the CD4+ cell data for all men, now with each person's repeated observations connected. Unfortunately, this plot is extremely busy, reducing its usefulness except perhaps from the perspective of an ink manufacturer. The issue is clearly drawn by these two examples. On the one hand, we want to connect repeated measurements to display changes through time for individuals. On the other, presenting every person's curve creates little more than confusion in large data sets. The problem can be even more severe than in the CD4+ example. See Jones and Boadi-Boteng (1991) for a further illustration. The ideal solution would be to display each individual's data with a very thin grey line and to use darker lines for the typical pattern as well as for a subset of persons. This is consistent with Tufte's (1990) 'micro/macro' design strategy whereby communication of statistical information is enhanced by adding detail in the background. Unfort~nately, such a plot is difficult on the type of graphics devices which are avallabl.e to many users An alternative solution is to connect the repeated observatlOns . . 0 f'm d'IVI'd ua1s. J on es and Rice for only a judicious selectIOn '(1992) . have proway posed this idea for presenting a large number of curves. The. slm~le~t 5 to choose a subset is at random, as is done for seven ~eopl~ I~ FIg. 'bl' to e There are two problems with random selection. ~Irst, IthlS pOSSllt· 1 . b h e especIally w en re a lve y obtain a non-representatIve group Y c anc ,

EXPLORING LONGITUDINAL DATA

.,8

GRAPHICAL PRFSFN1'\'rI ~', ON ()F I (

~

,

. ), ell \'111:\\1. 11\ L\

--~---if,

1500

::> -0

'" 'iii ::>

/

~

iii

+

~

+

'
0

500

500

U

0

0

~.

iii

-0

'
1500

'iii

0

0

-500

-500

~-~t72};~~~,i~;> " '~(/-~---=----

-2

o

2 Years since seroconve rSlon

4

Fig. 3.5. CD4+ residuals against. t.ime since seroconversi~n, wit.h sequences of dat.a from randomly selected subjects shown as connected Ime segments,

-2

o

Example 3.2. (continued) Figure 3.3 displays the average number of CD4+ cells for the MACS seroconverters. Figure 3.6 shows the residuals from this average curve with the repeated values for nine individuals connected. These persons had median residual value at the extrema or the 5th 10th, 25th, 50th, 75th, 90th, or 95th percentile. We have used residual~ rather than the raw data as thOIS somet'Imes h eIps to uncover more subtle

2

Years since seroconverslon

Fig. 3.6. CD4+ residuals auainst tl'rI1(' "I'r').' . , to " «(' S(Tl)('()IIY T ' data from systematically selectpd suI' . 'I ' I ( Slnll, ,

few curves are to be highlighted, Second, this display is unlikely to uncover outlying individuals, We therefore prefer a second appr~a~h, in ~hich individual curves are ordered with respect to some characterIstIc that IS relevant to the model of interest, We then connect data for individuals with selected quantiles for this ordering statistic. Ordering statistics can be chosen to measure: the average level; variability within an individual; trend; or correlation between successive values. Resistant statistics are preferred for ordering, so that one or a few outlying observations do not determine an individual's summary score. The median, median absolute deviation, and biweight trend (Mosteller and Tukey, 1977) are examples, When the data naturally divide into treatment groups, such as in Example 1.3, a separate plot can be made for each group, or one plot can be produced with a separate summary curve for each group and distinct plotting symbols for the data from each group. For equally spaced observation times, Jones and Rice (1992) apply principal components analysis to identify ordering statistics. Principal components do not exploit the longitudinal nature of the data, nor do they adapt readily to situations with unequal numbers of observations or different sets of observation times across subjects.

- - -

--,--- ' - ~ ~ ,

,

)J( (

S S 10\1'11 as ('(JllIl"C\('d

" I ()f Jill<' ,('glI\('llls,

\1 It 1 '''fjlll'lI('(>,

patterns in individual curves, Note that the 1· t' f ' I ' " (aa or eae I mdl\'ldual trnrl to 'ff trac k at d I erent levels of CD4+ cell ll\llllbers I t t ,I ' t t 'F' 3 , " lU lIO ncar \ t() the same ex en ~ 111 ,Ig: ,2. The CD4+ data have collsidl:'rably n;ore variation across tIme Wlthll1 a person. ' Thus f~r, w.e have considered displays of the response against time, In many 10ngltudll1al problems, the primary focus is the relationship between the response and an explanatory variable other than time, For example, in the MACS data set, an interesting question is whether CD4+ cell number depends on the depressed mood of an individuaL There is some prior belief that depressive symptoms are negatively correlated with the capacit.y for immune response. The MACS collected a measure of dl:'pressive symptoms called CESD; a higher score indicates greater depressive symptoms, Example 3.3. Relationship between CD4+ cell numbers and CESD score Figure 3.7 plots the residuals with time trends removed for CD4+ cell numbers against similar residuals for CESD scores, Also shown is a lowess curve, which is barely discernible from the horizontal line, y = 0, Thus, there is very little evidence for an association between depressive symptoms (CESD score) and immune response (CD4+ numbers), although such evidence as there is points to a negative association, a larger CESD score being associated with a lower CD4+ count. . 'IS wheth el. th e eVI'dence for any . . questIOn An ll1terestmg 'such .relationship would derive largely from differences across people,. t~at IS, from n cross-sectional information, or from changes across time wlthl~ a ?erso , . Chapter, 1 crosS- sectional informatIOn. IS ' more As previously discussed m . h · exam pIe , 70 of the varIatIOn III likely biased by unobserved factors. IntIS

EXPLORING LONGITUDINAL DATA 40

FITTING SMOOTH CURVES TO . LONGITUDINAL DATA _

41

(a) 1 5 0 0 , - - - -

(b) 2000 1 - - - - - -

_

2000 1000 f/)

1500

+ v

1::J

-in E + v

0

+

o

(ij :::>

(5 1000

o

1000

Q)

_S

a;

U

500

~ Q)

~

500

OJ C

CD

()

<1l

.c

o~

a

U

-500 -10

o

40

30

20

10

50

CESD residuals

o

Fig. 3.7. CD4+ residuals against CESD residuals· : data; - - : lowess curve; - - -; y ==

o.

CESD is within individuals, so that both cross-sectional and longitudinal information are present. We would like to separate the cross-sectional and longitudinal evidence about the association between the response and explanatory variable in EDA. The model in equation (1.4.2) Yij == f3CXil

+ f3L(xij

- xid

+ tij,

j

= 1, ... , ni;

i

=

1, ... , m,

expresses the outcomes in terms of the baseline value and the change in the explanatory variable. The model implies two facts: (1) Yil == f3cxil + fil; (2) Yij - Yil == f3L( x ij - xid + fij - Eil,

-1000

-500

j = 2, ... , ni.

This suggests making two scatterplots: (1) Yil against XiI; and

(2) Yij - Yil against Xij - XiI. Example 3.3. (continued) F'Igure 3.8 sho,:s both scatterplots for the CD4+ and CESD data. There is

no strong relatIOnship between baseline CD4+ and CESD. That is persons more at baserme d 0 not t end ' levels. . hdepressed . to have' higher or lower CD4+ Nelt er 18 there much relat' h' b t .th h ' IOns lp e ween the change in depression level WI c ange m CD4+ members H th . relationshi' . h . ence, ere IS little evidence of strong play. Neve~~nl elt er t~e cross-s~ctional (a) or in the longitudinal (b) dise ess, sue a graphICal decomposition of the information into

10 20 Baseline CESD

30

-20

0 20 Change in CESD

Fig.. 3.8. Relationship between CD4+ reSI'd ua Isand CESD -d I ( sectIOnal (baseline residuals)-, (b) Ion gl-t Ud-IDaI (change III _ r 'dres 1) I ua s. a) crossd lowess curve; __: y == O. e.~1 ua s ; -: ata: - --:

the cross-sectional and longitudinal components can be useful . II when the . t' d' l I ' espeCla y assocla IOns are luerent in direction or magnitudes. . 3.3 Fitting smooth curves to longitudinal data ~ curve-fitting method, lowess, was used in the previous section to highlIght the average change in CD4+ cell numbers over time. Lowess is one of a number of non-parametric regression methods that can be used to estimate the mean response profile as a function of time. Other methods include kernel and spline estimation. Figure 3.9 is a scatterplot of CD4+ cell numbers against time using only a single observation, chosen at random, for each person. Smooth curves have been fitted by kernel estimation, smoothing splines, and lowess. All three methods give qualitatively similar results. We now briefly review each of these smoothing techniques. Details can be found in Hastie and Tibshirani (1990) and the original references therein. To simplify the discussion of smoothing, we assume there is a single observation on each individual, denoted Yi, observed at time ti' The general problem is to estimate from data of the following form:

(ti,Yi),

i=1, ... ,m ,

an unknown mean response curve J.L( t) in the underlying model Y;

= J.L(ti) + Eil

where the Ei are independent, mean-zero errors.

(3.3.1 )

EXPLORING LONGITUDINAL DATA

FITTING SMOOTH CUR YEs TO LONGITUDINAL DATA

42 2500

~

E :>

c:

1500

a> 0 ~

0

0

500

;:';~:~'::'~'lli+;T:~~:;-_~~;m,,_=::';: .... . .

~.'

0

.

4

2

o

-2

Time since seroconversion

nd lowess smooths of a subset of · . b t n kemeI sp Ime a Fig. 3.9. ComparIson e wee b'~ct) .. data; __: kernel; - - ~: spline; the CD4+ counts (one count per su J .

43

the extreme right caiculat' th . ' lUg e averA"e f th . at every tIme. -0 0 . e pOlllts within the window If the window is so nar . th . row as to contain onl b time, e resuItmg estimate '11' Y one 0 servation at any . 1 d WI mterpolate the d If" me u e all data at every tim th It IS so wide as to e, e resulting t' ata.. to the average Y value. There' . . . . eSlll1ate IS a constant line equal . d' lS a contmullm of cu ., f wm ow WIdth. The wider the . d nes, one or each possible 'T' k' . WIl1 ow, the smoother tl I . .1.a mg a straIght average of th . '. le resu tmg curve. , . e pomts wlthlll pad . d . to as usmg a boxcar window' b. '. . I Will ow IS referred ecause /l(t) lS a . 1 i Yi'S with weights equal to zero 0 TI' . welg 1\('( average of the r boxcar (rail van) when plotted ag . o~tet·' Ic wel gh.ts form the shape of a am,') Ime A n alt t' r strategy is to use a weighting functi l th't h erna Ive, sIghtly better C and gives more weight to the observOalt . a I anges smoothly with time . h . . lons c Ose to t A co (05 2 ' romon welg t functlOn lS the Gaussian kernel K(u) _ is defined as ' - exp - . u ). The kernel estimate

......... : lowess.

(3.3.2)

y

~andwidth

2500

where w(t, t i , h) = K{(t - ti)/h} and h is the bandwidth of the kernel. Larger values of h produce smoother Curves.

.

Example 3.4. CD4+ cell numbers

:, °1 lIIl

1500

°

'j,

.. Ph : 500

OL_ _

:'.. i,

0

'.

00 ~ '.::

Ct,

01 •

0 0

i

~·:r.;:,:..·.·S:\. L.··.": 0~.·;.: ~ ;

(=)

Figure 3.11 shows the CD4+ data with kernel estimates of the conditional CD4+ mean as a function of time for two choices of the bandwidth. Note how the estimated curve is considerably smoother for the larger bandwidth.

;.!.

.;':

f· ~l: i;;.;.·..\\:,'

: :

":'-'-_~

I

2500

--=-_--'-:'------:-------'-\;I

~

~

Time since seroconversion

Fig. 3.10. Visualization of the kernel smoother (see text for detailed

explanation) ,

E

:::l 8 1500

~ (,)

.. -

: .....•..

:

• • ":;:

' 500

The first method, kernel estimation, is visualized in Fig. 3.10. At the left side of the figure, a window is centred at tl' Then p,(it), the estimate of the mean response at time t l , is just the average Y value of all points (highlighted by circles) which are visible in the window. Two other windows are shown in the centre and on the right of the plot for which we similarly obtain jl(t 2 ) and fi,(h). To obtain an estimate of the smooth curve at every time, we simply slide a window from the extreme left across the data to

0"

~.~.:.:.: :1:·:· :"·:~'\"'.:.·: : ;.: : : ~."T'j"j..

o

\.:.;

>-

.

..

~.'

-2

.-..•.....:;::;

2

4

Time since seroconversion

th s - - ' large bandwidth Fig. 3.11. Comparison between two kernel smoo er ' ......: small bandwidth.

EXPLORING LONGITUDINAl. DATA

EXPLOHINf; ('(

r r"

'"

. J { HJ.A I J()t'; S I HI'( '11 IH

44

. on use is the smoothing spline , , . t 'c curve In comm c , to It draftsman's tool which IS The sw:ond non-param e n. 'fh' '1Ylf' ~plmc relers· ,. .' thly J)oints in a df~slgn drawmg;. (•Si]Yfmnan, 1985).J t r.. nil. terpolate smoO, , " It flexible rorl usc" lJ Ill.· , ( t ) which mininllzes the cntcnon . th' pline is the functlOn,.'/ , A Cll hie smoO mg s . ,

2500

o.

.l()') ==

f

{Yi - s(t,)}2

+).

J{S"(t)}~

v,

c

i5

1500

o

dt,

(3.3,3)

+

o" o

~_

..

,=1

, . f (t) The first term in (3.:3,3) quan"(t) , tl e second denvatlve 0 .'/ . . . h ' where.'/ . IS I .. (t) to the observations, Yi; t e mtegral tifies the fidelity of the functIOn, h.'/ , f th~ function sometimes called a , f t e or wug ness 0 ' is a measure 0 curva ur 0' \ d t rmines the degree of smoothness of I lty The constant /\ e e mug I,ness pena " . II rttl. ight is given to the roughness penalty the spline. When). IS sma , I ,e we d th rne will be less smooth, . . an e sp I . r h' h minimizes the criterion above IS a tWlceThe smoothmg s~ me w, IC , be calculated from the differentiable pieceWIse CUbIC polynomIal. It can , . . , , ( ). _ 1 m by solving relatIvely sImple hnear observatIOns t" Yi ,l , ' , . , 0

500

,

equations. h' I' b 1 1 'I (1984) has shown that a cubic smoot mg sp me can e c ose y S1 verman . h h . . ated by a kernel estimate with local bandwldt t at IS proporapproxIll1 . .' h ., 't f where g(t) is the denSIty of pomts m t e VlcmI y 0 t ·IOna1 t 0 g (t)-O.25 ' . . d' time t. While splines have adaptive bandwidths, the ~ower -:-0.25 m Ica~es that the effective bandwidths do not change very qUIckly wIth the densIty of observations. The final method, lowess (Cleveland, 1979), is a natural extension of kernel methods made 'robust', that is less sensitive, to outlying Y values. The lowess curve estimate at time ti starts by centring a window there as in Fig. 3.10. Rather than calculating a weighted mean of the points in the window, a weighted least-squares line is fitted. As before, more weight is given to observations close to the middle of the window. Once the line has been fitted, the residual (vertical) distance from the line to each point in the window is determined. The outliers, that is, points with large residuals, are then downweighted and the line is re-estimated. This process is iterated a few times. The net result is a fitted line that is insensitive to observations with outlying Y values. The value of the lowess curve at ti is just the predicted value for the line. The entire lowess curve is obtained by repeating this process at the desired times. Example 3.4. (continued) Figure 3.12 shows kernel and lowess estimates of similar smoothness fitted t.o the CD4+ data where we have altered two of the points to create outhers. Note that the kernel estimator is affected more than lowess by the outliers. When exploring longitudinal data, it is good practice to use robust

0L.-_ _--._ _ r

-2

o

2

4

Time since seroconverslon

Fig. 3.12.

'fi ' I

artl

Cia

Comparisoll .

low(>ss 'lIId kr'rJ I 11 ." , II' ~1Il"" lI·r~. for data with two : kI~rnd ...... : lo\\'('s~

!JptWl'l'lI

outhers (shown as *)

smoothing methods as this avoids excessive deprndrnc(' of results on a ff'W observations. With each of the non-parametric curve estimation techniques. there is a bandwidth parameter which controls the smoothness of the fitted curve. In choosing the bandwidth, there is a classic trade-off between bias and variance. The wider the bandwidth, the smaller the variance of the estimated curve at any fixed time since more observations have been averaged. But estimates which are smoother than the functions they are approximating are biased. The objective in curve estimation is to choose from the data a smoothness parameter that balances bias and variance, A generally accepted aggregate measure that combines bias and variance is the average predictive squared error (PSE) defined by 1 m PSE(.\) = - I:E{Y,* -{t(t i ;,\)}2, m i=l

(3.3.4)

where Y* is a new observation at ti. We cannot compare the observed Yi with the corresponding value, {t(ti)' in the criterion ab.ove because the optimal curve would interpolate the points. PSE can be e.stlluated bY,::;o~s validation where we compare the observed Yi to the predlct~d c.urve ~ ~;) obtained by leaving out the ith observation. The cross-valIdatIon cntenon is given by

m

CV(.\) =

2- I : {Yi mi=1

{t-i (ti; ).)}2.

(3.3.5)

The expected value of CV(.\) is approximatel~ equ~l to the average PSE. See Hastie and Tibshirani (1990) for further dISCUSSIOn.

EXPLORING LONGITUDINAL DATA

46

EXPLORINC CORR ' , I"LATHJ" STRtTTl'HE

3.4 Exploring correlation structure , , . I . C exploring the degree of assocla. sec t'IOn, rTr'lphICal dlsp ays lor •. J ed To remove the effects of In thIs r· . . y" on the explanatian in a longitudinal data set are cans]C er." explanatory variables, we first regress the response, 'J' '. Ii W'tl' l' t. I I (a •.1. . II" .' to obtain reHI'd uaIs, r' ij -- y''J - xp. IJ . tory vafla) cs, x,], . d fme points correlatIOn can fi 1 mber of equally space I , I • ' .• '. h' .h r. is plotted against Tik for collectec at a xec TlU be studied using a scatterplot matflx III w IC lJ all.i < k = 1, ... , n. I"?

Example 3.5. CD4+ cell numbers t' we have rounded the CD4+ To illustrate the idea of a scatterp I0 t rna fiX, . f '. t 0 that there are a maxImum a ., . . observation times to the neares. year, s . . b t -2 nd 4 far each indIVIdual. FIgure 3.13 shows a 'IX seven observatlOns e ween . . h 2 ' tt each of the 7 c oose sea ,erplots of responses from a person at dluerent times. Notice from the main diagonal of the scatterplot mat.rix that there is substantial positive correlation between repeated observatIOn~ on the same individual that are one year apart. The degree of correlatIOn decreases as the observations are moved farther from one another in time, which corresponds to moving farther from the diagonal. N.0tice also t.hat t~e correlation is reasonably consistent along a diagonal III the matnx. ThIS indicates that the correlation depends more strongly on the time between observations than on their absolute times. If the residuals have constant mean and variance and if Corr(Yij, Yik) depends only on Itij - tiki, the process Y;j is said to be weakly stationary (Box and Jenkins, 1970). This is a reasonable first approximation for the CD4+ data. When each scatterplot in the matrix appears like a sample from the bivariate Gaussian (normal) distribution, we can summarize the association with a correlation matrix, comprised of a correlation coefficient for each plot. Example 3.5. (continued) The estimated correlation matrix for the CD4+ data is presented in Table 3.2. The correlations show some tendency to decrease with increasing time lag, but remain substantial at all lags. The estimated correlation of 0.89 at lag 6 is quite likely misleading, as it is calculated from only nine pairs of measurements. Assuming stationarity, a single correlation estimate can be obtained for each di~tinct value of the time separation or lag, Itij - tik I. This corresponds to ~oollllg observation pairs along the diagonals of the scatterplot matrix. !hlS autocorrelation function for the CD4+ data takes the values presented III Table 3.3.

-2

-1

o

2

3

Fig. 3.13. Scatterplot matrix of CD4+ residuals. Axis labels are years relative

to seroconversion. It is obviously desirable to indicate the uncertainty in an estimated correlation coefficient when examining the correlation matrix or autocorrelation function. The simplest rule of thumb is that under the null condition of no correlation a correlation coefficient has standard error which is roughly 1/ yIN where N is the number of independent pairs of observations in the calculation. Using this rule of thumb, a plot of the autocorrelation function can be enhanced with tolerance limits for a true autocorrelation of zero. These take the form of pairs of values ±2/.fJ'J:, where N u is the number of pairs of observations at lag u.

Example 3.5. (continued) Figure 3.14 shows the estimated autocorrelation function .fo: the CD4+ cell numbers, together with its associated tolerance lImIts for zero

EXPLORING LONGITUDINAL DATA

48

EXPLORING CORRELATION STRUCTURE

Table 3.2. Estimated autocorrelation matrix for CD4+ residuals. Entries are Corr(Y;j, Y;k), 1 = tij < tik < 7 years.

49

1.0 0.8 c 0

tik

2

tik

2 3 4 5 6 7

3

~

~0

6

5

4

u 0

0.66 0.56 0.41 0.29 0.48 0.89

'5

p(u)

1 0.60

0.4

c(

0.49 0.47 0.39 0.52 0.48

0.2

0.51 0.51 0.51 0.44

0.68 0.65

0.61

-

2 0.54

3 0.46

4 0.42

0.75 0.70

5 0.47

..................

0.0

0.75

2

3 4 Years between measurements

autocorrelation. The size of the tolerance limit at lag 6 is an effective counter to spurious over-interpretation of the large estimated autocorrelation. All that can be said is that the autocorrelation at lag 6 is significantly greater than zero. Calculating confidence intervals for non-zero auto correlations is more complex. See Box and Jenkins (1970) for a detailed discussion. In subsequent chapters, the autocorrelation function will be one tool for identifying sensible models for the correlation in a longitudinal data set. The empirical function described above will be contrasted with the theoretical correlations for a candidate model. Hence, the EDA displays are later useful for model criticism as well. The autocorrelation function is most effective for studying equally spaced data that are roughly stationary. Autocorrelations are more difficult to estimate with irregularly spaced data unless we round observation times as was done above for the CD4+ data. An alternative function that des~rib~s the association among repeated values and is easily estimated With megular observation times is the variogram (Diggle, 1990). For a stochastic process Y (t ), the variogram is defined as

~E [{Y(t)

- Y(t - u)}2],

U ~ O.

6

If Y (~) is stationary, the variogram is directly related to the autocorrelation functIOn, p(u), by

6 0.89

')'(u)

')'(u) =

5

Fig. 3.14. Sample autocorrelation function of CD4+ .I I . rCS1( ua s and upper 95(cf · 't t o Ier~nce I lml s assummg zero autocorrelation. ' 10 - - : sample autocorrelation f une t IOn ...... : to Ierance limits.

Table 3.3. Estimated autocorrelation function for CD4+ residuals. Entries are p(u) = Corr(Yij, Yij-u), u = 1, ... ,6. u

0.6

(3.4.1)

= 0- 2 {I -

p(u)} ,

where 0'2 is the variance ofY(t). However, the variogram is also well-defined for a limited class of non-stationary processes for which the increments Y(t) - Y(t - u), are stationary. ' The origins of the variogram as a descriptive tool go back at least to Jowett (1952). It has since been used extensively in the body of methodology known as geostatistics, which is concerned primarily with the estimation of spatial variation (Journel and Huijbregts, 1978). In the longitudinal context, the empirical counterpart of the variograrn, which we call the sample variogram, is calculated from observed half-squared-differences between pairs of residuals,

and the corresponding time-differences

If the times t· . are not totally irregular, there will be more than one observation at ea~h value of u. We then let -y(u) be the average of all of the . h t' I 1 of u . With highly irregular Vi'k correspondmg to t at par ICU ar va ue s;mpling times, the variogram can be estimated from the data (Uijk, Vijk),

"~XPLORING LONGITUDINAL DATA

50

.< k - 1 no' ; = 1 1ft by fitting a non-parametric curve. The pro1'" , .. " . cess variance, (12, iH estimated as the average of all half-squared-differences .!. ( _ )2 'th' --I- l rrhe autocorrelation function at any lag u can then 2 Yij Ylk WI, 1, I . be estimated from th(~ sample variogram by the formula J

>

-

,

•••

,

(3.4.2)

Example 3.6. CD4+ ceU counts An estimate of the variogram for the CD4+ data is pictured in Fig. 3.1~. The diagram shows both the basic quantities (Uijk, Vijk) and a smooth estimate of ')'(11) which has been produced using lowess.. Note that there are few data available for time differences of less than SIX months or beyond six years. Also, to accentuate the shape of the smooth estimate,. we have truncated the vertical axis at 180 000. The variogram smoothly mcreases with lag corresponding to decreasing correlation as observations are separated in time, but appears almost to have levelled out by lag u = 6. This is in contrast to the apparent, albeit spurious, rise in the estimated autocorrelation at lag 6, as shown in Fig. 3.14. The explanation is that the non-parametric smoothing of the variogram recognizes the sparsity of data at lag 11 = 6, and incorporates information from the sample variogram at smaller values of u. This illustrates one advantage of the sample variogram for irregularly spaced data, by comparison with the sample autocorrelation function based on artificially rounded measurement times. The horizontal line on Fig. 3.15 is the variogram-based estimate of the process variance, which is substantially larger than the value of the sample variogram at lag 11 = 6. This suggests either that the autocorrelation has not decayed to zero within the range of the data or, if we accept the earlier observation that the variogram has almost levelled out, that positive correlation remains at arbitrarily large time separations. The latter is what we would expect if there is a component of variation between subjects, leading to a ?ositive correlation between repeated measurements on the same subject, Irrespective of their time separation. Incidentally, the enormous random fluctuations in the basic quantities (11ijk, Vijk) are entirely typical. The marginal sampling distribution of each Vijk is proportional to chi-squared on 1 degree of freedom. Example 3.7. Milk protein measurements i"e now. construct ~n estimate of the variogram for the milk protein data, n a deSIgned expenment such as this, we are able to use a saturated model for the mean response profiles. By this, we mean that we fit a separate parameter for the mean response at each of the 19 times in each of the three treatment groups. The ordinary least-squares (OL8) estimates of these 57 mean response . . paramet ers are Just the corresponding observed means. The sample vanogram of the resulting set of OL8 residuals is shown

<0

I

I I I I I I

;; ~

~ 'z

.:

,

.

'1

,

.,

:

...'"

,

-

, .

....

..c:

.:;:.,-'::~

.

~

~

E ~,

'1)

....

. .

~

.£

...

.. '. :..... <.~ :'.' .'.:. .: ~. .-:'>:':l/·~~r;t{g~

I

. '.

,j . J'

c..

..

..

.'.

.,.,

."

1

',

...

'

.

-0'

'.

~:".

. \

"1 .... '. . . :

.':

. "':.:'

•• : ' : : '

.. ';

•• : •• : ••

~:"

.' ... '

l..:~.:':;~~:~~'1~~

.. , . ' ..

. ' .... '-".' :.... ' ~7,"=.·.·.·,,·1~~'i11

'. ..'f '.\: .•• ";: ::·~:::'.2i1i;~;:; . I ". . .' ....... .-, " :.

'.

'!'.:..: 1. \~ ;"",.·~"~nl

.".... .' '.' . .i,":: ',,~,:::,:\ _;c··(E?tii~~ I... . '.' . :r-..: .::.~. :::\~.,/':{ ":'J:rl:;;~Jc{7i'-, l.· .. · .,... ,. .. . '/

<'_.' . •

•••••

I

:""! ;

:.~.:

••

4

:

,,:::1'.,.,"; .•

~:,a.:~

;'."I~~.~::.'.J.""~~

....o

,; ~ ., :.,.: .""';.(,:-:2:~,,'L:~~,t~:j,r4~~~~::~

~...

bO

o

.~ Q)

1

0

1 rfJ

0 0

0 0 LO

0 0 0

0

0

~

0 0 0 0 to

0

EXPLORING LONGITUDINAL DATA

52

FURTHER READING 53

In the special case that LOR(t. t the resulting function the isot'r:o/). ~ep~nds only on UJk = It J - t k I they call to the concept of second-ord nc. ore ~gram. Note that this is anal~gous '1". er st.atlOnarlty fo I . 1.0 estlma~e the lorelogram fro ' . 'Or a rea -v~lue(~ process Y (t). J = 1, .. " ni; t = 1 m m rephcated longltudmal data y , IJ. ' ... , ,we IL<;e sample' PIace of the theoretical probab'l't" h'.' proportlOm; across subjects in · . I lIes w lch a " . . SectlOn 11.4 gives a deta'l d ppEar In the defimtion of . I e example mI' h h i ' study the association betwee w IIC t e lorelogram is used to , n repeated psych' t . . . . la TiC symptom.... on a cohort of schizophrenic patients.

0.12

~ c, .g ~

0.08

'V

0.04

3.6

Further reading

0.0

o

5

10

15

u

Fig. 3.16. Sample variogram of milk protein residuals. Horizontal line estimates

process variance. in Fig, 3.16. In contrast to the CD4+ example, the pattern is clear without further smoothing. The variogram rises steadily with increasing u, but levels off within the range spanned by the data. This levelling off suggests that the correlation has effectively decayed to zero by a time separation of about 10 weeks, The increase in V(u) at u = 18 is apparently inconsistent with this interpretation. However, this estimate is an average of rather few Vijk, and is therefore unreliable. For a more detailed interpretation of this variogram estimate, see Example 5.1 of Section 5.4. 3.5 Exploring association amongst categorical responses For a continuous-valued response variable, it is natural to seek initially to describe association in terms of the correlation structure between values of the response variable measured at different times. For a binary response, a correlation-based approach is less natural; for example, the permissible range of correlations between a pair of binary variables is constrained by their respective means. A more natural measure of association is the logodds ratio. For a pair of binary variables, say Y1 and 12, the odds ratio is

_ P(Y1 = 1, Y2 = I)P(Y1 = 0, Y2 = 0) , P(Yi = 1, Y2 = O)P(Yi = 0, Y2 = 1)

"( (Yi, Y) 2 -

and the log-odds ratio is LOR = log"(. For a longitudinal sequence Y1 , ... , Yn at measurement times tt, ... , tn, Heagerty and Zeger (1998) define the lorelogram to be the function

The modern rebirth of explorator d . The books by Thkey (1977) anl b atMaanalYllslS was led by .lohn W. Tukey. h " I Y oste er and Tukeu (197~) b est ongma sources. A simpler introduct'. . .J I are t. e Hoaglin (1981). Background intormat' Ion IS Ph:ovlded by Velieman and Ion on grap Ical meth ds' .. as ~ell as on lowess, ca~ be found in Chambers et al. (l~83):n;~~I:~~~~ ulatmg books on graphical displays of information are by Edward Thft (1983, 1990). An excellent :eview of smoothing methods is available in th: first three chapters of Hastie and Tibshirani (1990) . n. th ad'mg, with . . r ur er re an 'emphasis on the kernel method is in Hardie (1990) D' . f th " ' . iSCUSSlon 0 e estImatIon and mterpretation of autocorrelation functions can be found in almost any book on time series analysis, but perhaps the most detailed account is in Box and Jenkins (1970). The variogram is described in more detail in Diggle (1990) whereas a more detailed treatment of the lorelogram is given in Heagerty and Zeger (1998).

THE GENERAL LINE AR MODEL WITH CORREL\TEIl ERRORS

4 General linear models for longitudinal data

4.1

Motivation

In this and the next chapter, our aim is to develop a general linear modelling framework for longitudinal data, in which the infere~ces we ~ake about the regression parameters of primary interest recogmze the lIkely correlation structure in the data. Two ways of achieving this are to build explicit parametric models of the covariance structure whose validity can be checked against the available data or, where possible, to use methods of inference which are robust to misspecification of the covariance structure. In this chapter, we first consider the general linear model for correlated data and discuss briefly some specific models for the covariance structure. We then consider general approaches to parameter estimation - general in the sense that they are not restricted to particular models for the covariance structure. We use weighted least-squares estimation for parameters which describe the mean response, and maximum likelihood or restricted maximum likelihood for covariance parameters. Finally, we develop and illustrate methods for robust estimaton of standard errors associated with weighted least-squares estimates of mean response parameters. In this context, the term 'robust' means that, whilst the estimates themselves may be derived by making a working assumption that the covariance structure is of a particular form, the associated standard errors are calculated in such a way that they remain valid whatever the true covariance structure. In Chapter 5 we further develop explicit parametric models for the covariance structure and show how to fit these models to data. Models of this kind are particularly useful when the data consist of relatively long, irregularly spaced or incomplete sequences of observations. For data :onsist~ng.of :hort, complete sequences and when the primary scientIfic o?Ject~ve Involves inference about the mean response, the approach desc:Ibed. In the present chapter, which avoids the need for a parametric speCIficatIOn of the covariance structure, should suffice.

4.2 The general linear model with I ' . carre ated errors . - I L pt . y . IJ' J, "",11 lw th,· SC''1'1('IW'" f () .. , of m subjects, and I . _ 1 J 'Jsrnf'd 11I",l'-lII PIII('lIb "II tIll' Ith )' (,J k. ··· .. 11 )". tIll' l' . tlH' rneasurenlPnts ,up . J Il SP"lll lll~ IIIIII'S at II'hi('h ,t 'II "II l';lf'h 1I111t \\' I common SP! "f (ina's, I \. t ~ \ . ' (. Il' ax til(' a,SSlIllIptj'Jll "f a " k ). a (I. sS""lat"d \"Ith I :J.,]k,. = 1"", fI, of]J "xlil'lInt· . 11' ". I '/', an' , , ' , UIY Vana ) l'S \\', •. I t h(, \'a!lll's, ,I, Slllll(' t lat 1111' .Ii'}
j

I"

S

V,) = 11 1 1',) I + ' , , + .11,,1"11' +

f 'I'

where the f,] are random s"'1u p ncf'S f I.· , I ' the m subjects, In th" da,<;si("ll l' '," (1I\f2,t I II a"se)f'Jilt<,cI \\.'it II <,;tch of . , IIlPar moe <,] till' . II 1 N(O , a :l) ran<\ am vanahlf's . independent . In e" ' I \\ 011 ( If' mutuall\'. structure of the data m tI .. ". '\lr ('Ollt<'xt, tlIP longitudinal · ,t eans lat WP eXI)f'('( the, 'J to hf' ('orrelated wit hin b su Jec s. bl FO{ a formal analysis of the linear model. a matrix formulation is prefera e: et y~ = (y", ... , Ym) be the observed SP4upn('e of measurements on the zth subject and. y = (y l ' , . , 'Ym ) th . e camp Iete set of IV = nm measurem.ents fro,m m umts. Let X be an N x p matrix of explanatory variables, wIth.{n(z. -1) +j}th row (X'jI, .. "X'JP)' Let a 2 V be a blo~k-diagonal matr~x WIth non-zero n x n blocks a 2 Vo, each representing the variance matn.x for the vector of measurements on a single subject, Then, the general lmear model (GLM) for longitudinal data treats y as a realization of a multivariate Gaussian random vector, Y, with (4,2,1)

If we want to use the robust approach to analysing data generated by the model (4.2.1), the block-diagonal structure of a 2 V is crucial, because we will use the replication across units to estimate a 2 F without making any parametric assumptions about its form. Nevertheless, it is perhaps useful at this stage, in anticipation of the parametric modelling approach of Chapter 5, to consider what form the non-zero blocks (T2VO might take. 4.2.1

The uniform correlation model

In this model we assume that there is a positive correlation, p, between any two meas~rements on the same subject. In matrix terms, this corresponds to

Vo =(I-p)I+pJ,

(4.2.2)

. . t . d J the n x n matrix all of where I denotes the n x n IdentIty rna fIX an whose elements are 1.

56

GENERAL LINEAR MODELS FOR LONGITUDINAL DATA THE GENERAL LINEAR MODEL W

A justification of this uniform correlation model is the foll~wing. Let the observed measurements, Yij, be realizations of random vanables, Yij, such that

ITH CORRELATED ERRORS

1.0 ~\

, ... ~

0.8

Yij=ltij+Ui+Zij ,

i=l, ... ,mj j=l, ...

,n,

4.2.2

The exponential correlation model

In this model, Va has J'kth element, Vjk

= Cov(l'ij, Yik), of the form

Vjk = ~2 exp( -¢ I t j - tk

I).

(4.2.4)

In contrast to the uniform correlation model, the correlation between a pair of measurements on the same unit decays towards zero as the time separation between the measurements increases. The rate of decay is faster for larger values of ¢. Figure 4.1 illustrates this with ¢ = 0.1, 0.25, and 1.0. Note t~at if the obser;ation times, tj, are equally spaced, say tj+l - tj = d for all J, then (4.2.4) IS expressible as

A justification of (4.2.5) is to represent the random variables y;. as ~J

l'ij

= Iti) + W')', '

i- I , ... , m",J

= 1,... ,n,

(4.2.6)

where Wij = pWij - 1 + Zij, (4.2.7) and the Zij are mutually independent N{O (J"2(1 _ p2)} d . bl to give Var(Y.:) _ V: ( , ran am vana es, ij (427) th ~) - .ar W ) = ~2 as required. In view of (4.2.6) and " , e exponentIal correlation model is sometimes called the firstorder autoregresszve model, because (4.2.7) is the standard definition of

...

'.

\

~

....

,

06

I \

p(u)

'. '.

,

\

'.

\

0.4

I

\

.......

\

\

0.2

.•..... ",

\

\

.',

\ \

....

\ \

0,0

o

5

10

15

u

F~.

4.. 1. ¢ - 0.1,

20

The autocorrelation function for : ¢ = 0.25; - __ : ¢ = 1.0.

the exponential model.

a discrete-time first-order autore r . 1990, p. 34). This has led Some au~:~sl~e process ,(see, for example, Diggle, the sequences W. . _ 1 . rs a generalize the model by assuming , 1),J , . . . ,n, III (426) to be m t 11 . d partial realizations of a general t t' '. u..ua y III ependent process, s a IOnary autoregressIve moving average P

q

Wij = LprWij - r + Zij + L'sSZij-s. r=1

(4.2.8)

8=1

Our. vie,: is that the discrete-time setting of (4.2.8) is often unnatural for longltudmal data and that a more natural generalization of (4.2.6) is to a model of the form Yij=JLij+Wi(t j

(4.2.5) where p = exp( -¢d) is the correlation between successive observations on the same subject.

~.

I

(4.2.3)

where Itij = E(Yij), the Ui are .mutuall y independe~t N(O, v 2 ) ra~dom variables, the Zij are mutually mdependent N(O, 7 ) random van~bles and the U and Z·· are independent of one another. Then, the covanance , I) . 2/( 2 2 structure of the data corresponds to (4.2.2) WIth P = v v + 7 ) and 2 2 ~2 = v + 7 . Note that (4.2.3) gives an interpretation of the uniform correlation model as one in which a linear model for the mean response incorporates a random intercept term with variance v 2 between subjects. It therefore provides the foundation for a wider class of random coefficient models which we consider in Section 4.2.3 and, in more detail, in Chapter 9. Also, (4.2.3) provides a model-based justification for a so called split-plot analysis of variance approach to longitudinal data which we describe in Chapter 6.

57

".

),

i=I, ... ,m; j=I, ... ,n,

(4.2.9)

where the sequences Wi(t j ), j = 1, ... ,n are realizations of mutually independent, continuous-time, stationary Gaussian processes {Witt), t E R} with a common covariance structure, ,(u) ::= Cov{Wi(t), Wi(t Note that (4.2.9) automatically accomodates irregularly spaced times of measurement, tj, and, incidentally, different times of measurement for different units, whereas either of the discrete-time models, (4.2.7) or (4.2.8), would be quite unnatural for irregularly spaced t j .

un.

4.2.3

Two-stage least-squares estimation and random effects models

Two-stage least squares provides a simple method of fitting a class of models in which the data-analyst recognises a partition of the random variation in a set of longitudinal data into two components, between and within subjects. A specific example motivates the general idea.

58

GENERAL LINEAR MODELS FOR LONGITUDINAL DATA

Imagine an investigation into the normal grow~h rate of children. The study involves the recruitment of a group .of children, presumed to be a ran dam sampIe fro m th o p()pulation of mterest, and the subsequent measurement of the weight of each child at a sequence of ages. In the fir8t-.9tage analysis of the results, we ~onsid.er ,the ~rowth tra~ec tory of each child. A simple model is that the zth child s wel~ht (pOSSibly after appropriate transformation) varies randomly about a lmear growth curve. Formally, we assume that "J

Yij=Ai+BiXij+fij,

j=l, ... ,ni; i=l, ... ,m,

Bi

+ Zi,

i = 1, ... , m,

of what is intended onl, '. 59 II . TI Y as a motwatmg, ex I a. l. len, We <:an <:ornhiIw tl ' f i . . amp e. SUPPORP that A == (l f smgle' If r5t-stag,(.. I I • . or . eqllatlC)fl for the entire set of data. H. l( se('ollCl-stag,p lllodels into a c

Y' J

(4.2.11)

where the Zi are mutually independent Gaussian random variables with respective variances vl. In the second-stage analysis of the results, we focus on the variation amongst the child-specific first-stage estimated regression coefficients. Children vary in their underlying growth rates. A reasonable assumption might therefore be that the B i : i = 1, ... , m, are themselves an independent random sample from a Gaussian distribution with mean (3 and variance 0'2. Hence, the second-stage model is i=l, ...

,m,

(4.2.12)

i~ w,hic~ the. 6i are an independent random sample from a Gaussian distrIbutIOn WIth mean zero and variance 0'2. If we now combine (4.2.11) and (4.2.12) we obtain

Bi = (3 + {6i + Zi},

(1

= n

+ (jx'J + {,s,T'J + I'J}

= 0: + (h' J + \V'J'

(4.2.13)

in. which each 6t has the same vanance . 2 b . a, ut the vanances of the Z· are dIfferent. Hence the a' f th B~' t. . ' v nances 0 e i are dIfferent, and an appropnate estImate of (3 would be a weighted mean of the B. We could apply a si 'I . t· " . ml ar argument to the chIld-specIfic mtercept parameters A; leadmg t b' . G . " ' 0 a Ivanate aUSSlan second-stage model for the quantlties (At, B·)' iI , .•• , m. However, to simplify the presentation t· -

.

J

== 1, ... . TI,:

1

== 1.. . . . 111.

(1.2.14)

Now, the first two terms on the ri 1. .. . regression model for the m" . g It-hand sHIp of (1.2.1.1)
and Cov(Wij , Wik )

= (j2 XijX,k.

1 Thi: is ;- part~cul~r case of (4:2.1), in which the matrix V has diagonal e emen s Xij + (7 / (j ) and off-dIagonal elements xx Thus T.r·1 I . 1 k .) ,k· ,Y nvo ves a smg e un .nown parameter, 7 2 / (j2. It follows that any method for fitting t~e GLM WIth correlated errors can be used to fit the present model in a smgle-stage calculation. . Models of this kind, in which the covariance structure of the data is ~~duced by random variation amongst unobserved, subject-specific quantItIes, are called random effects models. We shall consider them in more detail in Section 5.2, and in Chapter 9.

4.3

Bi =(3+6i,

= + {13 + 8,}.r'J + ~'J

(4.2.10)

in which Yij and Xij respectively denote the. (transformed) weight .and age. of the ith child on the jth measurement occasIOn, Ai and B i determme the zth child's linear growth curve and the fij are independent random deviations of the observed weight about this underlying growth curve, with mean zero and variance 7 2 . We can fit the first-stage model (4.2.10) by ordinary least squares (OL8), to produce estimates Ai, Bi and +2, each with an associated standard error. In particular, suppose that the estimated standard error of Bi is Vi, then a reasonable assumption is that

Bi =

WEIGHTED LEA' ST-SQUARES Esnr-IATION

Weighted least-squares estimation

We now return to the general formulation of (4.2.1), and consider the problem of estimating the regression parameter (3. The weighted least-squares estimator of (3, using a symmetric weight matn'!:, W, is the value, i3 w , which minimizes the quadratic form

(y - X(3)'W(y - X(3).

(4.3.1)

Standard matrix manipulations give the explicit result

f3w =

(X'WX)-lX'Wy.

(4.3.2)

Because y is a realization of a random vector Y with E(Y) = X (3, the weighted least-squares estimator, f3w, is unbiased, whatever the

60

GENERAL LINEAR MODELS FOR LONGITUDINAL DATA WEIGHTED LE

choice of W. Furthermore, since Var(Y) = (1'2V, then Var(,8w) If W

= (1'2{(X'WX)-1 X'W}V{WX(X'WX)-l}.

= 1, the identity matrix, {i/

with If W

with

= (X'X)-IX'y,

(4.3.4)

(4.3.7)

The 'hat' notation anticipates that /3 is the maximum likelihood estimator for f3 under the multivariate Gaussian assumption (4.2.1). This last remark suggests, correctly, that the most efficient weighted least-squares estimator for f3 uses W = V-I. However, to identify this optimal weighting matrix we need to know the complete ~orrelation structure of the data - we do not need to know (J2, because f3w is unchanged by proportional changes in all the elements of W. Because the correlation structure may be difficult to identify in practice, it is of interest to ask how much loss of efficiency might result from using a different W. Note that the relative efficiency of and can be calculated from their respective variance matrices (4.3.3) and (4.3.7). The relative efficiency of OL8 depends on the precise interplay between the matrices X and V, as described by Bloomfield and Watson (1975). However, our experience has been that under the conditions encountered in a wide range of longitudinal data applications, the relative efficiency is often quite good. As a simple example, consider m = 10 units each observed at n = 5 time-points tj = -2, -1,0,1,2. Let the mean response at time t be J1( t) = 130 + 131 t. We shall illustrate the relative efficiency of the OLS estimators for f3 = (130,13t} under each of the covariance structures described in Section 4.2. First, note that

/3

°]

-1 _

°

[0.02

-

en, straightforward matrix

= [50 0 + 4p)

o

61

. I' . atlOns give

mampn

° 1

1000 - (J) ,

Var(,8[)

= (1'2lo.

0 20

° 1

+ 4p)

o

(4.3.8)

(L01(l _ p) ,

by substitution of (X'X)-I and X'VX'

= (1'2(X'V- 1X) -1.

100

AST-SQUARES ESTIMATION

and

(4.3.5)

(4.3.6)

13w

.

X'V X

the estimator becomes /3 = (X'V- l X)-1 X'V- 1y, Var(/3)

- P

(4.3.3)

< 1 Th

(4.3.2) reduces to the OLS estimator

= (1'2 (X' X)-l X'V X(X' X)-I.

Var(,8/)

= V-I,

for some 0 <

0] 0.01'

Now, suppose that V correspond s t 0 th e uni£orm correlation mo d e1 (4.2.2) with

mto (4.3.5). Similarly, V-I _( -I o - 1 - p) I - p{ (1 - p) (l + 4p)} -I J,

X'V- I X = [50(1 +04 p )-1

1

0 100(1 _ p)-l ,

and Var(/3) =

(1'2

[0.02(1

+ 4p)

o

1

0 0.01(1- p) ,

(4.3.9)

fro~ (~.3.7!. Com p aring.(4.3.8) and (4.3.9), we see that they are identical, ,:,hlc~ ImplIes ~hat OLS IS fully efficient in this case. An intuitive explanatIOn IS that WIth a common correlation between any two equally spaced measurements on the same unit, there is no reason to weight measurements differently. However, this is not always the case. For example, it would not be true if the number of measurements varied between units because with p > 0, units with more measurements would then convey more inf~rma tion in total but less information per measurement, than units with fewer measurements. Now suppose that V corresponds to the exponential correlation model (4.2.5), in which VO has jkth element Vjk

=

(1' 2 P I'J- k I .

Straightforward matrix manipulations now give X'V X as

10(5 + 8p + 6P2 + 4p3 + 2p4) [ 0

0

20(5 + 4p - p2 - 4p3

-

1

4p4)

,

and Var(13/) as 2 (J

[0.004(5

+ 8p + 6p 2 + 4p3 + 2p4) 0

0

0.002(5 + 4p - p2 - 4p3

-

1

4p4)

.

(4.3.10)

VO

=

(1 - p)I + pJ,

GENERAL LINEAR MODELS FOR LONGITUDINAL DATA

WEIGHTED LEAS

62

The matrix V -

I

is tri-diagonal, with jkth element (1 -

p2)-I

Wjk

,

and

O

WII

= W56 = 1,

W22

= W33 = 1/)44 = 1 + P ,

2

=

1/)jj+1

Wjj-I

= -po

Y,) = j-Jo + fJII, + { ' j ,

2

and

I

[10(5 - 8p 0

"ar(~) __ a2(1 _ p2) [0. 1(5 v

where . Xi is a bin . ary IIId'Icator for treatment B· 1 nentlal correlatIOn model as in th . ' ,lllt the ( ' j follow the expobet . e prc'VIOUS examplp . h ween successive measureme t ' . . • WIt autocorrelation P n s on any subJPct Th I ' e.({3) 0 an d e ({3) I are given in Tabl 42 W ." e r(' ahve efficiencie.s clent for {3o, but horribly inefficien: £' '(3 e :ee th.at OLS is tolerably effiefficient estimation of (3 re' or . I W en p IS large. In this example . h' I qUIres careful balancing f b . WIt Ill-subject comparisons f th a etween-subject and O e two treatments and th . critically on the c i t ' ' e appropriate b a Iance d epends E h orre a !On structure 'yen w en OLS is reasonably efficient " . Var(13) given at (4.3.5) that inter~al . : It IS c1ea~ from the form of tion about a2y t h ' . estimatIOn for (3 still requires informa, e vanance matrIX of the data I . I formula for the variance of the least ' . ' n partlc.ll ar, the usual -squares estimator, r

Then,

X'V- I X = (1 - P )-

fJ

+ 3p 2)

p 8P + 3 2)-1 O

0 ] 20(5 - 4p + p2) ,

0 + p2)-1 ] . 0.05(5 - 4p (4.3.11)

The relative efficiencies, e({3k)

T-SQUARES ESTIMATION

exampl t 63 e, a wo-treatment cro880v '. are taken, at unit time intervals er design In which n = 3 measurement of treatments given t th . . on ea£h of m = 8 subje<·ts Th S BAA BAB BBA 0 e eight subjects are AAA 'AB' AeBse
= Var(~k)/Var(i3k)'

in this case are

obtained from (4.3.10) and (4.3.11) as

e({3o) = 25(1 _ p2){ (5 _ 8p + 3p2)(5 + 8p + 6p2

•

Var(,B)

+ 4l + 2p4)} -1,

= a 2 (X' X)-I

(4.3.12)

assumes . t~at Y = I, the identity matrix, and can be seriously misleading when thiS IS not so. A naive use of OLS would be to ignore the correlation structure in the d~ta a~d to base int~rval estimat.ion for f3 on the variance formula (4.3.12) WIth a replaced by Its usual estimator, the residual mean square

and

These are shown in Table 4.1 for a range of values of p. Note that all of the tabulated values are close to 1. Comparisons of this kind suggest that in many circumstances where there is a balanced design, the OLS estimator, is perfectly satisfactory for point estimation. However, this is not always the case. Consider, for

/3,

Table 4.1. Relative efficiency of OLS estimation in the linear regression example.

p

0.1

0.2

0.3

0.4

0.5

e(,8o) 0.998 0.992 0.983 0.973 0.963 e(,81 ) 0.997 0.989 0.980 0.970 0.962 p

0.6

0.7

0.8

e(,8o) 0.955 0.952 0.956 e(,81 ) 0.952 0.955 0.952

0.9 0.970 0.955

0.99 0.996 0.961

(4.3.13) There are two sources of error in this naive approach when V i- I. First, formula (4.3.12) is wrong for Var(,B). Second, 0'2 is no longer an unbiased Table 4.2. Relative efficiency of OLS estimation in the crossover example.

P

0.1

0.2

0.3

0.4

0.5

0.880 e({3o) 0.993 0.974 0.946 0.914 0.692 e(,81 ) 0.987 0.947 0.883 0.797

P

e({3o) e(,81)

0.6

0.7

0.846 0.571

0.815 0.438

0.8

0.9

0.788 0.766 0.297 0.150

0.99 0.751 0.015

GENERAL LINEAR MODELS FOR LONGITUDINAL DATA

64

estimator for (J"2. To assess the combined effect of these two sources of error, we can compare the diagonal elements of Var(f3) as given by (4.3.5) with the corresponding diagonal elements of the m~tri~ E(a- 2 )(X'X)-1. Some numerical examples are given in Chapter 1 and In Dlggle (1990, Chapter 3), where the conclusion is that in the pre.':lence of positive autocorrelation, naive use of OLS can seriously over- or underestimate the variance of i:J, depending on the design matrix.

4.4 Maximum likelihood estimation under Gaussian assumptions One strategy for parameter estimation in the GLM is to consider simultaneous estimation of the parameters of interest, 13, and of the covariance parameters, (j2 and Vo , using the likelihood function. Recall that V is a block-diagonal matrix with common non-zero blocks va. Under the Gaussian assumption (4.2.1), the log-likelihood for observed data y is

L(13, (j2, Va) = -0.5{nm log((j2) + m loge I Vo I)

+ (j-2(y _

X13)'V- 1 (y

-

X,8)}.

(4.4.1)

For given VO, the maximum likelihood estimator for 13 is the weighted least-squares estimator (4.3.6), namely (4.4.2) Substitution into (4.4.1) gives ,

2

L{13(Vo) , (j ,Va} = -O.5{ nm log (12 + m log( IVo I) + u- 2 RSS(V )}, o (4.4.3) where RSS(va)

~ow, di~erentiation ood estImator for

= {y - X;3(va)}'V- 1 {y - X;3(Vo)}.

of (4.4.3) with respect to (12 gives the maximum likeliagain for fixed Vo, as

(j2,

2

a- (VO) = RSS(Vo)!(nm).

(4.4.4)

Substitution of (4.4.2) and (44 4) . t (4 4 1) . likelihood fo V ; ' '. In 0 " now gIves a reduced logr 0 WhICh, apart from a constant term, is 2

LT(Vo) = L{,3(Vo), a- (VO), Vo} = -O.5m{nlogRSS(V ) + log( IV I o o

n.

(4.4.5)

MAXIMUM LIKELIHOOD ESTIMATION

F'maII y, maximization of L (V, % . . and (444) r 0) Yields Vo and b b' , . 2 . '. ,the maximum likelih I _' , y su St1t~tlOn into (4.4.2) (J" (VrJ). OO( estlnlators. ~ = ~(\i . I-' I-' 0) and (J2 ==

In general, maximization of (4 4 t ' of ,:"0 ~eqU.ires numerieal oPtimizat'i(;~)t:,ll:~.res~H,et to tl,le distinct elements optUllIzatlOn prohlem for l' '_ I ( . lqucs, TIll' dlmpnsionall't, f th 2 YOIS;-1lTl-lj'f .,'O.e (J" ~t each of the Tl time-point~ and 1. I WP aSS\lILlP a common variance major ~omputati(~nal work in e~aluatiI~ Tl~ ~ 1 j. - 1 otherwise, Also. the determmant and Il1verse of a , . g (\ll} Consists l)f nl1culating th N t th . symm( trw. POSltlVP \.f e o e at m using maximum l'k 1'1 d' -( ( 1I11tf', n x 11 matrix 2 d i e I 100 for" I ' 13 ,(J" an Va, the form of the de ' , S I I t I \ 1 tanpOll'> pstimation of . . sign matnx X" I estimatiOn of (j2 and V; 0 IS invo ved explicitly in the 0ne consequencE' f th- -' l ' . . 0 _ IS IS t lat If we assume the wrong form for X we may not ' ' even get eons t t Vo. To combat this problem 'bl IS en estimators for (J2 and , a sensl e strateg '. t model for the mean response fil' , Y IS .0 \1';E' an over-elaborate th . pro es III estl1l1af ture of the data. When the d t f .mg . e covanance struc. a a are rom a deSigned . expe:l1nent with no continuously varying explanatory variables w separate parameter for the m ' e recommend 1I1corporating a ment, so defining a saturatede::~:~PfoonrStehaet each time within each treatmean response profil Th' gua~antees c~nsistent estimates of the variance structure which we c~~ the~ use In assessmg whether a more economical parametrization of the mean structure can be justified. In many observat.ional studies this strategy is not feasible. In particular, wh:never the data mclude one or more continuously varying covariates wh~ch are thought to affect the mean response, we need to decide whether to Incorporate the covariate as a linear effect, using a single additional column m the X matrix, or as a quadratic or more general non-linear effect. In these circumstances, the concept of a saturated model breaks down, Even when it is feasible, the saturated model strategy runs into another problem. For 9 treatments and n times of observation it requires p = ng parameters to define the mean structure, and if this number is relatively large, the maximum likelihood est.imates of (j2 and Va may be seriously biased. For example, it is well known that when Va = I, an unbiased estimator for (12 requires a divisor (nm - p) rather than nm in (4.4.4), and the problem is exacerbated by the autocorrelation structure in the data. Maximum likelihood estimation therefore presents us with a conflict. We may need to use a design matrix with a large number of columns so as to obtain consistent estimates of the covariance structure, whereas approximately unbiased estimation requires a small number of columns. If we can be confident that a small number of columns will define an adequate model, the conflict is not a serious one. Otherwise, we need to consider other methods of estimation. One such is the method of restricted maximum likelihood (REML).

66

GENI<;RAL LINI<;AR MODELS FOR LONGITUDINAL DATA RESTRICTED MAXIMI1M -

4.5 Res tricted maximum likelihood estimation ~stimation was introduced by Patterson and GL Th e me,tlW( I 0 f REML Thompson (1971) a,'l a way of (~Htimating va~tancc componen~s 111 Ii '. M. ' ·tl'on t() t 11(' standard maximum likelihood procedure IS that It. proTh e 0 ))jt:C , "'"' I" II k . . b'·IH.H(J( . J eH t',Imat,or.'l (fJ tll(' cowtriance ))aramcters. T liS IS we nown dllceR . , ( ( in the case of the GLM with indep~ndellt errorH, ,J.

..,

"

,

y", MV N(X{3, a 2 l).

•

rv

MV N(X (3, (12V),

-

g(/3) == (27T)-61'I

(4.5.3)

Then, Y* has a singular multivariate Gaussian distribution with mean zero, whatever the value of {3. To obtain a non-singular distribution, we could use only nm - p rows of the matrix, A, defined at (4.5.3), It turns out that the resulting estimators for (1'2 and V do not depend on which rows we use, nor indeed on the particular choice of A: any full-rank matrix with the property that E(Y*) = 0 for all [3 will give the same answer. Furthermore, from an operational viewpoint we do not need to make the transformation from Y to Y* explicit. The algebraic details are as follows. Less mathematically inclined readers may wish to skim through the next few pages. For the theoretical development it is convenient to re-absorb (1'2 into V and write our model for the response vector Y as ' rv

MVN(X[3, H),

where H == H (a), the variance matrix of Y, is characterized by a vector of parameters ~. Let A be the matrix defined in (4,5,3), and B the nm x (nm - p) matnx defined by the requirements that B B' = A and B' B = I, where I denotes the (nm - p) x (nm - p) identity matrix Finally let Z=B'Y. . ,

:'('H-I v

-

J

.

of {3, as we now show. First,

E(Z)

llid, \)( 11I!t'Jil, Whalf'WT till' \'alu(>

= B'E(Y)

== B'X{3 = B'BB'X{3.

since B' B = I. But since BB' = A , tl' . . liS gIves E(Z)

= B' AX{3,

and AX

Y

67

·'ll"Xp{_I(B_ f~)'('\"JJ 1\. . 2 • • I( 13 - B l} Furthermore , E( Z) == 0 and Z·,tillI /_"3'
(4.5.2)

the REML estimator is defined as a maximum likelihood estimator based on a linearly transformed set of data Y* = AY such that the distribution of y* does 'not depend on (3. One way to achieve this is by taking A to be the matrix which converts Y to OLS residuals, A = I - X(X' X)-l X'.

c

tile lllaxinlUlll I"k I ' I (. ihood ('~lil1lalor f"T {3 I~,.~ til"~ ,,'tuareg_ et>tullatoT.

(4.5.1)

In this case, the maximum likelihood estimator for 0'2 is (j2 = RSS/(~m), where RSS denotcl> the residual Hum of squares, whereas the usual unbiased RSS/(nrrl ' t or IS . 0'-2 =., es t Ima ,. -p), where ., p is the number of elements of (3. In fact, i7 2 is the REML estimator for 0'2 in the model (4.5.1). In the case of the GLM with dependent errors, Y

Now, for fixed . Q , generaIIzed )past-
_ LIKELlIIOOD ESTIMATIOt\

= {I -

X(X'X)-IX'}X

=X

- X == 0,

hence E(Z) = 0 as required. Secondly, Cov(Z,m

= E{Z(,B - m'} = E{B'Y(Y'G' - ,8')} = B'E(YY')G' - B'E(Y),8' = B'{Var(Y)

+ E(Y)E(Y)'}G' -

B'E(Y){3'.

Now, substituting E(Y) = X,8 into this last expression gives Cov(Z,,B) = B'(H

+ X,8,8'X')G' -

B'X(3(3'.

Also,

and B'HG'

= B'HH-IX(X'H-IX)-I = B'X(X'H-1X)-1 = 0,

?

because B' X = B' AX = 0 as in the proof that E( Z) = I~ follows t~at Cov( Z, (:J) = B'X [3[3' - B' X ,8[3' = O. Finally, in the multlVaI/ate Gaussian

68

GENERAL LINEAR MODELS FOR LONGITUDINAL DATA

RESTRICTED "lAX IMUM LIKELIHOOD E . STIMATJON

setting, zero covariance is equivalent to independence. It follows that the algebraic form of the (singular) multivariate Gaus;'3ian pdf of Z, expressed in terms of Y, is proportional to the ratio f(y)/ g(f3). To obtain the explicit form of this ratio, we use the following standard result for the GLM, I (y _ X(3)'H-I(y - X(3) = (y - x/3)'I-r (y - x/3)

+ (/3 Then, the pdf of Z

= B'Y

1

(3)' (X' H- X)(/3 - (3).

is proportional to

where p is t l . " ,lie number of elf" maXImizes the reduced log-like~~~~(~(~:l of

69

. REML pst imator for

(3. 1 hI'

L*(Vo) = -!m{nlogRSS(V, 2 " o)+l<w(\\/!)} >-, 0

-

.1 :L!og( \ X'V

IX

Vo

I ),

Finally . l . , (Ud) " I' . . h , suustltutIon of tllP r(S11 t mg. Pst'm t \., gIves ~ e REML estimators (3 = f:J.(\;'),' ~;~ or ',: lilt 0 (1.;',5) aud (-l,S.ti) It IS inst t' 1-'. 0 and r r - n-' (\. ) . ruc Ive to comlJare tl ' f . 0 . (4 4 5) d ( .' . If uuet Ions I (\. )' I . . 1 '. an 4.5.7), respectively. Tlw d'ff . . , 0 dll( L (\0) dpfiu('cl at 1 2log( I X'V- 1 X I) in (457) 'N t . .1 rrpnn.' IS till' aclditiou of tllf' t rlll' h" " . 0 " t l'lt X'\· I X . • t IS term IS typically of ord"r p, whereas r " 1:-> a )J !< P matrix, so correctly, that the distinction b t (v~)) IS of ordpr TIm sugg,'sting t' . ('ween maxlIl . l'k 1'1 '., es ImatlOn is important anI \ h ., , .'. II1I1l 1 (' 1 lOod and REi\lL , 1 h . Y IV en p IS rPlatwdy I · , l'h' , . arg!, IS IS, of course precIse y t e SItuation which.' . ran anse when w" saturated model for the mea . ( use a saturated or nearly n response so as to 01 t ' . . o f the covariance structure . .. )alll consistent estimation , as we recommended' S ' order-of-magnitude argument b k d I~ . eetlon, '4.4. Also, the rea s own when V IS I . when there are strong correlatio h near-slllgu ar, that IS, . ns amongst t e respons . d' . expenmental unit as is t . es on an 1Il IVldual Man a ' ~o uncommon III longitudinal studies. y uth~rs have dIscussed the relative merits of maximum likelihood and REML estimators for covariance parameters. Early work summarized by Patterson and !hom~son (1971), concentrated on the esti~ation of vari~nce comp~nents m deSIgned experiments, Harville (1974) gives a Bayesian mterpret~tlOn. More recently, Cullis and McGilchrist (1990) and Verbyla and ?u~hs (1?90) apply REML in the longitudinal data setting, whilst Thnmchffe-WIlson (1989) uses it for time series estimation under the name of marginal likelihood. One of Tunnicliffe-Wilson's examples shows very clearly ho:, REML copes much more effectively with a near-singular variance matnx than does maximum likelihood estimation. General theoretical results are harder to find, because the two methods are asymptotically equivalent as either or both of Tn and n tend to infinity for fixed p. When p tends to infinity, comparisons unequivocally favour REML; for example, the usual procedure for a paired-sample t-test can be interpreted as a REML procedure with n = 2 observations on each pair, P = m + 1 parameters to define the Tn pair means and the single treatment effect, and a single unknown variance, (72, whereas a naive application of maximum likelihood estimation to this problem would give a seriously biased estimator J

f(~)

= (21l')-!(nm-

p)

I H ,-! IX' H-

I

X

I-!

o

L

g(f3) x exp {-!(y -

X(3)'H- I (y - x(3)} ,

(4.5.4)

where the omitted constant of proportionality is the Jacobian of the transformation from Y to (Z,/3). Harville (1974) shows that the Jacobian reduces to IX'X 1- ~, which does not depend on any of the parameters in the model and can therefore be ignored for making inferences about 0 or f3. Note that the right-hand side of (4.5.4) is independent of A, and the same result would therefore hold for any Z such that E(Z) = 0 and Cov(Z, (3) = O. The practical implication of (4.5.4) is that the REML estimator , 0 , maximizes the log-likelihood

L*(o) = -~ log IH I-~ log IX' H- I X

I - ~(y - X(3)' H-1(y - X(3),

whereas the maximum likelihood estimator & maximizes

L(o)

=

-pog I H I-~(Y - X(3)' H-I(y - X(3).

,It f~llows from this last result that the algorithm to implement REML estImatI~n for t~e ~odel (4.5.2) incorporates only a simple modification to

the ~axImum hkehhood algorithm derived in Section 4.4. Recall that we c~nsIder Tn units with n measurements per unit and that (7'2V is a blockdIagonal matrix with non- zero n x n block s (7' 2 Vo representmg . . the variance matrIX of the measurements on anyone unit. Also, for given Vo we write

(3(Vo) = (X'V- 1 X)-I X'V- 1 y,

(4.5.5)

for

and RSS(Vo) = {y - X{3(VO)}'V- 1 {y - X{3(Vo)}. Then, the REML estimator for

(7'2

(72.

In summary, maximum likelihood and REML estimators will often give very similar results. However, when they do differ substantially, REML estimators should be less biased. Henceforth, we use the 'hat' notation to refer to maximum likelihood or REML estimators, except when the context does not make it clear which is intended.

is (4.5,6)

UENERAL LINEAR MODELS FOR LONGITUDINAL DATA

70

HOBUST

4.6

Robust estimation of standard errors

sl'\ (Jf

,,,, 1', J ' 1,' [llw robust approach to inferenCf~ [IJr (3 is Th e CSSUJ .ld, I( Cd (J , ' - , I (1 '3 2) gefi(~ralill(~d lefJ,st.-sqllarcs estimator f3w defined )y '", ,

lIWIL'iUn'IIH'nt

.'fl,,}.

s

ESTIMATIU~ OF S'

" ,lA"IJ\HI~I'H!{()HS

,

71

}L'i

It '- 1., , , ,fJ:

I

--

I, ./1.

(4 ,fi,l)

in conjulH:tio/l with an (~stiIJIated wLriancc matrix

/"'"

/,

°1

a satllrat ..d nllld"1 fill t I I' ' " , ' I (('\.llJ.l!lI"'llll
Rw

= {(X'WXr-1X'W}V{WX(X'WX)

(4,6.2)

I},

COI1SI'S·tnJlt w IlCre "~ l,~. . , '.' •

for V whatever the true covariance structure. Note ;I . V · (4 .6. 2)' we. heLVC rc-absorlJ(~d the scale parameter a mto V. For inference, we proceed as if tJ laot III

i3w ~ MVN(f3, Rw).

h

1

"

I I' 1 I'll

"1""111\" df'\illitr'll1lt

This !IlucI(,] falls wit hin t}w !.',f'III'I'J! j'\, 1 ' k ' 'I furrn for tilt' 1,'- . ' " 01111 -\\111'11("",11' I" 1 ra 1.1,WI' spenH ,1111 ,:~, wit I " , '" " " "'iJ~Jl 111},tnx .\, hlr ":0;"111 ,I" '1 trpatlllf nts and repll('allons "II = ,) _ 1 .. 'J I ,w\1 I I j - '2 I .... rill< 1f1:.! __ ., '\"1' h,'I'·.'

(4.6.3)

In this approach, we call W -I the 'Working variance matrix, to distinguish it from the true variance matrix, V, Typically, we use a relatively simple form for W- I which we hope captures the qualitative structure of V. However, the crucial difference between this and a parametric modelling approach is that a poor choice of W will affect only the efficiency of our inferences for (3, not their validity. In particular, confidence intervals and tests of hypotheses derived from (4.6.3) will be asymptotically correct whatever the true form of V. This idea has a long history, dating back at least to Huber (1967). More recent references are White (1982), Liang and Zeger (1986) and Royall (1986). The simplest possible implementation is to use the OLS estimator 13. This is equivalent to a working assumption that the measurements within a subject are uncorrelated, since we are then using W- 1 = I for the working variance matrix. Note, incidentally, that equations (4.3.2) and (4.3.3) do ~ot change ~f the elements of Ware multiplied by any constant, so that It w~uld str.lctly be more correct to say that W- 1 is proportional to the ,:orkmg vanance matrix. For data with a smoothly decaying autocorrelatlon structure, a more efficient implementation might be obtained from a block-di~gonal H:'~l, with non-zero elements of the form exp{ -c Itj - tk I}, wh~r~ C IS a posltlve constant chosen roughly to match the rate of decay antlclpated for the .actual autocovariances of the data. In our experience, more elab~rate chOIces for W-l are often unnecessary. For deSIgned experiments in which we choose to fit the saturated model for the mean . £lorm a f the robust estimator V required . . respons . e, th e expl'IClt III (4.6.2) 18 obtamed using the REML principle as follows. Suppose that measurements are made at each af . n' tIme-pomts t, on mh experimental . . umts III the hth of g expenmen . t a I treatment groups. J Write the complete

!

X=

() 1 Uj I. [o II I () ()

wher: I and 0 are, respectively, tlw 11 x 11 id~ntity matrix and the 11 x 11 ~atnx of zer?s. The extension to general 9 and 111 1. " ' . J7l 9 is obvious. It IS then a straightforward exercise to deduce that th(' estimators for the /1,,) are the corresponding sample means.

and that the REML estimator for

\10

is (4.6.4)

where Yhi = (Yhil, ... ,Yhin)' and {Lh = ({ihl,.",jJ'hl1)" Then,~the required estimate V is the block-diagonal matrix with non-zero blocks Yo· Note that at this stage, we do not attempt to interpret the OL8 fit. Its sale purpose is to provide a consistent estimate of Vo· When the saturated model strategy is not feasible, typically when the data are from observational studies with continuously varying covariates, it is usually no longer possible to obtain an explicit expression for the REML estimate of Vo. However, the same basic idea applies. W~ make no assumption about the form of Vo, use an X matrix correspondmg to the most elaborate model we are prepared to entertain for the mean response, and obtain the REML estimate, Vo, by numerical maximization of (4.5.7).

72

GENERAL LINEAR MODELS FOR

LONGITUDINAL DATA

. £ {3 we noW substitute V into (4.6.2), and To make robust mferences .or, b t {3 we will typically use an X th t £ . these mferences a au 6 3) . Na t e , a 0 1 , use ( 4.. I of the X matrix corresponding . . h £ than the ng co umns 'f e want to test linear hypotheses matrIX Wit many ewer I Furthermore, I w th s t ndard approach for the GLM. to the saturated rna de. about (3 within this model w,e c~: ~;:t t:e h~pothesis Q{3 = 0, where Q is Suppose, for example,. we wish < Then we deduce from (4.6.3) that a full-rank q x p matrIx for some q p.

Q/3w

rv

MVN(Q{3,QRwQ')·

h . th t Q{3 An appropriate test statistic for the hypot eSlS a

T

= /3'wQI(Q RwQI)-IQ/3w,

=

0 is

(4.6.5)

. II ling distribution of T is chi-squared on q and the apprOXImate nu samp degrees of freedom. . h b · es are not common to all umts, t e 1'0 ust When measurement t 1m . 'll b full'n the following modified form. The 1'eqmred approac h can st ) e use ' . rIX V'IS S1.'11 1 block-diagonal , but the non-zero blocks corresvarIance rnat' ponding to the sets of measurements within units are no .longer constant '1. Wrl'te TT. for the n·t x ni variance matrIX of the set of Vat between um s. measurements on the ith unit and I-Li for the mean vector of these me~urements. Estimate I-Li by iLi' the OLB estimates from the most complicated model we are prepared to entertain for the mean response. Then, an estimate of Vai is

(4.6.6) Our robust estimate of V is then V, the block-diagonal matrix with nonzero blocks Vai defined by (4.6.6). A few comments on this procedure are in order. Firstly, the choice of a 'most complicated model' for the mean response is less clear-cut than in the case of a common set of measurement times for all units. At one extreme, if sets of measurement times are essentially unique for units as in the CD4+ data of Example 1.1, a saturated treatments by times model would result in iLi = Yi and 110i = 0, a matrix of zeros. In this situation it is necessary, and sensible, to incorporate some smoothness into the assumed form for the variation in mean response over time. But it is not obvious precisely how to do this in practice. Fortunately, in many applications the replication of measurement times between units is sufficient to allow a sensible fit to a saturated treatments by times model provided, as always, that there are no important time-varying covariates. A second comment is that VOi defined by (4.6.6) is a terrible estimator f?r VOi . For example, it is singular of rank 1, so the implied V will be smgular of rank m, the number of units. This is less serious than it sounds,

ROBUST ESTIMATION OF STANDARD ERRORS

7:l

because . we only . ' use if in (4 "(j 2) to pvalnal!' I hp ) )( " matrIX multiplication in (4 (j 2) . ,1' , I P mall',IX RI\" Till' · . " ac IlP\,ps III l'ffl'ct I I 1 exp IIClt averaging over re!Jli .. I . ' ( ,' samp PI1( s as I hp (aes III 4 (j 4 I 10 Ill' I f of R w. Note that this• <argu Illen t t"rpa'k's (10 , ,. o(1\1(,'(' a usp ul l'slimale parameters defining thp rnpal' ' \ \ II 1111 pss 1'. t IIf' 1lI1111lwr of . 1 rpspOIIS(' for I lIP I I I' is much less than m , .tll<> 1'1111 II I f' 1I0S "(Imp Icalpd Illodpl. wr 0 l'xppr ' > t I . down when, in effect any onp rp,,,,' IIlH!la 111I11s. It "Iso hrpaks 1Sf( SSIOIl paralllf'l"r i' "1' If a small part of t1w. , of V() under tl1l' dp(:larpd assumptionl ;ha~ o IS m ependent of t.reat.ment. For a furt.her dismssion. spp \"hilp (1982) Zeger et at. (1985), Liang and Zeger (1986) and Zegpr and Lianl!, (1986), . ,1(',

Example 4.1. Growth of Sitka spruce with and without ozone We now use the approach described above to analyse the data from Example 1.3. Recall that the data consist of measurements on 79 sitka spruce trees over two growing seasons. The trees were grown in four controlled environment chambers, of which the first two, containing 27 trees each, were treated with introduced ozone at 70 ppb whilst the remaining two, containing 12 and 13 trees, were controls. As the response variable we use a log-size measurement, y = log(hd 2 ) where h denotes height and d denotes stem diameter. The question of scientific interest concerns the effect of ozone on the growth pattern, rather than the growth pattern itself. Chamber effects were thought to be negligible and we shall see that this is borne out in the results. Figure 4.2 shows the observed mean response in each of the four chambers. In the first year, the four curves are initially close but diverge progressively, with the control chambers showing a higher mean response than the treated chambers. In the second year, the curves are roughly parallel. . Thrning to the covariance structure, we computed the REML estimates, Vo, separately for each of the two years using a .saturated ~odel for the means in each case. That is, we used (4.6.4) With 9 = 4 III each ca.se , and n = 5 and 8 for the 1988 and 1989 data, respectively. The resultmg estimates are, for 1988, 0.445

fo=

0.404 0.397

0.373 0.372 0.370

0.370 0.375 0.372 0.401

o0.376 370J 0.371 00401 00410

,

GENERAL LINEAR MODELS FOR LONGITUDINAL DATA

ROBUST ESTIMATION OF

74

STANDARD ERRORS

15

7

7

/~~--==-. /'

-::"

6

re 'US

~

~

"

-"'" --' " ,/ I" /1.:;...•

6 Ql

N

"iii

1/

b> 0

Ii

5 1

I,'

5

...J

~

4

4 3

3

200

0

·tka. Fig. 4•2. SI chambers.

400

600

800

200

Days since 1 January 1988 spruce

400

600

"'T

800

Days since 1 January 1988

data: mean response profiles in each of the four growth

and, for 1989,

Vo=

0

0.457 0.454 0.427 0.417 0.451 0.425 0.415 0.409 0.398 0.396

0.433 0.422 0.408 0.418 0.431 0.420 0.406 0.416 0.412 0.404 0.390 0.401 0.410 0.402 0.388 0.400 0.434 0.422 0.405 0.418 0.415 0.399 0.412 0.394 0.402 0.416

The most striking feature of each matrix Va is its near-singularity. This arises because the dominant component of variation in the data is a random intercept between trees, as a result of which the variance structure approximates to that of the uniform correlation model 5.2.2 with p close to 1. Using the estimated variance matrices Va, we can construct pointwise standard errors for the observed mean responses in the four chambers. T.hese standard errors vary between about 0.1 and 0.2, suggesting that dIfferences between chambers within experimental groups are negligible. We t~erefore proceed to model the data ignoring chambers. FIgure 4.3 shows the observed mean response in each of the two treatment groups. The. two curves diverge progressively during 1988, the second year of the expenment, and are approximately parallel during 1989. The ?verall growth pattern is clearly non-linear, nor could it be well approxImated by a low-order p l ' I"m tIme. For thIS . reason, and because the o ynomla

primary inferential focus is on the OZOlH' effect. we make no attempt to mo~el the overall growth pattern parametrically. Instead, we simply use ~ se~arate ~ararneter, 13) say, for the treatment mean response at the Jth tlme-pomt and concent.rate our modelling efforts on the control versus treatment contrast. . For the 1988 data, we assume that. t.his cont.rast is linear in time. Thus, If J-ll (t) and J-l2(t) represent. t.he mean response at time t for treated and control trees, respectively, then

and 112(t))=I3)+7)+,f).

j=I ..... 5.

(4.6.7)

To estimate the parameters 13),7) and I in (4.6.7) we use OLS. that is, W = I in (4.6.1). To estimate the variance matrix of the paramet.er estimates, we use the estimated Va in (4.6.2). The result.ing estimates and standard errors are given in Table 4.3(a). The hypothesis of no treatment effect is that "7 = 'Y = O. The test statistic for this hypothesis, derived from (4.6.5), is T = 9.79 on two degrees of freedom, corresponding to p = 0.007; this constitutes strong evidence of a negative treatment effect, that is, ozone suppresses growth. Note that for the linear model-fitting, we used a scaled time variable, x = t/100, where t is measured in days since 1 January 1988. For the 1989 data, we assume that the control versus treatment contrast is constant in time. Thus,

76

MEAR MODELS FOR LONGITUDINAL DATA I GENERAL L.~ ROBUST ESTIMATION OF STANDARD

res estimates and robust standard errors Table 4.3. Ordinary eas ~8q;:e model fitted to the Sitka spruce data. for mean value parameters III (33 f34 f35 TJ 'Y (32 (31 Parameter I

(a) 1988 Estimate SE

4.470 0.086

4.060 0.090 f3I

t

(32

/33

4.483 0.083

5.179 0.086

5.316 0.087

-0.221 0.220

(a) 5 . 5 1 - - - - - - -

(til

/34

(36

(37

(38

~

6.040 6.127 6.128 6.130 0.354 Estimate 5.504 5.516 5.679 5.901 0.088 0.086 0.088 0.156 0.089 0.086 0.091 0.090 0.087 SE

.....

5.5 3.5

-'-'-'-'-'-'-"

60

=-~:---y---~---.J 150

200

250

Days since 1 January 1988

5.0

300

_.-' - _._._._.- ..

"".""'~'

.....

r

(b) 1989

77

6.5

.J!: 4.0

0.213 0.077

ERRORS

7.0f-----

5.0

100

/35

_

I

_._.-.-

11;;;OO~--:-;15 O::--.---..-----.J 200 250 Days since 1 January 1989

300

Fig. 4.4. Sitka spruce data' estim t d and 95% pointWise confidence 'limits ~a e response profiles, shown 88 solid dots or the OZOne-treated group.. ( a) 1988 growing' season; (b) 1989 growing sea<;on.

0.5

and

1l2(tj)

= /3j + TJ,

j

= 6, ... ,13.

0.4

(4.6.8)

To estimate the parameters in (4.6.8) and their st~ndard errors, we again use W = I and the estimated variance matrix Vo in (4.6.2). The resulting values are given in Table 4.3(b). The hypothesis of no treatment effect is that TJ = O. The test statistic derived from (4.6.5) is T = 5.15 on one degree of freedom, corresponding to p = 0.023; alternatively, an approximate 95% confidence interval for TJ is 0.354 ± 0.306. • Figure 4.4 shows the estimated treatment mean response, fl.1 (t j ) = {3j, together with pointwise 95% confidence limits calculated as plus and minus two estimated standard errors; note that under the assumed model, both control and treated data contain information on the {3j. Figure 4.5 shows the observed mean differences between control and treated trees, with the estimated control versus treatment contrast. The model appears to fit the data well, and the analysis provides strong evidence that ozone suppresses early growth. Incidentally, we obtained virtually identical results for the Sitka spruce data using an exponential working variance matrix, in which the (j, k)th element of W-i is equal to 0.9 1j-k I. This agrees with our general experi~nce that the precise choice among reasonable working variance matrices IS usually not critical.

Ql N

0.3

I

"iii

en0

-l

0.2 0.1 0.0 0

200

400

600

800

Days since 1 January 1988

Fig. 4.5.

Sitka spruce data: observed and fitted differences in mean response profiles between the control and ozone-treated groups. *: observed; - : fitted.

rates. Each pig was weighed in nine successive weeks. We therefore adopt the following as our working model for the weight, }lij, of the ith pig in the jth week:

Example 4.2. Weights of pigs As a seco~d example of the robust approach, we again consider the data on the weIghts of pigs, previously discussed in Example 3.1. There, we ~e~a:ked that the overall growth curve was approximately linear, but that mdlVldual pigs appeared to vary both in their initial size and in their growth

2) W' "-' N(O v 2 ) and Zj "-' N(O, r 2 ) are all mutually h U•. '" N(O ,(J" were , J 'b h independent. In ((4.6.9)), the parameters 0' and fl. de~cn .e t e mean . b th £ cus of SCIentific mterest, an d growth curve, whICh we suppose to e e o . " f X j = j identifies weeks 1-9. The model implies that the variance matnx 0

GENERAL LINEAR MODELS FOR LONGITUDINAL DATA ROBUST ESTIMATION OF STANDARD

Y.

ERRORS

= (}'til' .. , }'t9) has jth diagonal element 2 2 2 X2 V"=(7+r+// j' JJ

6.1

and jkth off-diagonal element

6.3 7.8

7.0 9.0 12.6

Vo = "t f th robust approach we now make some intenTo convey t he spm 0 e 2 ' 2 2 . . tionally crude estimates of the parameters (7., // and r by SImple vIsual . . 0 f F'Igs 3 .1 an d 3., 2 First , from FIg. 3.1 we see that the range InSpectIOn of initial weights is very roughly 10 kg. Taking this as, say, four standard deviations implies that

Treating the spread of final weights similarly implies a standard deviation of about 6 kg, hence

These together imply that //2 ~ 0.4 and (72 + r 2 ~ 6. Finally, from Fig. 3.2 the range of the time-sequence of nine standardized residuals from an individual pig is typically much less than 1, suggesting that r 2 is small relative to (72, say r 2 ~ 1, and therefore (72 ~ 5. These crude calculations imply the following working variance matrix,

W- I =

6.6 8.2 9.8 12.4

7 9.0 11.0 13.0 16.0

7.4 9.8 12.2 14.6 17.0 20.4

7.8 10.6 13.4 16.2 19.0 21.8 25.6

8.2 11.4 14.6 17.8 21.0 24.2 27.4 31.6

8.6 12.2 15.8 19.4 23.0 26.6 30.2 33.8 38.4

Using a block-diagonal weight matrix with non-zero blocks W in the weig~ted least-squares estimator (4.6.1) then gives estimates & = 19.36 and (3 = 6.21.

7.3 8.4 7.7 9.5 11.1 10.4 12.7 14.9 14.3 13.9 16.3 15.5 20.6 18.6 19.8

8.0 10.8 14.9 16.1 19.3 21.3 24.7

8.4 10.8 15.7 16.8 19.9 22.4 25.9 29.4

8.7 11.7 17.3 18.6 22.6 25.1 28.9 33.3 40.1

U s~ng a block-diagonal matrix with non-zero blocks if, . estimate of V in (4.62) we obtain t h ' . 0 88 th~ consIStent . , e variance matrIX of (&, (3) 88

Rw

6.4 5.8 6.2 7.6 7.4 9,6

79

. The REML estimator (4.6.4), which in . . with the sample variance mat" this Simple example coincides nx,lS

= [0.1631

-0.0049] 0.0085 .

In partic:Ilar, the respective standard errors of 0: and ~ are 0.40 and 0.09. In thIS example, the reasonably close agreement between W-I and Vo suggests that the assumed form of the working variance matrix has successfully captured the essential structure of the true covariance. However, we emphasize that whilst this is desirable from the point of view of efficiency of estimation of (3, it is not necessary for consistency of estimation. Another implication of the close agreement between W- I and Vo is that the estimated variance matrix of (o:,~) will be close to the estimate, (X 'W X) -1 that would have been obtained had we simply assumed the working variance matrix to be correct. This latter estimate is -0.0017] 0.0087 .

However, we again emphasize that in general, (X'W X)-l is not a valid estimate of the variance matrix of the regression parameters. The robust approach described in this section is extremely si.mple to implement. The REML estimates of the covariance structure.are SImple to compute provided that the experimental design allows the fittm~ of ~ saturated model for the mean response, and the remaining calculatlOns mvolve only standard matrix manipulations. By design, consisten~ infe.rences for the mean response parameters £1011ow from a correct speclficatlOn of the mean structure whatever the true covariance structure. . , h h uld nevertheless conslder However, there are good reasons w y we s 0 careful, explicit modelling of the covariance structure.

80

DELS FOR LONGITUDINAL DATA GENERAL LINEA R MO

. of estl'mation has two distinct strands. , . t that of effiClency One argumen, . . I weighted least-squares estImate uses a Firstly, the theoretically op~lma rtl'onal to the true variance matrix, so . h t' hose inverse IS propo . . . welg t ma fiX w th data to estimate thIS optImal weight it would seem reasonablehto use emeasurements per experimental unit, . , · S dly when t ere are n matnx. econ, 1 ( + 1) parameters to descnbe the variance n the robust approach uses 2 n t'mated from the data. In contrast, the · a 11 o f hich must b e es I . matnx, w. lve many fewer parameters, which can . ce structure may mvo " true cavanan d urately than the unconstramed variance b stimate more ace . . h uments constitute a case for hkehhoodthemseIves e e · Taken together t ese arg matnx. . h" plicit parametric model for both the mean based inference Wit m an ex ever as we have already argued, the resultand covariance structure. How , 11 b t' h d b . 'ns in efficiency are often modest, and may we e ou ",:,eIg e y mg gal. f ' t cy when the covariance structure IS wrongly the potentIal loss 0 consls en specified. b er, n, 0 f measure. prac t'lcaI argument concerns the num . ,. . A more baslC . t I 't When n is large the objectIOn to estlmatmg ments per expenmen a urn . '. 1 (1) -n n + paramet ers l'n the covariance structure gams force. In extreme gases n can exceed the available replication. A'third argument concerns missing values. The robust ap?roach uses the replication across experimental units to estimat~ the covarIance structure non-parametrically, and this becomes problematIcal when there are many missing values, or when the times of measuremen~ are not. common to all the experimental units. In contrast, a parametnc ~od~llmg.ap~roach can accommodate general patterns of missing values wlthm a hkehhoodbased inferential framework. This does assume that the missing value mechanism is uninformative about the parameters of interest; the whole issue of informative missing values will be addressed in Chapter 13. Finally, although robustly implemented OLS will usually give estimators which are reasonably efficient, we saw in the crossover example of Section 4.3 that this is not guaranteed. This argues in favour of using robust weighted least squares, in which the weighting matrix is chosen to reflect qualitatively reasonable assumptions about the covariance structure. To summarize, the robust approach is usually satisfactory when the data consist of short, essentially complete, sequences of measurements observed at a common set of times on many experimental units, and care is taken in the choice of a working variance matrix. In other circumstances, it is worth considering a more careful parametric modelling. The next chapter takes up this theme.

5

Parametric models for covariance structure 5.1

Introduction

In this chapter, we continue with the general linear model (GLM) (4.2.1) for the data, but assume tha~ the covariance structure of the sequence of measurements on each experimental unit is to be specified by the values of a few unknown parameters. Examples include the uniform correla.tion model (4.2.2) and the exponential correlation model (4.2.4), each of which uses two parameters to define the covariance structure. As discussed in Section 4.6, a parametric modelling approach is particularly useful for data in which the measurements on different units are not made at a common set of times. For this reason, we use a slightly more general notation than in Chapter 4. Let Yi = (Yil,' .. , Yin.) be the vector of ni measurements on the ith unit and t i = (til, ... , tin.) the corresponding set of times at which these measurements are made. If there are m units altogether, write Y = (Yl"'" Yrn), t = (tl,"" t rn ) and N = L:7:1 ni' We assume that the Yi are realizations of mutually independent Gaussian random vectors Y i , with Y

i

rv

MVN{Xd3, V;(t;,a)}.

(5.1.1)

In (5.1.1), Xi is an ni x P matrix of explanatory variables. The unknown parameters are /3 of dimension p, and a of dimension q. Note that the mean and covariance structures are parametrized separately. Also, when convenient we shall write the model for the entire set of data Y as Y

rv

MVN{X/3, V(t,a)}.

(5.1.2)

In (5.1.2), the N x p matrix X is obtained by sta~king the ~nit-specific matrices Xi, and the N x N matrix V (.) is block-diagonal, WIth non-zero

. d' al blocks l1;(.) Our ~ot~tion emphasizes that the natural setting for. most l~nfigltu dllli . f thO we shall denve speCI c mo e s data is in continuous time. Because a ~s,_ . are sampled from by assuming that the sequences, }ij, J - 1, ... , n"

82

PARAMETRIC

MODELS FOR COVARIANCE STRUCTURE MODELS

, f an underlying continuous-time stochastic process, independent COpIes a Yo _ Y:(t) j = 1"", ni; i = 1,.", m. In the next {Y(t), t E R}. Thus, Iij -f t dt}el~ which fall within the general framework , ' e give examp es 0 mo d'/X t kinds of stationary and non-statIOnary sec t lOn, w 5 11) and show how lueren . . h d 1: fi of ( , , , ' . II lIter sections, we develop met 0 s lOr tting behavIOur anse natura y, n a , . d ta and describe several apphcatIOns, the models to a I h.' h we shall use to describe the properties of each The principal thoo s. w IC {Y(t)} are the covariance function and its d I for the stoc astlC process mo e . Recall from Section 3.4 that the variogram close relation, the vanogram, . of a stochastic process {Y(t)} is the functIOn

')'(u) = ~E [{Y(t) - Y(t - u)}2J,

u 2: 0,

, Y(t) if p(u) denotes the correlation between Y(t) For a statzonary process , and Y(t - u), and a 2 = Var{Y(t)}, then

')'(u) = a 2 {1- p(u)}. 5.2 Models at least ., .. I nor der t 0 develop a useful set of models we need to . understand, qualitatively, what are the likely sourceS of random vanatlOn m longltu~mal data. In the earlier chapters, we have shown several examples of the kInds of random variation which occur in practice. Our experience with these and many other sets of data leads us to the view that we would wish to be able to include in our models at least three qualitatively different sources of random variation. 1. Random effects: When units are sampled at random from a population, various aspects of their behaviour may show stochastic variation between units. Perhaps the simplest example of this is when the general level of the response profile varies between units, that is, some units are intrinsically high responders, others low responders, For example, this was very evident in the data of Example 1.3 concerning the growth of Sitka spruce trees. 2, Serial correlation: At least part of any unit's observed measurement profile may be a response to time-varying stochastic processes operating within that unit. For example, in the data of Example 1.4 concerning the protein content of milk samples, the measured sequences of protein contents must, to some extent, reflect the biochemical processes operating within each cow. This type of stochastic variation results in a correlation between pairs of measurements on the same unit which depends on the time separation between the pair of measurements. Typically, the correlation becomes weaker as the time separation increases.

83

3, Measurement error' Es 'all involve some kind of s peel,l Y w~en the individual measurements amp 109 Wtthtn 't h may itself add a component f ,U~I S, t e measurement process :an,atJ.on to the data, The data on protein content of milk sam samples taken simultaneous~ fr gam Illustrate the point. Here two ured protein contents beca omtha cow would have different ~eas, use e measurement an assay technique which itself' t d . process involves variation, 10 ro Uces a component of random

1:

There are many different ways I'n h' h h l't f f w IC t ese be incorporated into specific model Th ~'Il .qua I a Ive eatures could s, e 10 OWtng add'f f la . tractable and, in our experience , useful . lIVe ormu bon is First, we make explicit the separation bet structures by writing (5,1.2) as . ween mean and covariance Y=X{3+£..

(5.2.1)

MVN{O, V(t,o)},

(5.2.2)

It follows that £.

"J

Now, using fij to denote the element of £. which corresponds to the 'th measurement on the ith unit, we assume an additive decomposition into random effects, serially correlated variation and measurement erro~. This can be expressed formally as

O/E"

(5.2.3) In this decomposition, the Zij are a set of N mutually independent Gaussian random variables, each with mean zero and variance 7 2 , The V i are a set of m mutually independent. r-element Gaussian random vectors, each with mean vector zero and covariance matrix G, say, The d ij are r-element vectors of explanat.ory variables attached to individual measurements. The Wi(tij) are sampled from m independent copies of a stationary Gaussian process with mean zero, variance a 2 and correlation function p(u), Note that the Vi, t.he {Wi(tij)} and the Zij correspond to random effects serial correlation and measurement error, respectively. In 'applications, the assumed additive structure may be mor~ re~ onable after a transformation of t.he data. For example, a loganthmlc . would convert an un d er1ymg . mu lt'plicative st.ructure to an 1 transformatIOn additive one. t fon To say anything more useful about (5.2,3) we need,some more no ~ I . . ) for the vector of random varmbles, fij, assocmte W n'te €.. -- (ftl,"" fm ; , . h 'h w d·· Let Hi with the ith unit. Let Di be the ni x r matnx ~t J~. ~ t that is lk be the n' x n' matrix with (j, k)th element hi}k - p( I, 'J 'b h' t t ( ) d W (t ) Fmally let Ii e t e h ijk is the correlation between Wi tij an ,tk . ,

d

IT

PARAMETRIC MODELS FOR COVARIANCE STRUCTURE

B4

nj

X

nj identity matrix. Then, the covariance matrix of Ei is

MODELS (a) 1.0

Var(Ei)

= DiGD: + a 2 H i + r 2 Ii •

(5.2.4)

In the rest of this section, we look at particular examples of (5.2.4). Because all of our models have the property that measurements from different units are independent, we drop the subscript i, and write (5.2.4) as

0.8

y(u)

0.4

5.2.1

o

0.2

(5.2.5) In (5.2.5), E = (lOb"" f n ) denotes a generic sequence of n measure_ ments from one unit. In what follows, we write t = (tt, ... , t n ) for the corresponding set of times at which the measurements are made.

2

0.6

-1

2

3

4

u

(b) 1.0

Pure serial correlation

5

r--------_

0.8

For our first example we assume that neither random effects nor measurement errors are present, so that (5.2.3) reduces to

20

2S

5

10

15

20

25

5

10

15

20

25

0.6

and (5.2.5) correspondingly simplifies to

o -1

-2

0.0

o

2

3

u

are

15

y(u)

0.2

f'

10

2

0.4

Now, (]'2 is the variance of each fj, and the correlations amongst the determined by the autocorrelation function p(u). Specifically,

5

4

(e) 1 . 0 , - - - - - -

J

5 _

2

0.8 0.6 y(u)

The corresponding variogram is I'(u) = (],2{1_ p(u)}.

a

0.4

(5.2.6)

-1

0.2

-2

Thus, 1'(0) = 0 and I'(u) --t a 2 as T . II () fu t' f U --lo 00. yplca y, I' u is an increasing nc IOt~ 0 U because the correlation, p( u), decreases with increasing timesepara lOn, u.

;:--~--:c----.---,---,J

o

2

u

3

4

5

Fig. 5.1. Variograms (left-hand panels) and simulated realizations (right-hand panels) for the exponential correlation model: (a) ¢J = 0.1; (b) ¢> = 0.25;

A popular choice for p(u) is the exponential correlation model, p(u) = exp( -¢u),

0.0

(5.2.7)

for some value of ¢ > 0 F' e model for (]'2 = 1 and ¢ =' o. ;g;r 5.1 shows the vario~ram ~ 5.2.6) .of this sequence of n _ 25 ' .25, and 1.0, together WIth a SImulatIOn of a 25 A . measurements at times t· - 1 2 I' J , , ••• , . s 1; Increases, the strength of the a t u ocorre atlOn deere .' th h the autocorrelatI'on b t ases, m e t ree examples shown, e ween successive " 0.905 0.779 and 0368 . measurements IS approxImately , , " respectIvely. The correspondmg '. vanograms and

(c) 1> = 1.0.

simulated realizations reflect this; as ¢ increases, the variograms approach their asymptote more quickly and the simulated realizations appear less smooth. In many applications the empirical variogram differs from the exponential correlation model by showing an initially slow increase as u increases from zero, then a sharp rise and finally a slower increase again as it

METRIC MODELS FOR COVARIANCESTRUCTURE

sa PARA • tures this behaviour is the 't asymptote. A model whIch cap approach as IS. elation function, so-called GaussIan corr 2) (5.2.8) p(u) = exp( -¢u , . th varlOgram a nd a simulated realization . o Figure 5.2 shows e an d 1., 0 with times of observatIon for some ¢ >. h of ¢ = 0.1,0.25, £ r 172 = 1 and eac . tjo = 1, 2 , ... , 25 , as in FIg. 5.1. (s) 1.0

2

0.8 0.6

0

r(lI) 0.4

-1 0.2

-2 0.0 0

2

3

4

5

5

10

15

20

25

u

(b) 1.0

2 0.8 0.6 r(u)

0 0.4

-1 0.2

-2 0.0 0

2

3

4

5

5

25

u (e) 1.0

2 0.8

rv

0.6 r(u)

0

0.4

-1

0.2

-2

0.0 0

2

u

3

4

5

5

10

15

20

25

Fig. 5.2. Variograms (left-hand panels) and simulated rearI~t'1~~.s (right-hand ¢ ;::= 1.0. panels) for the Gaussian correlation model: (a) cP = 0.1; (b) cP _ O. ,(c)

II!

MODELs

A panel-by-ptmel COlllPBrison between F"1&'8 5.1 and 5.2 is interesting. For both models the PlU'ameter tjJ has a qualitatively similar etlect: 88 tjJ increases the variogt"8.Ql rises lQOre ahacply and the simulated realizations are less smooth. Also, for each value of tP, the correlation between sUoce&rive unit-spaced .measurements ~ the BaIne for the two models. When tjJ = 1.0 the correlatIOns at larger tlme-sep&.rations decay rapidly to zero and the corresp~ndingsimulations of the two models are almost identical (the 8&Ille underlymg random number stream is used for 811 the simulations). For smaller values of cP the simUlated realization of the Gaussian model has a smoother appearance than its exponential counterpart. The most important qualitative difference between the Gaussian and exponential correlation functions is their behaviour near u :: O. Note that the domain of the correlation function is extended to 811 real u by the requirement that p( -u) = p(u). Then, the exponential model (5.2.7) is continuous but not differentiable at u = 0, whereas the Gaussian model (5.2.8) is infinitely differentiable. The same remarks apply to the VU'iogt"&m, which is given by 'Y( u) = (12 {1 - p( u)}. If we now recall the definition ofthe VBri~ ogram as 'Y(u) = ~E[{f(t) - f(t - u)}2J, we see why these considerations of mathematical smoothness in the correlation functions translate to physical smoothness in the simulated realizations. In particular, if we consider a Gaussian model with correlation parameter cPl' and an exponential model with correlation parameter ¢2, then whatever the values of cf>l and cf>2, there is some positive value, Uo say, such that for time-separa~ions less than uo, the mean squared difference between R(t) and R(t-u.o) IS s~al1er for the Gaussian than for the exponential. In other words, on a suffiCiently small time-scale the Gaussian model will appear to be smoother than the exponential. . Another way to build pure serial correlation models 18 t~ assume an explicit dependence of the current fj on a li~ited ~um~er of I~S pre~~~Th's a roach has a long history III tIme senes an , 1 pp For longitudinal data, the approach was sors, fj-I,· .. , fl· going back at least to Yule (1927)h d t d Gabriel's (1962) terminology proposed by Kenward (1987), w 0 a o~ e d I' which the condiof ante-dependence 01 011der p to descnbe a mo e IIIfl depends only on . . f given its predecessors fj_I,.'" tional distnbutlOn 0 fj . bl . with this property is more A ce of random vana es f J . • fj-I,· .. , fj_p' sequen d l after the Russian mathematICIan usually called a pth order Markov mo fet'h' 'kind Ante-dependence modeld · d t h tic processes 0 IS. (C d who first stu Ie s oc as . t d f Markov processes ox an ling therefore has close links WIth the s u ~ o. the graphical modelling of Miller, 1965), and with recent developmen S III multivariate data (Whittaker, 1990). d dence model for a zero-mean I I the fj can The simplest examp e of an ante-. epen cess In this mo de, sequence is the first-order a.utore~resswe pro .

(5.2.9)

be generated from the relatIOnshIp fj = (lfj-l

+ Zj,

PARAMETRIC MODELS FOR COVARIANCE STRUCTURE

88

MODELS

th Z. are a sequence of mutually independent N(O, (12) random vari:bl:~ an~ the process is initiated by lOa ,..." ~~O, (12 /(1 ~0:2)}, An equi,v~lent l' I t' w'th the same initial conditIOns, specIfies the conditIOnal lormu a lOn, I distribution of Ej given Ej-l as h

Ej

I €j_1

f'V

N(O:Ej-ll 0'2),

Care needs to be taken in defining the analogous ante-~ependence l' the time-sequences of responses, rj, To take a very Simple illus' h'Ip b etween mo d'e1s lorppose that we wish to incorporate a I'mear reIatlOns rat lOn, BU 'h '11 tthe response Y and an explanatory variable, x, Wit sena y correlated random vari~tio~ about the mean response. One possibility is to assume that

Yj

= f3l X j + €j,

where the €.s follow the first-order autoregressive process defined by (5.2.9), with para~eters 0:1 and Another is to assume that the conditional

at,

distribution of}j given rj-l is }j Irj-l

f'V

N(f32 x j

+ 0:2 Y j-l, cr~).

of the n - P conditional denB' , 89 straightforward Ror e I Ibes, feO, This maL , ,xamp e 'f f ( ) , K e s estimat' f denSity function with ,I e ' 18 sPecified G Ion ° a very mean dep d" as a aussia b' , ables, so that the cond't' I ' e~ mg hnearly on th ~,pro ablhty N (L: P Q E' 2) Ilona dlstnbution of " e conditIOning varik=1 kJ-k,O:o thenthe d" E"gtvenE'_ k - l ' of the O:k are obtained sim I con 1~lonal maximum liketih - ',.: ,p, IS treating the observed val p Y bY ordmary least-squares (OLoSO)d estlm~tes , bl ues 0 f E' regression vana es attached to th ,-1"., ,Ej_p as a set f ' A t d e respOnBe E' P explanatory n e- ependence models are a' J' , for unequally spaced data, For ex:peah~g for equally spaced data I interpretation to the Q p mple, It would be hard to en ,ess so k arameters in th b o·ve a natural were not equally spaced in time S' 'I ~ a ave model if the measurement cope easily with data for which 'thl~~ ar y, ante-dependence models do n : to all units, e Imes of measurement are not comm~n

°

,One exception to the above remark is the ex ' ponentlal correlation model whIch can also be interpret d 'f i 'If € = (El e ) ash an ante-de pendence model of order 1' cally, SpeCl , . ' . , . , fn as a multi . t '. ' vana e Gaussian dIStribution With covanance structure Cov(tj, fk) = a 2 exp(-<jJ It j

Both are valid models, but they are not equivalent and cannot be made so by any choices for the two sets of parameter values, The conditional version of the first model is obtained by writing

-

tk

I),

then the conditional distribution of f · ' , Furth J gIVen all Its predecessors depends only on the value of f ' )-1' ermore fjIEj_I

rv

N(ajfj_l,a

2

(1-O:J2 )),

J.

= 2"",n,

where and re-arranging to give

YJIYJ-l

f'V

2 N(f3xj - O:Xj-1 + aYJ_1,cr ),

and

We take up this point again in Chapter 7, One very convenient feature of ante-dependence models is that the joint probability density function of € follows easily from the specified form, ~f the conditi~~al distributions, Let Ic( fj I Ej-l, ' , , , Ej_p; 0) be the condltlOnal probabIlIty density function of E'J gl'ven '-J-k, ~. k = 1 , .. " p , d an 10 (fl, . , , ,lop; 0) the implied joint probability density function of (fl,'" ,lOp), Then the joint probability density function of E: is

IT

As a final comment on ante-dependence models, we note that their Markovian structure is not preserved if we superimpose measurement error or random effects, For example, if a sequence Yi is Markov but we observe the sequence yt* = yt + Zt, where Zt is a sequence of mutually independent measurement errors, then yt* is not Markov, 5.2.2

n

f(fI"."fn;a)=/o(fI,.",Ep;a)

fl "" N(O, ( 2 ),

I(E·lf· c J J -1'

, , ., E' J-P''0) '

j=pH

Typically, feO is specified explicitly, from which it mayor may not be easy to deduce f,o(.)" H owever, 1' f P 'IS small and n large, there is usually nly Oh a sm,a11l~ss of mformation in conditioning on 10 1 ",lOin which case t e contnbutlOn to the avera11 l'k l'hood function is' simply , p, the product leI

Serial correlation plus measurement error

These are models for which there are no random effects in (5,2.3), so that }j = W(tj) + Zj, and (5,2.5) reduces to Var(€) =

(J2 H

+ T 2 I.

Now, the variance of each Ej is (12 + T 2 , and if the elements of H are specified by a correlation function p(u), so that hij = p( Iti - tj I), then the

gO

S FOR COVARIANCE STRUCTURE PARAMETRIC MODEL

4,---

_

MODELS

* 1.2

*, /

, ,,

, ,,

/

~

*

91

* 3

-

_

----- ------

0.8

rlUl 2

r(u)

0.4

0.0 l..--_r----r--r----.--r--:' o 2 3 4 5 6

o 0.0

u

1.0

2.0 u

. gr m and two different fitted models. *: sample variFig. 5.3. A samp Ie varIO a . . . ogram; __: fitted model with a non-zero mtercept; - - -. fitted model with a

3.0

Fig. 5.4. The variogram for a model wI'th d' a ran om mtercept ' Ben'&1 COrreI ' and measurement error. 81108

zero intercept.

(5.2.10)

where J is an n x n matrix with all of its elements equal to 1. The variogram has the same form (5.2.10) as for the serial correlation plus measurement error model,

A characteristic property of models with measurement error is that 1'(u) does not tend to zero as u tends to zero. If the data include duplicate measurements at the same time, we can estimate ,(0) = '(2 directly as one-half the average squared difference between such duplicates. Otherwise, estimation of '(2 involves explicit or implicit extrapolation based on an assumed parametric model, and the estimate ,(0) may be strongly model-dependent. Figure 5.3 illustrates this point. It shows an empirical variogram calculated from measurements at unit time-spacing, and two theoretical variograms which fit the data equally well but show very different extrapolations to u = o.

except that now the variance of each € j is Var( €j) = v 2 + (72 + 7 2 and the limit of 1'(u) as u -+ 00 is less than Var( fj). Figure 5.4 shows this behaviour, using p( u) = exp( -u 2 ) as the correlation function of the serially correlated component. The first explicit formulation of this model, with all three components of variation in a continuous-time setting, appears to be in Diggle (1988). Note that in fitting the model to data, the information on v 2 derives from the replication of units within treatment groups.

5.2.3

5.2.4

variogram becomes

Random intercept plus serial correlation plus measurement error

The simplest example of our general model (5.2.3) in which all three components of variation are present takes U to be a univariate, zero-mean Gaussian random variable with variance v 2 , and dj = 1. Then, the realized value of U represents a random intercept, that is, an amount by which all measurements on the unit in question are raised or lowered relative to the population average. The variance matrix (5.2.5) becomes

Random effects plus measurement error

It is interesting to speculate on reasons for the relatively late infiltratio~ of time series models into longitudinal data-analysis. Perhaps the explanatlOn is the following. Whilst serial correlation would appea: t~ be ~ natural . . data mo de, I 'm specific apphcatlOns Its effects feature of any longltudmal . . f dom effects and meBBurement may be dominated by the combmatlOn 0 ran . 22th · 5 4 if (72 is much smaller than elther 7 or v, e error. I n t erms 0 f F Ig. ., th two horizontal '. . . , squeezed between e f h dI Increasmg curve of the vanogram IS finement 0 t e rna e. dotted lines , and becomes an unnecessary re

PARAMETRIC MODELS FOR COVARIANCE STRUCTURE

92

MODEL-FITTING

'ally correlated component altogether, (5.2.3) If we eliminate the serI reduces to

93 4

fj=d;U+Zj'

, d I f th' kind incorIlorates a scalar random intercept, U, The SImplest mo e 0 IS with d j = 1 for all j, to give

2

2

Var(t:) = v 2 .] + r I.

y

, f h ' 2 + r 2 and the correlation between any two , The vanance 0 eac fj IS V measurements on the same unit is p = V2/(v2

-4

vf

vi:

vr + tjtkV~.

Figure 5.5 shows a simulated realization of this model with r 2 = 0.25, V[ = 0.5, and = 0.01. The fanning out of the collection of response profiles over time is a common characteristic of growth data, for which nonstationary random effects models of this kind are often appropriate. Indeed, this kind of random effects model is often referred to as the growth curve model, following its systematic development in a seminal paper by Rao (1965) and many subsequent papers including Fearn (1977), Laird and Ware (1982), Verbyla (1986) and Verbyla and Venables (1988). SandIand and McGilchrist (1979) advocate a different kind of nonstationary model for growth data. If S(t) denotes the 'size' of an animal or plant at time t, measured on a logarithmic scale, then the relative growth rate (RGR) is defined to be the first derivative of S (t). Sandland and McGilchrist argue that the RGR is often well modelled by a stationary random pro::ess ..For models of this kind, the variance of S(t) increases linearly over tIme, III contrast to the quadratic increase of the random slope and intercept model. Also, the source of the increasing variability is within, rather than between, units. In practice, quite long sequences of measurements may be needed to distinguish these two kinds of behaviour. Cullis and

vi

-2

+ 7 2 ).

thl's unl'£orm correlation structure is " sometimes called · 1< As note d ear1Ier, , l't 1 t d l' because of its formal equivalence the . correlatlOn a sp 1 -p 0 rna e .WIth , expenment. . d d by the randomization for a clasSIcal splIt-plot struc t ure III uce . . . . However, the randomization argument gives no justIficatIOn for ItS use WIth longitudinal data. More general specifications of the random effect c~mpo~ent lea~ to models with non-stationary structure. For example, If U IS a paIr of and and independent Gaussian random variables with variances d = (1 t) specifying a model with a linear time-trend for each umt but J 'J' . () 2222 random intercept and slope between umts, then Var tj = VI + t j v 2 + r , and for j :f- k, COV(tj,tk) =

0

5

Fig. 5.5. slopes.

10

15

A simulated realization of a model w'th I

20

25

d . ran om mtercepts and random

McGilchrist (1990) advocate using the Sandland an " routinely in the longitudinal data setting. Note tha~ ~~:II~:~~st sm~del data the model can be implemented b I' . q .y p ed R - S( ) _ . . y ana ysmg successIVe dIfferences, t t S (t - 1), usmg a statIOnary model for the sequence {Rt}. Random e~ects plus measurement error models as here defined can be t~ou~ht of ~ Incorporating two distinct levels of variation - between and wlt?l~ subjects and might also be called two-level models of random :ranatlOn. In some areas of application, the subjects may also be grouped In some way, and we may want to add a third level of random variation to describe the variability between groups. For example, in the context of animal experiments we might consider random variation between litters, between animals within litters, and between times within animals. Studies with more than two levels of variation, in this sense, are also common in social surveys. For example, in educational research we may wish to investigate variation between education authorities, between schools within authorities, between classes within schools, and between children within classes, with random sampling at every level of this hierarchy. Motivated by problems of this kind, Goldstein (1986) proposed a class of multi-level random effects models which has since been further developed by Goldstein and co-workers into a ~ystematic methodology with an associated software package, MLn. 5.3

Model-fitting

In fitting a model to a set of data, our objective is typically to ,ansdwel~ . h genera . t ed the data . Note that mo e questions about the process whlc

94

PARAMETRIC MODELS FO

R COVARIANCE STRUCTURE

We expect only that the former will nd 'process' are not synonYIDhousl' tt in the sense that it will contain a d a . tion to tea er be a useful approxlma h values can be interprete as answers f ameters w o s e , II h small number 0 par . d b the data. Typica y, t e mo d eI will to the scientific questIOns posh~ h Yre not of interest in themselves, but h ameters w IC a contain furt her par, hich we make about t e parameters of whose values affect the mferences w direct interest. . r bout the model parameters is an end rations mlerence a d ' . In many app IC ' . h t use the fitted model for pre IctlOn of the in itself. In others, we m~y WIS 0 e profile associated with an individual t' llous-tlme respons . . llnderlymg con m .' . S tion 5.6. Here, we conSider only the . Wi t ke up thiS Issue m ec llmt. e a h' h divide into four stages: model-fitting process, w IC we . - Ch ' g the general form of the model; . (1) formulatwn oosm (2) estimation - attaching numerical values to parameter~, (3) inference - ca1culatl'ng confidence intervals or testmg hypotheses about parameters of direct interest;

MODEL-FITTING 95

need for a transformation or .. h an In erently t. a random effects model . When th e pattern . non-s atlonary model such as f '. tionary, the empirical variogram b 0 vanatlon appears to be sta. can e used to t' covanance structure. In particul 'f 11 . eli nnate the underlying set of times the empirical variograr, I f~h umt~ are o~)served at a common for the underlying theoretical va~m 0 e~eslduaL" IS essentially unbiased and Verbyla (1998) describe a vers?gram (h iggle: ~990, Chapter' 5). Diggle Ion 0 f t e emplfJcal' . can be used when the pattern of variat" . . vanogram which '. Ion 18 non-statIOnary d I' . Note that If an Incorrect paramet . . . . reSIduals, the emplflcal variogram will bTicb' mo 1e 18 used to defi ne the e lase( to an unknown extent. .5.3.2 Estimation Our objective in this second stage is to attach numerical values to the parameters in the model, whose general form is (5,3.1)

(4) diagnostics - checking that the model fits the data.

We write and

5.3.1

Formulation

Formulation of a model is essentially a continuation of the explo~atory data · conSI'dered'n Chapter 3 but directed towards the speCIfic aspects anaIySlS 1 , . fth d t which our model aims to describe. In partIcular, we assume here o e a a h'd entI'fi cat'IO~ an,d that any essentially model-independent issues such as tel treatment of outliers have already been resolved. The focus of attentIOn IS then the mean and covariance structure of the data. With regard to the mean, time-plots of observed averages within treatment groups are simple and effective aids to model formulation for well-replicated data. When there are few measurements at anyone time, non-parametric smoothing of the data is helpful. For the covariance structure, we use residuals obtained by subtracting from each measurement the OL8 estimate of the corresponding mean response, based on the most elaborate model we are prepared to contemplate for the mean response. As discussed in Section 4.4, for data from designed experiments in which the only relevant explanatory variables are the treatment labels we can use a saturated model, fitting a separate mean response for each combination of treatment and time. For data in which the times of measurement are not common to all units, or when there are continuously varying covariates which may affect the mean response non-linearly, the choice of a 'most elaborate model' is less obvious. Time-plots, scatterplot matrices and empirical variogram plots of these residuals can then be used to give an idea of the underlying structure. Non-stationary variation in the time-plots of the residuals suggests the

Ct.

/3,

2

a- , and

0: for the estimates of the model parameters (3 a 2 '

,

,

Particular methods of estimation have been derived for special cases of (5,3.1). However, it is straightforward to derive a general likelihood method. The resulting algorithm is computationally feasible provided that the sequences of measurements on individual units are not too long. H computing efficiency is paramount, more efficient algorithms can be derived to exploit the properties of special covariance structures. For example, estimation in ante-dependence models can exploit their Markovian structure. The general method follows the same lines as the development in Section (4.4), but making use of the parametric structure of t~e varia,nce matrix, V(o), Also as in Section 4.5 we favour restric~ed ma.,{lI~um likelihood (REML) estimates over classical maximum li~e~lhood :stlm~tes t? reduce the bias in the estimation of 0, although thiS IS less ImpOltant If the dimensionality of (3 is small. For given 0, equation (4.5,5) holds in a modified form as (5.3.2)

and (4.5.6) likewise holds in the form

0'2(0) = RSS(o)j(N - p),

(5.3,3)

RSS(o) = {y - x/3(o)}'V(o(l{y - x!3(o)},

(5.3.4)

where

96

PARAMETRIC MODELS FOR COVARIANCE STRUCTURE ,.'.f..•..•

and N = Ez;,l ~ is the total number of measurements on all m units. Finally, the REML estimate of a maximizes £"(a) =

m

= L{Yi - x i /3(o)}'Vi(t i j O)-l{Yi - Xi~(o)}, i=l

(5.3.6)

and

+

Nlog{RSS(a)} + ~ log I Vi(t i ; a)

&,log I X;V;(t

5.3.3

-~ [NlogI\lllS(<>)} + ~Iog II\(!;;<>>I].

Inference

m

(5.3.8)

Inference about {3 ca b b with (5.3.1), implies ~ha: ssoo on the result (5.3.2) which in co' . , nJunctlon {3(0) '" MV N {{3, (12(X'V(0)-1 X)-l We assume that (539) t' }. (5.3.9) . " con mues to hold to b su stItute the REML estim t '2 " a good approximati if . (53 ) a es (1 and 0 for th .. ~I._ on, 'We ex m ..9. This gives e UlIAllOWn values of (12 and

where

f3 '" MVN({3, V),

(5.3.10)

v = a- (X'V(a)-lX)-1. 2

The immediate application of (53 10) . elements of (3. Almost as im d···t. 1Shto set standard errors on individual " . me 1a e 1S t e calculation of nfid . lor general lInear transformations of the form co ence regions 'IjJ= D/3,

I[

-2

L(Q) =

(5.3.5)

and the resulting REML estimate of (3 is /3 = /3(&). The nature of the computations involved in maximizing £" (ex) becomes clear if we recognize explicitly the block-diagonal structure of V (0). Write 'Yi for the vector of ni measurements on the ith unit and t i for the corresponding vector of times at which these measurements are taken. Write l{(tiia) for the non-zero ni x ni block of V(o) corresponding to the covariance matrix of Vi. The notation emphasizes that the source of the differences amongst the variance matrices of the different units is variation in the corresponding times of measurements. Finally, write Xi for the ni x P matrix consisting of those rows of X which correspond to the explanatory variables measured on the ith unit. Then (5.3.4) and (5.3.5) can be written as

L*(o) =

estimates maxitnize

-! [Nlog{RSS(a)} + log I V(a) I +log /X'V(a)-lXJ] ,

RSS(o)

.

(5.3.11)

~~~~ ~o: ~::~~::~tk~~a: fr ;~r~f3~i:~e: ~ p. Confidence regions for 1P

I

;p '" MVN('IjJ,DVD'), i ; 0)-1

X",] .

(5.3.7)

Each evaluation of L" (0) th £ . I £ th . ere ore mvo ves m determinants of order p x p or e second term m (5 3 7) and t t d . the V; (t·· ) I .: ' . amos m etermmants and inverses of fewer\h~nO~ ~~~t apphcatIOn~, the number of distinct Vi(t i ; a) is much ; r ermore, the. d1mensionality of 0 is typically very small sav three J or lour at most and m '. . ' our expenence It IS seldom necessary to use sophisticated a t' .' t' of Nelder and Meal(~~~~a) IOn algor~thms. Either the simplex algorithm requires the USer to 'd .o~ a qUa:'I-Newton algorithm, neither of which IProvl e ~n or~at1on on the derivatives of £"(0) usually works well Th . e on YexceptIOns m o ' , ur expenence have been when an averelaborate model is fltt d t a, are poorly determ' ed ~. spars~ dat.a and the covariance parameters, e REML estimates ma:: .' ~*a(llY) m thIS subsection, note that whilst the mIze a given by (5.3.7), maximum likelihood

from which it follows in turn that

(5.3.12) is distributed as X;. Let er(q) denote the q-critical value of X2 so that 2 r' P{X r 2 cr(q)} = q. Then, a 100(1 - q) per cent confidence region for 'ljJ is {'ljJ: T('IjJ) ~ er(q)}.

(5.3.13)

Using the well known duality between hypothesis tests and confidence regions, a test of a hypothesized value for 'IjJ, say H o: tP = tPo, consists of rejecting H o at the IOOq per cent level if T( tPo) > er ( q). Note in particular that a statistic to test H o: 'ljJ = 0 is (5.3.14)

whose null sampling distribution is

X;.

~.~ ,_c,

PARAMETRIC MODELS FOR COVARIANCE STRUCTURE

98

. Iement'm g the above method of inference, the In lmp . nuisance para2 d are estimated once only, from the maximal model for {3. meters, a an 0, d'ffi It . II h The choice of the maximal model is sometimes I cu.' espeCla y w en the . f ment are highly variable between tlmes 0 measure . , umts or, when there are continuously varying covariates. An aIt~rnatlve, step-up approach .may then be useful, beginning with a very simple model for (3 an~ conslder_ .mg t h e maxJmlze . , d log-likelihoods from a sequence of progres,nvely . . .. . more _ elaborate models. At each stage, the maximized lo~-lIkehho.od IS £(0), where L(o) is defined by (5.3.8) with the curren~ de~lgn matnx.X, and & .. L() maximizes o. Let Lk be the maximized log-lIkelIhood associated with the kth model in this nested sequence, and Pk the number of columns in the corresponding design matrix. The log-likelihood. ratio statistic to test the adequacy of the kth model within the (k + l)st IS

(5.3.15) If the kth model is correct, Wk is distributed as chi-squared on Pk+1 - Pk degrees of freedom. This likelihood ratio testing approach to inference is convenient when the data are unbalanced and there is no obvious maximal model. However, the earlier approach based on the multivariate Gaussian distribution for is in our view preferable when it is feasible because it more clearly separates the inferences about f3 and o.

i3

5.3.4

EXAMPLES

5.4

99

Examples

In this seetion . These involve: we giVE' three' applicati(JIls of the mod I b ed e - as. approach. (I) the data set of Example 1 4 L OIl tile protein ' from cows on each of thr~'' d·ff. .' content of milk samples t( I f>rpnt dwts' (2) a set of data on tho Liod y-welg 'h.ts of '' . 'I experiment; '.' cows in a 2 by 2 factorial

(:3) the data set of ExamplE' 1 1 CD 1 ,on , + cell numbers. Example 5.1. Protein contents of 'lk '., ml sampIes

These data were shown in Fig 1 4 Tl . . " ley COIlSlst of measur t f . content in up to 19 weekly samples t. 'k f emen so protem . ' a en rom each of 79 11 to one of three different diets- 25 co ' d b . cows a ocated '. . ws receive a arley diet 27 cows a mixed diet of barley and lupins and 27 c . d' f'.' , , , I' . ' " . ows a let 0 lupms only The Imtla lJ1spectlon of the data reported in eha t 1 . , p er suggested that the mean response profiles are approximately parallel sho . . . . I h . . . . ,wmg an mltIa s arp drop associated With a settlmg-in period ' followed . t Iy cOllStant . by an approxnllae mean response over most of the experimental period and . .. . . a POSSI'bl e gentl e nse towar~s the. en~. The empirical variogram, shown in Fig. 3.16, exhibits a smooth nse WIth IIlcreasing lag, levelling out within the range spanned by the data, Both the mean and covariance structure therefore seem amenable to parametric modelling. Our provisional model for the mean response profiles takes the form

(5.4.1)

Diagnostics

Diagnostic checking of the model against the data completes the modelfitting process. The aim of diagnostic checking is to compare the data with the fitted model in such a way as to highlight any systematic discrepancies. Since our models are essentially models for the mean and covariance structure of the data, simple and highly effective checks are to superimpose the fitted mean response profiles on a time-plot of the average observed response within each combination of treatment and ti~e, and to superimpose the fitted variogram on a plot of the empirical vanogram, Simple plots of this kind can be very effective in revealing inconsistencies and model which were missed at earlier stages. If so, these can be mcorporated into a revised model, and the fitting process repeated. More formal ' £or regression models are discussed by . d'Iagnos t'IC cn' t ena Cook and WeIsberg (1982) and Atkinson (1985) in the context of linear model~ for. cro~s-sectional data and, in a non-parametric setting by Hastie and Tlbshlram (1990). '

betwee~ data

where 9 = 1,2,3 denotes treatment group, and time, t, is measured in weeks. Our provisional model for the covariance structure takes the form of (5.2.10) with an exponential correlation function. For the model-fitting, it is convenient to extract a 2 as a scale parameter and reparametrize to al = T 2 / a 2 and 0:2 = v 2 / a 2 . With this parametrization the t~eoretic~1 varia~ce of each measurement is a 2 (1 + 0:1 + 0:2) and the theoretIcal vanogram IS

,(u) =

(T2{0:1

+ 1- exp(-a3 u )}.

The REML estimates of the model parameters, with estimated. st:;~~ ard errors and correlation matrix of the mean para~e;ers, are bglvet the . b d to make 1Il1erenCes a ou Table 5.1. This informatIOn can now e us~ W'· t amples d' d' SectIOn 5.3.3. e give wo ex . mean response profiles as Iscusse. I~ h tl the diets affect the mean Because our primary interest IS III we. ler _ (-l _ f.J To do h h othesls that POI - ,u02 - ,u03· response profiles, we first tes t t e yp

OVAWANCESTRUCTURE

100

PARAMETRIC MODELS FOR C

EXAMPLES

fitted to data on pro. for the rnO d eI . bJe 5.1. REML estimates Mean respoIlBe; (b) Covanance :n content of milk samples: (a) structure.

(a)

101

The REML estilIl8.te of "" is

1J, = Df30 = (0.10,0.11). From (5.3.12), the test statistic is

= 15.98,

To = tb'(DV1D')-1"" Parameter Estimate

,801 ,802 ,808 ,81 ,82 ,8s

4.15 4.05 3.94 -0.229 0.0078 0.00056

Correlation matrix

BE

1.00 0.054 0.52 1.00 0.053 0.52 0.53 1.00 0.053 1.00 -0.61· -0.62 -0.62 0.016 -0.33 1.00 0.0079 -0.60 -0.06 -0.06 -0.93 0.24 0.00050 0.01 0.01 0.02

which we refer to critical values of X2 S' P 2 we clearly reject t/J = 0 and conclud ~ha'tlUd~ h: 2 > 15.98} :: 0.0003, the •mean response profil e. Fu rt h ermore, the ordering ofe th th let affects . . h h e ree estunates fJ' 'bl Wit t e parameter estimate for the mixed diet 1 i '011' 18 sew e, two pure diets. y ng between those for the A question of secondary interest is whether there . '. h response towards the end of the experiment Th h 18: ~e m t e mean (32 == (33 == O. The variance submatrix of Z. ~ ({3' e (3'YP) ?t /:18 to test this is fJ 2, 3 18 V2, w here

== [62.094 -3.631]

106V;

-3.631

2

(b)

+ 1- exp(-a3 u )} , Var(Y) == 0"2(1 + al + (2). Estimate 0.0635

0.3901 0.1007 0.1674

this, we need only consider the REML estimates and variance submatrix of f3 0 = ((301,(302,(303)' This submatrix is

Vi =

0.0029 0.0015 [ 0.0015

0.0015 0.0028 0.0014

.

The test statistic is

")'(u) == 0"2{al

Parameter

0.246

0.0015] 0.0014 . 0.0028

Now, using (5.3.11) and (5.3.12) on this three-parameter system, we proceed as follows. The hypothesis of interest is 'l/J = D/3 = 0, where

o

which we again refer to critical values of X~. Since P{X~ > 1.29} = 0.525, we do not reject the null hypothesis in favour of a late rise in the mean response. Refitting the simpler model with (32 == (33 == 0 leaves the other parameter estimates essentially unchanged, and does not alter the conclusions about the treatment effects. Results for the model with Ih = {33 = 0, but using maximum likelihood estimation, are given in Diggle (1990, Section 5.6). Verbyla and Cullis (1990) suggest that the mean response curves in the three treatment groups may not be parallel, and fit a model in which the difference, f.Ll (t) - f.L3 (t), is linear in t. A model which allows the difference in mean response between any two treatments to be linear in t repl~ces (3~2 and (303 in (5.4.1) by (3021 + (3022 t and (3031 + (3032 t , respectively. Usmgt,hls eight-parameter specification of the mean response profiles as our workl~g . of POSSI'bl' model a hypothesIs e mteres t'm th a t (3022 -- (3032 = 0 , that IS that (15.4.1) gives an adequate description of the data. Fitting this enlarged model, we obtain

(~022' ~032) = (-0.0042, -0.0095), with estimated variance matrix, 6

A

10 Vj

V3' where [33.41

18.92J

== 18.92 41.22 .

EXAMPLES

COVARIANCE STRUCTURE

pA.RAMETRlC

MOOELSFOR.

a

102

0 bich is to be compared _ 0 is To:::: 2.21}, w 033 there is no real

t (j 2 == ,..,032 2 2 210 := • , • The sta.tistic to tes ~2 X2 • Since P{X2 ::>' we consider {3032 alone, ~he with critical values 0 . : /31 == o. Even degree of freedom, which 'dence against (3022 -0 ·so~O == 2.208, on °d~e tly comparable with the VI e t a == l . l l .' ot trec I h 1 statistic to t~ . ,..,032 This analYSIB IB n they consider on y t e ast i still not significant. d Cullis (1990), as. t' n of the mean response :ne reported in Verbylo. :n_para.metric descrtp 10 6 ti11le--points and use a the barley diet. 'zed by the parameter 1 £. coWS on .8 llUJIlmaf l profile, 11-1 () t '. or al model fOt the data. 1. lification that {32 = {33 = O. Our ptOVJsionbl 5 1 but with the SlffidP an response profiles and estbne.tes inTo. e · inca! and fitte me Ftgure 5.6 compares the amp

if

4.0

II}

3.8

108

varioglams. The fit to the variogram appean to be satisfactory. The rise in the empirical variogtam at u = 18 can be discounted &8 it is based on only 41 observed pairwise differences. Since these pairwise ditferences deri-.re from 41 different animals and are therefore approximately independent, e1& mentary calculations give a 95% per cent confidence interval for the mean of 1'(18) as 0.127 ± 0.052, which includes the titted value, "((18) = 0.085. At first sight, the fit to the mean response profiles appears less satisfactory. However, the apparent lack of fit is towards the end of the experiment, by which time the responses from almost half of the a.nil111\la are missing. Clearly, this increases the variability in the observed mean responses. AIao the strong positive correlations between successive measurements mean that a succession of positive residuals is less significant than it would be for uncorrelated data. Finally, and most interestingly, we might question whether calving date is independent of the measurement process. If 1atercalving cows are also more likely to produce milk with a lower protein content, we would expect the observed mean responses to rise towards the end of the study. The implications for the interpretation of the fitted model are subtle, and will be pursued in Chapter 13.

Example 5.2. Body-weights of cows 3.6

These data, provided by Dr Andrew Lepper (CSIRO Division of Animal Health, Melbourne) consist of body-weights of 27 cows, measured at 23 unequally spaced times over a period of about 22. months. T~ey are listed in Table 5.2. For the analysis, we use a tlm~scal.e which runs from 0 to 66, each unit representing 10 days. One ammal showed an abnormally low weight-gain throughout the experiment and we have

V 3.4 3.2 3.0

15

10

5

0

Weeks

Ib) 0.12

---------- ---------

-------0.08 y(u)

0.04

- -,--

-

,

0.0 0

5

u

10

15

Fig. 5.6. Milk protein data: comparison between observed and fitted m~a~ response profiles and variograms: (a) mean response profiles - - : barley dIet, .........: mixed diet; - - -; lupins diet; (b) variograms - - : sampie variograffii - - -: fitted variogram.

4.7 4.868 4.868 4.828

4.787 4.605 4.745 4.745

5.075 5.136 5.165 4.905 5.298 5.416 5.075 5.193 5.22 5.273 5.323 5.193 5.011 5.136 5.273 5.347 5.011 5.136 5.193 5.106 5.298 4.977 5.043 5.136 5.106 5.165 5.011 4.828 4.942 5.298 5.106 5.22 4.868 5.043 5.273 5.165 4.905 5.011 5.106 5.011

5.298 5.323 5.416 5.617 5.481 5.521 5.521 5.46 5.416 5.541 5.438 5.561

5.652 5.501 5.561 5.371 5.438 5.394 5.371 5.298 5.273 5.58 5.22 5.298 5.561 5.416 5.501 5.635 5.347 5.347 5.521 5.541

5.298

5.371

5.416 5.416

104

PARAMETRIC MODELS

FO

R COVARIANCE STRUCTURE

, analysis, The remaining 26 animals removed it from the model-flttmg ' a 2 by 2 factorial design, The ngst treatment s m 'L' ' were allocat ed amo / f I'ron dosing and of m!ectJOn by the nce absence 0 two factors were prese . Th plications in the four groups were: organism M, paratuberculo8z8" L' e ·re only 10 x iron and infection. For Iy 9x IOLect 10n , , 4x control,3x Iron o~ , . I b kground to these data, see Lepper et at. further details of the blOloglCa ac (1989). L' t' of the bodyweights as the response varinT Iog-trans!orma Ion . vve use a. . variance over time. The resultmg data are shown able, to stabilize the.. I rl'ogram of OLS residuals from a saturated , F' 5"7 The emplflca vathat is fitting a separate parameter for each mIg, model for the mean response, ' " A' h . d treatment is shown 10 FIg. 5.8. s In t e prevIOUS , . f t' combmatlOn 0 Ime an , (b)

EXAMPLEs 105

0.008

r(u)

0.004

o

20

u

40

Fig, 5,8. Log-bodyweights of cows' sample

(s)

.

60

. vanogram of residuals.

6,0

E

I ~

~

5.5

,gJ

5.5

I .c

~

5.0

5.0

4.5

4.5 '-,-_---,_ _- . - _ - - - , _ - l 20 40 60 o

o

20

nme

40

60

Time

(c)

')'(u) = a 2 {ol

E

~

.gJ

!

5.5

1 "2

.0

8'

+1 -

exp( -Q3U2)}.

(d)

6.0

~

section, .the model described'III Sectlon . 5.2.3 seems re bl th e d etalled behaviour is rath d'LI' h Mona e, although er Iuerent t an th t 0 f th . For these data the measurement . a 2 e mIlk protein data. t b . , error vanance T small, as we would expect whereas th b t ' . ,seems 0 e relatively , e e ween-ammal variance 112 • Iarge. Also, the behaviour of the .. I . ' , IS very . empmca vanogram near the ori in su _ gests that a Gaussian correlation function might b . g than g . . e more appropnate an exponential. WIth a reparametrization to '" 2/ 2 d 2 thO ".'1 - T a an 02 = V /a 2 IS 2 suggests a model in which the theoretical variance of log-weight ~ ( 2) and the theoretical variogram takes the form (J (1 + 0:1 +

.c

g 5.0

5.0

-l

4.5

o

4.5

20

40

nme

60

o

20

40

60

Time

Fig: 5.7. Log-bodyweights of cows in a . (b) Iron dosing; (c) infection' (d) . ~ by 2 factonal experiment: (a) control; , Iron dosmg and infection.

Fitting the mean response profiles into the framework of the general linear model, J.t = X {3, proves to be difficult. This is not untypical of growth studies, in which the mean response shows an initially sharp increase before gradually levelling off as it approaches an asymptote. Low-degree polynomials provide poor approximations to this behaviour. A way forward is to use 23 parameters to describe the control mean response at each of the 23 time-points, and to model the factorial effects, or differences between groups, parametrically. This would not be sensible if the mean response profile were of intrinsic interest, but this is not so here. Interest focuses on the factorial effects. This situation is not uncommon, and the approach of modelling only contrasts between treatments, rather than treatment means themselves, has been advocated by a number of authors including Evans and Roberts (1979), Cullis and McGilchrist (1990) and Verbyla and Cullis (1990). Figure 5.9 shows the observed mean response in the control g~'oup, and the differences in mean response for the other three groups relative to the

ARIANCE STRUCTURE PARAMETRIC MODELS FOR COY

106

EXAMPLES

107

(a)

6.0

-------- ---------

0.008

t

5.5

J

--------

y(u)

0.004

5.0

1_..--__----;J,;------~44(0)-----~66'iO)---I 4.5{ 20 Time

--- .. _-----o

20

40

u

(b)

0.0

Fig. 5.10. Log- bodyweights of cows: observed and fitted variograms. sample variogram; - - -: fitted model.

(,

J ..0.10

I

~ -0.20

~'\~(':"':/~.::-::;..::.::::::.:./~~:: --

::::.:::>:..,::.::::::::::::::::, ::.::::::::

,..,.'.'.>1;;""

''''~\I::::.'...''\ "/....,, ~_ ...... , I ~ ..::.... \". ,. ~ ..,.. . . . . . " \ .. ,--_______ I', " _----\ \/ "..... '''\', --).--.-..---,.'------.......... ..... \ '" --, ........ --" ' ...... ' I

"

,

,....... -

..... ~

,---_

\.---

......

/""-,

_ ...

\

~ / \ / _.\

1 \ -----..../-------........--cr\\

1 ......1

...1

-0.30

60

LO------2.,.O------:4CO--------;6~O~-----l Time

Fig. 5.9. Log-bodyweights of cows: observed and fitted mean response profiles: (a) control --: observed; - - -: fitted; (b) difference ?etween co~trol ~nd treated, with fitted contrasts shown as smooth curves ......... : Iron; - - -: mfectlOn; - -: both.

control. These differences do seem amenable to a linear modelling approach. The fitted curves are based on a model in which each treatment contrast is a quadratic function of time, whereas the control mean is described by a separate parameter at each of the 23 time-points. Note that the fit to the observed control mean is nevertheless not exact because it includes indirect inform~tion from the other three treatment groups, as a consequence of our usmg a parametric model for the treatment contrasts. The standard errors of the fitted control means range from 0.044 to 0.046. Figure 5.10 shows the fit to th '. I . d e empltlca vanogram. This seems to be satisfactory, an confirms that the dominant component of variation is between animals;

the estimated parameters in the covariance structure are fr2 = 0.0016 = 4.099, and 6 3 = 0.0045. ' Table 5.3 gives the parameter estimates of direct interest, with their estimated standard errors. For these data, the principal objective is to describe the main effects of the two factors and the interaction between them. Note that in our provisional model for the data, each factorial effect is a quadratic function of time, representing the difference in mean logbodyweight between animals treated at the higher and lower levels of the effect in question. One question is whether we could simplify the description by using linear rather than quadratic effects. Using the notation of Table 5.3, this amounts to a test of the hypothesis that /321 = (322 = /323 = O. The chi-squared test statistic to test this hypothesis is To = 10.785 on three degrees of freedom, corresponding to a p-value of 0.020; we t~erefore retain the quadratic description of the factorial effects. Att~ntlOn n,ow focuses on the significance of the two main effects and of the mter~tlOn between them. Again using the notation of Table 5.2, the hypo~hes1s of . effect £ . .1S ,vOl (.I (~- -- 0, that of no effect no mam or Iron = {311 -- 1-"21 . mam . . s:lor mlec • s: t'lOn 1S . {302 -- {312 -- {322 = 0 , and that of. no mteractlOn . . 1S {303 = {3Ol +{302; {313 = {311 +{312; {323 = /321 +/322' The ~h1-squared stat1st1cs, To each on three degrees of freedom, and th~ir assoc1ate~1;;alue:~~o~ s: 1'1ows: I ron - 'T' 1. 936 586' . ,p - . , 10 .L 0 , p = 0. , InfectIOn - To = cfI = 0.353, cf2

Interaction - To = 0.521, p = 0.?14. . 1 si nificant main effect of infecThe conclusion is that there 1S a hIgh y g ., I the interaction . iF t f' n and not surpnsmg y, tion, whereas the mam e ec 0 .Iro. has failed to alleviate the effect are not significant, that 1S 1ron osmg

d'

PARAMETRIC MODELS FOR COVARIANCE STRUCTURE

108

Table 5.3. Estimates of parameters defining the treatment contrasts between the model fitted to data on log-bodyweights of cows. The contra'Jt between the mean response in treatment group g and the control group is Jig (t) = (30g + 1319 (t - 33) + (32g(t - 33)2.

EXAMPLES 109

2500 Q; .0

Treatment Iron

Parameter f30l

f3ll f321

Infection

(302 (312 (322

Both

(303 (313

f323

Estimates

SE

-0.079 -0.00050 -0.0000040

0.067 0.00071 0.000033

-0.159 -0.0009S 0.0000516

0.053 0.00056 0.00026

-0.239 -0.0020 0.000062

0.052 0.00055 0.000025

~verse effect of in~ection, Refitting a reduced model with only an infectIOn effect, we o~talll the following estimate for the contrast between the mean response With and without infection: fl(t)

= -0.167 - 0.00134(t -

33) + 0.0000566(t _ 33)2.

This equation and the associated t d d provide a compact summa S an ar e.rrors of the three parameters The estimated standard ry of ;he con~luslOns about treatment effects. t r errors lor the mt coefficients are 0.042 0 00044 d0 ercep, mear, and quadratic ,. ,an .00002, respectively. Example 5.3. Estimation of the 0 I . For the CD4+ d t f E P pu atlOn mean CD4+ curve CD aa 0 xample 1.1 there N 4+ cell numbers on m _ 369 . are = 2376 observations of is measured in years with ~ . ~en mfected with the HIV virus. Time ~s known approximately for e:C°hn?ldn . a~ the date of seroconversion, which mterest· . m IVldual In thO IS In the progression of . IS example, our primary since seroconversion, although I'n smeat~ C D4+ count as a function of time from th . ec Ion S S '11 I . e pomt of view of pred' t' . we WI a so consider these data subjects 0 Ie mg the CD4+ t . . raJectones of individual . ur parametric model as.the response variable and spec~ses::uare-root-transformed CD4+ count o o : h: seroconversion and quadra~~c inet~ean~ime-trend, Ji(t), as constant acks mean response, We also includ me t ereafter. In the linear model (P per day)· r e ' e as explanat . b and d . I creatlOnal drug use ( / ory vana Ies: smoking epresSlVe sy t yes no)· numb f mp oms as meas d b ' ers 0 sexual partners' ure Y the CESD scale. '

r::

E

. "-

:::J

c

1500

CD

"

0

...

:.j

~

'

.

..,.+ 0 0

500 0

-2

o

2 Years since seroconversion

4

6

~i~. 5.11. CD4+ cell counts with parametric estim t '. lImIts (plus and minu.., two standa d 'J f a e an~ pomtwlse confidence r errors or the mean tIme-trend. . Fig~lre 5.11 shows the data with the estimated mean time-trend and pomtwlse confidence limits calculated a'J plus and minus two standard I errors, re-expressed on the original scale The standard e . '. rrors were ca:ulated from a parametric model for the covariance structure of the kind mtroduced in Section 5.2.3. Specifically, we assume that the variance of each 2 2 meas~e~ent 2is 7 + a + v 2 an~ that the variogram within each unit is /'(~) - 7 +a {1-exp(-o.ll)}. FIgure 5.12 shows the empirical variogram, USIng a grouping interval of 0.25 years, and a parametric fit obtained by maximum likelihood, which gave estimates f2 = 10.7, &2 = 25.0, (;2 = 2.3, and 0: = 0.23. Figure 5.11 suggests that the mean number of CD4+ cells is approximately constant at close to 1000 cells prior to seroconversion. Within the first six months after seroconversion, the mean drops to around 700. Subsequently, the rate of loss is much slower. For example, it takes nearly three years before the mean number reaches 500, the level at which, when these data were collected, it was recommended that prophylactic AZT therapy should begin (Volberding et at., 1990). As with Example 5.1 on the protein contents of milk samples, the interpretation of the fitted mean response is complicated by the possibility that subjects who become very ill may drop out of the study, and these subjects may also have .unusually. low CD4+ counts. In Chapter 13 we shall discuss ways of handlmg potentIally informative dropout in longitudinal studies. . The upturn in the estimated mean time-trend from approXImately four . . 0 f course, a n artificial IS, . by-product . , of . the years after seroconverSlOn assumption that the time-trend is quadratic. As in our earlIer dISCUSSIons

PARAMETRIC MODELS

no

FOR COVARIANCE STRUCTURE ESTIMATION OF INDIVIDUAL TRAJECTORlES

the prediction variance of Y(t). Our predictor therefore takes the form

40

--------Viet)

30

~

::.

HI

= {J.(t) + U + ~Vi(t) == {J.(t) + Rt(t),

•

20

say.

•

.

For the time being, we shall assume that /l(t) is specified by a general

~Illear ~odel, and can therefo~e be estimate.d using the methods described III Sect~on 5.3. To construct Ri(t) we proceed as follows. For an arbitrary

10

set of tImes, u == (UI, ... ,up), let R.; be the p-element random vector with rth element itt (u r ). Let Y i be the ni -element vector with jth element Yij ~ /-L(t) + Ri(tij) + Zi}, and write t i == (til, ... ,tin,). Then, as our predictor for R.; we use the conditional expectation

o L---.,----~2----:3----:4~---;5--' o u Fig. 6.12. CD4+ cell counts: observed and fitted variograms e: empirical variogram - - -: empirical variance -~: fitted model. of these data a non-parametric, or at least non-linear, treatment of /L(t) would be preferable, and we shall take this up in Chapter 14.

5.5 Estimation of individual trajectories The CD4+ data provide an example in which the mean response is of interest, but not of sole interest. For counselling individual patients, it is that individual's history of CD4+ levels which is relevant. However, because observed CD4+ levels are highly variable over time, in part because of a substantial component of variation due to measurement error, the individual's observed trajectory is not reliable. This problem could arise in ~ither a non-parametric or a parametric setting. Because our motivation IS th.e CD4+. data, we will use the same modelling framework as in the prevIous sectIOn. J' - , 1 ... , n.t ,. 1 The data again consist of measurements ' y.. tJ , ,... ,m, and associated times, t ij . The model for the data is that

Yij

q

•

=

= f..t(tij) + Vi + W;(t;j) + Zij,

where the Vi are mutually independent N(O 11 2 ) the Z.. t all . ded h " t J are mu u y III pendent N(O 2) · G ,T . an t e {Wi (t)} are mutually independent zero-mean st at IOnary aUSslan proce 'th ' . . t sses WI covariance function 0- 2 p( u) Our objectIve IS 0 construct a predictor for th CD4 . time t. Because the Z.. e + number of an individual, i, at contribution t th t~:epres~nt measurement error, they make no direct , 0 e pre Ictor, Y(t), although, as we shall see, they do affect

k, = E(R; IYd· Under our assumed model, R.; and Y i have a joint multivariate Gaussian distribution. Furthermore, for any two vectors, u and t say, we can define a matrix, G(u, t), whose (r,j)th element is the covariance between R;(u r ) and R i (tj ), namely v 2 + (j2 p( IUr - t) I). Then, the covariance matrix of Y i is T 2 1+ G(t i , t i ) and the complete specification of the joint distribution of R i and Y i is

[R;] Yi

rv

111\1 N {[

0] [G(U, u) G(t;, u)

J..L,

7

2

G(u,t i ) ]} 1+ G(t" ti) ,

where J..Li has jth element p( ti})' . .' . . Using standard properties of the multivariate Gaussian distrIbutIOn It now follows that (5.5.1) with variance matrix Var(R;) defined a.'> Var(R; IY

) i

= G(u, u)

- G(u, t i ){ 7 2 1+ G(t;.t;)} -IG(t;, u).

(5.5.2)

2 _ 0 and u = ti. the predictor, ill' reduces to Yi - /li Note that when T h . measurement error the 't 'hould - when t ere IS no . . with zero vanance, as I::; CD4+ Inbers at the observatIOn d' t' f the true· nu data are a perfect pre IC IOn 0 2 ' 0 R' . flects a compromise between times. More interestingly, when 7 > '1 ~/e 72 increases. Again, tllis Y i - J..L' and zero, tending towards the a .er avasriance is large. an indit 'f asurement error I makes intuitive sense - I me . r. bi and the prediction for t lat vidual's observed CD4+ counts are Ulue ~a . ed~ idual's data and towards individual should be moved away from t e III IV the population mean trajectory.

PARAMETRIC MODELS FOR COVARIANCE STRUCTU~

112

FURrHER READING

In practice, formulae (5.5.1) and (5.5.2~ would be evaluated replacing J.l by the fitted linear model with jL = X (3. They therefore assume that Jt(t),{3 and the covariance matrix, V, are known exactly. The~ should hol.d approximately in large samples when p,(t),{3, and V are estImated. ~t 18 possible to adjust the prediction standard errors to re~ect.the unce~amty in 13 but the corresponding adjustment for the uncertamty m the estImated covariance structure is analytically intractable.

Example 5.4. Prediction of individual CD4+ trajectories We now apply these ideas to the CD4+ data using the same estimates of p,(t) and V as in Example 5.3. Figure 5.13 shows the estimated mean time-trend as a thick solid curve together with the data for two men and their predicted trajectories. The predictions were calculated from (5.5.1) in conjunction with the estimated population mean trajectory Jt(t) and covariance matrix 11, These predicted curves are referred to as empirical Bayes estimates. Note that the predicted trajectories smooth out the big fluctuations in observed CD4+ numbers, reflecting the substantial measurement error variance, 7'2. In particular, for individual A the predicted trajectory stays above the 500 threshold for AZT therapy throughout, whereas the observed CD4+ count dipped below this threshold during the fourth year post-seroconversion. For individual B, the fluctuations in observed CD4+ numbers are much smaller and the prediction tracks the data closely. Note that in both cases the general level of the observed data is preserved in the predicted trajectories. This reflects the effect of the component of vari2 ance, 11 , between individuals; the model recognizes that some men are 2000

113

intrinsically high responders and some low the predictions towards the pOp..l_t' reslPonders, and does not force wa. Ion mean evel. 5.6 Further reading

The subject of parametric modelling for lonait ' .,. ud'maI d ata contmues to . . generate an extensIve literature. Within the scope of th l' , e mear model WIth carre1ated errors, most· of the ideas are now well est abl'IShed ,and t he specIfic . d ls mo e proposed by dIfferent authors often amount to sens'lbl . t' . f T th C . , e vana ,IOns on a amI lar erne. ontnbutlons not cited earlier include Pantula and Pollock (1985), Wa~e (1985), Jones and Ackerson (1990), Munoz et al. (1992) ~nd Pourah~adl (1999),.Jon.es (1993) describes a state--space approach to lme~r modelhng of longltudmal data with serial correlation, drawing on the Ideas. of Kalman ~ltering (Kalman, 1960). Goldstein (1995) reviews the multi-level modellIng approach. A comprehensive discussion of the linear mixed model can be found in Verbeke and Molenberghs (2000). Nonlinear regression models with correlated error structure are considered in Chapter 14.

+ +

Q; 1500 .c E ::J

+

C

+

Q; 1000

+

l,,)

+

+

~

0 <.:>

500 +

•

0 -2

o

2

4

6

Years since seroconversion

Fig. 5.13. CD4+ cell counts and e

with parametric estimate of

'. 1 ~Plflcal Bayes predictions for two subjects, popu atlOn mean response profile.

'17

71

TIME-BY-TIME ANOVA

115

6.2 Time-by-time ANQVA A time-by-time ANOV,A consists of n separate analyses, one for each subset of data to each time of observat'lon, t j ' E ach analySIS ' IS . , correspondmg IA a conventIOna NOVA, based on the appropriat d 1 ' . , " e un er ymg expenmental deSIgn and mcorporatmg relevant covariate information D t 'I b £ d' . e al S can e o~n m any standard text, for example Snedecor and Cochran (1989) Wmer (1977), and Mead and Curnow (1983). ' The simple~t illustration is the case of a one-way ANOVA for a completely randomIzed . , . . design. For the ANOVA table ' we use the d0 t no t atlOn to mdlcate averagmg over the relevant subscripts, so that

6 Analysis of variance methods

-1'" mh

6.1 Preliminaries

Yh.j

h=l" .. ,g; i=l , .. , " mh' J'=l ... " n

and 9

y.. j = m-

1

L

mh

LYhij

h=l i=l 9

= m- 1

L

mhYh,j'

h=I

Then, the ANOVA table for the jth time-point is

Between treatments Residual Total

df

Sum of squares

Source of variation

(6.1.1)

BTSS = l:t=I mh(Yh,j RSS = TSS - BTSS TSS

- Y'-j)2

= l:t=Il:7:hI (Yhij

g-l

m-g -

y.. j)2

m-l

The F -statistic to test the hypothesis of no treatment effects is F = {BTSS/ (g - I)} / {RSS/ (m - g) }, Note that j is fixed throughout the analysis and should not be treated as a second indexing variable in a two-wa.y

in which each YhtJ-. d eno t es th e Jt 'h observatIOn ' from the ith unit within th e ht h treatment group Fu th ' . ' r ermore, we assume a common set of times of 0 bservatlOn c . ' t·J' J = 1, ' .. ,n, lOr all of the m = "\'9 m sequences of a b servatlOns. An em hasis d' . L...Jh=I h choice of notation, alth:u h i on. e~Igned expenments is implicit in this by covariate vect g n pnnclple we could augment the data-array ors, Xhij, attached to each d b observational stud'Ies. Yhij, an so em race many We write {lh' - E(YJ ) f th group hand d:fin-e th hij or e mean response at time tj in treatment , mean respon :til' J-Lh = (lihi 11) eTh . se pro em group h to be the vector f'" " ' " f"'hn. e pnmary b' t' inferences about the .h . 0 Jec Ive of the analysis is to make {lhj WIt partIcular £ . the 9 mean response p fil re erence to dIfferences amongst ro es, J-Lh'

~ Yhij

i=I

In this chapter we describe how simple methods for the analysis of data from designed experiments, in particular the analysis of variance (ANOVA) can be adapted to longitudinal studies. ANOVA has limitations which prevent its recommendation as a general approach for longitudinal data. The first is that it fails to exploit the potential gains in efficiency from modelling the covariance among repeated observations. A second is that, to a greater or lesser extent, ANOVA is a simple method only with a complete, balanced array of data, This second limitation has been substantially diminished by the development of general methodolodgy for incomplete data problems (Little and Rubin, 1987), but the first is fundamental. Whether it is crucial in pr~tice depends on the details of the particular application, for example the sI~e and shape of the data array. As noted in Section 4.1, modelling the covarIance structure is most valuable for data consisting of a small number of long sequences of measurements. Throughout this chapter, we denote the data by a triply subscripted array, Yhij,

=mh

iI

ANOVA, . ffi Whilst a time-by-time ANOVA has the virtue of simplicity, It su ,ers from two major weaknesses, Firstly, it cannot address questions concernmg treatment effects which relate to the longitudinal development ~f the mea:n tween succeSSIve tj' ThIS response profiles' for example, growt h rat es be " , . b t' , a t earher tImes as can be overcome in part by usmg a serva Ions, Yh.k, 1 d' (1987) ante-dependence mode s, K covariates for the response Yhij' enwar s 1 t f thl"S idea , , , S . 521 logical deve opmen 0 ' whiCh we dIscussed III ectlOn . , ,are a al e Secondly the inferences made within each of the n se~aratevan yses ar , h h ld be combmed, rOr exampl e, not independent, nor is it clear how t ey s ou

TIME-BY-TIME ANOVA

115

6.2 Time-by-time ANOVA A time-by-time ANaVA le na yses, one lor each sub. consists of n separate. a s('t 0 f data correspondmg to each time of observ t' t E ach analySIS . IS . .' . a Ion,]. a conventIOnal ANaVA, based on the appropriat d I . . " . ' . e un er ymg expenmental deSign .and mcorporatmg relevant covariate informat'lO t '1scan b e . n. D eal (1989) found III any standard text, for example Snedecor and Coch Winer (1977), and Mead and Curnow (1983). ran. The simple~t illustration is the case of a one-way ANOVA for a completely randomIzed design. For the ANOVA table, we use the dot notation to indicate averaging over the relevant subscripts, so that

6 Analysis of variance methods

6.1 Preliminaries

In this chapter we describe how simple methods for the analysis of data from designed experiments, in particular the analysis of variance (ANOVA) can be adapted to longitudinal studies. ANaVA has limitations which prevent its recommendation as a general approach for longitudinal data. The first is that it fails to exploit the potential gains in efficiency from modelling the covariance among repeated observations. A second is that, to a greater or lesser extent, ANaVA is a simple method only with a complete, balanced array of data. This second limitation has been substantially diminished by the development of general methodolodgy for incomplete data problems (Little and Rubin, 1987), but the first is fundamental. Whether it is crucial in pr~ctice depends on the details of the particular application, for example the SIze and shape of the data array. As noted in Section 4.1, modelling the covariance structure is most valuable for data consisting of a small number of long sequences of measurements. Throughout this chapter, we denote the data by a triply subscripted array, Yhij,

h = 1, ... , g; i = 1, '" ,mh' ) J' = 1 ... " n

(6.1.1)

in which each Yhij denotes the jth observation from the ith unit within t he hth treatment group . Fu r thermore, we assume a common set of times · of 0 bservatlOn t· j - 1 . ' ] ' - , ... , n, for all of the m = mh sequences d' . Dh=I of 0 b servatlOllS. An em hasi choice of not t' lth p s on eSIgned expenments is implicit in this a lon, a ough i '. I by covariate t n prmclp e we could augment the data-array . vec ors, Xhij, attached to e h d observational stud'Ies. ac Yhij, an so embrace many We write /l-h' - E(Yi ) f h group h, and d:fi~e the h~ea~r t e mean respo.nse at time t j in treatment ILh = (lIhl ) Th .response profile m group h to be the vector ,., , ... , /l-hn. e pnmar b' . inferences about th .h Y 0 JectIve of the analysis is to make e /l-hj WIt partieul f . the 9 mean response fil ar re erence to dIfferences amongst pro es, ILh'

"'9

and 9

y.-j =

m-

l

mh

L

LYhi]

h=l

i=I

9

= m- I

L

mhYh·]"

h=I

Then, the ANaVA table for the jth time-point is Source of variation Between treatments Residual Total

df

Sum of squares BTSS = 2:::~=I mh (Yh.j RSS = TSS - BTSS TSS = 2:::7,=1

-

y.. j)2

L:::\ (Yhij -

y.. j f

g-1 m-g m-1

The F -statistic to test the hypothesis of no treatment effects is F = {BTSS/(g -l)}/{RSS/(m - g)}. Note that j is fixed th~ough~ut the analysis and should not be treated as a second indexing vanable m a two-way ANaVA. . . uffi Whilst a time-by-time ANOVA has the virtue of simpliCIty, It s .ers · ly,' 1t canno t a ddress questions concermng from two major weaknesses. F Irst treatment effects which relate to the longitudinal development ~f the mTehi~ between succeSSIve tj' s response profiles' for exampIe, groW th rat es . . t' y' at earher tImes as , . b can be overcome in part by usmg 0 serva Ions, h.k, d I d' (1987) ante-dependence mo e s, covariates for the response Yhij' K enwar s d I t f thI'S idea I gical eve opmen 0 . . . . . 521 whIch we dIscussed m SectIon .. ,are a 0 t alyses are Secondly the inferences made within each of the n se~arad e;:n pIe , h h Id be combme . ror exam , not independent, nor is it clear how t ey s au

ANALYSIS OF VARIANCE METHODS

DERIVED VARIABLES

1I6

. 'fi t up-mean differences may he col. f . ally slgm can, gro . a succesSIOn 0 margm' I ted data but much less so If there II' 'th weakly carre a ' . 've observations on each UnIt. lectively compe mg WI, l are strong correlations between success f l ' a trial on intestinal parasites . Example 6.1. Weights 0 ca ves III . . 'ment to compare two (iJfferent treatKenwar d (1987) desenbes an expen £ t 11' g intestinal parasites of calves. The data ents A and B say, or can ro III f . m, . h f 60 Ives 30 in each group, at each 0 11 times of consist of the WeJg ts a ca , measurement, as listed in Table 6.1. . 1 h th t 0 observed mean response profiles. The observed FIgure 6. sows e w £ t tment B is initially below that for treatment A, but mean response or rea . h d' d between the seventh and eighth measurement times. tear er IS reverse A time-by-time analysis of these data consists of a tw~-sample t-test at each of the 11 times of measurement. Table 6.2 summanzes the results of these t-tests. None of the 11 tests attains conventional levels of significance and the analysis would appear to suggest that the two mean response profiles are identical. An alternative analysis is a time-by-time analysis of successive weight-gains, dhij = Yhij -Yhij-l' The results of this second analysis are also given in Table 6.2. The striking features of the second analysis are the highly significant negative difference between the mean responses on treatments A and B at time eight, and the positive difference at time 11. This simple analysis leads to essentially the same conclusions as the more sophisticated ante-dependence analysis reported in Kenward (1987). . The overall message of Example 6.1 is that a simple analysis of longitudInal data can be highly effective if it focuses on exactly the right feature of the data, which in this case was the weight-gain between successive twowee~ periods rather than the weight itself. This leads us to a discussion of denved vanabies for longitudinal data analysis.

6.3 Derived variables Given a vector of observations .= ( . . , a derived variable i I ' Yh, Yh,l,··., Yhin), on a partIcular umt, . s a sca ar-valued function . _ () . tlOn for analysing a deri d . bl . ' Uh, - U Y hi . The motIva. . ve vana e IS two-£ ld Fr ., of VIew It reduces a mult' . t o . am a pragmatIc pomt Ivana e problem t " lar applications a single d . d . 0 a umvanate one; in particu, enve vanable substantive issues raised b th d may convey the essence of the detailed growth process fo; e at~. For example, in growth studies the h . b e campIex, yet for some purposes t he mterest may foe eac umt may . t d . us on somethIng . I ra e urmg the course of th . as SImp e as the average growth . e expenment B d' quantIty, Uhi, we avoid the is f . y re uCIllg each Y h' to a scalar can agam . use standard ANaVA sue a correlat'Ion WIt . h'III each sequence ' and Wh or regre . , e~ there are two or more derived ss~on methods for the data analysis. combmed inference which w vanabIes of interest the problem of e encounte d . h I re WIt the time-by-time ANOVA

117

Table 6.1. Weights (kg) of c a l ' . inal parasites. The original ~es III a. trial on the control of intestexpenment illvolved 60 1 3' of two groups. Data below ar £ th fi ca ves, 0 ill each e or erst group only. Time in weeks

0

2

4

6

8

10

12

14

16

18

233 231 232 239 215 236 219 231 230 232 234 237 229 220 232 210 229 204 220 233 234 200 220 225 236 231 208 232 233 221

224 238 237 246 216 226 229 245 228 240 237 235 234 227 241 225 241 198 221 234 234 207 213 239 245 231 211 248 241 219

245 260 245 268 239 242 246 270 243 247 259 258 254 248 255 242 252 217 236 250 254 217 229 254 257 237 238 261 252 231

258 273 265 288 264 255 265 292 255 263 289 263 276 273 276 260 265 233 260 268 274 238 252 269 271 261 254 285 273 251

271 290 285 308 282 263 279 302 272 275 311 282 294 290 293 272 274 251 274 280 294 252 254 289 294 274 267 292 301 270

287 300 298 309 299 277 292 321 276 286 324 304 315 308 309 277 285 258 295 298 306 267 273 308 307 285 287 307 316 272

287 311 304 327 307 290 299 322 277 294 342 318 323 322 310 273 303 272 300 308 318 284 293 313 317 291 306 312 332 287

287 313 319 324 321 299 299 334 289 302 347 327 341 326 330 295 308 283 301 319 334 282 289 324 327 301 312 323 336 294

290 317 317 327 328 300 298 323 289 308 355 336 346 330 326 292 315 279 310 318 343 282 294 327 328 307 320 318 339 292

293 297 321 326 334 329 336 341 332 337 308 310 300 290 337 337 300 303 319 326 368 368 349 353 352 357 342 343 329 330 305 306 328 328 295 298 318 316 336 333 349 350 284 288 292 298 347 344 328 325 315 320 337 338 328 329 348 345 292 299

19

again needs to be addressed. However, it is arguable tha~ the practical importance of the combined inference question is reduced If the separate derived variables address substantially different questions and each one has a natural interpretation in its own right.

ANALYSIS OF VARIANCE METHODS

DERIVED VARIABLES

119

118

parameter estimates from these fits as a natural set of derived variables. For exam~le, we might decide that the individual time-sequences can be well descnbed by a logistic growth model of the form

350

l.lhi(t) = oh;[l

300

250

200

LO-----'5------;;10~-----;'15~----;2roO~ Time (weeks)

' 6 ,. 1 Observed mean response profiles for data on weights of calves. - - : F Ig,

treatment Aj - - -: treatment B. Table 6.2. Time-by-time analysis of weights of calves. The analysis at each time is a two-sample t-test, using either the weights, Yhij, or the weight-gains, dhij, as the response. The 5%, 1%, and 0.1% critical values of t58 are 2.00, 2.16 and 3.47. Test statistic at time 1

2

3

4

5

6

7

8

9

Y 0.60 0.82 1.03 0.88 1.21 1.12 1.28 -1.10 -0.95 d 0.51 0.72 -0.15 1.21 0.80 0.55 -6.85 0.21

10

11

-0.53 0.85 0.72 4.13

The derived variable approach goes back at least to Wishart (1938) and was systematized by Rowell and Walters (1976). Rowell and Walter~ set. out the .detail~ of an analysis for data recorded at equally spaced timepOlllts, t" III whICh the n d' . I -. 1ImenSlOna response ' Uh'tl is transformed to a se t af artJhago 1 1 na ~o ynoillla coefficients of degree, 0 1 .. , n - 1. The first two of these are Immediatel . t .'" and y III erpretable m terms of average response average rate of change res t' 1 . h' pec Ive y, durmg the course of the experiment It I'S les 1 . s c ear w at tangibl . t . degree coefficients Al f . e m erpretatIon can be placed on higher . so, rom an mferent' 1 . f' '.. recognize that the 'orth I' la pomt 0 VieW, It IS Important to . ogona sums of squ . . mlal decomposition are no t Orth . ares assocIated WIth the polynoindependent because th b o~onal III the statistical sense, that is, not , eo servatlOllS w'th' . I. In Uhi are III general correlated. A more imaginative use of d ' pretable non-linear mod Is t enved vanables is to fit scientifically intere 0 each tim e-sequence, Yhi' and to use the

+ exp{ -(3h,(t -

b'hi)}r l

•

in .which ~ase e~timates of the asymptotes 0h" slope parameters (3h, and pomts of mflectlOn Ohi form a natural set of derived variables. Note that non-linear least squares could be used to estimate the 0h" 13hi , and Ohio and that no assumptions about the correlation structure are needed to validate the subsequent derived variable analysis. However, whilst validity is one thing, efficiency is quite another. The small-sample behaviour of ordinary least squares (OLS) estimation in non-linear regression modelling is often poor, and this can only be exacerbated by correlations amongst the Yhij' Furthermore, the natural parametrization from a scientific point of view may not correspond to a good parametrization from a statistical point of view, In the simpler context of uncorrelated data. these issues are addressed in Bates and Watts (1988) and in Ratkowsky (1983). Strictly, the derived variable approach breaks down when the array, Yhij, is incomplete because of missing values, or when the m times of observation are not common to all units. This is because the derived variable no longer satisfies the standard ANOVA assumption of a common variance for all observations. Also, we cannot simply weight the values of the derived variables according to the numbers of observations contained in the corresponding vectors, Ylti' because of the unknown effects of the correlation structure in the original data. In practice, derived variables appear to be used somewhat optimistically with moderately incomplete data. The consequences of this are unclear, although the randomization justification for the ANOVA is available if we are prepared to assume that the mechanisms leading to the incompleteness of the data are independent of. both the experimental treatments applied and the measurement process Itself. As we shall see in Chapter 11, this is often not so. Example 6.2. Growth of Sitka spruce with and without ozone To illustrate the use of derived variables we consider the ~98: growth dl~t~ on Sitka spruce. Recall that these consist of 79 trees grown III °tur c~n~ro '~h . h' h the first two chambers are rea e WI environment chambers m w IC t' bers of trees in the four introduced ozone at 70ppb. The rehspecb~vet.nu:fthe experiment was to 27 12 and 13 Teo Jec lve . wth d the response from each ch amb ers are 27 " investigate the effect of ozone on tree gro t '/yll _ log(d2h) on days 103, f eight measuremell s 0 tree was a sequence 0 d 308 h day one is first January 1989, 213 247 273 an were 130, 162, 190, , ,~ and hei ht respectively. and d and h refer to stem dIamether d . angt ~ource of variation in these . 46 saw that t e amm In SectIOn . we profile for each tree. This suggests data was a random shift in the response

ANALYSIS OF VARIAN

CEMETHODS DERIVED VARIABLES

'table derived variable. ould be a sUI " d mean response w ANOVA estimatmg a separate that the ob~:eanalysis is then a oneh-wafYthe four chambers, followed by The approp, :Lh in eac a mean response parameter, / ' us control contrast, estimation of the treatmen vers

121

120

c:=:

(1"1

+ 1"2) -

(1"3

2.2448 31.0320 33.2768

Between chambers Residual Total

df

Mean square

3 75 78

0.7483 0.4136 0.4266

o

(6.17 + 6.29) = -0.70,

with estimated standard error

se(c)

=

{vo

4136 .

=

+ 1"4)'

. . to test for chamber effects is not significant Note that the F-statlst lC ) However the estimate of the treatment (F3.75 = 1.81, p-val ue.~ .I5, , versus control contrast IS

c= (5.89 + 5.87) -

approximation is to assume exponential decay towards zero. This gives w(t) = w(~)~xp(-l1t) or, dividing by w(O),Xj exp(-;3t ). Taking y = J log (x J ), thiS III turn gives a linear regression through the origin, J

(~ + ~ + ~ + ~)} = 13 27 27 12

Figur~ 6.2 shows th~ 40 sequences of measurements. YJ' plotted against t . J The hnear model gIves a good approximation, but with substantial variation in the slope, -{3, for the different plants. This suggpsts using a least-squares estimate of 11 from each of the 40 plants as a derived variable. Note that {3 has a direct interpretation 3B a drying rote. Table 6.3 shows the resulting values, bhl1 h = 1, ... ,4; i = 1, ... , 10, the subscripts h and i referring to pH level and replicate within pH level, respectively. To analyse this derived variable, we would assume that the bhi are realizations of Gaussian random variables, with constant variance and means dependent on pH level. However, the pattern of the standard errors in Table 6.3 casts doubt on the constant variance assumption. For this reason, we transform the bhi to Zhi = log(b h ,) and assume that Zhi ,.... N(Jlh,U 2 ), where Jlh denotes the mean value of the derived variable within pH E

E

(al

.~ o.O~------------'·~

0.31,

~

~

-£i -0.2

-£i -{l.2

oc

0

g corresponding to a t-statistic of 2.26 on 75 df, which is significant (p-value ~ 0.03). The conclusion from this analysis is that introduced ozone suppresses growth of Sitka spruce, although the evidence is not overwhelming. Example 6.3. Effect of pH on drying rate of holly leaves This example concerns an experiment to investigate the effect of different pH treatments on the drying rate of holly leaves. The treatment was administered by intermittent spraying during the plant's life. Ten plants were allocated to each of four pH levels, 2.5, 3.5, 4.5, and 5.6, the last being a control. Leaves from the plant were then allowed to dry and were weighed at ~ach of 13 times, ~rregularly spaced Over a three-day p~riod. The recorded vanabl,e was the ratIo of current to initial fresh weight. Thus if w( t) denotes the weight at time t, the recorded sequence of values from 'each plant is

Xj

= w(tj)/w(O),

j

= 1, ... ,13.

Note that t = 0 d 1r ~ ,an XI = lor every plant. A plaUSible model for the d . towards a lowe b d h' rymg process is one of exponential decay ial. If this lowe~ bound':' Ich represents the dry weight of the plant materoun IS a small fraction of the initial weight, a convenient

(b)

0.0.--------------,

~ c

.~ -0.6

~ -0.6

e~

o

~

~

se

8'-1.0 -'

8'-1.04--_~~,.____._-~-,...-_:_!

o

1000 2000 3000 4000 5000 6000-'

0

1000 2000 3000 4000 5000 6000 TIme (minutes)

Time (minutes) ~

~

(d)

·~;;;;;;;;;;;;;;~=::==-l

'f ~~;;;;;;S;;;~;;;:::====-1 !~::' ~ -£i ~----------=:: E

(e)

0.0

~

-0.2

0

o

c ~ -{l.6

c

~-o.6 ~

A

.3'-1.0 0

~

e $

8'-1.0 L,0--10..,..00-2-0.... 00-3"='00""':0--::40:..0:-0-;:5~00;;:0~6000 1000 2000 3000 4000 5000 6000 -' TIme (minutes) Time (minutes)

I (a) pH Fig. 6.2. Data on drying rates of holly eaves: (c) pH = 4.5; (d) pH = 5.6 (control).

=

2.5; (b) pH

=

3.5;

ANALYSIS OF VARIANe

122

E METHODS REPEATED MEASURES

. d data (bhi) for analysis I 63 Deflve Tab e '. . t of holly leaves. of the drymg ra es pH Replication 1

2 3 4 5 6 7 8 9 10

Mean SE

2.5

3.5

4.5

5.6

0.33 0.30 0.91 1.41 0.41 0.41 1.26 0.57 1.51 0.55

0.27 0.35 0.26 0.10 0.24 1.51 1.17 0.26 0.29 0.41

0.76 0.43 0.18 0.21 0.17 0.32 0.35 0.43 1.11 0.36

0.23 0.68 0.28 0.32 0.21 0.43 0.31 0.26 0.28 0.28

0.77 (0.15)

0.54 (0.14)

0.44 (0.09)

0.33 (0.04)

(h

= 0 within model three, using the residual mean square from the saturated model two, is 9.41 on 1 and 36 degrees of freedom. This corresponds to a p-value of 0.004, representing strong evidence against 0 == O. The 1 F-statistic to test lack of fit of model three within model two is 0.31 on two and 36 degrees of freedom, which is clearly not significant (p-value == 0.732).

The conclusion from this analysis is that log-drying-rate decreases, approximately linearly, with increasing pH.

6.4 Repeated measures A repeated measures ANOVA can be regarded as a first attempt to provide a single analysis of a complete longitudinal data set. The rationale for the analysis is to regard time as a factor on n levels in a hierarchial design with units as sub-plots. In agricultural research, this type of experiment is usually called a split-plot experiment. However, the usual randomization justification for the split-plot analysis is not available because there is no sense in which the allocation of times to the n observations within each unit can be randomized. We therefore have to assume an underlying model for the data, namely, Yhij

level h. The means and standard errors of the log-transformed drying rate, corresponding to the last two rows of Table 6.3, are Mean -0.44 SE 0.19

-1.03 0.25

-1.01 0.19

-1.17 0.11

In contrast to the direct relationship observed in Table 6.3 between mean and variance for the estimated drying rates, bhi , a constant variance model does seem reasonable for the log-transformed rates, Zhi . . "'!e now consider three possible models for J.Lh, the mean log-drying-rate wlthm pH level, h. These are

= f.l

(1) (2)

f.lh f.lh

(no pH effect); arbitrary;

(3)

f.lh

==

eo + fh Uh where Uh denotes pH level in treatment group h.

t~ c~~pa:ison

of between models one and two involves a one-way ANOVA f fr e d a a III Table 6.3. The F-statistic is 3.14 on three and 35 degrees I f . o ee am, corresponding to frences e 'm mean response bet a p-va H ue I0 0.037 which is indicative of dlfthe observation th t th I ween P evels. Model three is motivated by on pH in a rou hiare °ttr~nsformed rates of drying seem to depend g y mear ashlo n . The F -statistic to test the hypothesis

123

= 13h

+ 'Yhj + Uhi + Zhij,

j = 1, ... ,n; i == 1, ... , mh; h == 1, ... , 9 (6.4.1 )

Jh

in which the 13h represent mai~ eflect~ for treatmen~s and the j inte:~ tions between treatments and tImes WIth the constramt that Lj=l Ih; . ' for all h. Also, the Uhi are mutually independent random effects for urnts and the Zhij mutually independent random measurement error~ U. rv In the model (6.4.1), E(Yj,;j) = 13h + Ihj . .If w~ as~um~ t at h~_ dlstnbutlOn ( N 0, 1/ 2) an d Z hij rv N(O " (12) then the resultmg • • t . of Y h, . Yih,n . ) is multivariate Gaussian WIth covanance rna nx (Yih,I,···,

V =

(12 I

+ 1/2 J,

.' . d J a matrix all of whose elements are 1. where I is the IdentIty matrIx 1/2/(1/2 + (12), between any two This implies a constant correlatIOn, P -

ax:

_

observations on the same unit. he model (6.4.1) is given by the follo~ing The split-plot ANOVA for t . the dot notation for averagmg, table: In this ANOVA table, we agam use)_l ",mh "'':' Yhi', and so on. -1 ",n ., y .. = (mhn ~;=l ~;=l; . . b f 'ts The first F-statlStlc so that Yhi. = n ~j=I Yh,;, h mh denotes the total num er 0 urn . Al so, m -- "C'"'9 ~h=I . associated with the table IS

F 1 = {BTSSt/(9 -l)}/{RSSI/(m - g)},

ANALYSIS OF VARIANCE METHODS

CONCLUSIONS

124

125

Sum of squares

Source of variation Between treatments Whole plot residual Whole plot total Between times Treatment by time interaction Split-plot residual Split-plot total

BTSSI RSSI

= m E~=I mh(Yh.· -

= TSSI -

y.

.)2

09-1 m-g

BTSSI

= m Er.=l E;:hl (Yhi. - y.·Y BTSSz = n E;=l (y.. j - y.. y TSSI

n-l

1SSz = E7=1 Er.=1 nh(Yh-j - y·YBTSS I - BTSSz RSSz = TSSz - 1SS2 - BTSS2TSSI

TSSz

= Er.,=Il:~ll:;=l (Yhij

which tests the hypothesis that

Fz

df

f3h

=

f3, h =

- y... )2

(g - 1) x (n - 1) (m - g) x (n - 1) nm-l

1, ... , g. The second is

= {ISSz/[(g -1)(n - 1)]}/{RSSzI[(m -

g)(n - 1)J},

which tests the hypothesis that rhj = rj, h = 1, ... , 9 for each of j 1, ... , n, that is, that all 9 group mean response profiles are parallel. The split-plot ANOVA strictly requires a complete data array. As noted in Section 6.1, recent developments in methodology for incomplete data relax t~s requirement. Alternatively, and in our view preferably, we can ~nal~se Incomplete data under the split-plot model (6.4.1) by the general l~kehhood-basedapproach given in Chapters four and five with the assumptlO~ of a ~on~tant correlation between any two measurements on the same umt and, InCidentally, a general linear model for the mean response profiles. Example 6.4. Growth of Sitka spruce with and without ozone (continued)

r~e f~~l,o~ing table ~resents a split-plot ANOVA for the

1989 sitka spruce a. vve Ignore possIble chamb iF t from the robust anal . . ~r e ec.s to ease comparison with the results ySlS given In SectIOn 4.6.

Source of variation Between treatments Whole plot residuals Whole plot total Between times Treatment by time interaction Split-plot residual Split-plot total

Sum of squares

df

Mean square

17.106 249.108 266.214 41.521 0.045

1 77 78 7 7

17.106 3.235 3.413 5.932 0.0064

5.091 312.871

539 631

0.0094

The F -statistic for differences betwe 1 . en contro and t t d IS F I = 5.29 on one and 77 degrees of freedom rea e . mean response of approximately 0.02. The F-statistic for d ' correspondmg to a p-value F 2 = 0.68 on seven and 539 degrees f f departure from parallelism is o ree om corresp d' t of approximately 0.69. The analys' th r '. on mgo a p-value IS erelore gIves m d t 'd a difference in mean response b t 0 era ,e eVI ence of e ween control and tr t d t evidence of departure from parallelism. . eae Tees. and no 6.5

Conclusions

The principal virtue of the ANaVA approach to longitudinal d t I' is its technical simplicity. The computatiollal 0 t' ' . la,a ana ySIS . . pera Ions !llVO ved are elementary and there IS an apparently reassuring familiarity in the solution of a complext class of problems by standard methods. However , we bel'leve . th a t th IS .vlr ue',IS outweIghed by the inherent limitations of the approach as noted III SectIOn 6.1. Provided that the data are complete, the method of derived variables can give a simple and easily interpretable analysis with a strong focus on particular aspects of the mean response profiles. Furthermore, it provides a feasible method of analysing inherently non-linear models for the mean response profiles, notwithstanding the statistical difficulties which arise with non-linear least-squares estimation in small samples. The method runs into trouble if the data are seriously incomplete, or if no single derived variable can address the relevant scientific questions. In the former case, the inference is strictly invalidated whilst in the latter there are difficulties in making a correct combined inference for the complete analysis, The method is not applicable if a key subject-specific covariate varies over time. Time-by-time ANaVA can be viewed as a special case of a derived variable analysis in which the problem of combined inference is seen in an acute form. Additionally, the implicitly cross-sectional view of the data fails to address the question of evolution over time which is usually of fundamental importance in longitudinal studies. . The split-plot ANaVA embodies strong assumptions about the covaTlance structure of the data. If these are reasonable, a model-based analysis under the assumed uniform correlation structure achieves the same ~nds, while coping naturally with missing values and allowing a structured lInear model for the mean response profiles. . . useful III partlcuIn summary, wh'lI st ANaVA methods are undoubtedly all . bl h to lar circumstances, they do not constitute a gener Y via e approac longitudinal data analysis, O

•

MARGINAL MODELS 127

7 Generalized linear models for longitudinal data This chapter surveys approaches to the analysis of discrete and continuous longitudinal data using extensions of generalized linear models (G LMs). We have shown in Chapter 4 how regression inferences using linear models can be made robust to assumptions about the correlation, especially when the number of observations per person, ni, is small relative to the number of individuals, m, With linear models, although the estimation of the regression parameters must take into account the correlations in the data, their interpretation is essentially independent of the correlation structure. With non-linear models for discrete data, such as logistic regression, different assumptions about the source of correlation can lead to regression coefficients with distinct interpretations. The data-analyst must therefore think even more carefully about the objectives of the analysis and the source of correlation in choosing an approach, In this ~hapter we discuss three extensions of GLMs for longitudinal data: margl~al, random effects, and transition models. The objective is to Pt.rese~;;hef Ideas underlying each model as well as their domains of applicaIon. ne ocus on the' t t t' 8-10 present det 'I b 1ll erpre a Ion of regression coefficients. Chapters al s a out each method and examples of their use.

7.1 Marginal models In a marginal model, the regression of h abIes is modelled separ t I f . . t e response on explanatory variI th a e y rom WithIn-person ('th' , ) n e regression we model t h ' WI m-umt correlation. of exp Ianatory variables ' e margInal expectat' E("\l") B' lon, L ij , as a function ag , y margmal expectat' e response over the sub I' lOn, we mean the averThe marginal expectation-~oPuhatlon that shares a common value of x. S 'fi IS W at we m d I . peCI cally, a marginal model has the £ 0 ~ In a cross-sectional study. ollowmg assumptions' (1) th e marginal e . of th ' xpectatlOn explanatory variables x.. b he response, E(Yij) = fli' depends on fun t' ' tJ , Y (fl") - x '(3 h J' C Ion such as the logit f b' tJ - ij were h is a known link or mary respo nses or log for counts;

(2) the marginal variance depends on the . Var(Y;j) = V(flij)¢ where v is a know ma~gmal mean according to scale parameter which may n . d t b n v~nance function and ¢ is a h ' eeo e estimated' (,'J) d}l"' . ) t e correlation between Y: 'J an ,k IS a funct' f h an d perhaps of additional aram ' Ion 0 t I' marginal means J( .. .' ) ,P eters Q, that is Co (}' V) ( fl'J,fl,k,Q where p(.) IS, a kno\""n f ' rr lJ' I,k- == unctIon.

Marginal regression coefficients {3 h. . coefficients from a cross-sectio I', I' ,aw the same .mterpretation as , ,na ana YSIS. tllargin' I d I. analogues for correlated data of GLM' i:' , a mo e S are natural Nearly all f th r ' s or mdependrnt data. o e mear models mtroduced in ('I .' mulated as marginal models sl'nc'e tl h " lapters 4 6 can be for, . ,wy avp 3..<; part (f th . 'fi' all equation of the form E(Y) == "(3 f . . ) . elr SIH'CI catIon ' D 'J Xij a hI es, Xij. ror example cons ide r 01 some set of explanatory' vari, .' . · r a mear regression whose errors follow the exponential correlatIOn model introd ' :l' S . .' "\l" k' uee( Ill. ectlon 4.2.2, Responses I I ) ' are ta en at Integer times ., tJ -- 1, 2, .. " n, on each of m subjects ' Th e mean responses are E(Y) - X' (3 T I ' 'J 'J' Ie covanance structure is given. ?y Cov(Y;J' Y;k-) = (J2 exp( -¢ I tJ - tkl) and the variance is assumed to be mdependent of the mean, , To illustrate a logistic marginal model, consider the problem of assessmg the dependence of respiratory infection on vitamin A status in the Indonesian Children's Health Study (IeHS), Let Xi' indicate whether or not child i is vitamin A deficient (I-yes; a-no) at visit / Let Y;j denote whether the child has respiratory infection (I-yes; a-no) and let Ilij == E(Y;j). One marginal model is given by the following assumptions: J.li Pr(Y; = 1) • logit(flij) = log -_1 ) '.' = log P (Y,J _ 0) = (30 fl,)

r

'J-

+ (3I X ij,

• Var(Y;j) = flij(l- flij), • Corr(Y;j, Y;k) = a. Here, the transformed regression coefficient exp(,Bo) is the ratio ofthe frequency of infected to uninfected children among the sub-population that is not vitamin A deficient. The parameter exp((3I) is the odds of infection among vitamin A deficient children divided by the odds among children replete with vitamin A. When the prevalence of the response is low as in the Indonesian example, the odds ratio is approximately the ratio of frequencies of infection among the vitamin A deficient and replete sub-g~oups. Note that exp(,ih) is a ratio of population frequencies so we refer to It as a population-averaged parameter, If all individuals with t~e same x have the same probability of disease, the population frequency IS th: s~me as ~he individual's probability. However, when there is heteroge~elty III the r~k of disease among subjects with a common x, the populatIOn frequency IS the average of the individual risks.

S FOR LONGITUDINAL DATA 128

GENERALIZED LINEAR MODEL

RANDOM EFFECTS MODELS

, known function of its mean as b binary response IS a The variance 0 f a , The correlation between two 0 serassumptIOn above. dless of the times or expectations given in the second 'ld' umed to be a regar , d' h vations for a c I IS ass , d t this form of correlatIOn enves . For contmuo us a a, fi of the observatl~ns, . the linear model. It can only be a rst order from a random mtercept III ~ see why note that the correlation . . b . t' f< r binary outcomes. 0 , approxlma IOn 0 y; d Y; with means III and liz, IS gIven y between two binary responses, I an z Y;)

Corr(Yi,

2

Pr(YI = 1, Yz = 1) -IlI~z . = {tLI (1 - III )1l2(1 - 1l2) P 2

. . pro bab'l't = '1 Y2 The Jomt I I y, Pr(Yil max(O, J.lI

(7.1.1)

= 1) , is constrained to satisfy

+ J.l2 - 1) < Pr(YI = 1, Yz = 1) < min(IlI' 1l2),

(7.1.2)

that the correlation must satisfy a constraint which depends in a complic:~ed way on the means J.lI and 112 (Prentice, 1988). For this re~on many authors, including Lipsitz (1989), Lipsitz et al. (1991) and LIang et al. (1992) have modelled the association among binary data using the odds ratio

= 1, Y2 = 1) Pr(YI = 0, Yz = 0) ;)' Pr YI = 1, Yz = ) Pr ( Y I = 0, Y 2 = 1

_ Pr(YI OR(YI,YZ ) (

°

(7.1.3)

which is not constrained by the means. For example, in the Indonesian study, we might assume that the odds ratios rather than correlations among all pairs of responses for a child are equal to an unknown constant, 0::. Specifying the means, variances and odds ratios fully determines the covariances among repeated outcomes for one child. Modelling odds ratios instead of correlations is further discussed in Chapter 8. 7.2 Random effects models Section 5.2 introduced the r d a' . mear ran om euects model where the response IS assumed to be a linear fun t" f I . ffi . c IOn 0 exp anatory variables with regresSIOn cae clents that vary from 0 'd' 'd I fl t ne ill IVI ua to the next. This variability re ec s natural heterogeneity d t . f . ue 0 unmeasured factors, An example is a simple linear reg ent birth weight a:desslon thor mfant growth where the coefficients represgrow rate Childr b' I weights and have different h' en 0 VIOUS yare born at different growt rates due t t' . 1 factors which are difficult', 0 gene IC and envIronmenta t"fy A , or Impossible to IS a reasonable description 'f th quan I, random effects model children can be thought of I e set of coefficients from a population of . as a sample from d' t 'b . coeffi Clents for a child th I' a IS n utlOn. Given the actual th t ' ' a repeated observationse foInear th random effiec t s mo d el further assumes r . at p~rson are independent. The correlation among repeated obs erVat Ions ansI'S b ecause we cannot observe the

129

underlying growth curve that is th t ' only imperfect measure~ents f' ,e rue regres~lon coefficients, but have ' , 0 welg h t on each mfant Th IS Idea extends naturally t o ' " Gaussian continuous responses, It r~gresslOn models for discrete and nonare independent observations fOlloWi:s:m~~~~a~, t~e ~ata Ifor a subj:ct coefficients can vary from '.)U t at t 1e regressIOn 't 'll person to person accordmg to a distribution F o I. ustrate, once again consider a logistic model for t h e ' " ' respIratory infection in the ICHS W . ht . probability, of f ' " . e mIg assume that the propensIty th' or respIratory mfectlOn varies across children , reflectl'ng t' d' ,, . .. elr d'(X lueren,'t gene I,C pre ISposltlOns and unmeasured influences of environmental factors. The slIl~plest mo~el would assume that every child has its own propensity for respIratory disease but that the effect of vitaml'll A dI' fi ('Jency ' '. , on t h:IS probabIlIty IS the same for every child, This model takes the forn; (7,2.1)

°

where Xij is 1 if child i is vitamin A deficient at visit j, and otherwise, Given Vi, we further assume that the repeated observations for the ith child are independent of one another, Finally, the model requires an assumption about the distribution of the Vi across children in the population, Typically, a parametric model such as the Gaussian with mean zero and unknown variance, 1/2, is used. In this example (30 is the log odds of respiratory infection for a typical child with random effect Vi = 0, The parameter (3i is the log-odds ratio for respiratory infection when a child is deficient relative to when that same child is not. The variance I/z represents the degree of heterogeneity across children in the propensity for disease, not attributable to x, The general specification of the random effects GLM is as follows:

Yil,"" Yin; are mutually independent and follow a GLM with density f(Yij lUi) = exp[{(yijOij -1/JCOij))}! + c(y" Ao)] The conditional moments, J.lij = E(Yij lUi) = 1/J'(Oij) and 'J' 'I' • /. d'U d Vij = Var(Yij lUi) = 1/J"COij)¢, satisfy h(J.lij) = Xij ~ + ij i. an ., - ( ..)Ao where h and v are known link and vanance functIOns, v'J - v Il'J 'I' respectively, and dij is a subset of Xij' D:ects , Ui, T T ,; 2. The ran d om eHl • - 1, .•. , m , are mutually independent with a common underlying multivariate distribution, F. 1. Given U i , the responses

'T' • the basi'c idea underlying a random effects model is ~o summaflze, , .' . h' ' that there is natural heterogeneity across mdlvlduals m t elr regres~I.on coefficients and that this heterogeneity can be represented by a p~obabIh: distribution. Correlation among observations for one. person, arIses ~o . bl (J.. A model of thIS type IS sometImes their sharing unobservable varIa es, ~. referred to as a latent variable model (Bartholomew, 1987).

130

GENERALIZED LIN EAR

MODELS FOR LONGITUDINAL DATA CONTRASTING APPROACHES

· most useful when the objective is to make The random effects rna d e1 IS . . d' 'd 1 rather than the populatlOn average. In the , ference about !il IVI dua s ffects approach will a1 ' III ow us1 to estlmate the MACS example, a ran am e , ff . d' 'dual man In the ICHS, the random e ects model CD4+ status. 0, f can!il Ibout Vthe I 'propensity for a partIcular " chIld to have would permIt !ilierenCe a , .. , ' c t' The regression coeffiCients, (3 , represent the effects , , " respiratory !ilieC IOn. 'ables on an individual child s chance of mfectlOn. ate . ," f h exp1anat ory varl This is in contrast to the marginal model coeffiCients whIch descnbe the effect of explanatory variables on the population average. 7.3 Transition (Markov) models Under a transition model, correlation among Yil, ' , ' , Yin, exists because the past values, l'il,' ,., l'ij-I, explicitly influence the present observation, Yij' The past outcomes are treated as additional predictor variables. The marginal model with an exponential autocorrelation function considered in Section 7.1 can be re-interpreted as a transition model by writing it as where tij =

o:tij-l

+ Zij,

a = exp( -r/J), the Zij are mutually independent, N(O, 7 2 ) random variables 2 and T = 0'~(1- ( 2 ), By sub~tituting tij = Yij - X~j{3 into the second of ~hese ,equatIOns and ~e-arrangmg, we obtain the conditional distribution of IJ' given the precedmg response 1":. 1 as , IJ-'

Yij I Yij-l '" N {x~j,6 + O:(Yij -1

-

X~j_l(3),

7 2 }.

This form treats both the ex Ian t ' explicit predictors of th pta ory vanables and the prior response as e curren outcome The generalized linear transition model the conditional distrib t' f mo. el can be easIly specified. We u tion of the q preceding respon 1O~0 Yij gIVen the past as an explicit func, h ses. ror exampl ' th I d mIg t assume that the probabTt f ' e, m e n onesian study, we O j has a direct dependence 0 I hI Yh respnatory infection for child i at visit . 1 n w et er or not th h'ld ' J - ,as well as on explanat . eel had Infection at visit ory vanables x·· One I" c 1 ' ' IJ' exp lClt lormulation is oglt Pr(Yi j = 1 I Yi . y;.. J-l, tJ-2"·,, Yil) = x~ .(3.... Here the ch f ' '3 + aYij-l. (7.3.1) , ance 0 resplr t ' variabl b a ory mfection at t' . es ut also on whether or not th h~me tij depends on explanatory ear IleT. The p e c lId h d . C • 'ld arameter exp(a) is t h ' a InlectIOn three months chI ren who d ' d ' e ratio of th dd . in (3.... ,I and dId not have infect' e 0 s of Infection among of i ~ ~~n be mterpreted as the chan Ion at, the prior visit, A coefficient n ec lon, among children who We ge per U~lt change in x in the log odds re free of mfe ct'Ion at the previous visit.

d

'

131

This transition model for r ' . , chain (Feller, 1968 volume 1 eSP3l7ra2t)ory ~nfectIon is a first order Markov ' , p. , WIth a d' h t o bserve ' t ervaIs the p r ' IC 0 omous response , Y:. . . d at equally . spaced m I)' tranSItIOn matrix whose I ' ocess IS described by the 2 x 2 . 'h e ements are Pr(Y;· = .. I. eac of Yij _I and Yij may take values 0 J Y,) . ~Ij -I = Yij -1) where above specifies the transition m t' or 1, The lOgIStiC regression equation a nx as

o

o

1

1

exp(x~j.B"")

1 + exp(x~j(3"")

(7.3,2)

Yij-I

1

1

exp(x~j(3"" + a)

1 + exp(x~j(3"'·

+ oj'

No~e that because the transition probabilities depend on I t vanables, t?e transition matrix can vary across individuals. exp ana ory Tot th speCIfy the general transition model ' let 1t I).. = {y,l, . ... , y')..-I} represen e past responses for the ith subject. Also let lie. = E(Y:· I 'lJ .. ) and c: _ V (Y. I ., 'f'"l) I ) ' t,) vOJ - ar OJ H ij ) be the conditlOnal mean and variance of Yij given ~ast responses and the explanatory variables. Analogous to the GLM for Independent data, we assume:

• h(j.t5) = X~j{3*'"

+ ~~=l Ir(7ii j; a)

• v3 = v(j.tb)¢. Here, we are modelling the transition from the prior state as represented by the functions Jr, to the present response. The past outcomes, after transformation by the known functions, In are treated as additional explanatory variables. It may also be important to include interactions among prior responses when predicting the current value. If the model for the conditional mean is correctly specified, we can treat the repeated transitions for a person as independent events and use standard statistical methods. Korn and Whittemore (1979), Ware et ai. (1988), Wong (1986), Zeger and Qaqish (1988), and Kaufmann (1987) discuss specific examples of transition models for binary and count data. Transition models are reviewed in more detail in Chapter 10.

7.4 Contrasting approaches To contrast further the three modelling approaches, we consider the hypothetical linear model for infant growth and the logistic model for the respiratory disease data from the ICHS.

132

GENERALIZED LINEA

R MODELS F OR

LONGITUDINAL DATA .

ulate the three rep;reSSlOn . . ossible to rorm h' f ' near case, It IS P l·nterpretation. T at IS, cae II th In e . 'th the same . _ appro aclles to have coefficients II dWI tranSJ't'IOn II'near models can. have margmal . ' t from random eueets an 'd r the simple hnear regressIOn fi CleD s . II To illustrate, cons 1 e interpretatIOns as we . model for infant growth £

Yij

CONTRASTING APPROACHES (a)

(b)

25 r - - - - -

~

20 20 ;S

=

.cC>

iii ~

= (Jo + (Jl tij + Eij,

133 25

hs of child i at visit j, Yi J is the weight at age where tij is the age, III mont ~. Because children do not all grow at . zero deViatIOn. tij and Eij IS a mean~ ~ . . for child i will likely be correlated the same rate, the residuals, til, ... , Em,.' h is to assume . h one another . The marginal modellmg approac Wit

15

g

10

.cO! ~

15

10

.

5

5

0

o 0.0

(1) E(Yij) = (Jo + (Jl t ij; (2) Corr(Eij, Eik) = p(tij, tiki a). Assumption (1) is that the average weight for all infants in the population · t'IS (J0 +(J I· t The parameter (JI is therefore the.change per monthf at any t Ime in the population-average weight. Assu~ption (2) specifies the nature 0 the autocorrelation; a specific example might be that Itij - tik I < 6 months,

0.4 TIme (years)

0.8

0.0

0.4 0.8 TIme (years)

Fig. 7.1. Simulation of a random effects growth model: (a) underlying linear growth curves; (b) observed data. average of the rates of growth for individuals is the same as the change in the population-average weight across time in a linear model. The correlation for the linear growth model is given by

Itij - tik I 2: 6 months. That is, residuals within six months of each other have correlation ao, while those further apart in time have correlation al. In the marginal approach, we separate the modelling of the regression and the correlation; either can be changed without necessarily changing the other. A linear random effects model for infant growth can be written as

(7.4.1) where Zij,2 j = 1, ... , ni are independent , mean-zero deviations with . vanance (J and (UiO , Uil ) are independent realizations from a mean-zero distribution with covariance matrix whose elements are Var(U ) = Gll, iO Var(Ui~) = G22 and Cov(UiO , Uid = G12 . For mathematical convenience, e w. tYPlc~lly assume that all randomI variables. follow Gaussian distributions. Figure 7 l(a) sh ows a samp e of 10 hnear growth curves whose . t . III ercepts and slopes are' ltd f . · Wit . h means (J* - 81bsimu a e rom a bJvariate Gaussian distribut Ion (.1* 2 G - 0 4 (Ib / 0)2S, 1-'1 = 10 Ibs/year, variances G ll = 2.0 Ibs , 22 - . s year and correlation - 0 2 Of these curves but I ' . P - '. course, we do not observe , on y ImpreCise values y .. 0 1" . shown in Fig. 7.1(b) where the ' 't). ne rea lzatlOn of the data IS In the linear rand lr reSidual standard deviation is (J' = 0.5. am ellects model th . . ( ' e regreSSIOn coeffiCIents also have a marginal interpretat' . Ion Sillce E y'.. ) - (J* + (J*t Th"IS IS because t h e t) 0 1 ij.

=

{(G ll + 2G1ztij + Gzzt;j + O'2)(G ll + 2G1Ztik + G2Zt~k + O'2)}l/2'

If there is no heterogeneity in the growth rates (G 22 = 0), then Corr(Y.;· Y.; ) = G11 /(G ll +0'2). That is, the correlation is the same for any tJ' ,k . . fr l' h . two observation times reflecting the fact that It denves am t lelr s armg a common random intercept. A transition model for infant growth might have the form

+ (J;*tij + fij, = CXEij-l + Zij,

Yij = (J~*

(7.4.2)

fij

(7.4.3)

. h' where the Zij are independent mean-zero innovations WIt variance 0' 2 . This model can be re-expressed as "t/'. . _ I iJ -

/.1**

fJO

+ (J**t·· + a(Yi)'-1 I tJ - (J,U

(J~* - (J;*tij-d

+ Zij'

+ (JUt·· + a(Yi '-1

-

f.I** fJO

(7.4.4) (JUt·· ) 1 .)-1'

Hence, E(Y;j I Yij-J.···, Yil ) - 0 I:J the ~ast depends explicitly The expect~tion for the ~resent res~oa~s(l~~;)and (7.4.3) imply also that on the prevIOUS observatIOn. N~te£ f t ansition model has coefficients -- fJO (./**+(J**t·· E( V:.) I iJ 1 IJ so that thIS arm a. a r which also have a marginal interpretatIOn.

GENERALIZED LINEAR MODELS FOR LONGITUDINAL DATA

134

CONTRASTING APPROACHES

-. 1 se autoregressive (transition) models for prediction . Ecanome t nCIans 1 . ' Some prefer the form

1.0

r--------

135

(7.4.5 ) 0.5

Here the response is regressed on the covariates and on the previolls outcom~ itself without adjusting for its expectation. The predietions from (7.4.4) and (7.4.5) are identicaL However, the interpretation of (3.... and (3+ differ. Equation (7.4.5) implies that E(Yij) = L:::o a"x~j -r(3+ so that {3+ does not have a marginal interpretation as the coefficient for Xij- The formula for the marginal mean will change as the model for the correlation changes, for example by adding additional lagged values of Y in the model. Hence, the interpretation of the regression coefficients (3+ depends on the assumed form of the autocorrelation model. When the dependence of Yon x is the focus, we believe the formulation in (7.4.4) is preferable. We have shown for the linear model that regression coefficients can ha . I' ve aI' margma k . mterpretation for each of the three approaches . WI'th no n- I'mear Ill. functIOns, suc~ as the lo~it, this is not the case. To expand on this POI~t, w~ now conslde~ the lOgIstic regression model for the ICHS exam Ie of vltamrn A and respiratory disease. p The random effects logistic model logit Pr(Yij = 1 I Ui) = (30

+ Ui + (3~ Xij ,

£ ii ~

2

Cl..

0.0

-4

~::~c;5~h;;t~~t. ~ince

o

2

4

u

Fig. 7.2. Simulation of risk of infection with an" . factor, for 100 children in a logI' st'IC mo deIWith ' dichotomous risk ra dd \\ Ithout . average risks with and without I'n'lec t'IOn are In . d'lcated 11 om Intercept. Populationb th . . th d ' e vertlcallme on the 1eft. The Gaussian density fo reran om mtercepts is Y shown.

The population rate of infection is the average risk, which is given by

Pr(Yij = 1) = =

states that each child has its 0 " exp({3o +Ui) / {I + exp({3* + U. wn ~aselme n~k of infection, given by multiplied by exp({3*) i/th '~' an t~at a. chIld's odds of infection are this idea, Fig. 7.2 sh~ws thee~sk ec~~~ v:~amm A deficient. To illustrate 0 whose Ui's were chosen at ra d Ill ec IOn for 100 hypothetical children f child is represented by a verticnl ~.m rom a ?aussian distribution. Each out vitamin A deficiency plotte~ Itnethconhn~ctmg their risk with and withcurve of th e Ui IS . shown at the ab tt e c dd's val ue £or Ui· The Gaussian (.I fjO == -20 (3 We used the parameter values ff . 1 I = 0.4 and Var(U) =a 2am. _ e ect Ui = 0.0 has a 17% cha~c v. - 2.?, so that a child with random

-2

J J+

Pr(Yij = 11 U;)dF(Ui ) 1

exp(l1o + Ui + l1i X ij) . 2 exp((3o + Ui + l1i X ij) f(Ui,/I )dU.,

wh~re f (')2 denotes the Gaussian density function with mean zero and vanance v . The marginal rates, Pr(Yij = 1), are 0.18 and 0.23 for the sub-groups that are vitamin A replete and deficient respectively under the parameter values above. In the marginal model, we ignore the differences among children and model the population-average, Pr(Yij = 1), rather than Pr(Y;j = 1 I U;), by assuming that

rand:mo~:~;c~t~~:f:l~n

the deficient and a 12% chIldren who are not defi . ": a Gaussian distributo 68% N nt min AOdet.e that the logistic model assuc:::. ~ve nsks in the range 0.8% the corresp~~~~y is the same for every chil;s at the odds ratio for vitar~te. Children I:~t~hange in absolute risk diffe:;:~ to e~p(O.4) = 1.5. But SIde of the figur h a lower propensity for I' I: t' pendmg on the baseline e ave a s 11 nlec Ion (U· < 0) propensities near 0 5 ma er change in abs I t . • at the left . . a u e nsk than those with

Here, the infection rate in the sub-group that has sufficient vitamin A is exp((3o)/{1 + exp((3o)} = 0.18 so that 110 = -1.23. The odds ratio for vitamin A deficiency is exp((3I) = {0.23/(1.0-0.23)}/{0.18/(1.0-0.18)} = 1.36 so that 131 = 0.31. We have now established an important point. The marginal and random effects model parameters differ in the logistic model. The former describes the ratio of population odds; the latter describes the ratio of an individual's odds. Note that the marginal parameter values are

GENERALIZED LINEAR MODELS

136

FOR LONGITUDINAL DATA INFERENCES 137

where c = 16J3/(157l') so that

1.0

{3 ~ (c 2 v 2

0.8

+ 1)-1/2fj*

, (7.4.7) where c ~ 0.346. Note that (7 4 7) is . . listed above. . . consistent WIth the three properties 2

~ 0.6

~

We have established the connection bet ' d

.D

. 1 d I 'l\een ran am effects and ~.argma dmlo e parameters in logistic regression. The parameters in trans-

e a. 0.4

ItIOn rna e s such as (7.3.1) also differ from 'th th . 1 el er e random effects or margma parameters. The relationshil) between th . I d ' . e rnargllla an trans. . Ition parameters can be established but only in lim't d S Z and Liang (1992) for further discussion. I I" cases, ee eger

0,2 0.0 -4

o

-2

4

2

6

x

Fig. 7.3. Simulation of the probability of a posit~ve response in a logistic model with random intercept and a continuous covarIate. - - : sample curves; ++++++: average curve.

In log-linear models for counted data, random effects and marginal parameters can be equivalent in some important special cases. Consider the random effects model log E(Y,I)

I U) I

= x~tJ.(3*

+ d'U t) I'

(7.4.8)

where d iJ, is a vector containing a subset of the varl'ables'III x ij' The marginal expectation has the form smaller in absolute value than their random effects analogues. The vertical line on the left side of Fig. 7.2 shows the change in the population rates with and without vitamin A for comparison with the person-specific rates which change size as a function of the random effect. This attenuation of the parameter values in the marginal model also occurs if x is a continuous, rather than discrete, variable. Figure 7.3 displays Pr(~j = 1 I U i ) as a function of Xij in the simple model given by logitPr(~j = 1 I Ud = (00 + Ui ) + OiXij, for a number of randomly generated Ui from a mean-zero Gaussian distribution with v 2 = 2.0. Also ~hown is the avera.ge curve at each value of x. Notice how the steep rate of Increase as a f~nctIOn of x in the individual curves is attenuated in the average curve. ThiS phenomeno' 11 k . h . n IS we nown III t e related errors-in-variables regressIOn problem; see for example Stefanski and Carroll (1985). Neuhaus et al (1991) h th t 'f IT ( th . . s ow a 1 var Ui ) > 0 then the elements of e margmal (fj) and random effects (fj*) regressio~ vectors satisfy (1)

10k I:::; 10'k1, k = 1, ... ,p;

(2) equality holds if and only if Ok

=:

(3) the discrepancy between 0 and

Q. (.i*' •

k

In the random intercept case with dij = 1, note that E(Y;j) = exp(x: j {3* + v 2 /2), so that the marginal expectation has the same exponential form apart from an additive constant in the exponent. Hence, if we fit a marginal model which assumes that E(Yij) = exp(x;j{3), all of the parameters except the intercept will have the same value and interpretation as in the random effects model.

7.5 Inferences With random effects and transitional extensions of the G~M, it i~ po~ible to estimate unknown parameters using traditional maximum lIkelIhood methods. Starting with random effects models, the l~ke~hood of the data, expressed as a function of the unknown parameters, IS gIVen by

L({3·,a;y) =

•

Illcreases With Var(Ui ). In particular, if Ui is assumed t £ II mean zero and variance 1/2 Z o o ow a Gaussian distribution with , eger et al. (1988) show that .

(7.4.9)

fJk

(7.4.6)

ftJrrf(Yi i=l

j

I Ui)f(U;jo)dUi,

(7.5.1)

J=l

f the random effects distribution. where a represents the parameters 0 d dam effects of the The likelihood is the integral over the unob~erveeJ:s. With Gaussian joint distribution of the data and thedran °t~ ly simple methods exist . al h lased form an rela 1ve . d ata, the mtegr as a c . t d l'kelihood as discussed III for maximizing the likelihood or restnc e I ,

GENERALIZED LINEAR MODELS F

OR LONGITUDINAL DATA

, al integration techniques are . dels numerIC rna , d 'T' hniques for finding maxImum With non-linear 5 t Chap er . . h likelihoo . Lec often necessary to evaluat~ t :sed further in Chapters 9 aIJ(~ 11: likelihood estimates are dlSCU . be fitted using maximum hkeh.hoo<1. The Transition models can also y: Y; . can be written III the form joint distribution of the responses, ii,"" tn,'

INFERENCES

139

138

I y'

f(YiI, ... ,Yin, ) = f( Yin, tn, - , f(Yi2 I Yidf(YiI)'

!. , . ,

Yd, )f(Y'-I tn,

I Yzn,-2,""

functIOn (Wedderburn, 1974):

(7.5.4)

y,I) ... (7.5.2)

In the multivariate case, there is the additional complication that S{3 depends on 0 as well as on (3 since Var(Yi) = Var(Yi; /3, 0). This can be overcome by replacing a in the equation above by an m1/2-consistent estimate, &((3). Liang and Zeger (1986) and Gouriero\lx Pi al. (984) show that the solution of the resulting equation is a,>ymptotica1ly as efficient as if 0 were known,

In a first-order Markov model,

f( Yij

I~ the abs~nce of a convenient likelihood function to work with, it is sensl~le to estImate /3 by solving a multivariate analogue of the quasi-score

A

" y.!,(.I**,a)=f(YijIYij-1;/3**,a), t,fJ I Y~J-l,''''

so that the likelihood contribution from person i simplifies to

The correlation parameters a may be estimated by simultaneously solving S{3 = 0 and

ni

· y"tn"fJ(.1** , a) f(y il,""

= f(YiI; /3", a)

II f(Yij I Yij-I ;f3**, 0),

(7.5.3)

j=2

Again, in the linear model this likelihood is eas~ to eval~ate and maxi~ize. For GLMs with non-linear link functions, one dIfficulty IS that the margmal distribution of Yil often cannot be determined from the conditional distributions f(Yij I Yij-l) without additional assumptions. A simple alternative is to maximize the conditional likelihood of Yi2, ... ,Yin; given Yil, which is obtained by omitting J(Un) from the equation above. Conditional maximum likelihood estimates can be found using standard GLM software, treating functions of the previous responses as explanatory variables. Inferences conditional on Yi ! are less efficient than maximum likelihoods, but are all that is available without additional assumptions about f(Yil). Inferences about transition models are discussed in more detail in Chapter 10. In our marginal modelling approach, we specify only the first two moments of the responses for each person. With Gaussian data, the first two moments fully determine the likelihood, but this is not the case with other members of the GLM family. To specify the entire likelihood additional a.:sumptions about. higher-order moments are also necessary. E~amples for b(mary data ~re gIven by Prentice and Zhao (1991), Fitzmaurice et al, 1993) and LIang et al. (1992). . Even if additional assumptions are made th l'k l'h d ' ften mtractable an d'mvo Ives many nu' ' e' I e I..00 IS a (.I th t tb ' fJ ' a mus e estImated For thO Isance parameters m addItIOn to 0 and roblems I' S t ' , I S reason, a reasonable approach in many P o use generahzed est' t ' . , analogue of quas'1-l'k ,zmawmg equatwns I eI'h I 00 d whICh . or GEE, a multivarIate details in Liang and Z 1 e now outhne. The reader can find (1988). eger ( 986), Zeger and Liang (1986) and Prentice

(7.5.5)

where Wi = (~IRi2'~IRi3"",~n;-1R;ni!RTI,R;2, ... ,R;nY, the set of all products of pairs of residuals and squared residuals with R;j = {Y;j Jlij} / {V(tLij) P/2, and "Ii = E(Wi; /3, a) (Prentice, 1988). T~e choice of the weight matrix, Hi, depends on the type ofresponses. For bmary resp?use, the last ni components of Wi can be ignored since the variance of a bmary response is determined by its mean. In this case, we can use

H

.',

= (

var(R;o'IRt2)

Var(R; l Ri 3)

d d /3 nd a only For counted data, we which ensures that SOt epe~ s o~ a . £ when solving So. = 0, suggest the use of the x IdentIty matrIX or • where

n; n;

Ii.

ni* = (ni)+n. 2 ., is the number of elements of Wi. imal (Godambe, 1960). However, our The suggested H ~ are not opt h t the choice of Hi has small impact experience in applicatIOns has ~een t aSH 11 and Severini (1998) for r.I h n m IS large. ee a . on inference for fJ w e . nal H. to alternative weight matrices. a comparative study of a dlago bl ~'ht namely Var(Wi), we would Furthermore, to use the most suita e welg ,

(J'ENERALIZED LINEAR

140

J

NGITUDINAL DATA MODELS F OR LO

•

•

,

th third and the fourth JOInt e . I t make further aS8um Ptions about . d (1993) and Liang f.t a.I (1992) neel ,0 of 1': See F't' auriee and LaIr ! zm momen t,8 t, b' responses, " . r j tal'ls in the case of mary 0 I'S asymptotIcally (,ausslan ,or (e ( , '. f S - 0 anIj S a _ The Holution, ({3,a), 0 , f3 - . 'consistently estimated by 986) wIth varIance . (Liang and Zeger, 1 '~I A

rn

(L io=l

D) C"B:-! tIt

(I: m

-I

G'IB-Iv. B-ICI) (,",.m i i 0, L.., D:Bt-1C,)

, (7.5.6)

I

8

Marginal models

'0=1

10= I

evaluated at ((:J, a), where

f!&)

0) B= (var(l'i) c= (WI 0 0 !!!li I

ao.

i!'!li.

,I

ao.

8a

and

8.1

'

,

Yi - /-Li

VrJi

= (Wi - 17J

Yi - J.li

(Wi - 17i

)

This variance estimate is consistent as the number of individuals contributing to each element of the matrix goes to infinity. For example, in an analysis of variance problem, the number of units in each treatment group must get large. , The proposed method of estimation enjoys two properties. First, (3 is nearly efficient relative to the maximum likelihood estimates of (3 in many practical situations provided that Var(Y i ) has been reasonably approximated (e.g. Liang and Zeger, 1986; Liang et ai" 1992). In fact, the GEE is the maximum likelihood score equation for multivariate Gaussian data and for binary data from a log-linear model when Var(Y i ) is correctly specified ~Fitzmaurice and Laird, 1993), Second, (:J is consistent as m --+ 00, even If the ~ovariance structure of Yi is incorrectly specified. When regression coe~cle,nts are the scientific focus as in the examples here, one should invest ' , the hon s share of time I'n m d II' th o e lUg e mean structure while usmg a reasonable approximation to th . Th ' . e covanance. e robustness of the mferences fi I ' . about {3 can be checked b fitt' , different covanance assumptIOns and compariY thlUg I,a na model usmg . sI, an dar d errors If they d'ng ff e bwo sets ' of estImates and their robust th " I er su stantlally, a more careful treatment of e cova:lance model may be necessary. In thIS section we have fo d . £ ition models can ~l 0 b fit c;se, on lU erence for marginal models. Transand Whittemore (1~79)e Wte USlUg GEE. The reader is referred to Korn re discussion is provl'ded .' Cah et al. (1988), and references therein. Further In apter 10.

Introduction

This chapter describes regression methods that characterize the marginal expectation of a discrete or continuous response, Y, as a function of explanatory variables, The methods are designed to permit separate modelling of the regression of Y on x, and the association among reppated observations of Y for each individual. Marginal modpis are appropriate when inferences about the population-average are the focus. For example, in a clinical trial the average difference between control and treatment is most important, not the difference for anyone individual. Marginal models are also useful for epidemiological studies. For example, in the Indonesian Children's Health Study (ICHS), a marginal model can be used to address questions such as: (1 ) what is the age-specific prevalence of respiratory infection in children? (2) is the prevalence of respiratory infection greater in the sub-population of children with vitamin A deficiency? (3) how does the association of vitamin A deficiency and respiratory infection change with age? Note that the scientific objectives are to characterize and contrast populations of children, f . tes on the marginal d II" the effects 0 covana In addition to mo e mg . d I £ the association among obserexpectations, we must also speCIfy a ~o e tor ted with random effects and vations from each subject. This is to. e co~ rtas within-subject associathe covanate euec s and d I h transitional mo e s were. t' As discussed in Chapter seven, tion are modelled through a smgle equa ~on, f linear models for Gaussian all three approaches lead to the sa:~ c ~s (~on linear) models can lead to data. But in the discrete data case, I ~ren ffi ~ nts The choice of model . f; r the regressIOn COl' Cle . different interpretatIOns 0 , t'fi question being addressed. £ d end on the SClen l e d should therelore ep . al d Is for categorical an count IonThis chapter discusses rnargm rna he alysis of skewed continuous . '1ar 1'de as apply to t e an gitudinal data. SImi L

MARGINAL MODELS

S t' 8 '11 t be covered here. ec IOn .2 t which WI no d' 1 d ' 1992) and censored a a . ad'tional log-linear an margma (PaIk, b'nary data contrastmg tr I. t'nct parameterizations of the focuses on I are three dIS I 1 . mo deIs. In Section 8.2.2, wed comp b' ary observa t'10ns , in terms of corre atlOns, . , among repeate conSiders covanance ' d m r inal 0 dd S ratI'OS ,.Section 8.2.3 . conditional odds ratIOs, an . rna (~EE) for estimating regreSSIOn and assogeneralized estimating equatzons 'f' the entire likelihood. Examples of 'th ut speCI ymg . t'on .' al d a tare presente d'm Sec t'IOn 83 ". cIa I parameters WI. 0 I gJtudm a logistic models for bmary o~ d It'nomial responses is delayed untIl d tegoncal an mu I t odels for count data. Discussion of ordere ca Chapter 10. Section 8.4 trea s m

BINARY RESPONSES

142

8.2 Binary responses

8.2.1

The log-linear model . . . babilit models for Pr(Yil,"" Yin;). To SImplIfy This sectIOn focuses on pro c th ymoment the individual's subscript, i, as the notation, we suppress lor e 11 dependence on covariates, Xij' ' . we T~e most widely used probability model for multivariate bmary data IS the log-linear model (Bishop et al., 1975)

Prey = y)

=e(9) exp

(t

ej')Yj

j",1

where the (2

n

-

+

.~ e;~}2YjlYi2 + .,. + ei~).,nYI ... Yn) ,

Jl <J2

I)-vector of canonical parameters is

143

For this model, log {pr(Yj = 1 Pr(Yj=o Here,

I Yk = Yk, Yt = 0,1 # j,k)} I Yk=Yk,Yt=O,I#j,k)

_ (I) (2) -OJ +OjkYk.

e?) is the log-odds for Yj = 1 given that the remaining responses

j are all zero. Similarly, O(~) is the log-odds ratio describing the association between Yj and Y k giv~n that all the other responses are fixed, here set equal to zero. The log-linear canonical parameters, 0, may be undesirable if we now want to formulate a model in which they depend on explanatory variables by letting 0 = 9(x). Suppose that Y , ... , Y I n indicate whether a child has respiratory infection each month and the key explanatory variable is an indicator of whether the child's mother smokes, x = 1 for smokers. If we now assume that the quadratic exponential model above has different parameter values for the smokers and non-smokers, we have a regression model. But the difference between oy)(x = 1) and

Yi, I =I-

e(I) (x =

0) represents the effect of smoking on the conditional probability

at respiratory infection at visit j given that there was no infection at any

of the other visits. Since mother's smoking is likely to influence all visits, a part of the smoking effect will be conditioned away ~hen we con~rol ~or the other outcomes. A better formulation of the log-lmear model IS WIth parameters that describe the marginal probability Pr(Yj = 1 I :1:) as a function of the smoking variable (Neuhaus and Jewell, 1990).. . A second limitation of the canonical parameters is that theIr mterpretation depends on the number of responses, n. Hence, if we add or delete. an observation for an individual, the interpretation and value of the canomcal arameters will change. In longitudinal studies, the n~mb.e~ of observacommonly differs across subjects so that the applIcabIlIty of models formulated in terms of 0 is limited.

~ons

The function c(O) normalizes the probability distribution to sum to one. The equation above represents a saturated model in which the only n constraint on the 2 cell probabilities is that they add to one. The canonical parameters facilitate calculation of cell probabilities but are. less useful for describing Prey = y) as a function of the explanatory v~nables x, because, for example, the parameter ej/c describes the association betw~en Yj. and conditional on all the other responses, Ye, I =F j, k. To thIS pomt, the quadratic exponential model (Zhao andd P.rentIce, 1990; Gouneroux et al., 1984), a log-linear model with third an hIgher-order terms equal to 0, so that

estabh~h

yk

~onsider

Prey = y) = c(O) exp

(~O(I)y. + '"' ~ ~ J

J::I

J

j
(P)

.

jk YJYk

)

.

. al . with the margm We can still build a log-linear model by startmg parameters, 8.2.2

Log-linear models for marginal means

j.tj

= Pr(lj

= 1),

j = 1, ... ,n.

(8.2.1)

t Y has 2n -1 free parameters. The saturated log-linear model for an n-vec or 'ameters to be specified. n , have 2 - 1 - n pal f If we start with the n j.tj 5, we f ful ays' we briefly consider three 0 This can be done in a number 0 use w , these. , 1 the second- and higher-order canonical The first is to use the j.tj s, p ~s ) d by Fitzmaurice and (2) ()(2) () n ,as propose ) parameters, ()12 , .•. , n-ln"'" 12, ... ,n Y Y YIY2Y3,"" YIY2,'" , Yn , Laird (1993). Ifwe let w = (YIY2, YIY3,"" n-I n,

MARGINAL MODELS 144

{}J

=

(I) O(I) (0 1 , .. " n

and

{}2 .

=

log-linear model can be wrItten

Prey

(2) ()(2) ()(2) (B I2 , 13 ,"" n-In"'"

BINARY RESPONSES

e(n» I.... ,n

complicated ways by the marginal means Hence 'f h h . I . , I we assume t at t e margIna means ?epend on covariates, :1:, it may not be correct to assume that the correlat~ons and higher-order moments are independent of:l: as would be convement. '

aJ3

= y) = C({}1,{}2)exp(y'{}1 + w'{}2),

f' terest are J.t = (J.lI, .. ·,J.ln) = J.t(fh,02), so we can The paramet;rs a fIn from the canonical parameters, (0 1 , ( 2 ), to (It, (}2). n make a trans ~rma IOt t we assume that the marginal means, J.lj, satisfy a In the regressIOn can ex , t' £ t:I ""'/3 . Interestingly, the score equa IOn lor fJ under model sueh as Iogl't Ilj -- Wj this parameterization takes the GEE form

(~~)' Var(y)-l (Y -

145

then the

A compromise between conditional odds ratios which are unconstrained by t~e mea~ but whi~h have interpretations that depend on n, and correlatIOns which are senously constrained, is to parameterize the likelihood in terms of marginal odds ratios, These have weaker constraints and their interpretations are independent of 71. The marginal odds ratio is defined as . _ Pr(lj == 1, Yk = 1) Pr(lj Pr(lj == 1, Yk == 0) Pr(Yj

~k-

It) = 0,

where the variance matrix of Y is determined by both {3 and O2 , This demonstrates that the solution of the GEE is the maximum likelihood estimate under the log-linear model when the covariance assumption, Var(Y d, is correct for all i = 1, ... , m. The limitation of this parameterization is that the interpretation of 92 as a conditional odds ratio depends on the number of other responses in a cluster. Hence, this formulation is most useful when the number of observations per person is the same, at least by design, as for example is often the case in clinical trials. Also, the conditional odds ratios are not easily interpreted when the association among responses is itself a focus of the study. The second parameterization of the log-linear model that uses marginal means was proposed by Bahadur (1961). Here, second-order moments are specified in terms of correlations. If we let R- - (Y - J.t.) / {J.t. (1 _ 1l)}1/2 p. - C (Y V) E(R R ) J J J J t~ '~- orr j, L k == j k, Pjkl == E(RjRkRL) and so on, up Pl •...• n - E(R ll .. , ,Rn), then we can write the probability distribution P r (Y = y) as n

= 0, Yk == 0) , = 0, Y k ;. 1)

(8.2.2)

It takes values in (0,00); a value greater than one indicates positive association. The odds ratio is a popular measure of association because it is unconstrained on the logarithmic scale and is in common use as the parameter in a logistic regression. The full distribution of Y may be specified in terms of the means IJ, the odds ratios "y == (-r12, ... ,1'n-In) and contrasts of odds ratios, the first two given by (jlJ2j3 = log OR(Yll' Yh IYh = 1) -log OR(Yh' Yh IYiJ (jlhisj4

== log OR(Yll , Yh IYh

= 0),

= 1, Yj4 = 1)

- log OR(Yjl' Yh IYh

= 1, Yj4

= 0)

-log OR(Yll' Yh IYh = 0, Yi4 == 1)

+ log OR(Yil' Yh IYj3 == 0, Yj4

= 0).

The general term has the somewhat complicated expression '~ "

(jl, ... ,jn =

(-1 )b CY ), log OR(Yj1 ,Yh IYh' ' .. , Yjn)'

Yj3 "",Y;n =0,1

IIIlj'j(l- Ilj)(l-Yj) j=1

x

(1 +"p. + " L...-

Jk'rJ'rk L.. Pjkl'rj'rk'rl j
+ .. , + Pl"."n r l T 2, ..• ,Tn

)

.

Here, the joint distribut' . Ion IS expressed' t . , pairWise correlations and h' h IU erms of the marginal means, , I g er moment f th T he Bahadur representatl" ~ a e standardized variables R j . b'l't' on IS attractlv b . a Illes and correlations h'ch e ecause It uses marginal probf . , W I are famT o contllluo us responses T h ' liar parameters from the analysis G . . e senous dra b k h ausSlan data, the correlation ~ ac, owever, is that, unlike for s among bIllar y responses are constrained in

n , _ 2 Lian et al. (1992) discuss the evaluation where bey) = 2:£=3 YJt + n . g . d their contrll.'lts. of the likelihood function in terms of these O~tdhS ratl~l~umber of observa. . . I tter except in cases WI a sm . bservations the joint distribution, ThiS IS not a Simp e rna tions per person. For example, With two 0 Pr(Yl = Yl, Y2 = Y2), can be written as

J.tt1 (1- J.ldl-Y1J.l~~(I-J.l2)1-Y2

- y::

where J.tll = P r (Yi1 -

2

+ (_1)(Yl-Y2)(llll

= 1) is given in terms of Ilj

and

- 1l11l2), 1'12

~-4 "Y1~-1 "Y121-'11-'~

by

1/2

,1'12 1'12

r 1'

-J.

= 1.

MARGINAL MODELS 146

BINARY RESPONSES

n,) . f odds ratios per subject, and this t there are (2 pairs We note th a ' . eases The problem of having a 0

b t f I when ni mcr . n be alleviated by using a regression number hecomes su s an Ia ' . lisance parameters ea large numh er 0 f m . d for the marginal expectatIOns. The lS model for the odds ratios as r °lne. J' k which says that the degree of b d I is , k - , lor a l z, , I simp est mo e . ij - II . of observations from the same 811 ject. association is the same for a paIrs Alternatively, we might assume that

log,ijk =

aD

+ al

, tij -

tik

r

l

= ,(a),

(8.2.3)

where a is a vector of q parameters to be estimated. We often do not have sensible, simple models for third- and higherorder moments regardless of which formulation we adopt. Even when a probability model is fully specified, the likelihood can be complicated to evaluate except with small and constant ni' For these reasons, we now apply the GEE approach discussed in Section 7.5 to logistic regression. 8.2.3

odds ratios for data sets with block designs and t' t h .. ' , es Ima ,e t e assOCIatIOn parameters usmg maximum likelihood , Lipsitz t I (1991) L' . e a, ,lang et at. (l9,92 ) and Carey et at. (1993) formulate Var(Yi ) in terms of marginal odds rat lOS.

~ollowing Prentice (1988), we can estimate association parameters bv addmg a second set of estimating equations S ((3 ",) = 0 an d' simuIt,ane-' nusly solvmg the expanded equations for (3 and 0:. The Q equations take the form •

.

'_

Q

,.....

,

. ., ely proportional to the time that is, the degree of associatIOn IS 1~ver8 between observations. In general, we WrIte ,ijk

147

(8.2.5) where

Wi = (R; l R i2 , RijR;3,.'., Rm,-IRm ,), Hi = diag{Var(R;1R;2),"" Var(R;n,-IRin .)}, Hij = {li] - lli]}/{llij(lMij)}1/2 and"'i = E(W i ), With binary responses, 7], and Hi are fully determined by the mean and correlation models without additional assumptions about higher-order moments. One simple example is if we assume that Corr(Yij, Yik) = Q for all i and all j =f k. Here, if we use the simpler identity weighting matrix in (8,2,5), we can estimate a by m

Generalized estimating equations

0: = (l/N*)

In the logistic regression model, we specify that the marginal expectations, E(Yij) =: f.lij, satisfy logitfl.ij = xV3. We have seen that the GEE estimating function introduced in Section 7.5 is the score equation for (3 when the data follow a log-linear probability distribution and we correctly specify Var(Y i ). Even when we must model the covariance it seems sensible to estimate (3 by solving the GEE '

(8.2.4) The quantity S{3((3 Q) ca b ' d " score funct' fi t' n e Vlewe as a multIvanate version of the quasiIOn rs proposed by W dd b ( d e e r urn 1974) with an additional complication that it d Var(Y.· (3 "') Th d epedn s not only on (3 but also on Q, since Var(Yi) = " ,...... e epen ence on b a c,an e resc:.lved by replacing Q in the GEE above with a ml/2_ . consistent estImat ((3) L' and Gourieroux et al. (1984) h h e, a . lang and Zeger (1986) equation is asymptoticall affive, s own ,that the solution of the resulting yas e Clent as If Q were k WI'th b'mary responses th' nown. e association parameters, Q may be formulated and estimated" m a number f ' ' f 0 ways. LIang and Zeger (1986) parameterize Yare}';) in te £ h , r m s 0 correlat' d or t e unknown parameters P't . Ions an use moment estimators . I zmaunce and L ' d ( aIr 1993) use conditional A

LL i=l

(n;)

7'ij7'ik,

j
( - )/{ -

(1 - f.l,) - _)}1/2 where N* = Li 2 and 7'ij = Yij - Ilij f.lij . .When marginal odds ratios are used to model asSOCiatIOn, Q can be estimated by the approach introduced by Carey et at, (1993) and devel~p~d b Care (1992). They estimate the odds ratios using an offset lOgIstiC y y 1 dd t' for responses Y; - and Y;k and let regression. Let Qijk be the og-o s ra 10 , } ~.r _ ~.r_ _ 1) Then it can easIly be shown that Mijk = P r .( I ij - .I ,k - . , logit Pr(Yij = 1

I Yik =

fl.ij - JLijk Yik) = aijkYik

+ log (

1-

f.lij - f.lik

+ JLijk

) .

, model that all the odds ratios are the Suppose that we now Impose the b t' t d by a logistic regression _ Then a can e es Ima e h , same, that IS, Qijk. - a. ._ , using the second term on t e m of Y'J' on Yik> 1 ::; J < k ::; ni, Z - 1, ... 1 , ff: t Note that the offset , t' above as an a se ' right hand side of the equa lon, ' . required, More generally, we that IteratIOn IS 'h t depends on both (3 and a so h d.. is a set of covanates t a d',· a were 'Jk I k d can assume a mo e Qijk - '3 ' b ervations j and k, Wie then tio between 0 s '" , dd characterizes the log-o s ra . f y' . on the product uijkYik, I 't' regreSSIOn 0 ') h'ch estimate the vector a by a ogls lC' f thO l'mplementation of GEE, W I . 'e detal 1 SOlS Carey et al. (1993 ) give ~o~ , ns' or ALR, they call 'alternating logIstIc regresSIO ,

MARGINAL MODELS

EXAMPLES

149

148

8.3 Examples . . t t the use of marginal models for binary data we • demons . two examples are from e I"Inlcal tri• I d' ra eThe first In thOIS sectIOn from four blOmedlea stu les. . . Crossovers are common IY use d'In h'IOmedlcal deSlgns . als with crossover . h h . t tents for chronic dIseases sue Ck'i a...,t rna and studies to. compare m data from the SChlzophrelllf1 . ' trIa . I d escnbed . Th thOrea d uses hypertensIOn. e Ir . b . I . I 17Th fourth example is from the a servatlOna study of III Examp e ., e . d . . £ • • the association between vitamin A defiCIency an respIratory InlectlOn In Indonesian children. Example 8.1. A 2 x 2 crossover trial The data shown in Table 8.1 are adapted from Jones and Kenward (1989, p. 90) who reported results from a crossover trial on cerebrovascular deficiency in which an active drug (A) and a placebo (B) were compared. Only data from centre two are used for illustration. Thirty-four patients received the active drug (A) followed by placebo (B); another 33 patients were treated in the reverse order. The response variable is defined to be 0 for an abnormal and 1 for a normal electrocardiogram reading. Table 8.1 shows, for example, that 22 patients in the AB group respond positively on both treatments, whereas four patients in the BA group respond negatively to the first treatment (B) but positively to the second treatment (A). Also shown in Table 8.1 are the marginal distributions of responses at the two periods for both the AB and BA groups. Focusing on the data at period .1 for the moment, we observe that 82% (28/34) of patients receiving the ~c:IVe drug (A) were normal as opposed to 61% (20/33) of the patients recelVlng the pl.acebo (B). This gives (13 x 28) / (20 x 6) = 3.0 as an estimate of the odds ratIo comparing the chance of being normal for the active drug ~:rsus ~he pl~~bo. The s~andard error for the estimated log-odds ratio, g 3.0 - 1.1, IS III turn estImated as (28- 1 + 6- 1 + 20- 1 + 13- 1 ) 1/2 = 0.57.

Table 8.1. Data from 2 2 a x crossover trial on cerebrovascula d fi . (1989 r )e Clency adapted from Jones and Kenward ,p. 90 , where treatm t A drug and placebo . en sand B are active , respectIVely' the t . d' whether an electro d' ' ou come III Icates car lOgram w . d d or normal (1). as JU ge abnormal (0) Responses Group AB BA

Period

(1,1)

(0,1)

(1,0)

(0,0)

Total

1

2

22 18

0

6

4

2

6 9

34 33

28 20

22

22

For an explanation of these calculat' The estimate of the treatm t a IOns, see Appendix A, Example A.l '. en enect, measured by th dd " . eo s ratio, IS greater t IIan one, mdlcating that t h ' e active drug pr d h' 0 uces a Igher proportion o f normal readings. However th t 'f) , e reatment effect . t .. m lcant at the 0.05 level du t h I IS no statistIcally sig. e 0 t e arge st d d . deSirable to combine the odd . . an ar error. It is therefore th fi . s ratio estimated fr th.at from second period, (22 x 12)/(11 x 22 _om. erst peflod with plmg error. Two problems arl'se F' t I' ) - 1.1, to reduce the sam. Irs. t liS approa'l . treatment-by-period interaction know~ as th C 1 Ignores the possible the effect of treatment in period 1 . fl . e ca1Ty-over eff~ct whereby ~eriod. Secondly, the two responses ~~t~~:::fr~he response It1 t~e next lIkely to be dependent. In fact both m the same subject are w~th~n-sub~ect dependence. The ~dds ratf~O~sn:;~: (sstr20 2n)g tdegrt~es of .. . .. .0 es .Imate wlthl s b t n- u Jec asSOCIation, IS estimated to be (22 x 6)/(6 05) - 4 (18 x 9)/(4 x 2) = 20.3 for groups AB and BA, respective':.' - 4 and We n~w use the GEE method, together with a sensible choice of model to make .mferences about the treatment effect, combining the data fro~ bot~ ~enod~. Co~ceptually, 2 x 2 cro~sover trials can be viewed as longitudmal . . studIes WIth ni = n = 2 " In thIS example m - 67 . Th e t wo major covanates, treatment (Xl) and period (X2), are both time-dependent, and are coded as Xl

= {I active drug (A),

o

placebo (B),

X2

I

={o

period 2, period 1.

We first fit a logistic regression model logit Pr(Yij

=

1)

= f30

+ f31 Xijl + f32 X ij2 + f33 X ijl . Xij2,

under the assumption of a constant odds ratio, 'Y, across subjects. Results from Table 8.2 show little support for a treatment-by-period interaction Cih = -1.02 ± 0.98) and a strong within-subject association (1 = exp(3.54) = 34.5). The association parameter indicates that subjects with normal responses at the first visit have odds of normal readings at the next visit that are almost 35 times higher than those whose first response was abnormal. When dropping the interaction term. from the model, we estimate a statistically significant treatment effect (f31 = 0.57 ± 0.23). Thus, the overall odds of a normal electrocardiogram reading are estimated to be 77% higher (0.77 = exp(0.57) - 1) when using the active drug as com~ared to the placebo. We note that if we incorrectly assume that there IS no within-subject dependence, as was done in the last col~m~ of T~bl~ 8:2, the treatment effect is erroneously assessed to be statIstICally ll~SIglllfi cant (lh = 0.56 ± 0.38). However, this potential pitfall can be aVOIded by

MARGINAL MODELS

EXAMPLES

150 .f regress ion coefficients Table 8.2. LOgIS ~cEE for the 2 x 2 crOSsover (standard errors) from deficiency. Models 1 .and trial on cerebrovascular f g logistic regressIOns 2 are fitted using alterna III (Carey et at., 1993). Model

2

3

Variable

1

Intercept

0.43 (0.36)

0.67 (0.29)

0.66 (0.29)* [0.32J**

Treatment (Xl)

1.11 (0.57)

0.57 (0.23)

0.56 (0.23) [0.38J

Period (X2)

0.18 (0.51)

-0.30 (0.23)

-0.27 (0.23) [0.38J

-1.02 (0.98) 3.54 (0.82)

3.56 (0.81 )

xl' x2 log'Y

*robust SE. **model-based SE.

using the robust standard error (0.23) instead of the model-based standard error (0.38). Example 8.2. A three-treatment, three-period crossover trial This example demonstrates how marginal models can be easily extended to more complicated designs. It also serves to contrast the results from using marginal and conditional parameters. Table 8.3, previously shown as Table 1.4, gives data from a crossover trial in which two treatments were compared with placebo for the relief of primary dysmenorrhoea (J ones and Kenward, 1987). The placebo is labelled treatment A, whilst treatments B ~nd C are analgesic with low and high dose, respectively. The response vanable, measured at the end of each period, is coded one for some relief and zer~ for no relief. Table 8.3 shows, for example, that among 15 patients ran~omlZed to the group receiving treatments A, Band C in order, 9 patients .had the outcome (all) indicating no relief in period 1 followed by some relIef for both the second and third periods.

151

Table 8.3. Data from a 3 x 3 crossover trial on primary dysmenorrhoea from Jones and Kenward (1987). Response Group

000

100

010

001

110

101

011

111

Total

ABC ACB BAC BCA CAB CBA

0 2 0 0 3 1

0 1 1 1 0 5

2 0 1 1 0 0

2 0 1 1 0 0

1 0 0 8 1 4

a

9 9 3 0 2 1

1 4 1 1 1 0

15 16 15 12 14 14

I

0 8 0 7 3

period 2(3),

Xl(X2)= { 0 otherwise,

_ {I 0

X5(X6) -

otherwise,

the previous assignment is B(C), otherwise.

We first examine the carry-over effect by fitting a model logitPr(Y'ij

= 1) = (30 + (3IXijl + (32 X ij2 + (33 X ij3 + (34Xij4 + (35Xij5 + (36 X ij6,

(8.3.1)

' d two different models for the . 1 86 We conSI'd me . £or J' = 1 2 3 and t = , ... , . . re constant for all three paIrs l' , , . I Odds ratIOs we 2) Th association: that the margmah t k three distinct values (Model ' . e of times (Model 1); and that t ey 00 . lly as we changed the assumptIon inferences for {:J d I'd not change matena . about the association, ted in Table 8.4 suggest that .there ~s Results for Models 1 and 2 repo~ t for the high dose analgesIc trea . f a carry-over euec moderate eVIdence 0 lief in periods 2 or 3 are reduced by a ment, The odds of dysmenorrhoea re

MARGINAL MODELS

EXAMPLES

152

d (standard errors) from logisTable 8.4. Coefficients ~n GEE for the 3 X 3 crossover tic regression analyses usJnhg Models were fitted using . dysmenorr oea. 'on implementation of GEE. trial on primary . . the alternating logIstIc regressI Model Variable

1

2

3

Intercept

-1.08 (0.32) 0.42 (0.41) 0.59

-1.10 (0.32) 0.38 (0.41) 0.55 (0.45) 2.09

-1.21 (0.42) 0.96 (0.72) 0.56 (0.79) 2.74 (0.69) 2.00 (0.57) -0.32 (0.55) -0.86 (0.49) -1.55

(0.46) 2.09 (0.42) 2.07

(0.42) CarryoverB(XS)

-0.15 (0.51)

-0.92 (0.44)

(0.42) 2.12 (0.42) -0.091 (0.50) -0.86 (0.43)

(0.87) -0.052 (0.89) -0.39 (0.92) 0.42 (0.91)

log"Y12

-0.22 (0.38)

-0.19

IOg"Y13

-0.96 (0.62)

-0.22 (0.38) -0.22 (0.38)

0.11

-0.19

(0.72)

(0.35) -0.19 (0.35)

log"Y23

0.29 (0.70)

(0.35)

4 1.19

(0.42) 0.88 (0.72) 0.53

(0.80) 2.66 (0.67)

2.01 (0.57) -0.34 (0.55) -0.81 (0.49)

153

as shown for models 3 and 4 of Table 8.4 . ment effects are constant ave f ' ' that the assumptIon that treate the first model and condud r thlffit Ihs adequate. We therefore return to , e a t e odds of d h . the group receiving the low (hi h) d YS,m~norr oea relief for (exp(2.07) = 79) t' h' h g ose of analgeSIC IS exp(2.03) = 7.6 . Imes Ig er than when usin th> I· b two doses of analgesic have s'lm'lI ar euects, IX g e pace 0 and that the Jones and Kenward (1987) use eanOU1ca . I paramet ' I I' model to analyse three-perl'od .' ('rs m a og- mear I t I crossover tnals For th a model in which logit Pr(1":. I Y. k --'-' ' . , esf' t aa. t ley used v t] .k, r 1,x.]) IS a hnear function of the J ' k k -/.. J' ' ' o th er responses . .' t, r ,and 0 f th· e covanates defined above, This model '. " found lIttle eVidence of carry-over effects Th I ·k f , h' I'" . I" ac 0 statIstIcal slgmficance III t elr ana YSlS IS hkely due to the fact that th " bl . , ' I" vana es X5 and ,T6 were ~odelled SImultaneously with Vi] - I ' When there are treat~ent effects. as IS the case here, the carry-over effects of variables x and x WI'II b tl 'b d 1":' 5 6 I" par y attn ute ~o ij -I, whl.c~ has also been affected by the treatment during the last penod. By conditIOning on the prior outcome, we attribute some of the effect of the last-period treatment to the last-period outcome. By the analogous argument, the estimated treatment effects will also be smaller in the conditional model as applied by Jones and Kenward (1987). The conceptual difficulties in conditioning on other outcomes with longitudinal and other studies that generate correlated data have been addressed in more detail by Liang et ai. (1992).

-1.42 (0.84) 0.006

(0.90) -0.33 (0.89) 0.44

(0.92) -0.87 (0.90)

0.18 (0.75) 0.20 (0.68)

factor of 60l)i(o (0 .6 -- 1 - exp(- 0 919)) . ment C in the previous pe . d' h among patIents who received treatt no rat er th A' reatment assignment. an , Irrespective of the current We next examine the data £ . by adding Xl 'X3 X 'X or pOSSIble treatment-by-period interactions 'I 4,x2'x3andx t 2. X4 0 the previous model. It appears,

Example 8.3. A conventional randomized trial In clinical trials pre-post designs are common in practice. Here, the primary response variables of each patient are measured prior to and after the randomization has taken place. To illustrate how marginal models may be applied to this design for binary responses, we consider the schizophrenia trial described in Example 1.7 in which the response variable, Positive and negative symptom noting scale (PANSS), is dichotomized as one if PANSS > 80 and a otherwise. - Figure 8.1 shows the estimated prevalence in two groups (1: risperidone of 6 mg' 0: haloperidol) at baseline and at 1, 2, 4, 6, and 8 weeks after randomization. There is a clear trend of decline in risk for having PANSS ~ 80 for both groups. A scientific question of interest is whether th~ apparent difference in trend between the two groups is clinically me,anmgful.. T~ formulate this objective statistically, we consider the followmg margm model for the binary response Y;j: logitPr(Y;j

= 1) = (30 + (3IXil + f32 X ij2 + f33 X il . Xij2, - 170 the number of subjects , t status defined earlIer . IS the treatmen

mh £or J. = 01 , I " " 5 and'-1 z - , ... , mwere '.

included in this analysis. Here

Xii

MARGINAL MODELS 154

6

. ,_ r

1.0

0.8

1 I

Haloperidol Risperidone 6 MG

1

--

EXAMPLES

_

155

5

R.... H~

~H

~

···· ...R : : - H________

0.6

~

·····R---------··-·--- A~~~~

~ a.. 0,4

4 0 .~

en

'0 '0

3

0

6l

0 -l

0,2

2

0.0

0.69

Fig. 8.1. Prevalence of PANSS ~ 80 as a function of time (in weeks) from randomization estimated separately for risperidone (6 mg) and haloperidol group.

O.l----,-----,-----r-----,--~---,~-~-_.__-.J

0,2513

0.5878 0.8473 1.0986

1.5041

2.1972

Lag in log(time + 1)

Fig. 8.2. Estimated log-odds ratio versus lag time, in log scale, for risperidone (6 mg) and haloperidol.

and Xij2 =; log(tj + 1), where t j is time in weeks from randomization. Thus, for example, Xi02 = log I = 0 and Xi52 = log 6. We have applied the logarithmic, tr~sformation to the tj's to capture the empirical phenomenon, as shown.m Flg,.8.l, of greater rate of decline in earlier weeks. The primary ~oeffiClent of mterest is (33 which describes the difference, in logit scale, III c~ange of prevalence over time between two treated groups. With up to SIX observations per subJ' ect'III the span 0 f eIght . '"IS of mterest weeks It to eX~lo:e t~e pattern of within-subject association to ins~re that such an associfatlOn IS properly acknowledged in GEE. Figure 8.2 shows for each one 0 the groups the plots f th . ' b' ' 0 e estimated log-odds ratio relating two mary responses from the sa b' that th d f" me su Ject versus the lag time. It is evident e egree 0 withm-subje t " ratio, decreases as the t' b c associatlOn, as measured by log-odds Ime etween visit . S . appears to be linear I'S . '1 b s Illcreases. uch a pattern, whIch , SImI ar etween th t t ture this empirical phen e wo reatment groups. To capomenon, we consider for J' < k = 0 , 1, ... , 5 , log OR(Yij, Yik)

= 00 + 01 IXij2 -

Xik2 I.

The results based upon the GEE a ' suggest that for patients in th h InaIY~IS are presented in Table 8.5. They e a opendol gro up, the prevalence of havmg .

PANSS ~ 80 decreases significantly (~2 = -0.445 ± 0.131) at the rate of 26% (0.26 = 1 - 2- 0.455) per week at baseline and less drastically at the rate of 8% (0.08 = 1 - (6/5)-0.445) per week at week four. The decline of prevalence for patients receiving risperidone of 6 mg is more pro~ounced at the rate of 43% (0.43 = 1 - 2-0.445-0.371) per week at baseline and 14% at week four. The difference between the two groups in pre~l~nce change over time is statistically significant at the 0.05 level (Z-sta~lstlc = -0.371/0.185 = -2.01). The results from the statistical modelling e ., I fi d' . F' 8 2 that the log-odds ratIO consistent with the empmca n mg mig. . g measuring the degree of within-subject association dec~eases as thedla ted observatIOns measure a t . £ time increases. Thus the ~dds ~atlO Of repea = ex (3.803 _ 0.924 . log 2) baseline and at week one IS estimated as 23.63 p d b t' 924 . 10 8) if the secon 0 serva IOn but drops to 6.56 = exp(3.803 - o. ~. ruing the treatment was measured at week eight. Finally, th:a~~e :gl~~;c: the within-subject effect as captured by /33 appears to be s f Models 1-3 of Table 8.5. . ., I odeUed' see results rom asSOCIatIOn IS proper Y ill '.... fed results for J33 under . . h' b' t asSOciatIOn IS Igno , However, If the WIt lll-SU Jec . t' I and qualitatively than the 'D: t b th quantIta Ive y , Model 4 are very d lHefen, a . thO trial more that 50% of rest. It is important to point out that m IS ,

a:

MARGINAL MODELS

EXAMPLES

156

. I d I coefficients (standard errors) from 8 5 Margma rna e Table ., . h'zo hrenia. GEE for the pre-post tnal on sc 1 P Model

157

Table 8.6.

Prevalence of respiratory infection and xerophthalmia by visit. Visit (season)

Prevalence (%) Variable

Logistic regression Intercept Treatment (Xl) Time (X2) Xl' x2

1 0.976 (0.222) 0.110 (0.315) -0.445 (0.131) -0.371 (0.185)

Within-subject association 3.803 Intercept (0.410) -0.924 lag time (z) (0.337) Treatment (Xl) Xl ·z

2

3

0.965 (0.221 ) 0.087 (0.313) -0.458 (0.129) -0.351 (0.184)

0.974 (0.222) 0.120 (0.315) -0.449 (0.131) -0.366 (0.184)

2.884 (0.456)

3.605 (0.591) -0.778 (0.444) 0.397 (0.809) -0.302 (0.673)

0.258 (0.641 )

1 (Su')

2 (A)

3 (W)

12.6 3.9 230

4.6 6.1 214

7.3 6.2 177

3.8 5.5 183

= Autumn', W = Winter' .. , S --

Sprmg. .

4 1.060 (0.225) 0.005 (0.320) -0.724 (0.141) -0.156 (0.202)

the sU~jects drop~ed out during the eight week follow-up period and that th~r~ IS strong ~vldence that dropout is related to response. The issue of mlssmg data WIll be addressed in Chapter 13 where we will revisit this example. ' Example 8.4. Respiratory infection in Indonesian preschool children Two hundred and seventy-five r h I ' . ined for t . p esc 00 children m Indonesia were examup a SIX consecutive qua t £ infection. This is a s b t f r ers or the presence of respiratory u se 0 a cohort st d' d b S A primary question of . t . u Ie Yammer et ai. (1984). . . merest IS wheth th I ' mfectlOn is higher amon h'ld er e preva ence of respIratory . gel ren who suit h h . mamfestation of chronic vit . A d fi . er xerop t almla, an ocular in the prevalence of resPl·ratamm._J.' e. Clency, Also of interest is the change .h ory huectlOn w'th I' elt er question can be dd d' 1 age. t IS worth noting that usmg both . data. In what follows aw resse '111 k . cross-sectIOnal and longitudinal , e WI 00 speClfi 11 ca y at the ageing question,

Respiratory infection Xerophthalmia No. of children 'Su - Summer; A

4 (S)

5 (Su) 14.9 3.1 195

6 (A) 9.4 3.0 201

Th~ prevalence of respiratory infection in six consecutive quarters, as shown m Table 8.6, reveals a possible seasonal trend with a summer maximum. The prevalence of xerophthalmia also indicat.es some seasonality, with a maximum in winter. We begin our analysis by considering only data from the first visit. The results are summarized in Table 8.7. Modell in Table 8.7 is a logistic regression of respiratory infection on xerophthalmia and age (centred at 36 months), adjusting for gender and height for age as a percentage of the United States National Center for Health Statistics standard. The results suggest a strong cross-sectional age effect on the prevalence of respiratory infection. The effect is non-linear on the logit scale, as the quadratic term for age is statistically significant and negative in sign. As shown in Fig. 8.3, the cross-sectional analysis suggests that the prevalence of respiratory infection increases from age 12 months and reaches its peak at 20 months before starting to decline. If we now include the data from all ~ix ~sits (Model 2 in Table 8.7), this concave relationship is preserve~ qualItatIvely, as shown in Fig. 8.3. We note that an annual sine and ~osme have ~een included in Model 2 to adjust for seasonality but that thiS has very ~ttle . ' The d'Iscrepancy among the age coeffiCients Impact on the age coefficIents. . al in Models 1 and 2 may be explained by Fig, 8.4 in which t.he cross-sectihon f . £. d' I ed graphIcally for eac 0 age effects on respiratory llllectlOn are ISp ay . ht d six visits, The age coefficients in Model 2 can be interpre.t~d as weig e '1 ffi ients from each VISIt. averages of the cross-sectlOna a~e coe.c . d ero hthalmia is posThe association between resplfatory mfectlO n a5~ IX IPThere is limited . , 11 . 'ficant at t he 10 eve. Hive, although not statIstlca YsIgm hth 1 . l'n this partial data . t' 'th xerop a mla information about the assoCla lOn w~ . A deficiency. See Sommer et at. set which includes only 52 events of vlt~mm t f over 23000 observations. (1984) for an analysis of the complete 'batat.se Oaf cross-sectional and lon· t' . h the contn u IOns . We now want to d IS mgms 1 t' hI'p of respiratory infectIOn , h t' ated re a lOns f gitudinal informatIOn to tees lID d.f:r es among sub-populations 0 . t arate werenc and age. That IS, we want 0 sep

MARGINAL MODELS

158

Table 8.7. Logistic regressions of the prevalence of respirat~ry function on age and xerophthalmia adjusting for gender, season, and heIg~t ~or age. Models 1 and 2 estimate cross-sectional effects; Models 3 and 4 dIstmguish cross-sectional from longitudinal effects. Models 2-4 are fitted using the alternating logistic regression implementation of GEE.

1

Intercept

-1.47 (0.36) -0.66 (0.44) -0.11 (0.041)

Gender Height for age Seasonal cosine Seasonal sine Xerophthalmia Age

0.44 (1.15) -0.089 (0.027) -0.0026 (0.0011)

2

-2.05 (0.21) -0.49 (0.24) -0.042 (0.023) -0.59 (0.17) -0.16 (0.14) 0.50 (0.44) -0.030 (0.008) -0.0010 (0.0004)

Age at entry

Follow-up time (Follow-up)2 logh')

0.49

(0.27)

--------

0.20

~

3

-1.76 (0.25) -0.53 (0.24) -0.051 (0.025)

0.53 (0.45)

4 -2.21 (0.32) -0.53 (0.24) -0.048 (0.024) -0.54 (0.21) -0.016 (0.18) 0.64 (0.44)

a... 0.10

o

20

40

60

Age (months) ~ig. 8.3. Prevalence of respiratorY i ~ . different models. : Modell' n.ectlOn as a function of age for three , ....... Model 2; - - -: Model 3.

1.0

I------------~

0.8 Q)

g 0.6 Q)

Cii

a;

a: -0.053 (0.013) -0.0013 (0.0005) -0.19 (0.071) 0.013 (0.004)

(Age at entry)2

~

c: Q)

r

~159

~

Model Variable

0.30

EXAMPLES

---~~::::=:::::~

046 .

-0.053 (0.013) -0.0013 (0.0005) -0.082 (0.099) 0.007 (0.007) 0.49

0.4

0.2 0.0

Visit4~

Lj~~!i~!lli~===::=~::==~==~~~~~§~§§§J Visit 3 Visit 6

o

~

--~~

20

Age (months)

40

60

Fig. 8.4. Prevalence of respiratory infection as a function of age estimated separately for each of the six visits.

(0.26)

children at diff (0.26) h'ld erent ages at a fi d cage1 .. ren . . xeTotime . th over time (I ongltudmal) d O (cross-sectIOnal) from changes in I]' e age of the ith child at the J'th ~o: we first decompose the variable VISIt, where J. -- 1, ... ,6, as the sum

the risk of respiratory infection as the children grow older. The parameter (3c describes the age effect which would be estimated from purely crosssectional data. The reader is referred to a more detailed discussion on this distinction in Sections 1.2 and 2.2. Note that the distinction between (3c and (3L defined for linear models holds only approximately for logistic and

If w age il + (age ij - age· ) e allow separ t ,I . a a e regressio geil , respectively, th e regression n coefficients (3c a n d (3L for age· 1 and age .coeffi . clent (3L d escn' b ' change iJof es the

other non-linear models. Results for Model 3 in Table 8.7 provide a different picture of how the risk of respiratory infection may be associated with age. The cross-sectional

MARGINAL MODELS 160

COUNTED RESPONSEs

~

161

prenatal care programme Tradit" 11 . the most commonly used'm d I ~ona y, the POisson distribution has been o e lOr count data. It has the form

1.5 1.0

Pr(Y == y) == J./,Ye-'" fyI,

III 'C

y == 0,1,2, ....

(I)

~ 0.5

..

iD

.9 0.0 -0.5 0

5

10

15

20

Follow-up time (months) FI . 8.5. The logarithm of the risk of respiratory infection as a function of fOlfow-up time relative to the risk at an individual's first visit.

parameters suggest that the risk of respiratory infection climbs steadily in the first 20 months of life before declining; see Fig. 8.3. The longitudinal age parameters suggest otherwise. The risk of respiratory infection declines in the first 7 to 8 months of follow-up before rising later in life; see Fig. 8.5. This pattern is consistent even if we subdivide the study population into five cohorts according to the age at entry, namely, 0-12, 13-24, 25-36, 37-48, and 49 months or older. This pattern of a convex relationship between age and the risk of respiratory infection appears to coincide with the pattern of seasonality noted earlier. To determine whether seasonality is responsible, we add the annual harmonic in Model 4. The results are shown in the last. c~lu~ of Table 8.6. In fact, the longitudinal age parameters are now all lns~gnIficant. It makes sense that we can learn little about the effects of agemg from data collect d 18 h' . . mont s If we restnct our attentIOn . e over to ~otngltt.udl inal information. This is especially true in the presence of a su s an la seasonal signal. On the other h comparing child f d'a and, much can be learned by ren 0 luerent ages so I are no cohort effect h' h ong as we can assume that there s, w IC would confound the inferences about age effects. 8.4 Counted responses 8.4.1

Parametric mod II' I e mg Jor count data

?ount data are increasingly common . '. lnclude the number opaille f ' attacks 0 III the. bIOlogical sciences. Examples . aft er receiving treatment £ • ccurnng durmg a six-month interval . h lOr mental IUn . th III a tree-month period d' ess, e number of sexual partners th recor ed III an HIV prevention programme and e number of infant death s per month b £ ' e ore and after introduction of a

The Poisson distribution is completely specified b th . . b th th Y e parameter 11, whIch IS 0 e mean and the variance of the d'st 'b t' h' I n u lon, t at IS, 11 == E(Y) = Var(Y). Unfortunately, the Poisson assumption that th d' . . e mean an vanance are eq~aI IS often mconsistent with empirical evidence. To Illustrate the problem, consider the seizure data in Table 1.5 which represent the number of epileptic seizures in each of four two-week in~ervals, f~r treatment. and control groups with a total of 59 individuals. Table 8.8 gives the ratIO of the sample variances to the means of the counts for each treatment-by-visit combination. A high degree of extra-Poisson variation is evident as the variance-to-mean ratios range from 7 to 39 whereas ratios of one correspond to the Poisson model. A commonly us~d model for over-dispersed count data, where the variance exceeds the mean, is the negative-binomial distribution. Here, we assume that given a rate Ili, the Yij are independent Poisson variates with mean and variance equal to Ili. The over-dispersion arises because the Ili are assumed to vary across subjects according to a gamma distribution with mean 11 and variance rP1J. 2 • Then, the marginal distribution of Yij has mean 11 and variance IJ. + rPIJ.2 which exceeds the Poisson variance when ¢ > O. The negative binomial and other random-effects models for count data are discussed in more detail in Chapter 9. In the regression context, we want to relate the counted response to explanatory variables. The most common assumption is log E(Yij) == X~j{3,

(8.4.1 )

so that {3 describes the change in the log of the ~opulat~on-average count per unit change in x. If /31 is the coefficient asSOCIated WIth the treatment assignment variable in the seizure data of Example 1.6, then exp(/3d ~epre. sents the ratio of average seIzure rat es, measure d as the number of seIZures Table 8.8. Variance-to-mean ratios for the epileptic seizure data. Visit Treatment Treated Placebo

1

2

3

4

38.7 10.8

16.8

23.8 24.5

18.8 7.3

7.5

MARGINAL MODELS

162

t' ents compared to that among the . d £, the treate d pa I h . er two-week perlo , or f f3 is evidence that t e treatment IS p ' A negative value 0 1 . t control patIents. b' ntrolling the seIzure ra e, effective relative to the plac~ a I~ cowhich is not prescribed by the Poisson rt" t for the over-dIspersIOn ~o accoun , tead that distribution, one can assume IDS Var(l'ij) == ¢ijE(l'ij), (8.4.2)

b of over-dispersion parameters, a regres'th,/, > 1 To control t h e nurn er A-. A-. ( ) WI ' h j ' ' d e d For this we can assume 'f'ij = 'f' QI , J sian model for the !/Ji ' IS nee 'ters A si~ple version of ¢( Q d would be . . vector a f ql parame h al IS were add,/, 'f antral Alternatively, we may allow a different !/J. = !/Jl if treate an '1'2 I c · . '~t different time points, constant acros,s subjects, '. !/J'I h ve assumed that the mterval lengths, tij, durmg whIch Untl now, we a b' d !" h' . the events are 0 bserved , are the same for each su . Ject, an lor eac VISIt. the seizure data where the mtervals after treatment . approprl'ate !"or l' • • Th IS' IS are all two weeks, A special problem may emerge If the observatIOns are collected at irregular times and Vij represents the number of events between successive visits. This problem, however, can be easily corrected by decomposing the marginal mean, !1ij = E(Vij), into the product of iij, the known observation period, and Aij, an unknown parameter representing the rate of the counting process per unit time, The log-linear model (8.4.1) can then be applied to Aij, that is, log Aij

= X~jf3,

so that a regression coefficient in f3 is a logarithm of the ratio of rates per unit interval of time. We note that logE(Yij) = logt ij

+ X~jf3.

This equation shows that we can take account of different interval lengths by introducing an offset, log t ij , into the log-linear model as explained in more detail by McCullagh and NeIder (1989), ThUS far, we have ignored parametric models for the correlation among repeate d observations on a unit Th effects whi h 'II b d' : e most natural models involve random , c WI e Iscussed III Chapt 9 T k' . ch we can sp 'f . er . a mg a margmal approa , eCI y a parametnc model fo th I' . ) where Q2 is a q -v t f r e corre atJOn coeffiCIent p = p( Q2 , 2 ec or 0 parameters As 'th b' of correlations for pa' f ,WI mary responses, the ranges models in addition t Irs 0 dcounts are restricted by their means. Sensible o ran om effects d I !" counts need further deveIopment. mo e s lOr the association among

8.4.2

Generalized estimating e t' qua zan approach W~ have suggested the Use of S in 8 . ficIent f3 for binary respon ~h ( ,2,4) to estimate the regression coefses. e same estimating function can be used

COUNTED RESPONSES

163

here, except that S{j depends not only on {3 and a b of the need to account for over-dispersion Let :'_ u(t also o)nbal because + ' - QI,a2 eavector f o q - ql q2 parameters. We can now solve simult I th . S{j = a and aneous y . e equatIOns

So({3, a)

=:

8 (88~.)' m

(W, -11i)

=:

0,

~j 2 = {Vij - !1ij}/{V(f.lij)P/2, W',. =: (R,·IR'·2 R· IR where 2 R ) d ' ... , 111, - 111,' R il'" " inn an 11i = E(W i ). There are two minor discrepancies between this procedure and the one suggested in Section 8.2.3, Firstly, the ni squared terms of the ~'s have Ij>(Qd, been added to Wi to estimate the overdispersion parameters, Secondly, the diagonal matrix Hi used in (8.2.5) is set to be the identity matrix, so that (S{j,SOl) depends on (3 and Q only. For binary data, Cov(Yij, Yik) for j =I- k is completely specified by (8.2.1) and (8.2.3), and no additional parameters are introduced. However, this is not the case for count data. By forcing Hi to be the identity matrix we potentially lose efficiency in estimating a. Our experience has suggested that this loss of efficiency has very little impact on the f3 estimation. Using Hi = I avoids estimating additional higher-order parameters, and this reduces sampling variation.

:

Example 8.5. Epileptic seizures We now revisit the seizure count data set considered briefly in Section 8.4.1. Fifty-nine epileptics suffering from partial seizures were randomized to receive either the anti-epileptic drug progabide (mI = 31) or a place~o (m2 = 28). Patients from the two groups appear ~o be ,comparable III terms of baseline age (in years) and eight-week baselme seIzure counts. as shown in Table 8.9. The main objective of this study was to determme whether progabide reduces the rate of seizures. . Table 8.10 gives the mean seizure rates per two weeks strat~ed by treatment group and time (baseline versus visits 1-4). Overall, there IS very

Table 8.9. Summary statistics for the epileptic seizure data. Baseline seizure counts in Age Group 8 weeks Theatment Placebo

27.7 ± 1.19 29.0 ± 1.13

31.6 ± 5.03 30.8 ± 4.93

MARGINAL MODELS SAMPLE SIZE CALCULATIONS REVISITED

164

Table 8.10. Averaged s.ei.zure weeks) by treatment and Visit. Visit

Treatment

Baseline

Progabide

rates (per two

Table (~.11. Log-linear regression coefficients and robust standard I' . errors III parentheses) £ . or ana yslS of seizure rates. The model in (8 .. 4 3) was fi tted USing GEE . with and 'th t . assummg exchangeable correlation, WI. o~ pat~ent number 207 who had unusual pre- and post-randomization seizure counts.

Seizure rate 7.90 (6.91)*

1-4

Variable

7.96 (5.71)*

Baseline

Placebo

1-4

Complete data

Intercept

7.70 8.60

Time (xI)

0.90**

Cross-product ratio

(0.74)*

Treatment

*Summary statistics when patient #207 is deleted. ··0.90;;;; (7.96/7.90)/(8.60/7.70).

(X2)

Xl . X2 Over-dispersion parameter

little change in two-week seizure counts for the treated group (7.96/7.90 = 101%) and a small increase in the placebo group (8.60/7.70 = 112%). The treatment effect, measured by the cross-product ratio, is reported in the last row of Table 8.10. Patient number 207 appears to be very unusual. They had an extremely high seizure count (151 in eight weeks) at baseline and their count doubled after treatment to 302 seizures in eight weeks. If their data are set aside, the cross-product ratio drops from 0.90 to 0.74, giving some indication of a treatment benefit. For more formal inference we now use a log-linear regression fitted by the GEE method as described in Sections 8.4.1 and 8.4.2. To estimate the overall treatment effect, we use the following model: logE(Y'ij)

= logt ij + (30 + (3lXijl + (32xi2 + (33 X ijl . Xi2,

j=0,1,2,3,4,

i=I, ... ,59.

(8.4.3)

Here, tij = 8 if j = 0 and tij = 2 if . = 1 2 . to account for d'ff b J , ,3,4. The log tij term IS needed 1 erent 0 servation periods. The covariates are defined as Xiji

={

I if visit 1 2 3 0 4 . "r , o If baseline

,

Xi2

{ 1 .If progabide =. ' a If placebo.

165

~he variable Xi2 is included' th tIons of baseline seizur III e model to allow different distribuThe parameter exp((3 )e. cOtuhnts ~etween the treated and placebo groupS. t b 1 IS e ratio of th men to efore treatment fo th I e average seizure rate after treat(33, represents the differencerin :h~ ~eb~ group. The coefficient of interest, ganthm of the post- to pre-treatment

Correlation coefficient

Patient 207 deleted

1.35 (0.16)

1.35 (0.16)

0.11 (0.12)

0.11 (0.12)

0.027 (0.22)

-0.11 (0.19)

-0.10 (0.21)

-0.30 (0.17)

19.4

10.4

0.78

0.60

ratio between the progabide and placebo groups. A negative coefficient corresponds to a greater reduction (or smaller increase) in the seizure counts for the progabide group. The results given in Table 8.11 suggest that, overall, there is very little difference between the treatment and placebo gr~ups in the change of seizure counts before and after the randomization ((33 = -0.10 ± 0.21) if patient number 207 is included. If this patient is set aside, there is m~dest evidence that progabide is favoured over the placebo.. Note .also ~hat different conclusions would be drawn if the strong over-dISperSI?n (
MARGINAL MODELS

FURTHER READING

166

d

h calculating the sample size needed. be appropriately acknowledge w eln proach for sample size calculations .' utline a genera ap . . add ing the limitations mentIOned above. In thlS sectIOn, we 0 with longitudinal observatJO~s b {essand Liang (1997) who extended the pe The approach was develMo 't Y 1(~988) for independent observations. In by Self and "1 · k ear1ler wo r . , . aun 2 4 sen 0 closed form expressions for samp e size general, unltke III SectIOn . I n are available. h t th 'Ientific obJ' ective can be formulated through a We assume t a e sc marginal modeL Specifically, we assume that h(J-lij)

= X~j(3 + Z~jA,

where (3, a P x 1 vector, representing parameters of interest and A, a q x.1 vector is introduced to allow adjustment for other effects. The hypotheSIS of int~rest is then Ho: (3 = (30' a pre-specified value for (3. Thus, in conventional clinical trials where patients are assigned to one of three treated groups of different dosage levels or to the placebo group, xij would represent p = 3, dummy variables indicating treatment status, with the placebo 88 the reference group. In cross-over trials with two treatments (A versus B), p would be equal to 1 with

(~,Xn ) =

(8.5.1)

H,ere, (' = E H ,(S/3({3o,Ao,o)) and B I = COy ( LIang (1997) for detailed expre' I H, Sfj, -"0, 0); see Liu and . SSIOns n p t' I ".. latIon of /-Li, where J.L'. = h( l ~ ~r ICU ar, .. mvolvps the calcu\* d .) X,jfoJl +z-"d /1" - h(x l ~ + l , " an d /\0 enotes the estimate of ,\ which is ob')· ' .) - . ~jfJO zW,\o) assume H o, that is we fix (3 := (30, tamed when HI IS true but we

,< -

To complete the process, we furthe the covariates {(x. z .. ) . _ 1 r as~ume that ni = n and that • •» , t) '. J ., ... , n} are discrete with L distinct I Thus, m conventIOnal chlllcal trials with p + 1 t t ~ ues. earlier, Xi' = x· and z. := rea ments as mentIOned . ) , tJ - 1, hence L = p + 1. In this setting, both ( and Btlll (8._5.1) have expres~ion~ of the form <' = m( and B l = mill, when: ( and .B I mvolves combmatlOns weighted according to probabilities assocJate~ WIth eac~ ?f L distinct values that the covariates take. We can then obtam an explICIt expression for the necessary sample size, m, as (8.5.2)

(I, 0) if subject receives Treatment A followed by B { (0,1) if subject receives Treatment B followed by A.'

In observational studies where confounding adjustment is critical z·· would r epresent q major . confounding " ' t] variables measured at the jth visit whereas Xij would represent dummy variables for different exposure levels of interest such as maternal smoking status. To te,st .the null hypothesis that (3 = (30' we consider the following GEE test statistIC where

167

Under H o, T converges lIB m H {3 r.I. --+ 00 to a 2 d' 'b . I, = foJl and A = -" T £ II Xp 18tn utIon wherellB under asymptotically with the ~on_c;ntOW1~ta non-central chi-square distribution ra I y parameter

where v is derived from a non-central chi-squared distribution in accordance with the specified types I and II errors. Through simulations, Liu and Liang (1997) found that the sample size calculations outline above lead to an excellent agreement between the simulated power and the pre-specified power. As expected, the sample size formula in (8.5.2) coincides with that in Section 2.4 for the special cases considered therein,

8.6 Further reading This chapter has discussed the fitting of marginal models to discrete longitudinal data. We have emphasized the problem of describing the :elationship between the marginal expectation of the respons~ and cova:lates at each visit. In contrast to Chapter 5, we have given very lIttle attentl~n to the mechanisms by which the within-subject association may have arIsen. Readers interested in this issue are referred to Zeger et al. (1988) and Thall and Vail (1990). . The use of GEE to estimate regression coefficients specified by margmal models has been studied extensively over the last fifteen years. For mor)e detailed treatments, see Liang and Zeger (1986), Zeger and Liang (1986 ,

MARGINAL MODELS , 90) Thall and Vail (1990), Liang d Prentice (19 , Prentice (1988), Zhao an, nd Laird (1993), . 2) and FitzmaurIce a ' I d'es to encounter response van(199 l et a . , , b' medica stu I . A . than two categorIes. typical ' not uncommon III 10. It IS . I with more ., R . hich are categorIca the severity of Injury. egresslOn bl se measures '1 hi S a es w respon es ord'maI or nominal " are aval a e. ee, exampIe is when the lls models for polytomo respons , d A derson (1984). ExtenslOns of GEE 168

for example, McCullagh (1980).:nStra~ et ai. (1988), Liang et ai. (1992) to this situation can be found I rVI'val outcomes is presented in and Heagerty and Zeger (1996). t EE £ discre e su or G d Lin (1994) Shih (1998), Clayton Extensions of G TenHave and Uttal (1994), uo an ,

(1992) and Heagerty and z~~:rh~~~~; Jones and Kenward (1989) for the Readers are referred ~o l' , al trl'als For the more common pre-post designs III c IllIC . use of crossov~r . I't t re on how the response variable measured d' there IS extensive I era u ) , es~gnsh i ht best be utilized, See, for example, Stanek (1988 , Laird :d 09~O), Frison and Pocock (1992, 1997) and Liang and Zeger (2000) and the references therein.

;a:;

9

Random effects models 9.1 Introduction

Chapter 8 has dealt with marginal models whose regression parameters have population average interpretations. In this chapter we consider random effects models in which the regression coefficients measure the more direct influence of explanatory variables on the responses for heterogeneous individuals. Linear models with random effects were discussed in Section 5,2 in the context of a general framework for parametric modelling of correlation in longitudinal data. The use of random effects in the generalized linear model (GLM) family was introduced in Section 7.2 where the coefficients were contrasted with those from marginal and transition models, In Chapter 7, we used the notation 13,13*, and 13** to distinguish marginal, random effects and transition coefficients, respectively, For the remainder of the book, we will use 13 for any regression coefficients as long as the type of model is clear from the context. As discussed in Section 7.2, the basic premise of the random effects model is that there is natural heterogeneity among subjects in a subset of the regression coefficients, for example in the intercepts, Us~ng the GLM framework, we assume that conditional on unobservable vanabl~ U i , ~e have independent responses from a distribution in the exponential famIly (see Appendix A, equation (A.5.1)) such that

(9,1.1) Here, h is the link function and

Xij

,I. an d Uij are covariate vectors of length

p and q, respectively,

'd th 2 x 2 crossover trial data in To illustrate briefly, consl er ,e £ Hows the logistic model Example 8.1 and suppose that a person s response a logit Pr(Joij = 11 Ui) =

130

+ Ui + 131 X ij,

(9,1.2)

th erson received placebo (x = 0) or drug where Xij indicates whether e P h n on placebo has their own (x = 1), This model states that eac perso

RANDOM EFFECTS MODELS

ESTIMATION FOR GENERALIZED

170

LINEAR MIXED MODELS

y - 1) given by probability of a normal response ( -

P(Y; == 1) == exp({3o + Ui )!{1

+ exp({3o + U;)}.

, dd of a normal response are multiplied . t that a person s a s . ., . h d rug regardless of the Imtlal fisk. It furt her st a es by exp(13I) when takmg ~ e h :.. .. _ d .. so that each individual can be case IS w en ""J - IJ • Anot her SI'mpleth' wn regression coefficients {3 + Vi' WIth a large ubJ'ect we could estimate their individual thought to have ~Ir o~ number of observatIOns lor a s , .' b , B t' ractice we have lImIted data and must arrow coefficIents from Yi' u III p , 'f.l h V Th' ' t to make inferences about eIther fJ or t e i· IS strength across su bJec S • ' f .IS accomp I'IS hedbYas suml'ng that the U·I are independent realIzatIOns rom a distribution. There are two distinct approaches to inference about random effects models. The first is appropriate when we are interested in a particular subset of regression coefficients, none of which are assumed to vary across subjects. For example, suppose we want to estimate the drug effect in the crossover trial and believe that only the intercept, not the drug effect, varies across individuals, Here, we can treat the Vi as nuisance effects and condition them out of the problem. That is, we can attempt to use only that part of the data which does not contain information about the Vi when making inference about particular coefficients. The second approach is appropriate when the subject-specific coefficients are themselves of interest or when conditioning away the information about the random effects discards too much information about an important regression coefficient. Here, we operate as if the U; are an independent sample from some distribution a~d estir~Iate both the fixed effects, {3, and the random effects, Vi, under thIS workmg model. In random intercept models, the ch' Olce b et ween these two approaches r~late~ to the distinction between cross-sectional and longitudinal informatIOnd~I,scusse,dil~ Sections 1.1 and 1.4, and illustrated in Fig. 1.1. With the con ItIOnal hkehhood ap h . . . proac, we use only longltudmal mformation that . , " IS, compansons wlthm sub' t ' ' U· follow a dl'st 'b t' Jec s, t? estImate {3. When we assume that the 1 n u Ion we combme b th 1 ' d ' information The r I t" 'h 0 ongltu mal and cross-sectional . e a IVe welg t gi t h . eac sour:e ~s. determined by the variability among the Vi. When ven the analysis should weight th I t~ere .IS la~ge vanabIhty across subjects, comparisons within a sub' t e onl.gltudlllallllformation more heavily since Jec are Ikely to b . than compansons . among subjects. e more preCIse Note that a fundamental t~at .th.e Vi are independent :~~:~tion of the ra~dom effects model is tIon IS mcorrect in the d ' xplanatory vanables. If this assumpfll . ran om mtercept h case, t e conditional analysis will s I give a consistent estimate of not, Econometricians (e g H (3. The random effects analysis will '. ausman , 1978) h ave developed what they call

?

171

specification tests to check the r and om eueets a . . . longltudmal data in Section 14 h' h . assumptIon. The model for W IC Includes b th . longltudmal parameters , is one w ay t 0 a 11ow ra d 0 ' cross-secttonal and n om llltercept.s to depend un explanatory variables. The remainder of the chapter is org' . d cusses estimation of /3 in the rando ;llJze as foll~ws, Section 9.2 dise and ful! maximum likelihood The m tt GLM usmg both conditional la . ere<:ts IS com t t' II . implement in general and we shall .' pu alona y expensive to , review an apI . ?rO~llnat.e method which works reasonably well for estimation of /3' . ' m certam SituatIOns Cha te 11 d Iscusses some recent developments for f I! l'k IOh .
d approache w h IC Improve on the approximate meth d d 'b d s .. d I' 0 s esen e here. WI' consider IOglStiC rna e s for bmary data in Section 9 O.3'an d P Olsson . regressIOn m d lJ; for count data in Section 9.4, . 0 es o

0

•

,

1

o

0

9.2

-

-

Estimation for generalized linear mixed models

In the random effects GLM we assume: (1) the conditional distribution of Yij given U i follows a distribut.ion from the exponential family with density !(Yij IU;;(3); (2) given Vi, the repeated measurements, Yil,.' Yn ., are independent; (3) the Vi are independent and identically distributed with density function !(V i ; G), 0

,

Let V = (V I , , , , , V m). In the subsection below on conditional likelihood , we will treat the random effects as if they were fixed parameters to be removed from the problem, so that we need not rely on the third assumption above, In the subsection on maximum likelihood estimation, we will treat Vasa set of unobserved variables which we then integrate out of the likelihood, adopting the assumption that the random effects distribution is Gaussian with mean zero and variance matrix G. 9.2.1

Conditional likelihood

In this subsection, we review conditional maximum likelihood estimation for /3. McCullagh and NeIder (1989, Section 7,2) present a more general treatment for GLMs. The main idea is to treat the random effect.s,. Ui, 'mate (3 using the condItIOnal as a set of nuisance parameters an d t 0 es t 1 likelihood of the data given the sufficient statistics for the U~. Treating V as fixed, the likelihood function for (3 and U IS (9,2,1)

RANDOM EFFECTS MODELS ESTIMATION FOR. GENERA

172

. . rfy th discussion we restrict attention to where Oij := Oij(f3, U)- To ~:Ju~lagh :nd Neider: 1989, p. 32) for which canonical link functIons ( h I'k l'hood above can be written as - , f3 + d' .U, Then tel e I

Oij

-

xii

tJ

t

eXP{f3'LXiiYii+ , . t,J

LU~LdijYij

- '2;7/J(Bi j )},

J

i

(9.2.2)

',J

the sufficient statistics for 13 and U i are Li,j XijYii and Lj dijYij E d·y· is sufficient for Ui for fixed 13, respec lve Y", I 'f. 'l~h'Jod is proportional to the conditional distribution The cand ItlOua Iike I a . . of the data given the sufficient statistics for the U i , The contnbutlOn from

H

ence~, I and

LIZED LINEAR MIXED MODELS

implies that we can learn abo t '" " u one mdlvldual' ffi . mg the vanability in coeffic'le t s coe clents by understandWh , " n s across the po I f vanablhty, we should rely on the 0 ula . pu a lOn, .en there is little P those for an individual Whe th P, tIon average coeffiCIents to estimate . n ere 18 sub t f l ' . more heavily on the data from each' d' . s a~ la :anatlOn, we must rely ficients. This idea was illustrat d ,mSlvl~Ual m estImating their own coef. e III ectlOn 55 wh b" to estimate an individual subJ'ect' CD4 .., ere Our 0 Jectlve was The I1'keh'h ood function for thes k trajectory. to include both 13 and the element un nGo~n parameter 6, which is defined f , IS so

= II

f II

t=1

J=1

m

L(o; y)

subject i has the form

f ( Yi I LdijYij j

= bi jf3) =

f(Yi;f3,Ui) f (", d .. .. - b··f3 U) LJ j 1J Y'J " , , f (Lj XijYij = ai, Lj dijYij = bi ;(3, U i ) f(LjdijYij=bi;{3,Ui)

=

(9.2,3) For a discrete GLM this expression can be written as

nj

b" and

R;2

II m

i=1

'" wRit

L: R

i2

exp(,8, ai)

exp (,81 ",ni ) Wj=l XijYij

.

(9,2.4)

For simple cases such as the random ' hood is reasonably eas t " mtercept model, the conditionallikeli, y a maximIze (B I d D llltercept models for b' res ow an ay, 1980). Random ' mary and co t d detall below. un ata are considered in more 9.2.2

ni

L L Xij{Yij -llij(U

i )}

= 0,

(9,2.6)

i=l j=l

= ai and IS the set of values for y. such that L' d. 'y" = b,i' T~e conditional likelihood for f3 given the data for all . J ,'J'd'J SImplifies to m III d IVI uaIs 1JYl3

(9.2.5)

, To fin~ the maxim,um likelihood estimate, we can solve the score equatIons obtamed by settmg to zero the derivative with respect to 0 of the log likelihood, If we imagine that the 'complete' data for an individual comprise (Yi' U i ) and if we restrict attention for the moment to canonical link functions, then the complete data score function for f3 has a particularly simple form

S ,8(01 Y, U) =

whe~e R.il : the set of pO,ssible values for Yi such that Lj XijYij

2: 3

!(Yij IU i ;,8)!(U i ;G) dU j •

This is just the marginal distribution of Y bt . ed b . . . 0 am y mtegratmg the joint d ·Istn'b utlOn of Y and U with respect to U I . . . ' n some special cases such as t£h e GaussJan lInear model (Chapters 4- 6) ' the 'Inte , graI above has a cIosed or~, but for ~ost non-Gaussian models, numerical methods are required for Its evaluatIOn.

m

E Ri2 exp (f3' Lj XijYij + U~bi) ,

173

Maximum likelihood estimation

Here, we will treat the U. abIes fr 1 as a sample of ind d epen ent unobservable variam a random effects d' t 'b' IS n utIon , Qual't ' I atlvely, this assumption

where llij(U i ) = E(Yij lUi) = h-I(X~jf3 + d~jUi)' The observed data score equations are obtained by taking the expectation of the complete data equations with respect to the conditional distribution of the unobserved random effects given the data, That is, we define the observed data score functions, S{3(O J y), as the expectations of the complete data score functions, S{3(ol y, U), with respect to the conditional distribution of U given y. This gives, m

S{3(ol y) =

ni

L L Xij[Yij -

E{llij(Ui) IYi}]

= 0.

(9,2.7)

i=1 j=I

The score equations for G can similarly be obtained as

Sa(o/ y) =

~G-l 2

{f E(UiU~ I

l I Yi)} G- - ; C- = 0.

(9.2.8)

;",,1

, . d t' t f 0 a common strategy is To solve for the maximum hkehhoo es Ima e o , al 'h 't tes to use the EM algorithm (Dempster et al" 1977). This gont m I era

RANDOM EF

FECTS MODELS LOGISTIC REGRESSION FOR BINARY RESPONSES

1 74

. tlllg t h e expectations in the score

I which involves eva ua fthe parameters, and an M-step, between an E-step" the current values 0 d t d parameter estimates. b ve usmg t give up a e th core equations .0 d' the conditional expectaequations a 0 in which we sol:e efsthe integration mvol ve ~n ne or two, numerical d'mellSlOn 0 When q IS 0 C The. I the d'Ime nsion of U i · ably easily (e.g. rouch t d reason . es can be implemen e . I roblems Monte Carlo tions IS q, integra~ion technIi~90), For higher dimenSlOn~ ~he appli~ation of Gibbs and Spiegelman, bed. See, for examp e, , t' on methodB can e us mtegr~ I. Zeger and Karim (1991), . t the score equations in such amphng III , t approxlma e b d B A alternative strategy IS 0 'ded This approach has een use n ' n be avO!· . d i::r t that the integratlOnB ca" els with GaussIan ran om euec s, et al. (1984) for and Clayton (1993) for rand?m Karlm ' (1991) , Schall ,(1991), an (1990) for non-linear d nd Bates . regressIOn , acts GLMs and Lm strom a d rs The central Idea IS to use eue " d effects an erro , . £ models with GaussIan ran am d't' al means in the score equatIOn or th than con I IOn . . f U conditional modes ra er ' t ' the conditional distributIOn 0 i ' 'al t to approxlma mg d t t:I This IB eqUlv en " t h th same mode an curva ure, fJ· . d' tributlOn WI e . given Yi by a GaussIan IS lace the integration WIth an By using modes rather t~an mea:t~d~~t:efhe M-step. optimization that can b: mcorpor I tely let v" = Var(Yij IU ) and

~yW;~ratel1i

To

logl~t~r:~:w

~pecify t~e al~on~7 zm~: ;~~;u;rog~te res~onse

i

defined to have Qi = dJag{vijh (fl.ij) }. 1.. I .. ) • = 1,.", ni and define the elements Zij = h(f-Lij) + (Yij - f-L~J)h ~J.L1J the n. x q matrix whose n· x ni matrix V; = Qi + DiGDi , were i IS 1 . d b j~h row is dij , For a fixed G, updated values of {3 and U are obtame y iteratively solving

j}.

~

f3

m

I

-1

= ( t;Xi Vi Xi

)-1 t; m

~

= GDi Vi-I (Zi -

The quantity iii is1 an estimate ofE(Ui I lid· An estimate ofthe conditional variance is (D;Qi D i + G-l)-l. Note that the parameters appear on both sides of equatiollB (9.2,9) and (9.2.10) so that the algorithm proceeds by iteratively updating first the estimates of the regression coefficient.s and t.he random effects, and then the variance of the random effpcts, until the parameter estimates converge. A variety of slightly different algorithms for estimation of G have been proposed, See Breslow and Clayton (1993) for one specific implementation and an evaluation of its performance. This approximate method gives reasonable estimates of {3 in many problems. The estimates of U j and G are more sensitive to the Gaussian approximation to the conditional distribution. The approximat.ion breaks down when there are few observations per subject and the GLM is far from the Gaussian. Karim (1991) and Breslow and Clayton (1993) have evaluated this approximate method for some specific random effects GLMs. More recently, Breslow and Lin (1995) and Lin and Breslow (1996) proposed a bias correction method by expanding the approximate likelihood, which they termed penalized quasi-likelihood, at random e~ects parameters G. They found that bias reduction is satisfactory, especlally for large values ofG.

9.3 Logistic regression for binary responses 9.3.1

Conditional likelihood approach

In this section, we consider the random intercept logistic model for binary data given by

Xi Vi

-1

.

(9,2.9)

Zt

To simplify the diSCUSSIOn, does not include an intercept term. the 'Yi is proportional to m

IJ

To estimate G, note that the score equation (9.2,8) implies that

i=l

j=1

e Jom

]

(9,3.2)

. the sufficient statistics for the 1i has f3 gIVen

the form

)

n m

(9.2.10)

1=1

exp (l:j~1 YijX: j f3

l:R; exp( l::~1 X:t,8)

(9.3.3)

(no)

' all the Yi'. ways ' dex set ~ contams y" and t he m f ns where Yi. = L...tj=l 1J t of n. repeated observa 10 ' .. esponses oU 1 choosing Yi. pOSItIVe r ",ni

(9.2.11)

j=1

J-l

The conditional likelihood for

m i==1

j

(~ .'X~.) f3 _ flog {I + exp('Yi + X;j(3)} , II exp [n. 'Yi LYij + ~ Y'J

X i (3) ,

These equations are an application of Harville's (1977) method for linear random effects models to a linearized version of the possibly non-linear estimating equations in the GLM extension,

G== m- I LE(UiU: IYi)

11 Ui ) = Po + U + X;jf3. (9,3.1) ill write "Y' = f30 + Ui and assume that Xij we w Th' ~ t likelihood function for f3 and

logitPr(Y'ij =

,.

I

and Ui

175

0

f

RANDOM EFFECTS MODELS 176

LOGISTIC REGRESSION FOR BINARY RESPONSES

Table 9.1. Nota

10

triaL Group

177

t ' n for a 2 x 2 crossover (1,0)

(0,1)

(1,1)

(0,0)

AB BA 'equI'valent to the one derived in 'k I'h d a b ave IS The conditional lI e I ~o (B I and Day 1980). In that context, t 01 studIes res ow , stratified case-con ~ 'th tatum there are Yi. cases and ni - Yi. conr there are m strata; III the Z St t' 'that any statistical package suitable ection is impor an III d Th' troIs, IS conn 'fi d ase-control studies can be use to fit a i 's of the stratI e c , h l' h for t e ana ySI ,. d I t binary longitudinal data WIt Itt1e or random intercept logistIC mo e 0 no modification,

For the 2 x 2 crossover data on cerebrovascular deficiency, previOUSly discussed ~ E~mple 8.1, We have bl == 6, CI == 0, b2 == 4, and C2 == 2. For the model mdudlng the treatment variable only the treatment effect {3 . esti~~ted as log~ (6+4)/(0+2)} == 1.61 with ~timated standard err~r ~.'7~~ A SImilar result 18 obt~ined when the period effect is included in the model. T~e tre~tment effect IS now estimated as ~ log{(6 x 4)/(0.5 x 2)} == 1.59 With estimated standard error 0.85. Note that we have used the co v ntion of replacing the zero cell with 0.5 in this calculation. Nevertheles~ ~he data indicate that the odds of a normal electrocardiogram for the tre~ted patients are about five times (5 == exp(1.6)) greater than the odds for patients receiving the placebo. This finding from the conditional inference is to be contrasted with the results from fitting a marginal model in Example 8.1 of Section 8.3. For the marginal model, we estimated a rougWy two-fold increase in the odds for the treated group. The smaller value from the marginal analysis is consistent with the theoretical inequality stated in Section 7.4.

Example 9.1. The 2 x 2 crossover trial Let a, b, C, and d denote the numbers of response pairs for ,each I of hthe four . possible combinations of outcomes in a 2 x 2 crossover tna , as s own III Table 9,1. h 'd For example, b1 is the number of subjects in the first group, w 0 receIVe the active treatment (A) followed by placebo (B) with outcomes (1,0), that is with a normal response (Y == 1) at the first visit and an abnormal response (Y == 0) at the second. For the logistic model (9.3.1) which includes only the treatment variable XI, the conditional likelihood (9.2.4) reduces to ({3) { 1 :x:XP({31)

}b

1

+b

2 {

1 } 1 + exp(131)

C1 +C2

The estimate of (31 which maximizes this conditional likelihood is

~I == log{(b1 + b2)!(Cl + C2)}, Its variance c~n be estimated by (b i + b2 )-1 + h + C2)-I. . I~ the penod effect (X2) is now added to the model, the conditional lIkelIhood function becomes exp{ ({31

+ (32)bd

exp{(131 - 132)b

2

}

{I + exp({31 + .B2)}b + {I + exp(131 _ 132)}b2 +C2' The maximum conditi Irk l'h . . I e I ood estimate of 131 and the corresponding vanance estimate are, ona respectively 1

. (31

==

1 log

2

(b b) ~ ,

Example 9.2. The 3 x 3 crossover trial Table 9.2 gives the results of fitting a random intercept logistic regression model by conditional likelihood to the crossover data from Example 8.2. The table reports the estimated regression coefficients and their standard errors. Strong treatment effects are evident after adjusting for period and carryover effects. The chance of dysmenorrhoea relief for a patient is increased by a factor of exp(1.98) == 7.3 if her treatment is switched ~om placebo ~o a low dose of analgesic and by a factor of exp(1.71) == 5.5 If treatment 18 switched from placebo to a high dose. The principal advantage of the conditionallikelihoo~appro~ is that we remove the random effects from the likelihood by which we estIm~te {3, thus avoiding the assumption that they are a sample from a p~tlcu1ar probability distribution. The disadvantage is that we rely entIrely on within-subject comparisons. So persons with Yi. == ni or Yi. == 0 provide no information about the regression coefficients. In the 2 x 2 crossover

., all'k Table 9.2. Results for a condition I el'h I 00 d analysis of data from a 3 x 3 crossover trial.

C1

V;;(t11) ==

~ (bl

l

+ cll + b2"l + c2"l) .

Variable Coefficient Standard error

Carryover

Period

Treatment

B

C

2

3

1.98 (0.45)

1.71 (0.41)

0.69 (0.56)

0.85 (0.58)

B 0.14 (0.60)

C 1.24

(0.65)

RANDOM EFFECTS MODELS

LOGISTIC REG

178

. . th t a + d + a2 + d2 pairs are uninformative. In the trIals, thIs means I . 'd d at}lis 1accounts for 82% (55/67) af t he subjects under examp I,e · conSIC ere, . . . aeql1ently standard errors of regreSSIOn estImates tend to a 1)servat,IOn. on.~ , . or random effects analySIS. . For example, the ' a marginal e Iarger th an III . . . b standard error of the regression coefficient for the treatment :anahle IS here 0.91, BB opposed to a value of 0.23 obtained from ~he mar~mal model. At the extreme, the conditional analysis provides no mformatJOn about coefficients of explanatory variables which do not vary over time. This can be seen by examining (9.3.3). The product of any time-independent covariate and its coefficient will factor out of the sum in both the numerator and denominator and cancel from the conditional likelihood. This is sensible since we have conditioned away all information about each subject's intercept and thus cannot use estimates which are based entirely upon comparisons across subjects. We now turn to the situation in which the U i are treated as an independent.s~mplefron: a random effects distribution. We begin by reviewing the traditIOnal but Simpler random effects models for binary data and then consider the logistic-Gaussian model more specifically. 9.3.2

Random effects models for binary data

Historically, the for random effects models has bee n th e 0 b sert' th h motivation . va IOn at t e varIability among clustered binary responses exceeds what would be. expected due to binomial variation alone. Random effects mod-

~~;~:~a~~~:~:~dd::r~~~~~n:[~~e~~~~so;~:~)d extra-bi;omial variation.

REsSION FOR BINARY RESP

ONSES

The beta-binomial d' . . 179 of non-infectious diseas lS~nbutlOn has been used to m d malformed foetuses i ~~ m a household (Griffiths 197~) el the incidence somal aberrant cell n a Itter (Williams, 1975) and'th . the number of 1986). Prentice (19~6amo~g repeated samples'for an i:~~~ber of chromonot be positive in the l~tOalllbt.s out: that the correlat.ion c~:lffi,u~l (P~entice. .t ' I - Illomlal m d I Clent. () need I s ower bound is 0 e as previously tl"lOUg h" " t. but that. .

00 = max{ -fl./(n -

:1-

I

1), -(1 - fl)/(n

+ fl)},

The beta-bmomial frame wor k can be extend d b . may e Imposed on the cluster-spec"fi e so ,t.hat a parametric model h d , I e means J F e ~su.me to depend on duster-level f' ' . 1,. or .example. II, might a IOgIS.tl~ function, logit(fl;) = X;f3.xplanatory vanables. x" through Ongmally, it was assumed that th b I. . each response from the same I t e e a-bmomial distribution required e us er to have a t h e regression set-up this requ' d th . common probability. /1;. In . ' Ire e covanat . t b o bservatlOns within a cluster th t . e s o e the same for all Rosner (198 4) has extended th , ab tIS, b'Xi! = '" = X tn, -- Xi' However . . e e a- momial to . II h . ' vary wlthm clusters. His model for one I . a ow t e covanates to custer IS formally equivalent to the following ni logistic regressions: .. logit Pr(Y:' tJ = 1/ Ytl'···'Y'J-I,Y;j+l,···,Yin.,Xij)

=

log (

1-

Oil

Oil + Wij()i2 ) + (ni _ 1 _ Wij)(}i2 + X;jf3*,

j

= 1, ... , ni,

(9.3.4)

{Yib' .. ,Yin,} represent the n' bina'

was one 0 the earliest. Let cluster could be a litter in t ttl ry responses from cluster i. Here the a genetic study or an indi: der~ o~ ex~eriment, a family or household in distribution assumes that: I ua m a ongltudinal study. The beta-binomial

?

(1) conditional on fl.'" th e responses y;, . common probability J.li; ,1, ... , Yoin; are mdependent with

(2) the fl.; follow a beta distribution w'th Unconditionall h I mean J.l and variance 8p(1 - p). 1": y, t e total number of " ,I +." + Yin, has a beta-bino . I d~osl~lve responses for a cluster, Yi = mla Istnbution with .

and Var(Yi.) = nifl.i(1 ) The over-d' . - J.li {I + (ni - 1)8}. IsperSlOn p h arameter 6 is the c . responses fro m t e same cluster. orrelatlOn for each pair of binary

where w·· . t! ---:- y'•. - yij, (il) 'IS an mtercept parameter and ()i2 characterizes the as~oclatlOn between pairs of responses for the same cluster. This clever ~xtenslOn of the beta binomial does have some important limitations. First Its regression. coefficients, f3*, measure the effect of Xij on Yij which canna; first be explamed by the other responses in the cluster. Hence, the effects of cluster-level covariates may often be attributed to the other observations within the cluster, rather than to the covariate itself. This drawback is particularly severe when the cluster sizes vary so that different numbers of other responses are conditioned upon in the different clusters. This is a particular problem for longitudinal studies where the number of observations per person often varies. In addition, in longitudinal studies, it may be awkward to model the probability for the first response as a function of responses which come later in time. See, for example, Jones and Kenward (1987) who consider models for crossover trials. The logistic model introduced in Section 9.1 adds the random effects on the same scale as the fixed effects. To our knowledge, this approach was first considered in the biostatistical literature by Pierce and Sands (1975) in an unpublished Oregon State University report. They assumed

RANDOM EFFECTS MODELS

LOGISTIC REGRESSION FOR BINAR'" R

180

I

.' d m intercept. Since then, the . ., t' [, r a ullIvaflate ran 0 . I a Gaussian cllstrwu ,Ion 0 D: t has heen studied extensive y, .hG " n random euec s logistic model WIt ,al~ssJa 82 Stiratelli et al. (1984), Anclerson and including work by WIlliams (19 ») Ze er et al. (1988), Zeger and Karim Aitkin (198.5), Gilmol~r et ai. (~~:3~ 'anJWaclawiw and Liang (1993). One (1991), Breslow and Clayton) h' d a log-log link and a log gamma exception is Conoway (1990., w a u~e · t 'b 11,t'lon for the random mtercepts. d18,fI 9.3.3

Exo.mp ies

modei9 with Z . , 11OgZ8. t'c

0

Gaussian random effects

' the random effects model within the GLM frame. I' I' The approac h to fltt mg 9.2. No particular computatlOna SImp 1· Sec t'on work has been covere d III 1 . ' . anses ' when we 10CU, r , s on the logistic model wIth GaussIan random flcatlOn • effects. The likelihood function for (3 and

L((3,G; y) ;,;:

IT f fi i",1

{llij((3, Ui) }Yii {1 - llij((3, Ui)}I-Yi J f(U i ;G)dU i ,

j",1

where llij(l3, Ui) = E(Yij lUi; (3). With the logit link and Gaussian assumption on the Ui, this reduces to

.=1

exp

[13

1

~XijYij + U; L dijYij - L log{l + exp(x;j(3 + d;jUi)}] J

J

181

Table 9.3. Regression estimates and stand . ard errors (m parentheses) of random effects and m . I arglUa models fitt d ,e to the 2x2 crossover data for cerebrovQ~cul d fi . ...., and Kenward (1989) a dar, e Clency ' . ad ap t ed from Jones n presented In Table 8.1.

Intercept 1featment Period

G1 / 2

Random effects model

Marginal model

Ratio of random effects to marginal

2.2 (1.0) 1.8 (0.93) -1.0 (0.84) 5.0 (2.3)

0.67 (0.29) 0.57 (0.23) -0.30 (0.23)

3.4 3.3

3,3

a IS

(9,3.5)

IT/

ESPONSES

J

X (271"(1 I G l-q/2 exp( -U;a- 1U;j2)dU i ,

where G is the q x q variance matrix of each Ui' Crouch and Spiegelman (1990) present numerical integration methods tailored to the logisticGaussian integral above. Zeger and Karim (1991) have used a Gibbs sampling Monte Carlo algorithm to simulate from a posterior distribution similar to this likelihood function. In the examples below we use the ~pproximatio~ method by Breslow and Clayton (1993), whi~h has been Implemented III SAS (GLIMMIX) to obtain maximum likelihood estimates and their estimated standard r d ' W " . errors lor count ata regresslOn models, and e use .numencal IlltegratlOn methods implemented in SAS (NLMIXED) to obtam MLEs for binary response models. Example 9.1. (continued) For the 2 x 2 crossover trial b on cere rovascular deficiency we assume a I . t·· ogls ,Ic regreSSIOn model with dd" , random interce t ,_ a ltIve effects of treatment, period, and a tion with varia~ce "'Ie T,B~I + Ui , ~sumed. to follow a Gaussian distribuand GI/2 F ' . a e 9.3 gives maxImum likelihood estimates of (3 . or companson the t bl I obtained by fitting a .' I a e a so presents regression coefficients margma model.

Focusing firs~ on the maximum likelihood estimates, there is clear evidence of substantial heterogeneity among subjects. The standard deviation of the random intercept distribution is estimated to be 5.0 with a standard error of 2.3. By the Gaussian assumption for the intercepts on the logit scale, roughly 95% of subjects would fall within 9.8 logit units of the overall mean. But this range on the logit scale translates into probabilities which range from essentially 0 to 1. Hence, the data suggest that some people have little chance and others very high chance of a normal reading given either treatment. Assuming a constant treatment effect for all persons, the odds of a normal response for a subject are estimated to be 6.0 = exp(1.8) times higher on the active drug than on the placebo. The last column of Table 9.3 presents the ratios of regression estimates obtained from the random effects model and from the marginal model. The three ratios are all close to (0.3466 + 1)1/2;,;: 3.1, the theoretical value discussed in Section 7.4 and in Zeger et ai. (1988). Example 9.3. The pre-post trial on schizophrenia As discussed in Chapter 7, a distinctive feature of random effects models is that natural heterogeneity across subjects is modelled directly through subject-specific parameters. This example serves to illustrate that sometimes random intercepts alone may not sufficiently capture the variation exhibited in the data. Table 9.4 presents results from two random effects models: Model 1 with random intercepts to capture variations in baseline risk among subjects, and Model 2 with random int~rcepts ~d slo~es, the latter for subject variations in changes of risk over tIme. As 111 Sectl?n 8..3, We consider three independent variables: treatment status (xd, tIme 1ll weeks from baseline (X2) and their interaction (xs)· For Modell, the estimate for /33 is -0.877 (SE = 0.349) suggesting that o~ average, the rate of change for the risk of having PANSS ~ 80 is conSIderably lowher l' I 'd0 I group . For example, t e lOr the risperidone group than the h a open

LOGISTIC REGRESSION FOR BINAR Y RESPONSES

RANDOM EFFECTS MODELS 182

. estimates and standard errors Table 9.4. RegresslO~' ffects models for the pre~ (in parenthesp,S) of ran ~m e porLt tr 'lal on sehizophrema. Model Variable Intereept Treatment (xI) Time (X2) XI' X2

G11 G I2

G22

Table 9.5. Regression est.imates and t d n rd err~rs (in parentheses) of random effects models ~ st.: . . f . or e ndoneslan study on respIratory III ectlOn (Sommer et at., 1984). .

t

Model

1

2

Variable

2.274 (0.236) 0.236 (0.690) -1.037 (0.234) -0.877 (0.349) 11.066

2.400 (0.610) 0.319 (0.740) -1.034 (0.430) -1.247 (0.561) 12.167 0.143 3.532

Intercept

G and G22 denote the variances of random intercepts 11 h . and slopes respectively; 0 12 denotes t e covanance between random intercepts and slopes.

Sex Height for age Seasonal cosine Seasonal sine Xerophthalmia Age

risk for each patient in the haloperidol group at week 5 reduces by 17% (0.17 = 1- (6/5)-1.037) compared to that at week 4. However, for a patient receiving risperidone of 6 mg, their risk at week 5 reduces by as much as 29% (0.29 = 1 - (6/5)-1.037-0.877) relative to that at week 4. The estimate (G 11 = 11.07) for the variance of random intercepts suggests a strong degree of heterogeneity for baseline risks. When random slopes are added to the model, ~3 increases in magnitude, from -0.877 to -1.247, representing a 42% inflation. This addition of random slopes is justified by the substantial estimate of their variance, 0 22 = 3.532, as shown in Table 9.4. This extra variation across subjt;.cts is also more accurately reflected in the ~tandard error estimates of the (J's in Model 2. For example, the s.e. for fh III Model 2 (0.561) is almost twice the size of that for /33 (0.349) in Modell. Example 9.4. Respiratory infection in Indonesian preschool children Fitting a random effects. model to the data from the Indonesian study allows ~sf to .address the questIOn of how an individual child's risk for respiratory III eetlOn. would chang: if their vitamin A status were to change. This is accomphhsh:d by allo-:mg each child to have a distinct intercept which repT b resents t elr propensity for i f t' from mod 1 I n ec Ion. a Ie 9.5 gives regression estimates e sana ogous to Models 2-4 in Tabl 8 7 H for correlation by includin r e . '. ere, we have accounted and ' a Gaussian distributl' n ?th ~m llltercepts WhICh are assumed to follow o WI vanance G. Note that the estimates of the r d around 0.7, statistically si nific 1 an. am effects standard deviation are g ant y different from zero but smaller than

183

Age 2

2 -2.2 (0.24) -0.51 (0.25 ) -0.044 (0.022) -0.61 (0.17) -0.17 (0.17) 0.54 (0.48) -0.031 (0.0079) -0.0011 (0.00040)

Age at entry Age at entry 2 Follow-up time Follow-up time 2 GI(2

0.72 (0.23)

-1.9 (0..30) -0.56 (0.26) -0.052 (0.022)

0.57 (0.48)

-0.054 (0.011) -0.0014 (0.00049) -0.20 (0.074) 0.014 (0.0048) 0.71 (0.24)

3

-2.4 (0.37) -0.55 (0.26) -0.049 (0.022) -0.56 (0.22) -0.019 (0.22) 0.69 (0.49)

-0.055 (0.011) -0.0014 (0.00050) -0.085 (0.10) 0.0069 (0.0068) 0.72 (0.24)

in the crossover trial of Example 9.1. Among children with linear predictor equal to the intercept, -2.2, in Model 1 (average age,. ~eight,. fem~e, vitamin A sufficient), about 95% would have a probabIlIty of Illf~ctl~n between 0.03 and 0.31. This still represents considerable hetero.genelty. III · dd of infection assOCiated WIth the propensity for infection. T hereIa t Ive a ~ (054) = 1 7 xerophthalmia (vitamin A deficiency) are estimated to be ~~:'fi 'e in Modell and are not significantly different from 1. The lac 0 S~;I. c~~~s of the effect is due to the small number of xerophthalmia cases ( ) III I

RANDOM EFFECTS MODELS

COUNTED RESPONSES

184

185

.'

I d t Finally the longitudinal age effect illustrative subset of the ofl~m~ a a. I'n Model 2 can be explained, to a . f . tory inlectlOn seen on the flsk 0 resplra I d as shown by fitting Model 3. large extent, by the seasona tren' mong subjects, the regression estimt Because there is l~ss.7ete:o~::e~~r~inal model coefficients in Table 8.7. ates seen above are slhml ar o. I nd random effects coefficients are close Again the ratios of t e margma a to (0.3460 + 1)1/2 as discussed in Section 7.4.

Example 9.5. Epileptic seizures Returning to Example 8,5 we first cons'd '

Conditional likelihood method

'd condl'tl'onal maximum likelihood ,estimation of the ranWe now consl er dom intercept log-linear model for count data. Specifically, we assume that conditional on 'Yi = /30 + Ui ,

(1) Yij follows a Poisson distribution such that

= 1'i + X~j{j + log(tij),

logE(Yij hi)

j = 1, ... , ni;

and

(2) Yil,"., Yin. are independent. Under these assumptions, the likelihood function for f3 and 1'1, .. , ,'Ym is completely specified and is proportional to

IJ {n. +f

n,

m

exp Ii LYij

,_I

+ {j' LYijXij

,

Yiilog( tij) -

J=1

f:

tii exp( 'Yi +

X~ f3)}. .

J

m

i=1

(

.

Y,. Yll"",Yin

ni (

i) II J=1

exp(x~j{3)/

Xi]2

=

tije

x''Jf'J .r.I

Yij

ni t L e=1 x' {j) ,ee

(9.4.2)

'f

ni

L: tif exp(X~e{j) e=1

represents the probabiIitY that gory j', j = 1 n. W each of the Yi. events will fall into 'cate, . , " , . e now Use th d' . . continue the analysis of t h ' e con ItlOnal hkelihood approach to . e seizure data.

e model

+ f33 x 'JIX'J2 . +I ( ) og til ,

{~

if progabl'd e group, . the ith . subject is assigned to the' If the zth subject is assigned to the placebo group,

{~

if j = 1,2,3, or 4, if j = O.

Here, Ii is the exp~cted baseline seizure count for the ith subject, i = 1, .. , , 59. The coeffiCient (32 represents the log ratio of the seizure rates post versus pre-randomization for the placebo group. Note that this is assumed to take the same value for each subject. Similarly, f32 + /33 represents the log ratio of rates for the treated group so that /33 is the treatment effect coefficient. Because Xii = Xi2 = Xi3 = Xi4 and t;o 8 = til + ... + ti 4, the conditional likelihood for {j reduces to

(y ) ( 59

X

The contribution in (9 4 2) fo b' " which .. r su Jed Z IS a multinomial probability in

7':ij = tii

=

(9.4,1)

By conditioning on y',. -- ",n. d" . . L..i=1 Yii as was one III SectIOn 9.2,1, we obtam the followmg conditional likelihood which depends on {3 only:

n .

Xijl

g y:~

i=1

th

where

28

J=1

3=1

er

logE(Y;j IT;) = Ii + f3 IX ijl + f32x2 . . 'J J = 0, 1, ... ,4; Z = 1, ... , 59 ,

9.4 Counted responses

9.4.1

I

exp((3) ) 1 + eXPC(32)

(

Yi

II Yi~

)

(

1=29

Yi. -YiO (

1 ) 1 + exp(/32)

a) )Yi.-YiO exp ((32 + ,v3 1 + exp((32 + (33)

(

Y,o

1

1 + exp(/32

)YiO

+ (33)

Thus, the conditional likelihood reduces to a comparison of two sets of binomial observations with 'sample sizes' 28 and 31. In the placebo group, each subject contributes a statistic YiO/Yi. for estimating the parameter 7fl = 1/{I +exp((32)}, which is the common probabili~y t~at an.in~ividual's seizure occurred before rather than after the randomizatIOn. SImIlarly, the common probability in the progabide group is 7T2 = 1~{1 + exp(/32 + (33)}. Thus, a negative value for (33 indicates that a relatively larger fractIOn of the total seizures in the treatment group occurred before rather than after randomization as compared to the placebo group. In other words, a negative (33 indicates that the treatment is effec~ive: . Table 9 6 gives the conditional maximum likelIhood estlm~tes of f32 and (33 and'their standard errors. With the full data set, ther~ IS mod.est . ., ffi' than the placebo In reducmg eVIdence that progablde IS more e ectlve . ' l' 0 10 ± 0 065). WIth pOSSible out ler the Occurrence of seizures ((33 - - , ' ,

A

_

RANDOM EFFECTS MODELS

COUNTED RESPONSES

186

187

f conditional likelihood analysis Table 9.6. Re>su Its o . (coefficient ± standard error) of seIzure data. Complete data Patient# 207 deleted (32 (33

Pearson 's X2

0.11 -0.10

± 0.047 ± 0,065

289.9

0.11 -0.30

±

(2) the Jii are independent gamma d . variance ,pJi2. ran am vanables with mean Ji and

Then, the unconditional distribution of Y. _. '. I) IS negative bmomial with

0.047

± 0.070

227.1

patient number 207 deleted, a stronger treatment effect is suggested, (/33 = 2 -0.30 ± 0.070). However, the calculation of the Pearson X statistic

E(Y;))

= /l

and

Var(Y: -) I)

= I'~ + -1./12

,/-,,..,.

(9,4.3)

The use of the negative binomial model d at b k Greenwood and Yule (1920) who mOdelled es ~ at least to the work of . over-dispersed accid t The SImplest extension of the negati b" - en counts. ve momlal model is to that the /li depend on covariates x- thro h - assume . . ' ug some parametric function TlIe most common IS the log-lmear model for which . (9,4.4)

*2

where 1h = 1/{I + exp(P2)} and = 1/{I + exp(P2 + P3)}, also reveals that the fitted model is grossly inadequate. For example, with 57 degrees of freedom in the full data, the fitted model is rejected at the 0.01 level. The same conclusion is reached when the outlying individual is set aside. An important implication of this observation is that the estimated standard errors for the elements of /3 may be too small. This may be due to the inadequacy of the assumption that the change in seizure rate is common to everyone within a treatment group. One way to address this possibility is to introduce a random effect Ui2 for the pre-post explanatory variable X2. However, the conditional likelih~Od meth~d would no longer be appropriate since all relevant information a out WIll. be ~onditioned away. We must instead use the random effects approac whIch IS the topic of the following sub-section.

(3t

9.4.2

Random effects models for counts

The Poisson distribution has a 1 . . but in biomedical ap pl' . . ~ng tradItIon as a model for count data, is implied by the p . IcatlOns It I~ rarely the case that Var(Y) = E(Y) as OlSSon assumptIOn T . II ' mean (Breslow 1984) A d' " yplca y, the vanance exceeds the , . s Iscussed III St' 9 3 over-dispersion can be ex I' db ec IOn .. 2 for binomial data, this . p ame Y assumin th t th ' genelty among the expected res 0 g a :re IS natural heteroassumed to follow a gamm d' p ~bses. across observatIOns, If the means are " a Istn utlOn the . I d' . . margma IstnbutlOn of the counts IS the negative binomi I d' t 'b: arises from the assumptions t~at IS n utlon. Specifically, this distribution (1) conditional on /I- tl . """ Ie response variabl v . WIth mean Iti; e I ij has a POIsson distribution

One i~portant limitation of this model for application to longitudinal data I~ t~at th: explanatory variables in the regression above do not vary wlthm. sub.Jects. Morton, (1987) proposed a solution to this problem. In the longltudmal context, If we once again let v_ 1,1, ... , Y.-,,,. deno t e th e counted responses from the ith subject, Morton (1987) assum'ed that (1) conditional on an independent unobserved variable f E(Y,-)-I f t-) . " ( ,ij (3) fi,J=I, expx ... ,ni;

(2) Var(Yij lEi) = ¢>E(Y;j If;); (3) E(fi)

= 1 and Var(Ei)

= 17 2 .

Note that assumptions (1), (2), and (3) imply that E(Y;j) = exp(x:l-1) = /lij and Var(Y;j) = ¢>/lij + (j2/ltj . Morton (1987) extended this approach to include more complicated nesting structures as well. He used a quasilikelihood estimation approach, which is similar to GEE, for estimating (3,,p, and 17 2 . An attractive feature of this model is that it is not necessary to specify the complete distribution of fi, only its first two moments. The model that is the focus of the remainder of this chapter adds the random effects on the same scale as the fixed effects as follows:

(1') logE(Y;j lUi) = X:j{3 + d~jUi; (2') given U i , the responses Y;l,"" Yin. are independent Poisson variables with mean E(Y;j !Ui);

(3') the U i are independent realizations from a distribution with density function f (U i; G), This second approach allows the contribution of the random ~ffect~ to vary within a subject, that is, d ij need not be constant for a gIven z. If this flexibility is needed, the second approach is to be prefer~ed, In fact, . 1 e of (1') WIth d- - = 1 the expression (1) is readily seen as a specm cas . IJ and U i = log(Ei)' On the other hand, in order to make mferences about

RANDOM EFFECTS MODELS

FURTHER READING

188

189

ecify a distribution for the U i· The following d ta P ~ d G.' one sregressIOn . Wit . h G aUSSIan . random fJ an . nee Il th se of poisson sub-section Illustrates e u . effects by re-analysing the seizure data. 9.4.3

Table 9.7. . Estimates and stand ar d errors C effects POIsson regression models fi III parentheses) for random without patient number 207. tted to the progabide data with and

poisson-Gaussian random effects models

Example 9.5. (continued) . t' e fit two models to the progabide data which differ only In t hIS sec Ion w . . on how the random effects are incorporated. Modell IS a log-lInear model with a random intercept. In Model 2, we add a second random effect for the pre/post-treatment indicator (X2) so that logE(l'ij lUi)

= 130 + 131 Xijl + 132 x ij2 + 133 x ijl x ;j2 + Uil

+ Xij2Ui2 + log(tij) , where Ui == (Vii, Ui2) is assumed to follow a Gaussian distribution with mean (0,0) and variance matrix G with elements

(g~~ g~~). The inclusion of Un allows us to address the concern raised at the end of ~ection 9.4.1, that there might be heterogeneity among subjects in the ratIO of the expected seizure counts before and after the randomization. Th~ degree of heterogeneity can be measured by the magnitude of G 22 , the ~arI~nce of Ui2 .. Both mo?els were fitted using the approximate maximum hkelihood algOrithm outlined in Section 9.2.2. !able 9.7 presents results from fitting Models 1 and 2 with and without patient num?er 207. As expected, the results from M~del 1 are in close WIth those.from th e cond't' agreement h' H I lOnal approach given in Section 9.4.1. owever, t IS model IS refuted b th t . . in Model 2 . th yes atlstlcal significance of G 22 which ' error 0.062. usmg e complete dat a we es t'Imate to be 0.24 with standard Focusing on the results £ M or odel 2 fitted to the complete data, subjects in the placeb o group have expect d . e seIzure rates after treatment which are estimat d t b (exp(~2) = exp(0.002) =e1.00~) ~o:o~hlY the .same as before treatment are reduced after treatment b . b e progabIde group, the seizure rates !Ier;ce, the treatment seems t~ ~ out 27% (1 - exp(0.002 - 0.31) = 0.27). IS (33 = -0.31 with a stand d ave a modest effect: the estimated effect the moment patient numb ar207error of 0.15. Finally, if we set aside for e r , is whoif had. u ~usua IIy h'Igh seIzure . then the eVI'd ence that progabide ~ _ rates, 3 d-' -0.34 ± 0.15. The analysis 't~ ectlve IS somewhat stronger, with an IS carried out in order t wdl out patient 207 is only exploratory, overall result s. p atrent . 0 un erstand thO " s influence on the number 207 h IS patIent . seiZure Counts an d perhaps h a s ' as been ident·ft 1 ed b ecause of unuSU al speCial medical problems.

9.5 Further reading Throughout this chapter, we have used the Gaussian distribution as a convenient model for the random effects. When the regression coefficients are of primary interest, the specific form of the random effects distribution is less important. However, when the random effects are themselves the focus, as in the CD4 + example in Chapter 5, inferences are more dependent on the assumptions about their distribution. Lange and Ryan (1989) suggest a graphical way to test the Gaussian assumption when the response variables are continuous. When the response variables are discrete, the same task becomes more difficult. Davidian and Gallant (1992) have recently developed a non-parametric approach to estimating the random effects distribution with non-linear models. The statistical literature on random effects GLMs has grown enormously in the last decade. Key papers in the biostatistics literature include: Laird and Ware (1982); Stiratelli et al. (1984); Gilmour et af. (1985); Schall (1990); Zeger and Karim (1991); Waclawiw and Liang (1993); Solomon and Cox (1992); Breslow and Clayton (1993); Drum and McCullagh (1993); Breslow and Lin (1995) and Lin and Breslow (1996). These papers also provide useful additional references.

GENERAL 191

for suitable functions frO, and

vC: == v( /le)A. .)

"'ij

(10.1.2)

,+"

where h and v are known link and vanance " f ' unctIOns d t . A . e er~\Iled from the specific form of the density function ab details on GLMs. ave. ppendlX A.5 gives additional

10

Transition models

In wor~s, the transition model expresses the . " as a functIOn of both t h ' conditional mean /lc e CQvanates :c.. ' and )of'th.e past respon f)

Yij- q' P ast responses or functions th f .' ses as additional explanatory variables W erleo are SImply treated . e assume t lat the past affects the present t h rough the sum of s terms each f h' h . 1 Th £ . , 0 W IC may depend on the .1- f . .Illustrate . q prior va ues. e lollowmg examples with d'ffi ·. I erent I'IlUl. unctions t h e range 0 f transition models which are available. Linear link - a ?inear regression with autoregressive errors for Gaussian data (Tsay, 1984) IS a Markov model. It has the form Ytj -I, ... ,

' h t r considers extensions of generalized linear models (GLMs) for ThlScape h I " 'b' the conditional distribution of eac response YiJ as an exp lClt descrJ JDg . W '11 t: function of past responses Yij _ 1, ... , Yil and covanates x ij . e WI. LO~uS on the CaBe where the observation times tij are equally spaced. To slmphfy notation we denote the history for Bubject i at visit j by'Hij = {Yik, k = 1, ... Ii : I}. As above, we will continue to condition on the past and present values of the covariates without explicitly listing them. The most useful transition models are Markov chains for which the conditional distribution of Yij given ?tij depends only on the q prior observations Yij-lt.", Yij-q' The integer q is referred to as the model order. Sections 10.1 and 10.2 provide a general treatment of Markov GLMs and fitting procedures. Section 10.3 deals with categorical data. Section lOA briefly discusses models for counted responses which are in an earlier stage of development than the corresponding models for categorical data. 10.1 General As d~8:ussed .in ~ection 7.3, a transition model specifies a GLM for the condltlO~~ dlstnbution of Vij given the past responses 'H,.. The form of ' ~J the conditIOnal GLM is f(Yij I'H ij ) == exp{[YiAj -1j;(Oij)]j¢ + C(Yij, ¢)},

I

J

fJ

Vij = :Ci/f3

+~ ""' ar(Y,)· -r -

v5 == Var(Yij l'Hij ) = '1/1" (Oij )¢.

We will consider transition models satisfy the equations where the conditional mean and variance

+ Z·

f)'

where the Zij are independent, mean-zero, Gaussian innovations. This is a transition model with h(J.lc.) == Jl.cf)' v(lJ....)c ) == 1, and f,r -- a r (y I..J - r I lJ Xij-r (3). Note that the present observation, Vij, is a linear function of:c and of the earlier deviations Vij-r - :C~j_r{3, r == 1, ... , q. 'J Logit link - an example of a logistic regression model for binary responses that comprises a first-order Markov chain (Cox, 1970; Korn and Whittemore, 1979; Zeger et ai., 1985) is o

logit Pr(Y,j == 111tij) = :cd {3 + aYij -I, previously given as equation (7.3.1). Here C

.

C

h(J.lij) == lOglt(Jtij) = log

J.l5 ' (1 -J.lijC)

and

fr('Hij, a) and

:c'f ). - r {3)

r=l

(10.1.1)

for. known functions 1j;(() ij) and ( ) variance are c Yij, ¢. The conditional mean and

/If· == E(Y. . I'Hij) == 1j; (()ij)

q

= arYij-rt

s

=q =

1.

A simple extension to a model of order q has the form q

logit Pr(Vij = 111ti j) = Xi/f3 q

+

L

QrYij-r'

r=1

The notation f3 indicates that the value and interpretation of the q regression coefficients changes with the Markov order, q.

,

TRANSITION MODELS

FITTING TRANSITION MODELS

me a log-linear model where Yi j Log-link - with count dat,a w,e c~n assZu and Qaqish (1988) discussed P 'sson dIstrIbutIOn. eger h • 11 . I _ {I g(y~. ) - Xij-l '(3} , were Yij == given 'Hij fo ows a 01. a first-order Markov cham wIth I - a 0 .)-1 ( ., d) 0 < d < 1. This leads to

In the linear model we assume that Yo. en '1..J . 'b ' If ~tl, ... , vI iq are also multivariate . 'J ",.venG Ilij .follows a Gaussian dlstn utlon. . weakly stationary th ausslan. and the h v ij IS ance struc t ure £or teL ' . covari. f . ' e marginal dlstnbutlOn . ) b I(Yij,.·., Y,q can e ully determmed from the cond't' I d' 'b . . h aut add"ltionaI unknown parameters H IlOna mode1 Wit f II .Istn utlOn . lIke. " , ence, u maxImum lIhood estlmatlOn can be used to fit Gaussian aut . oregresslve models . See Tsay (1984) and references therein for details. In the logistic and log-linear cases f(Y'1 y. ) I'S n t d t . d fr • , " .. " ' q a e ermme am the GL~ assumptlOn abou~ th: conditional model, and the full likelihood is unavaIlabl:. ~n alternative IS to estimate f3 and Q by maximizing the conditional lIkebhood

192

m"" y.,>

'"g ~ E(Y;; [1t,,) ~ exp(x,/13) (exP~::;~'13))

0

- 0 from being an absorbing state whereby when a > 0, we have The constant d prevents Yij-I . = 0 forces all fut ure responses t 0 bONate e '. YiJ-;1 1/9 , when the prevIOUS outcome, Yij-l, exceeds an Increased expect a t'on I ''''') I I , (,I) Wh < 0 a higher value at tij-I causes a ower va ue exp(roi .1-11-' ' en a ,

mod~l

at tWithin the linear regression model, the transition can be formulated with Ir = ar(Yij-r - roij-l'(3) so that E(~j! = .Xij (3 whatever the value of q. In the logistic and log-linear cases, It IS dlffic~1t to formulate models in such a way that f3 has the same meaning for dIfferent assumptions about the time dependence, When f3 is the scientific focus, the careful data analyst should examine the sensitivity of the substantive findings to the choice of time-dependence model. This issue is discussed below by way of example, Section 7.5 briefly discussed the method of conditional maximum likelihood for fitting the simplest logistic transition model. We now consider estimation in more detail. 10.2 Fitting transition models

As indicated in (7.5.3), in a first-order Markov model the contribution to the likelihood for the ith subject can be written as

II I (Yij IH ij ).

In a Markov model of order q, the conditional distribution of ~. is .)

l7tij ) ==

f(Yij IYij-I, ... , Yij-q),

so that the likelihood contribution for the ,;th b' t b ' su Jec ecomes n,

f(Yil"",Yiq)

II j=q+I

II I(YiQ+l,""

m

= II

Yini IYiI, ... , Yiq)

i=I

n,

II

i=lj=q+l

f(Yij IH ij ).

m

n,

L L

8

C

:~j vf.1-

1

The GLM (10.1.1) specifies anI th d' . the likelihood of th fir t Y e c?n ItJonal distribution f(Yi.1 l7-£i.1); . e s q ObservatIOn f(y·,1, ... ,Yiq ).IS not spec iiied dIrectly.

(10.2.2)

(Yi.1 - /-tm = 0,

where 6 = ((3 Q). This equation is the conditional analogue of th~ GLM . A ppend'IX A.. 5 The derivative'Il8/-tij/85 IS anaScore equation, discussed III £ ulate the

logous to Xij but it can depend on 0: and (3. We can Stl o~~ ws Let estimation procedure as an iterative weighted least squares as 0 dO c't Itij row 1 S l'i be the (ni - q)-vector of responses £or J. -- q + 1,. 't': n·• an 'th kth expectation given H ij . Let X; be an (ni -q) x (p+s)) ma(nrIX_Wql) x (n' _q) • / C k - 1 n· -q an, 1 8/-tiq+k/86 and Wi = dmg(l vik+q' - , ... , • .. (Yi _ C). Then, . F'IIIally, let Z·• = X.'t5.+ Z• onIt.X· using d·lagonal weighting matrIX. an updated ~ can be obtained by iteratively regressmg A

f(Y··IY·· ) 'J ,)-1"",Yij_q'

(10.2.1)

When maximizing (10.2.1) there are two distinct cases to consider. In the first, fr(H ij jQ,f3) = arlr(Hi.1) so that h(J.l5) == roi/f3 + L:=l ar fr(7-£ij). Here, h(/-tB) is a linear function of both f3 and Q == (ab"" as) so that estimation proceeds as in GLMs for independent data. We simply regress Yij on the (p + s )-dimensional vector of extended explanatory variables (Xij, It (H i.1 ), ... ,Is (H i.1 )). The second case occurs when the functions of past responses include both Q and (3. Examples are the linear and log-linear models discussed above. To derive an estimation algorithm for this case, note that the derivative of the log conditional likelihood or conditional score function has the form

i=lj=Q+l

j=2

f(Yij

m

SC (6) =

n,

Li (Yn, ... , Yin.) == I (Yil)

193

A

£ th conditional mean and variweights W. When the correct model is assumed or t' ~IY as m goes to infinity, ance, the solution ~ of (10.2.2) asympto IC al to the true value, t5, and follows a Gaussian distribution with mean equ

TRANSITION MODELS FOR CATE

TRANSITION MODELS 194

V" = ('f:,Xt'Wi x ;*)-1

(10.2.3)

,:1 • IT d ds on a and o. A consistent estimate, VS ' is obtained The varlance v8 epen fJ . a d by their estimates f3 and o. Hence a 95% confidence by rep1acmg fJ an a " . . a1 C (3 • (3A ± 2 where V.. is the element m the first row mterv lor 1 IS I V V 811 ' °u A

In the regression setting we model th t .. . f' ' e ransltlOn prob a b'l' . 0 covanates Xi' = (1 x'' I Itles as functlOns J "Jl,Xij2, ... ,Xi') Avery I ~ separate logistic regression for Pr(Yij := 1rV;_ = genera _model uses IS, we assume that J I Y'J)' Yij - 0,1. That

A

q;;-

and column of V". . .. . Ii the conditional mean is correctly speCIfied and the condItIOnal vanance is not we can still obtain consistent inferences about t5 by using the robust vari~nce from equation (A.6.1) in Appendix A, which here takes

and logit Pr(Yij = 11 Yij-I

(f,Xt'WiXt) (f:X;'WiViWi X:) (f Xt'Wi x ;)-1 -1

,:1

,=1

(10.2.4) A consistent estimate VR is obtained by replacing Vi = Var(Yi l1td in the equation above by its estimate, (l'i - Mic)(Yi - Mi C )', Interestingly, use of the robust variance will often give consistent confidence intervals for 6 even when the Markov assumption is violated. How!lver, in that situation, the interpretation of 1; is questionable since JL5 (IS) is not the conditional mean of Y:,'J given 'IJ, , ' t'J' 10.3 Transition models for categorical data This section discusses Mar kov ch" . am regreSSIOn models for categorIcal responses" observed at equ II d . b' a y space mtervals. We begin with logistic deIs lOr mary responses rna . mult'n . al d and then b nefly consider extensions to 1 o~m an ordered categorical outcomes. As dIscussed in Section 7 3 fi' . . characterized by th t " ., a. rst-order bmary Markov cham IS e ransltlOn matnx

(

11"00 11"10

11"01) 1I"1l '

logit Pr(Yij =

11 Yij-l

=

YiJ'-I)

== x',a + y"'J- IX',,,, 'JfJ o 'J .... ,

(10.3.1)

so ~hat f31 .= .f3o + o. Equation (10.3.1) expresses the two regressions as a smgle lOgiStIC model which includes as predictors the previous response Yij-I as well as the interaction of Yij-I and the explanatory variables. An advantage of the form in (10.3.1) is that we can now easily test whether simpler models fit the data equally well. For example, we can test whether 0:: = (ao,O) so that in (10.3.1) Yij-lXi/O: = O:OYij-l' This assumption implies that the covariates have the same effect on the response probability whether Yij-l = a or Yij-l = 1. Alternately, we can test whether a more limited subset of a is zero indicating that the associated covariates can be dropped from the model. Each of these alternatives is nested within the saturated model so that standard statistical methods for nested models can be applied. In many problems, a higher order Markov chain may be needed. The second-order model has transition matrix Yij Yij-2

0 0 1

1 where 7I"ab == Pr(y;" == blY:' _ probability that ' / _ 1 'hJ - 1 - a), a, b = 0,1. For example ?f01 is the L 'J W en the . ' that each row of a transition rna . prevIOUS resp~nse is Yij-l = 0. Note a) + Pr(y;'J' == 11 Y ) tnx sums to one smce Pr(Y:' = 0 I Y:' 1 = 'J-I = a == 1. As its '. '3 'Jname Imphes, the transition matrix

= 1) == X~jf3l,

where f30 and f31 may differ. In words, this model assumes tht th IX '11' a e euects . bl a f exp1ana.t ory vana es WI dIffer depending on the prevIous . response. A more conCIse form for the same model is

the form

,=1

195

records the probabilities of making e h f . one visit to the next. ac 0 the pOSSible transitions from

(p + 8) X (p + 8) variance matrix

VR;:;:;

GORICAL DATA

Yij-l

0 1 0 1

0

1

1rooo

11"001

11"010

11"011

11"100

11"101

11"110

7l'1l1

"V b)' for example 11"011 is the H ere, 11"abc = Pr(Yij = C IYij-2 = a, L ij-l , _ B al with probability that Yi J' = 1 given Yij-2 = 0 and Yij-l - 1. fiy;n ogy ate · could now t lour separ t h e regression models for a first-or d er ch aIll, we

TRANSITION MODELS FOR CATEGORICAL DATA

TRANSITION MODELS 196

~ ossible histories (Yij-2, Yij-d, ach of the lOur p ffi' t a f3

~

logistic regressions, one or e d (1 1) with regression cae Clen ~ 1-"00'. 01' an it is'agalll . more convenient to wnte a smgle name Iy (0 , 0) , (0 , 1) '.(1,0), I But {3 10' a nd (3 11' respectIve y. equation as follows logit

Pr(Yij =

11 Yij-2

--

--

Y'J-2,

Yo-I 'J

= X ij f3 + Y-')-- IX-'J-0'1 + y"'J- 2X 'J- -02 I

I

I

= Yij-I)

10.3.1

+ Y'j-I -- y-'J--2XiJ-03'

(10.3.2) .

a,z' .

~~~~~rr;.~s~~;~~:ene~ndst~is~ituationp' l:e :u:tg~~l~t:~~s:i~~t:et~r~~:~~~ models of different or er. exam , rOT

model which can be written in the form

= 11 Yij-3 = Yij-3, Yij-2 = Yij-2, Yij-l =

Yij-d

= X~jf3 + O'tYij-1 + 0'2Yij-2 + 0'3Yij-3 + 0'4Yij-lYij-2

+ 0'5Yij-lYij-3 + 0'6Yij-2Vij-3 + 0'7Yij-IYij-2Yij-3'

(10.3.3)

A second-order model can be used if the data are consistent with aa = a5 = <16 = <17 = 0; a first-order model is implied if aj = a for j = 2, ... ,7. As with any regression coefficients, the interpretation and value of 13 in (10.3.3) depends on the other explanatory variables in the model, in particular on which previous responses are included. When inferences about 13 are the scientific focus, it is essential to check their sensitivity to the assumed order of the Markov regression model. As discussed in Section 10.2, when the Markov model is correctly specified, the transition events are uncorrelated so that ordinary logistic regression can be used to estimate regression coefficients and their standard errors. However, there may be circumstances when we choose to model Pr(Yij I fij-I, ... , fij-q) even though it does not equal Pr(Yij IHij)' For example, suppose there is heterogeneity across people in the transition matrix due to unobserved factors, so that a reasonable model is Pr(fij

intercept U makes the t:ansitions for a person correlated. Correct inferences about the populatIOn-averaged coefficients can be d . th . . Section 7.5. rawn uSlllg e GEE approach descnbed III

Indonesian children's study example

I

r and Y-')- 1 , we obtamUTf3 00 = 13; 'a t values lor Yij-2 ld By plugging in t he dIlleren . and = f3 + 0'1 + 0'2 + 0'3. vve wou {301 = {3 + al; f310 = 13 + f3l l odel fits the data equally well so h t e parsImonIous m again hope t a a mar f the a _would be zero. that many of the com~onents 0 f (10 3'2) occurs when there are no interAn important speCIal case 0 .. d __ and the explanatory t esponses Y- '-I an y,)-2, h actions betwee~ t e pas r t' ;~he a _are zero except the intercept variables, that IS, when all elemen so aae'ct the probability of a positive . th revious responses 11' term. In thIS c:e, ffi et p f the explanatory variables are the same regardless

logit Pr(Yij

197 i

= 11 Y;J-I = Yij-I, U;) = (/30 + Ui) + x~}f3 + aYij-l,

where Ui '" -:r~O,0"2). We may still wish to estimate the populationaveraged transItIon matrix, Pr(Y;j IY;j-l = Yij-I). But here the random

We illustrate the Markov models for binary longitudinal data with the respiratory disease data from our subset of 1200 records from the Indonesian Children's Health Study. Ignoring dependence on covariates for the moment, the observed first-order transitions are as presented in Table 10.1. The table gives the number and frequency of transitions from the infection state at one visit to the next. These rates estimate the transition probabilities Pr(Yij IYij-t). Note that there are 855 transitions among the 1200 observations, since for example we did not observe the child's status prior to visit one. For children who did not have respiratory disease at the prior visit (Yij-I = 0), the frequency of respiratory infection was 7.7%. Among children who did have infection at the prior visit, the proportion was 13.5%, or 1.76 times as high. An important question is whether vitamin A deficiency, as indicated by the presence of the ocular disease xerophthalmia, is associated with a higher prevalence of respiratory infection. Sommer et al. (l9~4) have demonstrated this relationship in an analysis of the entire IndonesIan data set which includes over 20 000 records on 3500 children. In our subse~, we can form the cross-tabulation shown in Table 10.2 of xeropht~alm~a and respiratory infection at a given visit, using the 855 observatIOns III Table 10.1.

Table 10.1. Number (frequency) of transitions from respiratory disease status Y; - at visit j - 1 to disease status Y;j a~J;i~it j for Indonesian Children's Health Study data.

Yij I

0

1

0

1

721 (0.923) 64 (0.865)

60 (0.077) 10 (0.135)

781 (1.0)

74 (1.0) 855

TRANSITION MODELS

TRANSITION MODELS FOR CATEGORICAL DATA 199

198

This exploratory analysis suggests a model

tabuIation of respiratory o 2 Cross-ero for Table 1 . ' . hthalmia status Xij disease l';:j ag~mst x, ~alth study data from Indonesian chIldren s visits 2 to 6.

0

1

748 (0.920) 37 (0.881)

65 (0.080) 5 (0.119)

Xij

0 1

logit Pr(Yij

= 11 Yij-l = Yij-r) = a:;j{3 + 0Yij_I'

Table IDA presents logistic regression results for that model and a number of others. For each set of predictor variables, the table reports the

Table 10.4. Logistic regression coefficients, standard errors (within round parentheses), and robust standard errors (Within square parentheses) for several models fitted to the 855 respiratory disease transitions in the Indonesian children's health study data.

813 (1.0) 42 (1.0) 855

Model Variable

Tabl 103 Cross-tabulation of current respirato? . e •. y;. a ainst xerophthalmia Xij and pr~vldisease status lJ. g t Y; . for the Indonesian ous respiratory disease sta us lJ-l children's study data from visits 2 to 7.

Xij

0

1

0

688 (0.925) 33 (0.892)

56 (0.075) 4 (0.108)

1

Yij-l

744 (1.0) 37 (1.0) 781

0

1

60 (0.870) 4 (0.800)

9 (0.130) 1 (0.200)

69 (1.0) 5 (1.0) 74

=0

The frequency of respiratory infection is 1.49 = 0.119/0.080 times as high among children who are vitamin A deficient. But there is clearly correlation among repeated respiratory disease outcomes for a given child. We can control for this dependence by examining the effect of vitamin A deficiency separately for transitions starting with Yij-l = 0 or Yij-l = 1 as in Table 10.3. Among children free of infection at the prior visit, the frequency of respiratory disease is 1.44 = 0.108/0.075 times as high if the child has xerophthalmia. Among those who suffered infection at the prior visit, the xerophthalmia relative risk is 1.54 = 0.200/0.130. Hence, the xerop~thalmia effect is similar for Yij-l = a or Yij-l = 1 even though Yij-I IS a strong predictor of Yij.

Intercept

Current xerophthalmia (1 = yes; 0 = no)

1

2

3

-2.51 (0.14) [0.14)

-2.51 (0.14) [0.14]

-2.85 (0.19) [0.18]

-2.81 (0.18) [0.17]

0044 (0.50) [0.54]

0.40 (0.55) [0.51]

0.42 (0.50) [0.53]

0.79 (0.58) [0.49]

0.78 (0.52) [0.53]

-0.024 (0.0077) [0.0070)

-0.023 (0.0073) [0.0065J

Season (1 = 2nd qtrj 0= other) Yij-l

0.61 (0.39) [0.41)

Yij-l

0.11 (1.3) [1.1]

Yij-l

by age

Yi j - 1 by season

5

-2.44 (0.13) [0.14)

Age - 36 (months)

by xerophthalmia

4

0.62 (0.37) [0.39)

1.23 (0.29) [0.29]

(0.28) [0.27J

0.82 (0.47) [0.44)

0.62 (0.38) [0040)

-0.11 (1.4)

[1.1] 0.00063 (0.029) (0.024) -1.24 (1.2) [1.1J

1.11

TRANSITION MODELS FOR CATEGO

TRANSITION MODELS

200

m (10.2.3) reported by ordinth tandard error fro d fi , regression coeffielent, e s . nd the robust standard error e ned , ary logistic: regression procedures, a by (10.2.4). . k f' c ction using only xerophthalmIa , d l I' ·t8 tlw fjS 0 mle .' f'''' The first mo c prec IC ' . Table 10.2. The frequency 0 mll~ctlOn status and should reproduce I" Table 10 2 is 8JJ% whieh equals a:Uong children without xeropht~a mla ~n44 is the i~tercept for Model 1 in exp( -2.44)/{I + exp( -2.44)!, : t~ ~ed from Table 10.2 is Table 10.4. The log odds ratiO ca cu a, log{(O.l19/0.881)/(0.080/0.920)}

:=

0.44,

• c hthalmia in Model 1 of Table 10.4. The I' h 's the coeffiCient lor xerop d '. . . formulation is that standar errors are w 11C I , advantage of the logistic regresSIOn readily calculated. I' d . t Model 2 allows the association between xeropht.ha mIa an res~lra~? ong children with respiratory disease at .the pnor. VISit . t d' a dIseflSe a Il1er am sition rates from Table 10.3. For children Without an d repro duces th e tran . h' infection at the prior visit (Yij-I = 0), the log odds ratIO for xeropht almla calculated from Table 10.3 is log{(0.108/0.892)/(0.075/0.925)} = 0.40, which equals the coefficient for current xerophthalmia in Model 2. The log odds ratio for children with infection at the prior visit is 0.40 + 0.11 = 0.51, the sum of the xerophthalmia effect and the xerophthalmia-by-previousinfection interaction. A comparison of Tables 10.2 and 10.3 indicates that the association of xerophthalmia and respiratory infection is similar for children who did and did not have respiratory infection at the previous visit. This is confirmed by the xerophthalmia-by-previous-infection interaction term in Model 2 which is 0.11 with approximate, robust 95% confidence interval (-2.1, 2.3). In Model 3, the interaction has been dropped. If the first-order Markov assumption is valid, that is, if Pr(Yij I'Hij) = Pr(Y;j IY;j-d, then the standard errors estimated by ordinary logistic regression are valid. The robust standard errors have valid coverage in l~rge samples even when the Markov assumption is incorrect. Hence a Simple check of the sensitivity of inferences about a particular coefficient to the Mar~o: assumption is whether the ordinary and robust standard errors are Similar. They are similar for both coefficients in Modell. Models ~' 2, and 3 illustrate the use of logistic regression to fit simple ~arkov ch~ms. The strength of this formulation is the ease of adding additlhonalpredlctors to the model. This is illustrated in Models 4 and 5 where , t e child's age and a b' . ' d' y m Icator of season (1 = 2nd quarter 0 = other) h ave been added In mal Mod 1 4 fi 11' ' . the same fi: e , we t a mteractions with 'Yij-l which IS as ttlng. separate logistic regressions for the cases Yi' -1 = 0 an d 1. None of the mtera t' .h ". J are dropped in Model 5, c Ions Wit pnor mfectlOn are important so they

RICAL DATA

201

Having controlled for age season d . .' . ' , a n respiratory' ~ . . VISit III Model 5, there is mild eviden . h In ectlOn at the pnor ce III t ese data ~ . . between xerophthalmia and respirato ' " . . or an asSOCiatIOn . ' . ry Illlec:tlOn' the x hth I . ficlCnt IS 0.78 With robust standard error 0 53 ' . erop . a mla coefis important to check the sensitivity of th' .: A~ .wIth any regression, it . . . e SCIentIfic findmgs t h' f model. With transItIon models , we must. C'1leC . k Whet her th . 0 ' COWl' . 0 enees about f3 change with the model for th t' d . e regresSIOn mfere Ime ependence 1i '11 we add Y;J-2 as a predictor to Model 5 Be' _ . ' 0 I ustrate. '.,.. . . cause we are uSlIlg tw as predIctors, only data from visits 3 throu h 7 I .~) ~)fIOI \ ISlts . .. . g are re evant glvlIlg a total . of 591 tranSitIOns. Interestmgly, the inclusion of Y,.. d, . ')-2 re uces the mfluence of season an d y.ij -1, and lllcreases the xerophth' I ' ffi . ' . a nlla COl' C1ellt to 1 73 Controllmg for respiratory infection at the two prl' . 't 1 " ' . '. or VISI s near y doubles the xerophthalmia coeffiCient. This example demon t t . . sra .es an E'BsentIal fea. . ture .of tranSitIOn models: explanatory variables' (e '. g xerop hth . ) and . a Imla prevIOus responses are treated symmetrically as predictors of the currE'nt response. Hence, as the time-dependence model changes, so might inferences about the explanatory variables. Even when the time dependence is not of primary scientific interest, the sensitivity of the regression inferences must be checked by fitting a variety of time-dependence models. 10.3,2

Ordered categorical data

The ideas underlying logistic regression models for binary responses carry over to outcomes with more than two categories. In this sub-section, we consider models for ordered categorical data, and then briefly discuss nominal response models. The books by McCullagh and NeIder (1989, Chapter 5) and Agresti (1990, Chapter 9) provide expanded discussions for the case of independent observations. Agresti (1999) provides an overview of methods for both independent and clustered ordinal data. To formulate a transition model for ordered categorical data, let Yij indicate a response variable which can take C ordered categorical values, labelled 0, 1, ... , C - 1. An example of ordered data is a rating of health status as: poor, fair, good, or excellent. While the outcomes are ordered, any numerical scale that might be assigned would be arbitrary. The first-order transition matrix for Yij is defined by 1ra b = Pr(ljj = b IYij-l = a) for a, b = 0,1, .. " C - 1. As with binar~ data (C ~ 2), a saturated model of the transition matrix can be obtamed by fit~lllg a . for each of t he C POSSI'ble values of 1':. I· That IS , •we separate regressIOn 1)model Pr(Y: . = b I Y: '-1 = a) separately for each a = 0,1, ... ,C - 1. With tJ tJ t' l odds model (Snell , ordered categorical outcomes, we can use a p~por w~a 1964 and McCullagh 1980) which we now brIefly reVI€W. b' · ' . Ii -d' al data are usually ar Itrary, Smce the response categones or 01 III . t fficients have the same III erwe would like a regression mode I whose coe . . db k' . categorIes. . This IS achieve y war Illg pretation when we combine or splIt

TRANSITION MODELS FOR CATEGO

TRANSITION MODELS 202

,. , Y < ) rather than the cell prohabilwith the cumulative probabilIties P~( - ~abilitiP-s, we can derive the cell the cumulative pro C' · ities, Pr(Y = a). GIven _ (Y < a-I) + Pr(Y = a), a = 1, ... , ' '1" . Pr(Y < a) Pr , h th C probabl Itles SInce ~ inde endent observations as e lorm P The proportional odds model or Pr(Y ~ a) logitPr(Y ~ a) = log Pr(Y > a)

Pr(Y;J:SblY;j-l=a)=1fao+1fal+"'+~ log Pr(Y;j :$ b I1';j-1 Pr(Y;j > b I1';j-1

°

Pr(Y;j:$ b I1';j-1 ogP(y' r ij > blY:ij-I

I

.

(.

,

a=

0,1, .. , , c

V;

- 2.

Here, each is allowed to have a different intercept but the proportional odds model requires that covariates have the same effect on each Ya•• Our first application of the proportional odds model to a Markov chain for ordered categorical responses is the saturated first-order model without covariates. The transition matrix is 1fab = Pr(Vij = b IYij-I = a), a = 0,1, ... , c - 1. We model the cumulative probabilities,

Table 10.5. Definition of Y· variables for proportional odds modelling of ordered categorical data. V }'/ o

y.1

° 1 1

1

2

°1

o o

(10.3.4)

= a) = a)

=(Jb+X"'~ a I ) fJa

(10.3.5)

for a = 0, ... , C - l', b = 0 , ... "C - 2 As wI'th b'mary responses we c~n rewrit: (1O.~,5) as ~ single (although somewhat complicated) re~res swn equatIOn usmg the mteractions between Xij and the vector of derived 'bles Yij-I • . • vana - (Yij-I.O,"" Yij-I,C-2)

(10,3.6)

°

= 10glt Pr Ya = 1) = Oa + X 13,

_ - 9ab

x·· have a different effect on y..

Following Clayton (1992), it is convenient to introduce the vector of variables Y" = (Yo",~", ... ,YC- 2 ) defined by Ya" = 1, if Y s:: a and otherwise. If C = 3, Y" = (Yo", Yt) takes values as shown in Table 10.5. The proportional odds model is simply a logistic regression for the Y a" , since Pr(Y $ a) (Y ) Pr > a

= a) = a)

203

.. .,~ , h ....,.,umlllg t at

for a = 0,1, . , . , C -1 and b = ' 1,.,., C' - 2. Now suppose th t . C h .' a covanatE'B 'J. I ) lor eae prevIous state Y. . Th d I can be wntten as I)-I· e mo e

= ()a + x'f3,

l e - 2 Here an d for the remainder of this section,, we h were a = O, ,.:., . () and do not include an intercept term m x, write the model mtercepts asp; < a) = e-Oa /(1 +e-Oa). Since Pr(Y s:: a) is () < () < ... < ()C-2' If ()a = ()a+l, Taking x = 0, we see t?at Pr a non decreasing functIOn of a, we have 0 - 1 _ P (Y < + 1) and categories a and a + 1 can therefore then Pr(Y $ a) - r _ a be collapsed. " t t' , . parameters f3 have log odds ratIO mterpre a lOllS, smce The regressIOn

log

"
RICAL DATA

Comparing (10.3.5) and (10.3.6), we see that 0C-lb = Ob and Clab = (Jab a = 0,1,.,., C - 2; b = 0,1, ... , C - 2. Similary, f3 c - I = f3 and 'Ya = f3 a - f3 a + l , a = 0,1, .. , ,C - 2. With this formulation, we can test whether or not the effect of 'Xij on Yij is the same for adjacent categories of Yij-I by testing whether 'Y a = O. A simple special case of (10.3.6) is to assume that 'Yo = 'YI = ... = 'YC-2 = so the covariates Xij have the same effect on 1';j regardless of ¥ij-I' As with binary data we can fit models with different subsets of the interactions between Yij-I and Xij' . ' The transition ordinal regression model can be estimated usmg conditional maximum likelihood, that is, by conditioning o~ the first o~er vation, ViI, for each person. Standard algorithms for fittmg pr~portlOn~ odds models (McCullagh, 1980) can be used by ~~ding the ~erlved varIables y., and their interactions with Xij as additIOnal covanates. As an 'J - I fi . It I alternative Clayton (1992) has proposed using GEE to t slm~ ~eous Y ·th· and Its mterac, . logistic regressions to Y;jo,"" YijC-2' agam WI Y~j-i b CI t the as covariates. For the examples studw y ay on, tions with x·· OJ GEE estimates were almost fully efficient. d . t h e a natural or enng, we . d When the categOrIcal response .. ~es no ~v r Y;. = b 1'; '-1 = a), must model the transition probabIlIties, 1fa b - P (IJ AI J t' (1990) a b = O l e _ 1 McCullagh and NeIder (1989) and gres 1 , , ,".,' . f , ' dependent responses. discuss a variety of logistic model formulatIOns or m ()a+Ib,

°

LOG-LINEAR TRANSITION M ODELS FOR COUNT DATA

TRANSITION MODELS

204

· nd ordered categorical responses, b a deIs by using . ., As we have d emons,trated , for .mary . mdlcator vanabIes t i d to transItIOn rna . . these ean be ex ,enc e , . ' t ctions with covariates as addItIOnal for the previous state and theIr III era explanatory variables.

205

Suppressing the index i for the . }"d I . moment, Ill( IVI ua s III a population at ge t' . let y.J rep resent t he number of . . nera Jon J. Let Z ( . offsprmg for person k III generation . _ 1 Th k y] - d be the number of of the jth generation is J. en for YJ-I > 0, the total size YJ-l

10.4 Log-linear transition models for count data '1 t sl'ons of the log-linear model in which the ." In th IS sectl'on , , we conSIC er ex en.. th ast 'H' is POIsson WIth condI'J conditional distribution of Yij gIven e P . t' th t depends both on past outcomes YiI, ... , Yij -1 and tIonal expecta IOn . bl a . . tree h "We begin by revIewmg POSSI'bl e mo d eIs on explanatory vafla es x,], . ' .' I C In each case we restnct our attentIOn to a for the can dItIOna mean P'ij' , first-order Markov chain. O

> O. In this model, s~~gested by Wong (1986), ,8 represents the influence of the explanatory variables when the previous response takes the value Yij-l = O. When Yij-I > 0, the conditional expectation is decreased from its maximum value, exp( x; ,,8) {I + exp(-ao)}, by an amount that depends on at. Hence, this mo~el only allows a negative association between the prior and current responses. For a given ao, the degree of negative correlation increases as at increases. Note that the conditional expectation must vary between exp(x;j,8) and twice this value, as a consequence of the constraints on ao and al.

1. IlC = exp(x;j,8){1 + exp{-ao - aIYij-t}}, ao, al

= exp(x;j,8+aYij-I)' This model appears sensible by analogy with the logistic model (7.3.1). But it has limited application for count data because when a > 0, the conditional expectation grows as an exponential function of time, In fact, when exp(x;j,8) = 11, corresponding to no dependence on covariates, this assumption leads to a stationary process only when a < 0. Hence, the model can describe negative association bu~ ~ot p~sitive association without growing exponentially over time. ThIS IS a time series analogue of the auto-Poisson process discussed for data on a two-dimensional lattice by Besag (1974).

2. 115

3.

~~d~e:p~~j,8+a{l~g('yij_l) -X~j_I,8}], 1988)

where Yi'j-l

= max(Yij-l, d)

< ,I. Th~s IS the model introduced by Zeger and Qaqish

( a~d bnefly dIscussed in Section 10.1. When a = 0 it reduces to an ordmary log linear d I Wh ' th 't .mo e. en a < 0, a prior response greater an I s expectatIOn decreases th t. and th" e expec atlOn for the current response ere IS negative correlation betwe ', d W there is positive correlation. en Y'J-l an Yij' hen a > ,

°

For the remainder of this se . sition model above It .ctlon, we focus on the third Poisson tran. can anse through . I h ' al l' caIIed a size-dependent branch' a SImp e p ySIC mec lalllsm mg process. Suppose that exp(x~j,8) = 11·

Yj

::=

L ZdYJ-d.

k==1

If YJ-l = 0, we assume that the population'. . uals. Now, if we asssume that the r d IS ~estarted WIth 2 0 individ. . an om vanables Z . d POIsson WIth expectation (JiJy* )1-0 th h k are III ependent J-I , en t e population size ,II C I • I 10low the transition model with I.F = the number of offs ri J p.(YJ-I/p.). The assumption about . . p ng per person represents a crowding effect. Whe the populatIOn IS large, each individual tends t d h . n a: . Th' , 0 ecrease t elr number of ousprmg. IS leads to a stationary process. Figure 10,1 displays five realizatiOllE of this transition model for different val~es of a. When a < 0, the sample paths oscillate back and forth about theIr long-term average level since a large outcome at on t'Ime decreases ,e . . the condItIonal expectation of the next response. When a > 0, the process meanders, staying below the long-term average for extended periods Notice t~a~ the sample paths have sharper peaks and broader valleys. Thi~ pattern IS III contrast to Gaussian autoregressive sample paths for which the peaks and valleys have the same shape, In the Poisson model the conditional variance equals the conditional mean. When by chance' we get a large observation, the conditional mean and variance of the next value are both large; that is, the process becomes unstable and quickly falls towards the long-term average. After a small outcome, the conditional mean and variance are small, so the process tends to be more stable. Hence, there are broader valleys and sharper peaks. To illustrate the application of this transition model for counts, we have fitted it to the seizure data. A priori, there is clear evidence that this model is not appropriate for this data set. The correlations among repeated seizure counts for each person do not decay as the time between observations increases. For example, the correlations of the square root transformed seizure number at the first post-treatment period with those at the second, third, and fourth periods are 0.76, 0.70, and 0.78, respectively. The Markov model implies that these should decrease as an approximately exponential function of lag. Nevertheless, we might fit the first-order model if our goal Was simply to predict the seizure rate in the next interval, given ?nly ~he rate in the previous interval. Because there is only a single. ob~ervatlOn pnor to randomization we have assumed that the pre-randomIzatIOn means are the same for the 'two treatment groups. Letting d = 0.3, we estimate the treatment effect (treatment-by-time interaction in Tables 8.11 and 9.6) to be -0.10, with a model-based standard error of 0.083. This standard error Q

'

\\

TRANSITION MODELS

FURTHER READING

206

207

(')~ol_~~~200

.

12 (b)

o

1

50

L

.11

100

hI. I. I

250

150

AIIlI .AIIL

.~L,~,"l . J1.dM .~LM

'1~~~50 o a 1 I"

I

50

I

I.

I

100

(.) ;11_~:-:~~~----::;:--':""-~~~C:L2Ci~----''--2!;O o

50

100

Fig. 10.1. Realizations of the Markov-Poisson time series model: (a) a = -0.8; (b) u == -0.4; (c) Q == 0.0; (d) Q == 0.4; (e) Q = 0.8.

is not valid because the model does not accurately capture the actual correlation structure in the data. We have also estimated a robust standard error which is 0.30, much larger than 0.083, reflecting the fact that t~e correlation does not decay as expected for the transition model. The estImate of treatment effect is not sensitive to the constant dj it varies between -0.096 and -0.107 as d ranges from 0.1 to 1.0. The estimate of a for d = 0.3 is 0.79 and also is not very sensitive to the value of c. 10.5 Further reading Markov models have been studied by probabilists and mathematical statisticians for several deCades. Feller (1968) and Billingsley (1961) are semi?al texts. But there has been little theoretical study of Markov regressIOn

models except for the linear model (e.g. Tsay, 1984). This is partly because it is difficult to derive marginal distributions when the conditional distribution is assumed to follow a GLM regression. Regression models for binary Markov chains are in common use. Examples of applications can be found in Korn and Whittemore (1979), Stern and Coe (1984), and Zeger et al. (1985). The reader is referred to papers by Wong (1986), and Zeger and Qaqish (1988) for methods applicable to count data. There has also been very interesting work on Bayesian Markov regression models with measurement error. See, for example, the discussion paper by West et at. (1985) and the volume by Spall (1988).

GENERALIZED LINEAR MIxED MODELS 209

11

Likelihood-based methods for categorical data

to the underlying model assumptions n e
11.1.1

11.1 Introduction l history of schizophrenia, first-episode patients In a study af th e nat ura 11 ' .. , t ecol'ded monthly for up to 10 years fo owmg mIhad disease symp oms r tiaI hOSpI'tal'Izat'IOn (Thara et al', 1994) ' In studies of the health effects of air pollution, asthmatic children recorded the presence or absence of wheezing each day for approximately 60 days (Yu et ai" 2000). To determine whether maternal employment correlates with paediatric care utilization and both maternal stress and childhood illness, daily measurements of maternal stress (yes, no) and childhood illness (yes, no) were recorded for 28 consecutive days (Alexander and Markowitz, 1986). Lunn, et ai, (2001) analyse ordinal allergic severity scores collected daily for 3 months on subjects assigned either to placebo or to one of three doses of active drug, In each of these examples the primary scientific question pertains to the association between specific covariates and a high-dimensional categorical response vector. 'Group-by-time' regression methods that characterize differences over time in the expected outcome among patient subgroups can be used for each of these examples. Although several analysis options exist for discrete longitudinal data, there are few likelihood-based methods that can accommodate such increasingly common long discrete series, Likelihood-based analysis of longitudinal data remains attractive since a pr?perly specified model leads to efficient estimation and to valid summanes d " , . un er mlssmg at random (MAR) drop-out mechanisms as discussed I~ ~hapter 13, A likelihood approach can be used to construct profile lIkelIhood curves for k t . 1'1 d ' ey parame ers, to compare nested models usmg I'k I e I 100 ratiO tests and t . . , d " ' 0 compare non-nested models by consldermg penaIIze cntena such as AIC BIC H or ,owever, to be used successfully we nee d t 0 correctly specify th d 1~ d e mo e lorm and this requires both tailore exp Ioratory methods to ch ' for mOdel chec ' ,aractenze means and covariances, and methods kIng, Fmally, the sensitivity of likelihood-based methods

Notation and definitions

In this section we distinguish between conditional and marginal regression coefficients. We define the marginal mean as the average response conditional on the c~variates for.subje~t i, Xi = (XiI"",Xin.): J1~J = E(Yij I X;). A margmal generalized linear regression model links the marginal mean to covariates using a marginal regression parameter, (3M: h(f.l~) = X~j{3M. In contrast, we define a conditional mean as the average response conditional on the covariates Xi and additional variables A ij : f.l5 = E(Yij /Xi,Aij). A conditional generalized linear regression model links the conditional mean to both covariates and A ij using a conditional regression parameter, (3c, and additional parameters, T h(J15) = x~ .(3c + 1" A ij . We introduce this additional notation to help highlight so:Oe of the differences between marginal regression models discussed in Chapters 7 and 8, and random effects models discussed in Chapter 9, where A·· represents the unobserved latent variable U i , and. transition D models discussed in Chapter 10, where A ij represents functIons of the response history, 1fij . In this chapter we show ~ow a correspo~dence between conditional models and their induced margms can be explOited to obtain likelihood-based inference for either conditional regression models or marginal regression models.

11.2 Generalized linear mixed models Recall from Section 9,1 that a GLMM structures multiple .sources of measured and unmeasured variation using a single model equatIOn:

h{E(Y;j I Xi, Ui)} = X:j{3C

+ d:jUi ,

. and U· represents unmeasured Where Xij represents measured covarlates, t'h central generalized random effects. For longitudinal data there are ree

THODS FOR CATEGORICAL DATA 210

LIKELIHOOD-BASED ME Random intercepts

linear mixed models: h{E(Yij I xi,Vd} = X;jj3C I j3c h{E(Yij I Xi' Vi)} = xij h{E(Yij I Xi, Vi)} COV

+ UiO, + Ui O + U-I'

= x~/3c + Uij,

-- U- ) = (J2 p1t;j -t;k I, (U'J' .k

(11,2.2) 'J'

(11.2.3) (11.2.4)

11 2 1) correlation among observations collected on de,I ( ." In the first rna , ., I tt 'b table to a subject-specI'£1c '1evel' , or rand om'mteran mdlvldua are a n u . . f . me that there is no addItIOnal aspect 0 tIme that cept, These dat a as su . t th dependence model In (11.2.2) we assume that each subenters moe ' " jeet has an individual 'trend', or follo~s a random.lmear trajectory (on the scale of the linear predictor), For bmary data thIS model assumes that c _ h-I(""'_j3C + U-o + U-I . t .. ) which approaches 0 or 1 as tij -+ 00. f.l' - ""') • • 'J Fi~allYI (11.2.3) assumes only 'serial' correlation implying that obse~vations close together in time are more highly correlated than observatIOns far removed in time. Figure 11.1 displays simulated data, }lij = 0 or 1, generated from each of these models for j = 1,2, ... ,25 time points. The first column of plots shows data for 10 subject generated using random intercepts, while the second column uses random lines, and the third column uses a latent autoregressive process to induce dependence, Even though these data are generated under fairly strong and different random effects parameters (var(UiO ) = 1.5 2 for random intercepts, var(UiO ) = 0.2 2 for random slopes, and cov(Uij , Uik) = 2.5 2 . O,glj-kl for random autoregressive process), there are only subtle aspects of the data that distinguish the models, First, the latent random lines model in the second column shows subjects in rows 2 and 6 (from the top) that appear to change over time from all D's to alII's, and the subject in row 5 switches from alII's early to all D's late. This pattern is expected from random lines since lie. -+ 0 or 1 as " The data generated using autoregressive random""'J J mcreases, effects exhibit more alternating runs of 0' s or l' s th an t he random mtercept ' data reflect, . mg 't'Ion. F'mally, these data suggest the challenge ' ' the dserlaI assocla that bmary ata present - det t bl d' a ' ec a e luerences m the underlying dependence st ructure may ' only be we aklY apparent even with long categorical series, In Sect Ion 9.2,2 we ' t d d ' methods refer d m r? uce approXimate maximum likelihood on a Lap'lace are to.as p,enahzed quasi-likelihood (PQL) that are based pprmomatlOn to the ' I I' . and binomial data PQL margma Ikehhood. For count data can work s .. I Clayton (1993) also d urpnsmg y well, However, Breslow and that PQL approXimatlOns '. . ld po t ent ·all I Y severely biased e emonstrate t' t yle Sima es of both . . vanance components and regressIOn parameters when used ~ b' ' lor mary respo d Fu dIrected at removing th b' nse ata. rther research has been . e las of PQL b I' . approxImations (Breslow a d L' Y ~mp oymg hIgher order Laplace n m, 1995; Lm and Breslow, 1996; Goldstein

R

VVLfU~~.

(11.2.1)

,t

Random In ' l + slopes

Ql CIl

C

o

0. CIl Ql

a:

VWV ~ VlNVV\NVV\MA V\lA ~ flNlJJ\ ~ JlJIJl JLNL ~ VVJVL ANWMJ A L AJJUNl VVWV JLN\J lJJNl/\J JWJVlJ lJ\JYIA WVlAN nJJJ\JL YVLJl flIWlN jIJ\J'{V NVlfl{1

Fig. 11.1. Binary data simulated under GLMMs: Random inte~cepts == logit E(Yij I Xi, U;o) = fie + UiO; Random intercepts and sl.opes == IOgit E(Y;j I Xi, U i ) = fie + U;o + Ui1 • j; Random autoregressive == IOgIt E(Y;j I Xi, U;) = c ,. 101 (J + Uij where COV(Uij, U'k) = Gp J- .

212

LIKELIHOOD-BASED METHODS

FOR CATEGORICAL DATA GENERALIZED LINEAR MIXED MODELS

, I i ntage of approximate methods is 1996) Thp smgu ar a( va h d , . ~'. It mative numerical met o· s which and Rash ash , t'· I ea.:'iC relative to a e t their compu a ,JOna ~ ,,' h I'k I'hood function using either quadrat. . . , ,tl m' ximlzmg tel e I focus on (IIree , y a th ds We review numcncal maXlITmm () me, 0 ,. M te C'lr1. 1 ure meth(J( s or on, . <. S t' 11 2 1 Modern Bayesian computing , 1 (MI) (aches m ec lOn, . . . likehhoQ( ~ apprJ . d d th model forllls that can I){~ considered p ~"uss Markov Chain Monte Carlo algorithms have greatly ex an e for analysifl, In Section 11.2. 2 we ISC" (MCMC) analYflis for GLMMs, 11.2.1

tl J[g ~ tl J ~

C

f(y" I X" U,; Ii

)] •

p

p

I X" U,

{f)Ogj(Yi j I Xi, Ui = G I/2Zk; {3C)}] J=I

.

Monahan and Stefanski (1992) discuss alternatives to Gauss-Hermite and show that, the maximal error for Gauss-Hermite quadrature is increasing as th~ vanance of the random effects increases, Specifically, let M ('fJ, a) = explt('r/ + I j . z)ljl(z)dz, where expit(x) = exp(x)/[l + exp(x)], and ¢(z) IS the standard normal density function. The function M ('fJ a) represents the ma:ginal m~an induced by a GLMM where logit(JP) ~ 'fJ + U, U rv N(O,1j ). Let M('r/ 0") 't( ) to M('r/ ) . L.,k~1 Wk 'expl 'fJ + Ij , Zk , the approximation M h ,0" usmg auss-Hermlte quadrature with K nodes For K = 20, ona an and Stephanski (1992) show that sup IM('fJ a = M(TJ a = 6 2) I, ~ 10- while suP'1 I M('r/, Ij = 4) _ M( ~ ~ -3 ' SpIegelman (1990) also discus . . 'fJ,0" - 4) I rv 10 ,Crouch and quadrature begins t o f aI, 'I s condItIons under which 20-point Gaussian Adaptive Gaussian quadratu . technique that centr th G ~e IS an alternative numerical integration es e ausslan ap , . d pro>omatlOn to the likelihood at the posterior mode of th e ran om effects, To derive the adaptive quadrature

!

G - "K

2) _

~ U;Il

C

)}]

~ U;Il j} C

C-'I'¢(u/C'I') do.

C-'!>

¢(U/GI/2) ]

~ fjJ [ex x

~ IT t Wk' [ex

f(y"

x ¢([u _ al/b)

flU,; C) dU,

[exp{t,IOgf(Y'j I X" U,; lic) }] . C-'I'¢(U,/C'I') dU,

,=) k=1

~ tJ J[ex {t, log ~ fjJ [exp {t,

!ogf(Y'j I X" U,

Maximum likelihood algorithms

, , I I'andom effects standard numerical integration For low-dlmenslOlla , , ' luate the lrkelrhood, solve score, equatIOns, eva to d metho dscan be use Gauss-HermIte quadratan d compu t e rna de l- based information matrices, , K ' ure uses a fixed set of K ordinates and weights (Zk' wkh=1 to approximate the likelihood function, Consider a single scalar random effect Ui rv N(O, G) and the likelihood given by equation (9.2.5):

L(6, Y)

213

we begin with the likelihood given ab d OVe an then 'ct ' linear transformation for the placeme t f th COllSl er an arbItrary e quadrature points: n 0 L(t5,y)

p

{t,

¢([a +

m K ~g(;Wk'

[

exp

log

b·

'¢([u - aJ/b) du

fry"~ I X" U. ~ ("

+b ,); lie) } C-'I'

1]

z]/GI/2) ¢(z) . b 'ljl(z) dz

{

n, } ~logj(YijIXi,Ui=(U+b'Zk);I3C) C- I/ 2

I 2

!].

x ¢([a + b . zkl/C / ) . ¢(Zk) b

This shows that we can use the Gauss-Hermite quadrature points (Wk , Zk)K_ after any linear transformation determined by (a,b) (as long k-l . as the integrand is also modified to contain one additional term a ratIO of normal densities). If the function that we are trying to in~egrate were of the form exp( - ~ (u - a)2 /b 2) then evaluation using the .Imear transformation a + b . Zk would yield an exact result. An ada?tlVe approa~h uses different values, ai and bi for each subject, that proVIde a ~u~ratlc . ' bsu"Ject s cont n'b u tl'on to the log likehhood, apprmomation to the zth ".10 f( .. / X, lj .. ,qc) - U 2 /(2G). In Section 9.2.2 we showed that ~J g YtJ " t , fJ t e d b ' th . . • _ -V-I ( . - X·{3 ), an i IS e Ui IS the postenor mode, ai = Ui - CD, if Zt b. D'. +C-I 1-1/2, where approximate posterior curvature, bi = [Lj D ij V(Ji'J) IJ Dij = af..L~)ab. Liu and Pierce (1994) dis~uss ~ow this adaptive quadrature is related to higher order Laplace ~pproXlmatlOn, f PQL fixed quadratPinheiro and Bates (1995) studied the acc~acy 0 . d' model Their ure, and adaptive quadrature for the non-hnear rruxe .

214

LfKELfHOOD-BASED M

ETHODS FOR CATEGORICAL DATA

, d t btain high accuracy with fixed quadrature results SUggfl~'lt that mbor er a °adrature points may be necessary (100 or er 0 f qu " 'd tl1re methods proved accurate usmg 20 pomts me,th 0 ds a large num ') hile adaptIVe qua r a ' , ' software more. , w th d re noW implemented III commerCial or fewer Quadrature me a s a , ' I) d ' ,, l' ST'AT'A (fixed quadrature for logistic-norma an SAS packages mcluc mg I d ' I· . ) A k limitation of quadrature met 10 . S IS t lat I'Ikeh-, (fixed and ac1aptlve . , d' , r.(q quadrature points where q IS the ImenSlOn of . . ey hood evaIuatlOn reqUIres .l' ' . Lr tV· D q larger than two, the computatIOnal burden can t he ran dam euec t· C'or . ' " 'c Il'ml'tation makes numerIcal mtegratlOn usmg quadratbecome severe, Thl" ure an excellent choice for random intercept model~, .for n~sted random effects, or for random lines, but prohibitive for multIdimenSIOnal random effect models such as time series or spatial data models, Monte Carlo ML algorithms have been developed by McCulloch (1997) and Booth and Hobert (2000). Three methods are described and compared in McCulloch (1997): Monte Carlo Expectation-Maximization (MCEM); Monte Carlo Newton-Raphson (MCNR); and Simulated Maximum Likelihood (SML). Hybrid approaches that first use MCEM or MCNR and then switch to SML are advocated, Booth and Hobert (2000) develop adaptations that aim to automate the Monte Carlo ML algorithms and guide the increase in the Monte Carlo sample sizes that is required for the stochastic algorithm to converge. Booth and Hobert (2000) show that efficient estimation can be realized with random intercept models and crossed random effect models, but suggest that the proposed methods may break down when integrals are of high dimension. The advantage of Monte Carlo ML ~ethods is. that the approximation error can be made arbitrarily small ~Imply by Increasing the size of the Monte Carlo samples. In contrast, to Improve:ccuracy with quadrature methods a new set of nodes and weights, (Zk,Wkh:l' must be calculated for a larger value of K, In summary" fixed adapt'Ive, and St och ' numencal . ,mtegratlOn . astlC met hods have been devel d d d op,e an, rna e commercially accessible for use with GLMM h ' s aVlllg low-dimenSional random effect distributions However, none of the numerical ML h d ' met 0 s have been made computationally practical for mod I 'th e aspect makes it ims WI' random effects distributions with q > 5. This pOSSI ble to use ML f GLMM ' dom effects, such as (11.2.3 re ' o~ . s, th~t have senal r.anlongitudinal data anal . Fu), g atly hmltmg applIcatIOn for categorIcal ySIS, rther det 'I d' al regar mg methods for integral approximation can be f d' E oun In vans and Swartz (1995), 11.2.2

Bayesian methods

Zeger and Karim (1991) d' I . f ' ISCUSS use of G'bb ' ,YSIS 0 discrete data usin a GLM 1. S samphng for Bayesian anaImplementation of a wid g I M. Gibbs sampling is one particular , d' . er c ass of meth d £ lor Istnbution known as MCMC 0 s or sampling from a poster, MCMc methods constitute a technical

GENERALIZED LINEAR MIXED M

ODELS

215

breakthrough for statistical est' , , " Imatlon, These m th
1l'(o,U I y) ex f(y I U,o), f(V I 0) . 1T'(0), We note that the marginal posterior distribution for the parameters 0 is obtained by integrating over the random effects V,

7['(0 I y) = ex

Iu Iu

1T'( 0, U I y) dU f(y I V,o), !(V '0) '1T'(6) dU

ex L(y I 0) '1T'(6), indicating that the only difference between the maximization of the posterior, 7['(0 , y), and the marginal likelihood, L(y I 6), is the additional impact of the prior 1l'(0), We review these basic aspects of Bayesian methods for two key reasons. First, it shows that if we can actually solve the harder problem of characterizing the joint posterior 1T'(0, V I y) then we can compute the marginal posterior for regression and variance component parameters o. Second, this shows the well-known result that the posterior distribution closely approximates the likelihood function when weakly informative priors are used, The MOMO methods can be used to generate samples from the joint posterior 7['(0, U I y). General algorithm details are thoroughly presented

216

LIKELIHOOD-BASED METHO

OS FOR CATEGORICAL DATA

' d Louis 1996). The GLMM has 1 1996 CarIm an, C elsewhere (Cilks et a., ' h that f(y I V,o) = f(y , U, f3 ), and a separation of parameters suc d't' al independence relationships allow These can I IOn f( V 10) = f (V I G) ' 'th' MCMC algorithms (Zeger and Karim , ' . d C'bb teps WI III , use of slmphfie ISS . 11l'res that the samples generated via CMC in practIce req , ' . , 1991). Use 0' f Mb assessee1 tor t: vergence to theIr statIonary dIstnbu_ can M t Carlo sample sizes have heen generated Markov challlS e tion, and that adequate t ~n e ummaries such as means, medians, and , I estimate pos erwr s to precIse y" P bl' lly available software (BUGS) allows applicatandard deVIatIOns, u lea , dId d (11 2 1)-(11 23) and for more complIcate carre ate ata s tion for models " ' , f B /MCM 'th II likelihood-based methods the use a ayes C problems, As WI a 'd d I f( I~) , f I d thoughtful specificatIOn of the ata rna e y (J, reqUIres care u an , . h I" IS t , at rea Istrcally ' contribution of MCMC machmery However, th e major " , , t d models can now be routinely used m practIce, allowmg a nch I comp lea e I 'bl' d' family of correlated data models not previous y acceSSI e usmg Irect likelihood methods, Gelfand and Smith (1990) describe sampling-based approaches for calculation of marginal posterior densities, and Gelfand et at, (1990) provide illustration of Bayesian inference based on Gibbs sampling, Zeger and Karim (1991) discuss regression analysis of categorical response data within the GLMM framework, and Albert and Chib (1993), Chib and Greenberg (1998) overview MCMC methods for binary and polytomous data using multivariate probit models, Sun et at, (2000) discuss GLMMs where random effects have a serial or spatial correlation structure, Finally, Chib and Carlin (1999) discuss specific computational issues related to the use of MCMC for longitudinal data analysis, 11.3 Marginalized models The ~~al of this section is to show that log-linear, random effects and tranSItIOn models can b d 'th 1" . " ,. e Use WI mu trvarrate categoncal data to eIther model a condItIonal regression coefficient as described in Chapters 9 and 10 or as a, basis for constructing a correlated data likelihood with a marginai regressIon s~ructure that is directly modelled. In margrnal regression m d I h parameters represent the choa e s, : e mean (or first moment) regression lence with binar t nge III expected response, such as preva" . , y au comes per un't h I C ange III a gIven predIctor WIthout conditioning on the t h ' , 0 er responses 0 l ' , X r any atent varIables, CorrelatIOns among elements of y . ' ,given . even'f d unobservable latent v 'bl " I reasonably attributed to share ana es or thr h comes, is aCcounted for b ' oug a dependence on the past outy a separate d d 1 ad vantages of a direct m . I epen ence model. There are severa argma approa h F' parameters, (3M is invarI'a t 'h c, Irst, the interpretation of mean ' , ' n Wit respect t mo d e.I Two data analy t 'h 0 speCIficatIon of the dependence s s WIt the same mean regression but different

MARGINALIZED MODELS 217

association models have exactly the same tar get of . , M . sense, the mean model is 'separable' fro th e;;tlmahon, {3 . In thIS the joint distribution. Second marg'ln m, . e rlsemalllder of the model for t' d' , a rna d e can b using semi-parametric methods such as ener r ~ es~mate eIther (GEE), or using li.keli?ood methods descr~bed ~~~::.eshmatlllg equations Often the motIvatIon for adopting a rand a . ' condi tlOns on I atent vanables U· or for use of a tr om'fellects model that . ' ~]' ansI Ion model that conditions I I' on t 1e past outcomes, IS SImply to account for . .' . corre atlOn among repeated measurements. In discussIllg the role of statl'st' 'I die . ., , lea mo e s, ox (1990) comments: It is important to distinguish the parts of the model th t I f i ' , . . ' . a (e me aspects of subject matter mterest, the pnmary aspects and the secondary ~'spoct th t . d'

. , . " . ~ c S . a. efficient methods of estimatIOn and assessment of precision. (page 171)

III

Icate

Especially in empirical models, it is desirable that parameters ( t t e.g. con ,ras 5, ' " regreSSIOn coeffiCients and the lIke) have an interpretation largely independent of the secondary features of the models used. (page 173) Therefore, if the primary objective of longitudinal analysis is to make inference regarding the mean response as a function of cova;iates and time, then a marginal mean model may be useful. In this section we describe how the correlation among observations can be charaderized by embedding the marginal mean structure within a complete multivariate probability model based on either log-linear, random effects, or transition model dependence assumptions. Marginalized models separate the model for systematic variation (mean) from the model for random variation (dependence) when using regression models for longitudinal data. In the first component of a marginalized model a generalized linear model is used for the marginal mean:

(11,3.1) However, the marginal mean only identifies one facet of the complete multivariate distribution for the response Y i , In order to complete model specification we need to characterize the dependence among re~eated . I'Ize d rna d eI we specify a second regresSIOn to observations. In a margma characterize correlation or random variation: dependence: h{E(Yij ,Xi, A ij )} = Liij(Xi)

+ 'i':jAij'

(11.3.2)

. , 'bl A·· that are used to structure Here we introduce addItIonal varra es 'J al th I'nk funed rements In geneI', e 1 dependence among the repeate meas~ alth h we usually choose tions in (11.3.1) and (11.3.2) can be dIfferent d ~~gendence parameters them to be equal so that mean parameters, an, A P _ {Y. ' k =1= j}, 'ble chOIce IS i] ,k' are on a common scale, O ne pOSSI t I all other response ' case the parameter lij m ' dI'cates how S rang Y I n thIS

218

I- IKELIHOOD-BASED METHOD

S FOR CATEGORICAL DATA

Y.. Although we assume the ent response, ') . . fi . f . bl es, Y.,k, predict t h e curr. al mean gIVen . by (11.3.1), speci catIOn a vana . lysis focus is on the margm . f I for describing within-subject corana. , (11 32) IS use U 1St" 11 32 the conditional mean ~n, 'the 'oint distribution of Y i . n ec IOn .. J d I ' that are based on (11.3.1) and t ·on and for identlfymg I re a I . I' d I -linear rna e s U we describe 'margma Ize ,og , Alternatively, we may let Ai) = i, a (11 32) using A ij = {l'ik. k =1= J}. , bles In this case we also need coll~~tion of random effects or I~tbentt. var~~ th~ random effects. Here 'Yij I tion distrl u IOn . d f b to describe the popu a h t haracterize the magllltu e 0 uno C I t ' I . mponents t a represents varIance co " d all within-person carre a Ion. n , h . tion whIch III uces Served or ran d am varIa . I' d latent variable moclels t at are ' d 'b 'margma Ize Section 11.3,3 we escn e ' A _ U. Finally we may consider ,., I h (11 3 1) d (11 32) usmg ij based on ." ,an=?i " ;here (11.3.2) now descr.ibes how strong y ~ e A ij = {l'ik' k < J} ') t t orne In SectIOn 11.3.4 we descnbe past responses pred~c.t the ~~:~:~t~~ care based on (11.3.1) and (11.3.2) 'V" characterizes how strongly the past 'marginalized translItlOn ~ using A ij = ?iij' n th IS case I I) t' edict the current outcome. marginalized models the parameter !::.ij (X i) represent,s , and dependence · f the marainal means , ItM parameters, 'Y~J' t f a unc IOn 0 b' ') • 'h h 'I such that the conditional model in (11.3,2) is consIstent WIt t ,e margma .. )] . Stated alternatIvely, when mean structure: It··M = E A., [E(Y.' 'J I X ., A ')

obs~:v:~~~A~ese

the conditional model in (1i~3.2) is averaged over the distribution of A ij the value D. .. (X -) is chosen such that the resulting marginal mean structure . properly ' ) . III ' duce d , Itij M -- E A [h- I {!::.', J (X,) LikelihoodIS ' + ""I,A·}], I 'J IJ ij based estimation methods for marginalized models need to recover !::.ij (X i) and evaluate the likelihood function. In each of the following subsections we first review the model assumptions and then overview ML estimation issues. The structural specification of a marginalized model parallels the components of a generalized linear model for univariate outcomes. McCullagh and NeIder (1989, p. 27) identify a random component that specifies the distribution of the response and therefore determines the likelihood, and a separate systematic component that characterizes the mean response as a function of covariates. Marginalized models for longitudinal data similarly separate the distributional assumptions in (11.3.2) from the regression assumptions in (11,3.1), 11.3.1

An example using the Gaussian linear model

Although in this chapter we focus on the use of marginalized models for categorical response data, the model formulation in terms of (11.3.1) and (11.3.2) could equally be adopted for continuous response data. We briefly r~ent the G~ussian version of three marginalized models to derive anayt1cal expressIOns for D.ij(X i ) and to show that each specific pair of

MARGINALIZED MODELS

219

models, given in terms of the marginal mean (1131) d d' . ItlOnal " an. a con d mean ( 11.3.2), characterizes both the mean and the ' h . covanance, an Wit the ~sumPtI~n of normal errors specifies the multivariate likelihood FIrst conSIder the conditionally Specified Gaussian model: .

Yij I Yik:

k

of. j = /lM + '"' (Jk(Y.k__ /lM) + Etj ') ~.) Ik k#j

=( /l,) M_ fJ ,. 10.1.) .)f..L. E(Yij

+ fJ'.y'i + Eij, 1)

I Xi,A ij ) = D.ij(X i ) +fJ~jAij,

where ()ij is the vector ((Jijl, (Jij2, , , . , (Ji)n ) with () .. :::: 0 and A. :::: Y* 'lJ, I) i' the response vector Y i with Yi) set to zero. We see that D.;)(X ) :::: i (It~J - ()~jJJ,tt) a function of the marginal mean, f..L~I, and the dependence parameters Oij. The covariance of Y i is (1 - 6 i )-IR.;, where 6, is an ni x ni matrix with rows Oij, and R.; is a diagonal matrix with diagonal elements equal to var( Eij). We see that in order to be a valid model we assume i ) is invertible and require the product (1 - 6 )-1 R.; to be symmet(1 i ric, positive definite. Models specified in this fashion are also discussed in the spatial data literature and compared to alternative model specification approaches (see for example Sections 6.3.2 and 6.3.3 of Cressie, 1993). Next consider an AR(p) formulation: t

a

p

Yij

I Yik: k < j

=

It~J +

L rij,k(Yij-k -

/l~-k) + Eij

k=1

M - lijf..Lij , *) = ( Itij

E(Yij

I Xi,A ij ) =

+ lij' H..IJ-I + E'l' ..

!::.ij(X i ) +'")'~jAij,

where 'Yij is the vector ('")'ij,l, )'ij,2,.", rij,p ) , f..Lij* -- (M Itij-I' /lM ij-2,"".ItM) ')-P , A ..J = H,J I where H ij - I = (Yij-1,Yij-2, .. "Yij-p)· Agam we 't · A .. ('x -.)' = (Jl11 _ ",,1.11.*,) a function of the marginal mean, o b a1n l . . J . ' J ' """J 11)r"J . Y 's f-LM and the association parameters lij' The covarIance of i .1 (] ~ r·) -1 D. (I - r ' )-1 where r i is the ni x ni lower triangular m~t~lX , .. "i " r i d to a d'posItIveI with rows defined by lij' Here we find that any i ea sb · Lor r Y i (R; is assumed lagona 'fi d' tothO e f.a h'on are semi-definite covariance rnat nx 1) Models specI e 111 IS as 1 h d' matrix with var( Eij) on t e lagona . Ch ter 3 of Diggle ap , discussed in the time series literature (see for example t d as 1990). Finally, the random effects model can be represen e

Yij , Ui = It~

+ d~jUi + Eij

IlM + d' .a -- ""'J 1)

E(Yij

I Xi, A ij ) =

l 2 /

ei + Eij,

~ij(Xi) + a~jAij,

DS FOR CATEGORICAL DATA 220

MARGINALIZED MODELS

LIKELIHOOD-BASED METHO

N(O,I). Here we simply have , d' a and A··J ;: lor . D'~· h . 1 • d covariance of Yi IS DiG i +'<'1 were where aij;: ij A (X.) I'M. and the mduce ~ij l "IJ' d· is the jth row of D i . h t multivariate normal model can be IJ I I onstrate t a, a " . These examp es (~em E(Y... I X) ;: J'l.~, combmed with condlIJ 1 J b h . . . rgmal means specified usmg rna A' .. ) here A·· can either e t e remammg t' s E(Y; I X IJ' W IJ . { y, k tional expec~a ,lOn, ~J..t. I.'} the past response vanables, ik:' < J.} , response vanabIes {Yik ' k r J , or a latent variable, Ui·

ei

l/ 2

L"

C. rv '01

11.3.2

Marginalized log-linear models ' h t al 1975) have been widely used for the Log linear models (B IS op e. ., 1 d b' t - . _ lassified discrete observations. Ba ance mary vec ors, '_ 1 2 m can be considered as a crossanalysIs of cr~ss)c L" (}'il,Yi2, .. ·,.lin Jor ~ - , , ... " d' d' S t' 822 . . f h ponent responses. As Iscusse III ec Ion .. , clasSificatIOn 0 ten com ., b b'l' . , a Iog-Imeal' mo del'IS constructed directly for the multlvanate pro a 1 Itles (O) log Pro (}'il,' .. , Yin);: (Ji

j

j

+

j
l: em1l'ijYikYiI + .. , + e~n)Yil"'"

Yin·

j
L

=

PrO i (Yil, Yi2, ... , Yij

(2)

= 1, ... , Yin I Xi)

Yik,kf-j

yielding mixtures of exponential functions of the canonical parameters Oi' In a log-~i~ear model, the natural (canonical) univariate regressions are for the condItIOnal expectations logit E(Yij I Yik : k ::=

tP) lJ

f=. j)

+ "e(2)y ~ k

ijk Ik

h{E(Yij

I Xi)} = X;j,8M, f=. j)}

= .:iij(X,)

"(3)

+ L., eijk1YikYil + ... + e~ k
n )

II

Yil.

l#j

Therefore, although 10 -line d . variate dependenc'l g f ar mo els are well suited for describing multIes or or mod 11' '. , e mg Jomt and conditional distributions,

(11.3.3)

+ (};jW'j'

(11.3.4)

where W ij represents Yik, pairwise products Yidil, and higher order products of {Yik: k =J- j}. The fact that (11.3.3) identifie.s the joint distribution of Y i with Oij unconstrained parameters subject only to symmetry con-

egk

e(n)) , Here the canonIcal parameter vector Oi = (Oi ,Oi , ... , i IS unconstrained and e~O) is a normalizing constant. Given covariates, Xi, it is possible to allow Oi to depend on Xi or to extend the log-linear model to describe log P(Y i , Xi) when Xi is also discrete. However, in either case the log-linear model results in complicated functions for the marginal expectations, E(Yij I Xi), because these are obtained as sums over the response variable joint distribution •

E(Yij I Xi)

We now consider the formulation of marginaliz d I I' ' . an d Lair . d (1993) , . models. F 1't zmaunce margmalized' .e I ogI meal' . . l' . . canOlllca og-lmear models to permit Ikehhood-based regressIOn estimat' f h . .Ion 0 t e margmal means . by transformmg the canonical parameter, (}, ::= (0(1) (}(2) (n) , h . d t , , , . • . • B, ) mto t I' mlxe parameter, (}* = (/.lM 0(2) (J(nJ) h M M ' 1 , l ""', , Were /.li = M M M ) . ( /'1.; I , J'l.i2' ... , J'l.in , With /lij ::= E(Yi j I Xi). In their approach the underlying log-linear model parameters ' ((}(2) B(n)) ara llised t d ·b. h . . " ... , i ' · 0 escn e t e covanance of the response vector while the. average . ble IS . . . . .resJ)on • se vana directly modelled via the marginal mean. We use the following pair of regression statements to characterize the marginalized log-linear model

10git{E(Yij I Xi, Yik: k

~ (J(l) y, +"" e(2) y, . y; + L..J ij ij L..J ijk IJ Ik

221

they do not directly facilitate multivariate g l' d r . modelling of the marginal means. enera lze mear regressIOn

ditions such as = e~~~, is known as the Hammersly-Clifford Theorem (Besag, 1974). The term ~ij(Xi) is not a free parameter since it is constrained to satisfy the marginal mean model (11.3.3), The parameters for this model are 13 M and {Oij }j=I' As with all marginal regression models the mean model (11.3.3) is separated from the dependence model (11.3.4) and the parameter 13M retains the same interpretation for any model assumptions used in (11.3.4). The log-linear association parameters in Oij indicate how strongly other outcomes, Yik k =J- j, predict Yij. This model does not exploit time asymmetry and conditions on both past and future outcomes through {Yik: k =J- j} in order to characterize depe~dence. With regard to estimation, Fitzmaurice and Laird (1993) showed how iterative proportional fitting (Deming and Stephan, 1940) can be used ~o transform from the mixed parameter to the canonical parameter .Oi l~ n order to evaluate the likelihood function. In this approach the 2 multI:arI. . vector p 0 (V V V ). ate probabIlIty .1 iI, .1 i2, ... , .1 in IS recovered for each subject, , A related use of log-linea:r models that also permits marginal regresSIOn models is presented by Lang and Agresti (1994), Glonek an~ ~c~ullagh (1995) and Lang et al. (1999). Each of these approaches IS hmlt~d to applic~tions with small or moderate cl~ter siz~s due t~ ~o:rl~;~)o~ demands. In addition, the methods of Fltzmaunce an d aIr.. . the canonical asSOCiatIOn paraeffectively limited to balanced d ata SIllce ch meters (OV>, . .. ,ofn)) must be separately modelled and estimated for ea c1uster size ni.

0:

ZZ2

LIKELIHOOD-BASED

METHODS FOR

CATEGORICAL DATA MARGINALIZED MODELS

. ed atent variable models Marginalzz l er (2000) have

discusse~

how the 9) and Heagerty and. ~eg observed heterogeneity can be for We refer to these models flexlb~hty 0 . marginal regreSSIOn m . uivalently as marginalized combmed with a . ble models, or eq as marginalized latent varIa Md' dom effects models. . the marginal mean J1ij an an asSOCIran . we assume interest In . . S systematic variation. We Once agaIn I th t characterIze . generalized linear mode a bservations is induced Via unobated ume that correlation among 0 t' s Y; . would be conditionally furt her ass d that observa IOn tJ ed IX t s. The model can then be expressed serv latent variables Uij, andam euec independent given these ra~ through the pair of regressIOns 11.3.3

He~g~r.ty (1f9~LMMS

character~zIng ~~el

h{E(Yij I Xi)} = XijI 13M , h{E(Yij I Xi,Ud} = ~ij(Xi)

(11.3.5)

+ Uij·

Il~

h-1(X~jf3M)

(11.3.7)

U i rvN{O,G(a)},

. where the parameter a represents vanance compone nts that .determine t (U·) This random effects specification includes random. mtercep s, cov • u... == Iu.·' o random hnes, Uij =: UiO + Ui l ' t ij, an d autoregressIVe random eti:cts. that if G 1/ 2 G 1 / 2 = G(a) and Ai is multivariate standard normal, then U i = G 1 / 2 Ai and (11.3.6) becomes

~~te

h{E(Yij I Xi,A i )} = ~ij(Xi)

difficult for multilevel models with cluster-level covariates since no direct matching of Uij is observed for these contrasts. See Graubard and Korn (1994) or Heagerty and Zeger (2000) for further discussion. Figure 11.2 represents the marginalized latent variable model. The inner dashed box indicates that the marginal regression model characterizes the average response conditional only on covariates. The outer dashed box indicates that the complete multivariate model assumes dependence is induced via the latent variables Uij. The marginalized model in (11.3.5)-(11.3.7) also permits conditional statements via the implicitly defined ~iJ (X;), recognizing their dependence on model assumptions. The parameter ~iJ(Xd is a function of both the marginal linear predictor T/( Xij) = X~j 13M and the random effects distribution Fa(Uij ), and is defined as the solution to the integral equation that links the marginal and conditional means

(11.3.6)

. t' we assume a distributional model for To complete the model speclfica Ion t such as the normal random .. u... '" Fa, indexed by a parame er a, u.IJ' tJ effects model

= E(1l5), =

Jh-I{~ij(X;) +

r - - - -- - - - - - - -- - - - - - - -

~

(11.3.8) Uij}dFa(Uij).

(11.3.9)

- - -- - - -

;

,

,

, ,

' '

,7=-Y=7=7-: :

i

'EJ ~ ~ G::J

+ G 1/ 2 A i ,

which has the general marginalized model form that we describe in (11.3:2). The formulation in (11.3.5)-(11.3.7) is an alternative to the clasSical GLMM (Breslow and Clayton, 1993) which directly parameterizes the conditional mean function, ~ij (X i) = X~jf3c. Recall from Chapter 7 that there is a critical distinction between the marginal parameter, 13M , ang the conditional parameter, f3c. The conditional regression coefficient 13. contrasts the expected response for different values of the measured covanates, 'JJij, for equivalent values of the latent variable U . The marginal ij coefficient does not attempt to control for the unobserved U . when chari acterizing averages. For example, a marginal gender contrast ~ompares the mean among men to the mean among women, while a conditional gender contrast compares the mean among men with Uij = U* to the mean among women who also have Uij = U*. Interpretation of f3c can be particularly

223

c±JG5cJjg Marginal regression

_ _ _ _ _ _ _ _ _ _ .J

GLMM

,,

_ _ _ .J

inalized latent variable mo~el. ~he Fig. 11.2. Diagram representmg a ma:f s ecifies a marginal generalIzed Iminner dashed box indicates that ,theMm~dd ~d from an underlying GLMM for ear model, h{E(Yij I Xi)} = Xij(3 , III uc L

-

-

-

E(Yij I Xi, Ui).

-

-

-

-

-

-

-

•

ODS FOR CATEGORICAL DATA

224

MARGINALIZED MODELS

LIKELIHOOD-BASED METH

2 X.) allowing explicit dependence on We assume that var(Vij) = (7 ( . ' 'xed model formulation, V ij :::::, . th' represents a ml . h ndom intercepts have a dIfferent oovariates Xi smcc 18 d th case were ra lIo + Uil 'Xij, an e d' the value of a cluster-level covarim'agnitude of variation deP2 en Idng on(U. t X = 1) = (7?, In the common X - 0) = (7 an var ,0 I ' ate, var(Uw I i -N{ 2 (OX.)) we can rewrite Vij = (7 (X ij) , €, where 0, (1 ". b 8 case where Uij '" €'" N(O, 1) and the integral equatIOn ecome

h-I{TJ(Xij)}

=

J

h-I{Llij(Xi)

+ (7(Xi) . OcP(O d~,

A.. , th t dard normal density function, Given 1](Xij) and (7(X i ), where 'I' IS e s an A (X) S H t , al tl'on can be numerically solved for L..lij i · ee eager y (X) h h - 1 't the mtegr equa (1999)fordetailsofthelinkagebetween1](xij)an?~i! ,w ~n .- Ogl, , l'mk fun ctl'on and mixing distnbutlOn combmatlOns ' the For certam , and margmal mean can be obtamed ' between conditional transformat Ion . ' analytically, For example, using a probit li~k functIOn a,nd ~aussIan random effects, U = I1(X) for N(O, 1), yIelds the relatIOnshIp

,e e'"

{1](X)}

= E [{Ll(X) + (1(X) '0] = {

Jl

~(X)

+ (12(X)

}

showing that the marginal linear predictor, 1](X), is a rescaling of the conditional linear predictor, ~(X), If the variance of the latent variable is independent of X, then the marginal and conditional model structures will be the same (i.e, linear, or additive in multiple covariates),however, if I1(X) depends on covariates, then the marginal and conditional models will have different functional forms. A key example where heterogeneity or over-dispersion is assumed to depend on covariates is in teratologic applications where the intra-litter correlation is a function of the dose, X (see Prentice, 1986 and Aerts and Claeskens, 1997 for examples using the betabino~ial ~odel), Similarly for count data, Gr6mping (1996) discusses the relatIOnshIp between the marginal and conditional mean for a log link with normal mixing distribution, where exp{1J(X)}

= E[exp{~(X) + u(X) ,OJ = exp{~(X) + ~u2(X)}.

Again, if u(X) = 170 then th 'I , " e margma and conditional models are nearly eqUIvalent, but If heterog 't . II . . d £r enel y IS a owed to depend on covariates as m mIXe euects models the f t' al ' unc Ion form of the marginal and conditional models will d'ffi p' function of Xl er, dexample, a mixed model that has ~(X) a linear function. may ea to a marginal model where 1](X) is a quadratic

t

225

. Figure 11.2 characterizes the marginalized random effects model. The mner dashed box .shows that the marcnnal regresslon . mod d' . " o' e i escnbes systematIc. varIatIOn m the response as a function of . t Th . oovanaes. e margmal . . regressIOn structure IS Just one facet of the multl'va . t d' t ' b ' f . , na e Is.n utlOn 0 y . whIch IS assumed to be of GLMM form. f By introducing the marginally specified model, we allow a choice as to whether the marginal mean structure or the , . condl'tl'on aI mean s t.ructure IS t~e focus of modelling when using a latent variable formulation, There. e~st~ a general correspondence between 7J(X) and ~(X) so that the dIstmctlOn becomes purely one of where simple regression structure is usefully assumed, and what summaries will be presented through the estimated regression coefficients, The choice between marginal or conditional regression models can now be determined by the scientific objectives of the analysis rather than by the availability of only conditional random effects regression models. Estimation for marginalized latent variable models is just as computationally demanding as estimation for GLMMs described in Section 11.2. The likelihood function has the same form as the GLMM likelihood detailed in Section 11.2.1. However, f(~j I Xi, Ud depends on ~ij(Xi) which is a non-linear function of both 13M and the variance components a, The parameter ~ij(Xi) can be recovered using numerical integration to solve the one-dimensional convolution equation (11.3.9). Algorithm details are described in Heagerty (1999) and Heagerty and Zeger (2000). Similar to the GLMM, ML estimation for the marginalized random effects model is currently limited to low-dimensional random effects distributions due to the intractability of the likelihood function,

11,3.4

Marginalized transition models

In Chapter 10 we describe how transition models focus on the d~tribution of ~j conditional on past outcomes Yij-I, Yij-2,":' and covanates ~~. These models are particularly attractive for categoncal data, th~t exhibIt , 2 ." serial dependence smce the coeffi' clent s a f Y;.IJ-1> Y;'J-' , mdicate how strongly the past outcomes predict the current response, However, the;e Id t t to condition on past outcomes 0 are situations where we wou n~ wan 1 t clinical trials ' late X· For examp e, mos make inference regard mg a covar " t fixed final . . f t tent on the response a a are interested m the Impact 0, rea m file over time. In this case v.. measured follow-up time, or on the entIre response pro v d't' on outcomes .I'j-I,.I'J-2,··" we would not want to c,on ~ Ion din the effect of treatment on after baseline when making mference regar 1 gconsidered as intermediate Vij since earlier outcomes should be proper Y variables and not controlled ,for.. f erial dependence that a transition The attractive charactefl~atlOn~ h s marginal regression structure by model provides can be combmed WIt a

S FOR CATEGORlCAL DATA 226

MARGINALIZED MODELS

LIKELIHOOD-BASED METHOD

,.

d I (Azzalini, 1994; Heagerty, 2002), adopting a marginalized tran,sl~o~h:ofir:t_order Markov chain models of In this section we first reVle alt ative but equivalent, model speciern, 'd I d d cribe an Azzalini (1994 ) an es .' f a marginal regreSSIOn mo e used . f the combInatIOn 0 'd t fication III terms 0 f th response on covanates, an a ranto characterize the dependence; e ture the serial dependence in the sition model (Chapter 10).' usel'ktol'hcaPd function. We then generalize the d .d ntlfy a I e I 00 response process an I e . . del to allow pth-order dependence. I , alized tranSItIOn mo first-order margIn d b' y Markov chain mode to accomuce Azzalini (1994) introd athIntaris common in longitudinal data. a 'al depen d ence h dimes that the current response variable modate t e sen mo e assu , der Markov A fi rst -or . I through the immediate prevIOUS response, . d ndent on the hIstory on Y '1' . 18 epe, ') = E(li' I Yij-d, The transition probaI)J ItIes Pij,O = E(Yij I Yik, k < J) d .. J = E(Yi' I Yi-I = 1) define the Markov proE(Y.' I Y. I = 0 an P'J,I lJ J A I" (1994) d' tl parameterize the marginal mean. zza cess'3but d'J0 not Irec y . Inl F' arameterizes the transition probabilities through two ass~mptlOns. I~S~, a ~arginal mean regression model is adopted which constraInS the tranSitIOn probabilities to satisfy

J.l~ = Pij,l 'J.l~-I + Pij,O' (1 - J.l~-I) ,

(11.3.10)

Second, the transition probabilities are structured through assumptions on the pairwise odds ratio ,T,., _ ~IJ -

Pij,J/(I- Pij,d Pij,O! (1 - pij,O

)'

(11.3.11)

227

regression model for how strong) Yo. . d' (2000) describe the dependence :odellu~;:gl~ts Y;dj'l~eagehrty and. ~eger t f C _ E(V . mo e lOr t e condItIOnal I ij I Xi, Yik, k < J) with logit link expec a IOn f1ij 10git{E(Yij

I Xi, ?tij)}

=:

t.ij(X;) + I'ij,1 . Yij-I,

(11.3,12)

where H· = {Yo . k < J'} d The log oddS rat',10 I'i' I . . I 'J I" ,k . . an I'iJ ' I =: log W. 'J IS sImp y a OglStlC regressIOn coefficient in the model th t d't' J, hX Y: a, con I Ions on bot i and ij-l· The parameter t.iJ(X;) equals logit(p·, ) and' d t _ . d' r' I b M 'J,O IS e er mme Imp. IClt y y (3 and I'ij.1 through the marginal regression equation and .equatlOn (11.3.12). Furthermore, a general regression model can be speCIfied for I'ij,l, I'iJ'1 I

= Z'. 1 .1°1 J,)

(11.3.13)

where the parameter QI determines how the dependence of Yo. on v.. IJ I 'J-I varies as a function of covariates, Z;j,I, For example, lij.1 = O'j, allows serial dependence to change over time, and "Iij, I =: 0'0 +0'1 Zi allows subjects for whom Zi = 1 to have a different serial correlation as compared to subjects for whom Zi = O. In general, Zij is a subset of X; since we assume that equation (11.3.12) denotes the conditional expectation of Yij given both Xi and Zij. In summary, the marginalized transition model separates the specification of the dependence of Yij on Xi (regression) and the dependence of Yij on the history Yij-I, Yij-2, .. , ,Yil (auto-correlation) to obtain a fully specified parametric model for longitudinal binary data. A first-order Yil given model assumes that Yij is conditionally independent of Yij-2, Yij -I' The transition model intercept, t. ij (X i), is determined such that both the marginal mean structure and the Markov dependence structure are simultaneously satisfied. Equations (11.3,12) and (11.3,13) indicate how the ~rst-,order dep.e~d ence model can naturally be extended to provide a margmalIzed tranSItIOn model of general order, p. We assume that lij depends on the history only through the previous P responses, Yij -I, , .. , lij -p' ~ pth-order ~ependence model, or MTM(p) can be specified through the paIr of regressIOns: 0

which quantifies the strength of serial correlation. The simplest dependence model 83sumes a time-homogeneous association, I}J ij = I}J 0, however, models that allow Wij to depend on covariates or to depend on time are also possible. The transition probabilities, and therefore the likelihood, can be recovered ~,a function ~f the m~rginal means, J.l~, and the odds ratios I}J ij' ~zzahll1 (1994) prOVides detaIls on the calculations required for ML estimatIon and. establishes the orthogonality of the marginal mean and the odds ratio parameter in the restricted case of a time-constant (scalar) dependence model.

h{E(Yij

H~a~erty

and Zeger (2000) view the approach of Azzalini (1994) as combmmg a marginal mea d I h . ' n mo e t at captures systematic variatIOn III t he response 83 a fun t' f . I that d 'b ' Cion 0 covanates, with a conditional mean mode Th fiestcn des senal dependence and identifies the joint distribution of Yi' e rs -or er marginalized t 't' first assu . ransl IOn model, or MTM(I), is specified by mmg a regressIon structure for the marginal mean E(¥;' I Xi), . ~smg a generalized linear model h( M) _ , M .' lJ IS specified by assu' M k' J.lij - 'X ij (3 . Next, senal dependence mmg a ar ov structure, or equivalently by assuming a

I Xi)} = m~j(3M

0"

(11.3.14) p

10git{E(lij I Xi, Hij)} = D.ij(Xi ) +

L lij,k 'Yij-k

(11.3.15)

k=1

and we can further assume that the dependence parameters follow regres-

0

sion structure Ok

"II. nJ,

~k , k= = Z ''J' k OA I,

1, 0 " ,po

(11.3,16)

LIKELIHOOD-BASED

228

CATEGORICAL DATA METHODS F OR . .

,) rginalized transitIOn model additive ma For example, a second-order ( I ' Yij-I + "It},Z . Yi}-Z, and "Ii},1 ~ C I a depend on the interaction ' assumes.. logit{/LtJ } == Aij(Xi) +h "ItJ /I..e can a s , , == Z'· 2az, Althoug r'J d I J: r simplicity of presentatIOn, Z lal, "ItJ,z 'J, add'tive mO e 10 , 2 we assume an I meter 13M descri bes changes III the avery,..t}, I ' y,. tJ-' 1}-1 the MTM(2) the mean para . ithout controlling for previous n , f covanates, w d d e as a functIOn 0 age relspons ' 1 1 3' a d'lagra m that represents ,a secon -or er onse variables. Figure ,IS h ' er dashed box indicates that the res P " model T e Inn h' b t h inalized transitIOn 'h ar
-_._---------_._-----~-----------------------------~

, ,

MARGINALIZED MODELS

229

constrained and must yield the proper maruinal t' M C . d ' , 0' expec atlon J.L .. when J.L .. IS average over the dlstnbution of the history Ro fi' '}l 'J 1\ (X) , r any mte va ues of Ok as /.Jotj i ranges from -00 to +00 the induced mar' I ' . ' gina mean monotonI dd d Ically Illcreases from 0 to 1. Therefore uiven any fin't ' , " ' O' I ·e va ue epen ence model and any probablhty dlstnbution for the history a ' 1\ __ (X.) 'd 'fi d h ' , Ullique U'J , can b e I entl e t at satisfies both the transition model d th 'I ' an e margma mean assumptIOns, One disadvantage to directly using a transition model with I d ' bes l 'IS t hat the information in (Y. pagged response varIa y'.tp ) I'S cond't' I I ,.,., I lOne upon, and therefore does not contribute to the assessment of covariate effects. With an MTM(P) approach the information in the initial responses regarding J.Lit, and thus 13 M , is included through lower order marginalized transition models involving E(Y;k I X iko Y;k-I, .. "Y;I) for k < p, For example, when using an MTM(2) the likelihood for the initial responses, (Yil, Y;z), is obtained by factorization into Pr(Y;1 I XiI) (determined entirely by JL;1), and Pr(Y;2 I X i2 , Y;I) which involves and the conditional model: logit E(Y;z I X iZ , Y;d = Ai2(X i ) + ;hz,1 'Yil' Note that 1'iz,l is distinct from the first-order coefficient rij, 1, j > 2, in the MTM(2) dependence model, logit(J.t5) = ~ij(Xd+"Ijj,l' Yij-l + rij,Z ' Yij-2, since this model involves Y;j-2 in addition to l'ti-l' Therefore, in applications we estimate separate lower order dependence parameters, To carry out estimation within this general class of models, note that the MTM(p) likelihood factors into the distribution of the first p response variables times the subsequent Bernoulli likelihood contributions with parameters J.lg for j = (p + 1)"", nj:

ttM

Pr(Yil, Yiz, ' , , ,Yin; I X i) = Pr(Yil, Yiz"", Yip I Xi) . Pr(Y;p+l, Y;P+2,""

n,

= Pr(Yil, Yiz, ... , Yip),

II

l'tn; J'liiP+l,X.)

Pr(Y;j I 'liij, X d

j=p+l

Marginal regression : --- - - -- - --. - - - ---. --- -----

L----

_ a

...J

Transition model

-------------------------

g FTih · .11,3. Diagram representing a second-order marginalized transition model. e mner dashed box indo t h I' ed Ica es t at the model specifies a marginal genera IZ I·mear model, h{E{Y,.. I X-)} - 'M, . 'tioD model E(Y. I X '1 '. - Xij{3 , mduced from an underlymg trans l ,

'J

.,Yikk<J).

,h t t with a model for Pr(Yil,"" Ytp ) The basic maximization algorlt m s ar s t' . The key . aJ' d t 'fon model assump IOns. using lower order margm lze , ran~1 I hat transition probabilities, to subsequent likelihood evaluatIOn IS t (.1M d fu tion of the parameters fJ an can be sequentiaJly recovered as a nc t 'to 1J9. directly but the 01, ... , a p • The dependence. ~arame~ers e:;r~~ al"~,. ,a ' We obtain p intercept ~ij(X,;} is an impliCIt functIOn 0

ttS,

230

LIKELIHOOD-BASED MET

HODS FOR CATEGORICAL DATA

EXAMPLES

test the assumption

Aij (X i) by solving the marginal constraint equation:

0'

pH

=. .

"Y1J,p+1

m

Pr(Yij

M 1:tij -

= 1 I Yij-I = Yj-I,""

}j-p

= '!Ij-p)

Y'j-J ,,,,,Y'J-p X

Pr(Yij-1

= Yj-J,""

Up + 1

Yij-P = y,j-p)'

In order to obtain a solution we require the initial state probability Pr(Y;1 = , v, = ,) from which all subsequent p-dimensional probabilities Y,I,"" L,p Y,p y: can be obtained by multiplying Pr(Yik = Yib"" ik+{p-I) = Yik+(P-I)) times jlik+p and then summing over Yik· Details for MTM(2) estimation are provided in Heagerty (2002). . , ' The computational complexity of MTM(p) hkehhood evaluatIOn for subject i is O(ni 2P ). Calculations required to compute and update the p-dimensional history are O(2P ) and each observation requires such calculations, Therefore, with a fixed dependence order the computational burden only increases linearly with the length of the observation series, ni· The order of alternative likelihood methods is generally much greater with the iterative proportional fitting algorithm of Fitzmaurice and Laird (1993) requiring exponentially increasing computational time, O(2 n ;). Azzalini (1994) suggested that the MTM(1) has general robustness properties but only established a consistency result for a restricted scenario. By viewing the MTM(I) as adopting a logistic regression for allows us t~ show t~at the MTM(I) is a special case ofthe mixed-paramet~rmodel of ~Itzma~l'lce and Laird (1993) with "f = VeCbij,l) the canonical log-linear mteract~on parameter. Appendix 1 of Heagerty (2002) provides details that show {3 and 0:1 a:~ orthogonal. The implication of orthogonality is that the ML. estimate . £ . " {3 rem' alllS consistent lor (3 M even if the dependence

J.if

model IS mcorrectly specified. Use of the MTM(I) ML t' t {3.M d a sandw' h . es Ima e an • IC .var~ance estimator (Huber, 1967; White 1982' Royall 1986) proVides ' "in, . . a lIkelthood mativat ed versIOn of GEE appropriate serial data SituatIOns since the point estimat {3.M . . errors can be obt' d . h e WIll be consistent and valid standard allle WIt out requ' .' modelling of "0/' , F mng correct Markov order or correct "J,I· or general pth-order models 13M ( not be orthogonal and .t ' and 01,,," op) may , conslS ent esti t' f requires appropriate de d rna Ion 0 mean regression parameters pen ence modellin One practical advantage of rna' .g. . . several simple proced rgmahzed tranSItIon modelling is that ures can be used t assumptions. First to est bl' h h 0 assess the dependence model ·· a l s t e a propna P ' t e order we can use direct t rausltIon models 'and . regress Y;,J on X d' approxImate score tests for dd ij an pnor responses. Second, f rom p to p + 1) take simpl anf a itiona I Iagged response (increased order e orms. For exampI ' e, usmg a MTM(p) we can

231

. = 0 WIth the statistic

n'i

= '" '" ~.~

Y:ij-(p+l) (Y:ij - {If),

''''I ]=p+2

J

·C· wh:re Ilij IS t h e fitted conditional mean obt . ThIS statistic only approximates th l'k r~med from the MTM(p) model. ignores BAil / oO'p+ I. The approxilnet I e. I ood s~:o~e ~tat.istic because it . I a e score statistIc I . t " . simp y evaluates the correlation b t hS In tutlve SlUce it . . e ween t e (p + 1) I d the condItIOnal residual obtained b fitt" ' . agge response and with the first p lagged responses. y mg a margmahzed transition model L'

11.3.5

"

-

Summary

In this section we have introduced marginalized models for cat . II gitudinal data. These models separate specification of th egoflca onM f h . . , e average response Ilij' .rom t e dependence among the repeated measurements. Depend~ ence IS characterize~. by a second regression model for MC = E(Y.' I Xi, A ij where ~ddl~lOnal variables in A ij are introduced'\o struc;~re ~orr~latJon. MargmalIzed models expand analysis options to allow flexible hkehhood-based estimation of marginal regression parameters.

!

11.4 Examples 11.4.1

Crossover data

We now revisit the two period crossover data (Example 1.5) using GLMMs, and marginalized models.

GLMMs. In Section 9.3.3 we analysed the 2 x 2 crossover data using a GLMM with a random intercept. In Table 11.1 we present estimates obtained using PQL, fixed quadrature, adaptive quadrature, and posterior summaries obtained using MCMC. For Bayesian analysis we use independent normal priors with standard deviation = 10 2 for the regression coefficients, and a gamma prior for the precision, I/G '" f(I, 1). Not surprisingly, we find that PQL greatly underestimates the random effects standard deviation, G I / 2 , relative to the numerical ML estimators. We find only minor differences between estimates obtained using fixed and adaptive quadrature. The posterior mean for G 1/ 2 is larger than the MLE by (5.82 - 4.84)/4.84 = 20%. This is typical when posterior distributions are skewed - the median of G I / 2 is 5.09 with a 95% posterior credible region (2.48, 13.48).

Marginalized models. In Section 8.2.3 we analysed the crossover dat.a with GEE, using the pairwise odds ratio to model within-subject cor~e~atl~n. In Table 11.2 we display estimates obtained using marginal quasI-lIkelihood

232

THODS FOR CATEGORICAL DATA LIKELIHOOD-B ASED ME

EXAMPLES 233

'k I'h d-based estimates for a GLMM analysis of Table 11.1. LI e I 00 . G H . ethods are based on 20 pomt auss- ermlte crossover data. Quad rat ure m quadrature. Fixed, quadrature

PQL

SE

Est.

SE

Est.

Bayes MCMC

Adaptive quadrature Est.

Mean

SE

SD

Conditional mean {3C 0.768 (0.396) Intercept 0.692 (0.420) Treatment -0.356 (0.419) Period

2.153 (1.031) 1.841 (0.928) -1.020 (0.836)

2.170 (1.098) 1.836 (0.900) -1.016 (0.795)

2.644 (1.684) 2.249 (1.322) -1.330 (1.160)

Variance components GI/ 2 1.448 (0.602)

4.969 (2.263)

4.843 (1.747)

5.824 (2.925)

Table 11.2. Likelihood-based estimates for marginalized model analysis of crossover data. Marginalized GLMM Marginalized transition

MQL Est.

BE

Est.

BE

Est.

BE

M

Marginal mean 13 Intercept 0.668 (0.356) Treatment 0.569 (0.379) Period -0.295 (0.378)

0.651 0.577 - 0.326

(0.275) (0.227) (0.222)

Variance components 1.239 (0.549)

5.439

(3.718)

QI/2 0<1

log-likelihood

We can also use either a marginalized 10g-1' transition model for analysis In the h Iilear model, or marginalized . . case were n· - 2 th t are Identical. For analysis we use ' ese wo approaches logit E(Y;2 I X

-68.15

-68.17

log-likelihood

for a standard normal rand a . om euect Ai In th' . the varIance component G1/2 h '. IS specIfication we see that · . " c aractenzes th . to-su bJect vanation in the 10 dds f e magllItude of subjectnot explained by the covaria~e~ X. 0 Trhesponse .tha~ is unmeasured, or . . -M " e margmahzed GLMM swn estimates, {3 in Table 11 2 . regres' · . are nearly Id t' I ~b tamed usmg GEE (Table 82) d h . en Ica to the estimates G I / 2 is similar to that obtai~ed :~h t e Ivar~ance component estimate, GLMM. a c asslcal conditionally specified

-68.11

0.674 0.569 - 0.295

(0.278) (0.231) (0.229)

3.562

(0.907)

-68.32

(MQL), and two different marginalized models R I f ' that penalized . l'k I'h . eca I rom SectIOn 9.2.2 qUasl- I e I ood (PQL) . h . . imate ML inference £ GLMM .IS a ~et od for obtammg approxs. In discussmg the d I t f PQL B reslow and Clayton or (1993) I eve opmen 0 , used for a marginal m a so develop estimating equations that can be ean assumed to b . d d' Clayton (1993) refer t th ' . em uce Via a GLMM. Breslow and . 0 ese estImatlllg e t' M relatIOnship to method d' . qua Ions as QL, and note the s Iscussed m Z t I ( latent variable model fo th eger ea. 1988). The marginalized r e crossover data assumes t h ' . structure and e margmal regreSSIOn

1':1} ,, '

= U ,(X-.) 2 , + 0'1' Y.iI, A.

where Ctl is the log odds ratio measuring the association between y. d Y;2 that !s not expl~ined by covariates. Again, w~ obtain ML estt~:~ for margmal regreSSIOn parameters that are similar to GEE t' t Th f t . . es Ima es. e es Ima e Ct1 = 3.562 mdlCates striking dependence, or concordance between Yi1 and Yi2. ' In smaller sample sizes it is often desirable to consider inference based on the likelihood function directly rather than the Wald statistic. In this data set we have a total of m = 67 subjects. Figure 11.4 shows a profile likelihood curve for the treatment parameter in the marginal regression model. The MLE, t3{'1 = 0.569, is indicated by a vertical dotted line. A dashed line shows the quadratic approximation to the log-likelihood that is the basis for Wald inference. The solid line is log Lp(.er) = log L{y;8(,B{'1)}, the log-likelihood maximized over (.ert , .e~ Ct1) for fixed values of ,B{'1. This profile curve can be used for interval estimation based on likelihood ratio test inversion since the difference between the maximum, logL{y;J(t3{'1)}, and logL p (,B{'1 = b) represents the likelihood ratio test of Ho : ,B{'1 = b. The horizontal dotted line is at (log L{y; 6(,Br)} - 3.84/2), and characterizes a confidence interval based on the critical value ofaX2(dj = 1). We find excellent agreement between the Wald and likelihood ratio-based inference for ,Bf'1 near zero but find differences for greater than 1.0 that lead to a slightly wider confidence interval.

,

.er

Summary. We have demonstrated that it is feasible to obt~in likelih~od based inference using either a conditionally specified-regre~slO~ coeffiCIent with a GLMM, or using marginalized models. The margmalized models adopt the dependence formulation of random effects ~odels, or of .Ioglinear models, but directly structure the induced margmal means VIa a regression model.

S FOR CATEGORICAL DATA LIKELIHOOD-BASED METHOD

234

EXAMPLES

-68

'\

'\ \

-69

\ \

\ -70

.

.........................

......................................... \

\ --

,'' ' '

\

1

\

:

\

i

1

-71

\ \

i i i i

i

\

\ \ \

\

i i

\ \

!

\

i 0.5

1.0

Tx coefficient

Fig. 11.4. Profile likelihood for the marginal treatment parameter using a marginalized log-linear madellikelihood (--) , and the quadratic approximation based on the MLE and the model-based information (------).

11.4.2

Madras schizophrenia data

Th~ :vradras Longitudinal Schizophrenia Study investigated the course of poslt.lVe .an~ negative psychiatric symptoms over the first year after initial hospItalIzatIOn for disease (Thara, 1994). Scientific interest is in factors that c?rre~ate with the course of illness. Our analysis addresses whether the longltudmal symptom prevalence differs across patient subgroups defined by . 11y address the primary question we age-at-onset and gende r. 't0 st a t'IstIca GL MMs and mar . al' d use d . gm Ize mo I'Is to estImate the interaction between . t Ime and age-at-onset and th . t . . . ' e In eractlOn between time and gender III . . a IagIstlc regressIOn mod I W . d e. e COmpare parameter estimates obtame using ML und d'!l' er luerent d e p e n d ' . d using GEE. ence assumptIOns, and estimates obtaIne The Madras Lon it d' I S tudy collected data on six common schizophrenia sym t g ~ Ina p oms. ymptoms were classified into positive symptoms

235

(hallucinations, delusions, thought disorder . affect, apathy Withdrawal) E h s) and negatIve symptoms (flat , . ac symptom was d d 0, ... , 11 during the first year followin ho . r.eco~ e every month j == The prevalence of each positi g sP.ltahzatlOn for schizophrenia. . . . ve symptom declInes from . at InItIal hospitalization to <2007' b h approxImately 70% 10 y t e end of the first _ h'l prevaIence of each negative symptom d ec es r ' . year, w I e the t llld from approxImately 40% initially to 10% after one year H . . eager y an Zeger (1998) I data usmg graphical methods to dis la h .'. . ana ysed these p y t e wlthm-subJect correlation as a function of the time la b t . g e ween measurements The . I ~~~d~penden~e structure suggests strong seriai corr:I:~i~:a:~;~:;l~l~s~~:: ~ oms an strong long-term correlation for the ne ative s m Y ptoms. SpeCIfically, to characterize the dependence between b' g th '. Illary outcomes we u~e I' paIrwIse log .odds ratio I'i.(j,k) defined by (8.2.2) in S~ction 8.2.2. In FIg. ~ 1.5 we plot estImates of the pairwise log odds ratio log{ . . } ver the t~me separation It!j. - tik I ~or each of the six symPtorns.'Yi~Jt~~is fi~~ we dIsplay both empIrIcal estImates obtained using crude 2 x 2 t bl and ~ sm~oth function of It ij - tik I based on regression spline me:ho~~ descnbed I~ H~agerty and Zeger (1998). For the symptom 'thoughts' we find th: paIrWIse log odds ratio between observations 1 month apart is approxl~ately 3.0, but this association decays to approximately 1.0 for observatIOns 5 months apart, and further decays to 0 for observations 10 months apart. Serial dependence models such as the MTM(P) appear appropriate for this symptom and the other two positive symptoms. For regression analysis we focus on the outcome 'thought disorders'. Not surprisingly~ for this specific symptom a large fraction of the N == 86 subjects are symptomatic at the time of hospitalization (56/86 == 65% at month = 0). The crude prevalence of thought disorders decreases during the first year with only 6/69 = 9% presenting symptoms at month == 11. To evaluate if the course of recovery differs for subjects with early ageat-onset or for women we fit logistic regression models with main effects for month (j = 0, ... ,11), age (1 = age-at-onset less than 20 years old, o = age-at-onset ~ 20 years old), gender (0 = male, 1 = female), and the interaction between time and the two subject-level covariates. Evaluation of the interaction terms determines whether the rate of recovery appears to differ by subgroup. The majority of subjects have complete data but 17/86 subjects have only partial follow-up that ranges from 1 to 11 months. Regression analysis of subject discontinuation (drop-out) suggests that subjects who currently have symptoms, Yij = 1, are at increased risk to drop-out at time j + 1 (odds ratio 1.716, p-value = 0.345). Women are also at increased risk to discontinue (odds ratio = 2.375, p-value = 0.11). The potential association between drop-out and the observed outcome data warrant co~ideration.of an analysis that is valid if the drop-out mechanism is MAR. WIthout specIal modification GEE is valid only if missing data are missing completely at

EXAMPLES (d)

Hallucinations

(a)

,

..

6

random (MCAR). See Chapter 13 for further discussion of missing data issues.

o'

6 o.

237

Flat affect

G~MMs.

We first consider analysis of the schizophrenia symptom data GLMMs. Our primary interest is in a group-by-time model that mcludes month, age, gender, and both month: age and month: gender product terms:

~smg

a: o

~ 2

o

'0

........ ~.... ..... .... -2

J.l5 = E(Yij I month;j, age;, gender;, V

L.....----.--.,----,c---.--~--J

4

2

6 delta

8

10

logit(J.l5) =

t

6

4

2

6 delta

(c)

. age;

+ (3f . month;j . gender; + Uij.

Table 11.3 presents estimates using a random intercept model, Vij == U;o. Similar to the crossover data we find that PQL underestimates the variance component, with (;1/2 = 1.827, versus (;1/2 = 2.222 using adaptive quadrature. Our Bayesian analysis adopts independent priors for the regression parameters, f3'j (normal with a standard deviation of 10 2 ), and specifies a gamma prior for the precision (G r-v f(2,2». The estimates obtained using MCMC are quite similar to the ML estimates. Since the conditionally specified GLMM uses a single regression equation that contains both covariates and random effects, the regression estimates /Jc need to be interpreted recognizing that U;j is controlled for. However, in the random intercept model each subject is assumed to have the same change in their log odds

10

8

),

f3~ + f3f . month;j + f3~ . age; + f3f . gender; + f3f . month;j

Apathy

(b)

ij

t

Withdrawal

6 ~

a: o

9

4

;

···Z- : :

I

...:

., ..;...j~....~:....~.._..~._} ... ~~.. _-~~._ ....;/ -

~ 2

;,

10., ~

•

0

0

2

_~._._•••.• _.

:

- :

4

6

8

10

delta t

Fig.

1l.~.

Serial dependence for the Madras schizophrenia data. Pairwise log ratIos are plotted against the time lag for each of six symptoms. Solid hnes are based on a natural spline model using knots at t = 3,5,7 months. The dashed lines ar p . t· 95~ h de . . e om WIse 10 confidence bands. Also shown are t ecru palTWlse log odds ratios and corresponding asymptotic 95% confidence limits. The '0' represents the crude zero-cell corrected log odds ratio computed from a 2 x 2 table of Y., versus Y. Th h e . . 'J ike - represent the confidence limits for t es pomt estimates.

~dds

I

PQL

0

~o •.•~ ~ :._._.~~.. _ :

Table 11.3. Likelihood-based estimates for a GLMM analysis of Madras schizophrenia data using random intercepts, Var(V;o) = G. Bayes MCMC

Adaptive quadrature*

Mean

SD

SE

Est.

SE

(0.388) (0.048) (0.570) (0.533) (0.087) (0.081)

1.087 -0.437 1.438 -0.931 -0.251 -0.086

(0.457) (0.055) (0.678) (0.632) (0.100) (0.090)

1.085 (0.477) -0.439 (0.057) 1.511 (0.683) -0.965 (0.649) -0.262 (0.100) -0.089 (0.092)

Variance components 1.827 (0.643)

2.222

(0.285)

2.259 (0.288)

Est.

Conditional mean f3c Intercept 1.005 Month -0.387 Age 1.180 -0.748 Gender Month.Age -0.204 Month· Gender -0.079 Gl/2

,

*The maximized log-likelihood using adaptive quadrature is

369.88.

238

LIKELIHOOD-BASED MET

HODS FOR CATEGORICAL DATA EXAMPLES

of symptoms over time: , . c) _ f3c + fJc . age; logit(t15+1) -IOglt(l1i) - I 4

239

subject-specific rate of recovery:

+ (35C . gender;

logit(115+d -logit(115) =

I3f + 13; . age. + I3f· gender, + U'I ,

celled when computing the w.ithin. d . t rcept U '0 IS can since the ran am III e 'm '. t f month.age measures the difference · t d'ff: . 8U bJec I er.ence.. The cae clen 0 g early age-at-onset subjects rel. h ate of recovery amon III t e common r f mong the late age-at-onset subjects · to common rate 0 recovery a at IVe. . . II . 'ficant using the random intercept model and appears statlstlCa Y slgm d on MLE). -0251/0.100:::: -2.51, p:::: 0.012, base . . -. d ' t cept assumption to allow eIther random mterWe relax the ran om III er a rnes) or to allow autocorrelated random euects. cepts and sIopes (ran dam I , . d I 11 . 4. In the random hnes rna e we assume · Table Results are presen t ed III

(z -

logit{E(Y;j I Xi, Ui)} :::: X;jj3C

+ UiO + Un' tij,

where tt ' :::: (j -1) represents the month of observation. This model assumes that ea~h patient is following their own linear recovery course, and has a

Table 11.4. Likelihood-based estimates for a GLMM analysis of Madras schizophrenia data using random intercepts and slopes, var(UiO ) = G u , var(Uid :::: GZ2 , corr(Uw•Uii) = R, and with random autocorrelated random effects cov(Uij,Uik):::: G· plJ-kl. Adaptive quadrature* Est. Conditional mean (3c 1.620 -0.620 1.616 -0.953 -0.212 -0.188

Intercept Month Age Gender Month·Age Month·Gender

Bayes MCMC

SE

Mean

SD

Mean

SD

(0.684) (0.129) (0.978) (0.922) (0.180) (0.175)

1.805 -0.709 1.801 -1.108 -0.223 -0.227

(0.780) (0.152) (1.107) (1.064) (0.224) (0.211)

2.162 -1.002 2.641 -0.976 -0.429 -0.369

(1.304) (0.266) (1.748) (1.731) (0.303) (0.297)

(0.555)

3.895 (0.640) -0.604 (0.140) 0.662 (0.114)

Variance components

d/11 2

R QI/2 22 QI/2

3.490 -0.691 0.534

(0.101)

6.322 (1.284) 0.856 (0.031)

p

'The maximized log lik I'h d - e I 00

.

USIng

adaptive quadrature is -345.94.

I'

!he coefficients of month·age and month. gende . In the average rate of recovery rath th d'~ now re~resent differences er ,an luerences In a co of recovery. The maximized log lik l'h d' mmon rate . e I 00 mcreases from - 369 88 £ th ran dom Intercept model to -345 94 £ th . ,or , e increase in model fit. Allowing h~tero;:nei; rr:~:~~ hne.s ~lO~el- a la~ge s~opes increases ~he standard error of the ti~e mai; :~~~te~~dll~::~~slOn tlO~ t~rm coefficlent estimates. While a random intercept model i~~i;:: a slgmficant month·age interaction the estimate u . . r d r d I . " . . smg a more appropnate a~ am mes mo e IS no longer significant (Z = -0.212/180 = -1.18, PMC-MO C·238 , based on MLE). Once again, the estimates obtained using are comparable to the MLEs. Finally, we consider the autocorrelated random effects model logit{E(li)· I Xi, U;)} =

x',(.IC 'JfJ

+ U')'

where cov(Uij , Uik ) = G· p1t,;-tikl. The random intercept model is actually a special case of the autocorrelated random effects model where p ::;; 1, or all Uij are perfectly correlated. The autocorrelated random effects model is prohibitive to fit using ML since the dimension of the random effects in this example is ni = 12. Bayesian estimation using MCMC is felUlible. Table 11.4 presents posterior means and standard deviations using vague independent priors for I3f, a uniform prior for p, and a gamma prior (a r(2, 2) prior) for G I / 2 that is truncated to lie on (0.01,20). The correlation parameter p has a posterior mean of 0.856 with a posterior standard deviation of 0.031. The posterior 95% credible region (0.792, 0.913) is relatively far from the value that defines the random intercepts model, p = 1. The standard deviation of the random serial process is estimated IUl 6.322 (posterior mean). This estimate is substantially larger than the estimated standard deviation in the random intercepts model, or the estimated standard deviation for UiO+Ui1 · tij in the random lines model which ranges from 3.490 when t ij = 0, to 5.702 when tij = 11 (based on MLEs). In general, the posterior point estimates are larger using the autocorrelated random effects model relative to the random intercept or random line models. For example, comparing posterior means for the coefficient of month· age we find the random lines estimate is 1-0.223+0.4291/1- 0.4291 = 48% smaller than the autocorrelated process model estimate. Regression parameter interpretation in this model is somewhat complicated for cluster-level covariates since the regression coefficients are defined in a conditional model that controls for Uij , a random effect which varies over both subjects and time (see Heagerty and Zeger (2000) for a detailed

240

LIKELIHOOD-BASED ME

THODS FOR CATEGORICAL DATA

. t ts this regression model specifies diseussion). However, for tIme con ras C . ( c) _ f3c + f3c, agei + genderi + (ViJ+l - Vi})' logit (P'iH I) - loglt P-ij - I 4 C , . ' r sub'ect i yields f3f + age; + 135 . genderi, Averagmg over tIme, }, 0 .J fthese parameters as the average rate of . 'd' g the interpretatIOn 0 agam provI m . t bgroups In summary, GLMMs can be specific covana e su . recovery among t' nd allow a variety of dependence models d t ompare groups over Ime a H use 0 c h d'~ t random effect assumptions. owever, as g to be considered throu hI eren in the dependence structure need to be see in this example, C anges· , . b' we . t' the estimated regressIOn coeffiCIents 0 tamed considered when mterpre mg using different heterogeneity models.

fJf .

EXAMPLES

Table 11.5. GEE estimates for a 10 . t' . nia symptoms. Model based d ~ .IC regressIOn analysis of schizophrean empmcal standard errors are presented.

GEE ~ independence

fJr .

, I'zzed made1s. We now use marginalized models for analysis of the Mamma symptom data. We specify:

= E(Yij I monthij , agej, genderi), logit(J.t~) = fJff + f3r . monthij + 13:;: . agei + 13~ . genderi + f3!t ,month;j . agei + 13~ . month;j 'gender;. l.l~

In the marginal model we essentially use the subscript i to identify a specHic covariate subgroup rather than an individual, and characterize the difference in the prevalence of symptoms over time for subgroups:

logit(J.t~+I) - 10git(J.tm

= I3r + {3!t ' agei + 13~ . genderi ·

In contrast to the GLMMs the interpretation of the parameters in the marginal mean do not depend on the specific correlation model that is chosen for analysis. Table 11.5 presents GEE estimates for the marginal mean model, '( J.tij M) Ioglt

I M. = Xij{3

!here is a non-significant suggestion that the rate of recovery (time slope) IS faster am~ng women" but recovery does not appear to depend on ageat-onset. Usmg GEE WIth a working AR(I) model yields a coefficient for the age. ~y month interaction of &~ = -0.101 with a Z statistic based on = 119 ,p- vaIue -- 0235 Th empmcal standard ' errors of -0.101/0089 ' . . . e gender by month mteraction is weakly suggestive with 13M = _0.150, Z = -0.150/0.089 = -169 5 . . ,p-vaIue = 0.091. However if a working mdependence m d I ' d ' , 0 e IS use we obtain f3~ = -0.113 with Z = -0.113/0.096 ::::; 118 . ,p-value = 0 238 Un£ rt I model ' ". 0 unate y, the choice of working dependence can Impact pOlnt est' t d' objective criter' Ima es an slgnificance levels, and without Ion we cannot formall h . . II valid estimators. y c oose among varIOUS asymptotlca Y

241

Variable Marginal mean Intercept Month Age Gender Month·Age Month· Gender

Coef. Mod. SE Emp. SE

GEE - AR(I)* Coef.

Mod. SE Emp. SE

13 M 0.643 -0.254 0.811 -0.388 -0.137 -0.113

(0.202) (0.038) (0.305) (0.286) (0.064) (0.063)

(0.305) (0.059) (0.493) (0.449) (0.094) (0.096)

0.553 -0.235 0.638 -0.161 -0.101 -0.150

(0.296) (0.053) (0.440) (0.412) (0.089) (0.090)

(0.291 ) (0.055) (0.461) (0.420) (0.085) (0.089)

*The estimated lag-l correlation is p = 0.590.

By using a likelihood-based method rather than a semi-parametric method we are able to compare alternative dependence models using likelihood ratios. Table 11.6 presents ML estimates adopting both first-order and second-order marginalized transition models. The simplest marginalized transition model is the first-order time homogeneous dependence model, 'Yij.l = G'1,D· Table 11.6 presents the mean regression and dependence parameter estimates for this model. Point estimates and standard errors for (3M are quite close to those obtained using GEE. The estimated first-order coefficient ch,D = 3.166 indicates that the odds of symptoms at month = t are exp(3.166) times greater among subjects who previously had symptoms, }j-l = 1, compared to subjects who previously did not have symptoms, }j-l = O. Recall that since the MTM(I) has {3M orthogonal to 0, the resulting ML estimates, (3, are consistent even if the dependence model is incorrectly specified. To evaluate the specification of the dependence order we can compute score tests using only the MTM(l) fit, or use likelihood ratio tests comparing second-order to first-order models. The score test of 'Yij,2 = Q2,0 = 0 obtained from model 1 is 1.428 with p-value = 0.232. Model 2 is a secondorder marginalized transition model with scalar first- and second-order coefficients. The first-order coefficient model includes the variable 'initial' an indicator variable for month = 1, allowing 01 to be used for both the second-order model, Yij I Yij-l, Yij-2, and the initial state, Yo'I I"'~ which is a purely first-order distribution. The second-order LiD, .. coefficient, &2,0 = 0.650, is significant based .an the W~d stat.1Stlc, Z = 0.650/0.295 = 2.20, p-value = 0.028. ComparIson of devlances ylelds, 6.D = 2 x (337.19-334.44) = 5.50, p-value = 0.064 on 2 degrees of freedom.

242

LIKELIHOOD-BASED METHODS

FOR CATEGORICAL DATA

SUMMARY AND FURTHER READING

Marginalized transition models fitted . ML . ' · usmg permit a thorough mo d e I- base d anaIYSlS of schizophrenia sympt U1 . . 'k l'h d ' . oms. vve use the maximIzed d log- lI e I 00 to establish an appropriate d . '. epen ence model, and then evaluate the eVIdence regardmg differences i th d' n e Isease Course across su bgroups d efi ned by age-at-onset and gender W fi d h . . h . e n t at a second-order tIme m. omogeneous dependence model is appr opna . t e. C,orrespondmg . . regressIOn estImates of the marginal mean structu . d' t h h . re III lca.e t at t e rate of recovery does not vary significantly among tile c0 vanae 't su bgroups.

.' tions and ML estimates using Table 11.6. Generalized estlmatmg. equa ' . ms marginalized transition models for sc!llzophrema sympto . MTM(2) MTM(l) SE Coef. SE Coef. SE Coef. Variable Marginal mean 13 Intercept Month Age Gender Month· Age Month· Gender

M

0.534 -0.236 0.650 -0.142 -0.112 -0.144

First-order coefficient 01 3.166 Intercept Initial Month Second-order coefficient 02 Intercept

log-likelihood

-337.19

(0.300) (0.054) (0.442) (0.413) (0.086) (0.083)

0.576 -0.238 0.588 -0.150 -0.101 -0.140

(0.301 ) (0.052) (0.439) (0.412) (0.086) (0.084)

0.568 -0.234 0.619 -0.161 -0.100 -0.149

(0.295) (0.054) (0.434) (0.407) (0.091) (0.089)

(0.228)

2.911 -0.260

(0.291 ) (0.633)

2.099 0.403 0.156

(0.559) (0.740) (0.096)

0.650

(0.295)

0.597

(0.293)

-334.44

Summary,

W~ have used both GLMMs and marginalized models to

comp~re covanate subgroups over time. In the GUv[M we interpret the

-332.93

We can allow the dependence to change over time using the second-order model, 'Yij.l = 0:1,0 + 0:1,1 . initial + O:l,amonth, and 'Yij,2 = 02.0, whieh allows the coefficient of l'ij-l to depend on time (we could also allow the second-order coefficient to depend on time). Model 3 yields a maximized log-likelihood of -332.93 and 01.3 = 0.156, Z = 0.156/0.096 = 1.625, and reduction in deviance of 6.D = 2 x (334.44 - 332.93) = 3.02 on one degree of freedom (p-value = 0.082) relative to Model 2. Model 3 also achieves the smallest AIC value among the first-order and second-order m?dels considered. A score test for third-order effects leads to 0.072, indieatl~g the adequacy of a second-order model. The observed time trend in se~lal dep~ndence is expected in situations where patients stabilize (either WIth or WIthout symptoms). Fi~ally, ha:ing settled on a dependence model we can assess whether the difference m recove t . . ry ra es comparmg early age-at-onset to late age. at-onset subjects ((3 ) and 4 ~ompanng women to men (f35) are significant. U. Slllg Model 3 ~e obtain (34 = -0.100, with Z = -0.100/091 = -1.10, p = 0.271 and (35 - -0 149 'th Z ' - ' , WI = -0.149/089 = -1.67 p = 0.094. Although both early age t t b' , -a -onse su Jects and women appear to have faster recovery rates subgr d'fF . nominal0.05level. oup I erences m recovery are not significant at the

243

I

I

I'

,

coeffiClent of month as an average within-subject change in the log odds of ~Yn:Ptoms for ~ubj,ects in the reference group (age = 0, gender = 0), whIle m the margmalized models we interpret the coefficient of month as the change in the log odds of symptoms over time for the reference group. Primary scientific interest is in the interaction terms which have parallel interpretations in terms of differences in average rates of recovery for conditional regression coefficients, f3c, or as differences in the changes in the prevalence log odds across patient subgroups. Using likelihood-based methods we can compare the maximized log-likelihood for the various models. The GLMMs yield -369.88 and -345.94 for the random intercept and random slope models, respectively. The marginalized transition models yield -337.19 for the MTM(l) and -332.93 for the time inhomogeneous MTM(2). The maximized log-likelihoods suggest that the serial models provide a better fit for this symptom. Based on Fig. 11.5 we anticipate that serial models will fit the positive symptoms (hallucinations, delusion, thoughts) while models for negative symptoms (flat affect, apathy, withdrawal) will require a random intercept to characterize long-term within-subject dependence.

11.5 Summary and further reading In this chapter we overview ML methods for the analysis of longitudinal categorical data, We focus on the common case of a binary response. . . ,m Chapter 9 0 f cand't' We extend the dlscusslOn I 10 nally specified GLMMs to include details regarding ML and Bayesian estimation me~hods. Idn ' h 'fy the margmal rna Section 11 3 we discuss marginalized mo d eIs wh IC um . d . ffi t models dlscusse m . . d els discussed in Chapter 8 WIth the ran om e ec ~ 10 Chapter 9 and with the transition models discussed m Ch~Pterl' d . tructure to be dIrect y assume Marginalized models allow a regresSlOn s . d ea'ects . d f 'ther a log-lmear, ran am 11' , for the marginal mean mduce rom el t 'tI'on mod. I . dom effects or ranSI or transition model. Alternative y, ran 'fi t' n as discussed in d't' I mean specr ca 10 . els can be fit using theIr con 1 IOna . d d ginal means obtained Chapters 9 and 10 and then estimates of III uce mar

244

LIKELIHOOD-BASED METHODS FO

R CATEGORICAL DATA

dT I) model (see Lindsey (2000) for by marginalizing the fitted (con II JO~ta atl'ons where the marginal struc'f del) n SI u details using a transl IOn mo . back f an indirect approach which fits . . t est the draw 0 tum is of prImary m. er d i d then computes marginal summaries, a conditionally specified mo e a~ umed for a conditional mean, J.l0, . h ession structure IS ass IS that w en regr . M t have the same simple regres. d . I means J.l .. may no then the mduce margma .'. lJ, a roach may not facilitate simple sion structure. Therefore, an mdlrect pp . I ft t . . d t'mates or tests for margma e ec s. covariate adJuste es I ' d . GEE and the efficiency of GEE Mar inal models can be fitte usmg .., g . . d b d It tives is discussed III Fltzmauflce et al. relative to hkehhoo - ase a ema I 'fi d . . (1995) d Heagerty (2002). A proper y specl e , an . 'fi (1993) ' Fltzmauflce ad ffi' t parameter estimates, but when mlsspeCl ed likelihood Ie s to e Clen .' . · d can lead to bIll.'le es t'Imat es. The bias of mlsspeclfied ML estImates for clustered data is discussed in Neuhaus et al. (1992), TenRave et al, (1999), and Heagerty and Kurland (2001). In this chapter we have focused on binary response data. Approaches £ I ngitudinal ordinal data are overviewed in Molenberghs and Lesaffre (~~9~). Heagerty and Zeger (1996) discuss methods based. on log-linear models, while Agresti and Lang (1993) and Redeker and GIbbons (1994) present random effects models. We have focused on GLMMs and marginalized models. Alternative estimation approaches for hierarchical random effects models are discussed in Lee and NeIder (1996, 2001). Alternative ML approaches that directly model marginal means are found in Molenberghs and Lesaffre (1994, 1999), and Glonek and McCullagh (1995).

12

Time-dependent covariates 12.1 Introduction One of the main scientific advantages of conducting a lona-itudinal t d b o' su Y . h b'l' IS tea I Ity to 0 serve the temporal order of key exposure and outcome events. Specifically, we can determine whether changes in a covariate precede changes in the outcome of interest. Such data provide crucial evidence for a causal role of the exposure (see Chapter 2 in Rothman and Greenland 1998). ' There are important analytical issues that arise with time-varying covariates in observational studies. First, it is necessary to correctly characterize the lag relationship between exposure and the disease outcome. For example, in a recent study of the health effects of air pollution the analysis investigated association between mortality on day t and the value of exposure measured on days t, t-l, and t-2 (Samet et al., 2000). Subject matter considerations are crucial since the lag time from exposure to health effect reflects the underlying biological latency. Also, the relevance of cumulative exposure or acute (recent) exposure depends on whether the etiologic mechanisms of exposure are transient or irreversible. Second, there is the issue of covariate endogeneity where the response at time t predicts the covariate value at times s > t. In this case we must decide upon meaningful targets of inference and must choose appropriate estimation methods. In this chapter we adopt the following notation and definitions. ~e assume a common set of discrete follow-up times, t = 1, 2, ... , T, With a well-defined final study measurement time T. Let Yit be the response on subject i at time t. Let Xit be a time varying covariate and Zi a . of baselme, . . . ' t covan'ates . F10r simplicity we assume or tlme-Illvarlan , collectIOn that only a single time varying covariate is considered for analYSIs. We also assume that Y. t and Xit are simultaneously measured, and that for I I t v, d' ctly with Xt. However, cross-sectional analyses we.would corre a e ~ it Ire . • . that only prevIOus covarIate for etiologic or causal analyses we assume . are potential causes of Yit. Thus, III terms · measurement s. X it-I, X tt-2,'" the ex osure directly after of causal ordermg we assume that Xit represents P lit rather than before.

TIME-DEPENDENT COVARIATES

246

AN EXAMPLE: THE MSCM STUDY

. X·,t need to be determined ' fl the covanate . . I in order h that relate a longItudma response Factors that In uence " to select appropriate analySIS approac .es I nalysis Kalbfleisch and Prentice . , te In surVIVa a . , to a time-dependent covafla . , t as 'the output of a stochastIC t mal covana e . ( 1980 p. 124) de fi ne an zn e . d' 'dual under study' in contrast to ' . t d by the III IVI Process that IS genera e 'fi ' th t 's not III uenc ed by the individual under study. an external covanate a J,. I't t e the term endogenous is typically . . h nometncs I era ur Similarly, t'In t e ecO• bles t hat are s t chastically determined by measured o , used to reler to vafla d b ation while exogenous van abies are factors within the system ~dn etrho sertv under study (Amemiya, 1985). . d b f ctors OutSI e e sys em determme y ametncs . I'Iterat ure precise definitions of covariate exogeneI th n e econo t f onditional independence, and statements ity involve both statemen soc x - X . X. X } as the his'H i (t) - { ,1,, '1,2, I, .. ,d fi,t H (t) _ ' Define of parameter separat IOn. Y tor of the covariate through time t, and SImI ar y. e ne i _ y,y y, , .. " Yit}. A covariate process is wIth ,respect to the { ,1, ,2 process 1'f the covariate at time t is condItIonally mdependent of outcome , I all preceding response measurements. We define endogenous SImp y as the opposite of exogenous. Formally the definitions are

exogen~~s

I 'Hi (t), 'Hf (t f(X it I 'Hi (t), 'Hf (t -

exogenous: f(X it endogenous:

1), Zi) = f(X it 1), Zi)

i f(X it

IHf (t IHf (t -

1), Zi), 1), Zi),

where f(x) represents a density function for continuous covariates and probability otherwise. This definition is not the same as that given by Engle et al. (1983) since we have not further discussed specific implications of this assumption, nor commented on the relationship of the process Xit to parameters of interest (our statement is referred to as 'Granger noncausality' by Engle et al. (1983)). Our definition is essentially equivalent to that given for statistical exogeneity by Hernan et at. (2001). One implication of the assumption of exogeneity is the factorization of the likelihood for (XiI! Yit}:

!IX"

Y, I Z,; 9)

~ [n fIl-;, 111;It - 1),111' It - 1), Z,; 9)] x =

[n

!(X"

1111' (t -

1), Z,; 9)]

.c y (9) x .c x (9).

If we further assume th t 9 - (9 9) , . . a - I , 2, where (h and 9 2 are vanatIOn Independent (i.e. (9 1 E ( 1 ) X (9 E ( ), and that 9 is the 2 1X parameter of mterest, then Engle et al. (1983) d fi2 th as strongly ex £ e ne e process ,t ogenous or the parameter 9 1 , One motivation for introducing

~arameters

.

247

the concept of strong exogeneity is that wh th '". I"k I'h d b d' leI 00 - ase mference regarding 8 can en d't'e assumption, IS satisfied ' • t' • 1 con 1 IOn loss,of and therefore analysis can proceed 'thon tXit h WIthout " IlllormatJOn, an explicit model for X·It" WI ou aVlOg to speCIfy implication of exogeneity is that the e t t" f A second important I h ' xpec a. Ion o vL it con d't' 1 JOna on t e entire covariate process (X X X) d d I th ' 11, t2,,"", iT WIll epen on y on e covanates prior to time t Rorm II 1 . exogenous " a y, w len a process IS E(Vit!XiI,Xi2"",Xit"",XtT,Z,)=E(Y.tIXI X t

t,

X· 12,· .. ,

tt-l

Z.) 1

l'

Exogeneity is actually a stronger statement since it implies that Y.. . d ·· II' d It IS con ItIona y III ependent of all future covariates
12.2 An e~ample: the MSCM study Alexander and Markowitz (1986) studied the relationship ?etwe~n n:aternaI 1 ' t' health care utilization, The mvestlgatlOn was d emp oyment an paed Ia nc . that have occurred motivated by the major social and demographIC changes 'd 'th ' 1950 on Iy 12% of marne women in the US since 1950. For example, m . WI t I e while ey preschool children worked outSI'de th e ho m , in 1980 tapprOXlma f the labour 'ld d the age of SIX were par 0 . 45% of mothers WIth ChI ren un er . d the effect of mothers' force. A significant body of research has examme

248

TIME-DEPENDENT COVARIATES

AN EXAMPLE'. THE MSCM STUDY

work on cogm't'lve ane1 ROC!.'al ''''pects of child development while only limited , h ' t' tIthe impact on paedJatflc care utIlIzatIOn, The pnor rf'.Reare mvcs ,1ga ,cc ' , . Mothers' Stre.Rs and Children's Morbidity Study (MSCM) enrolled N = 167 I b et,we,.Dn th" ages of 18 months and.'} years that attended prf'.Rr:h00 I ehI'l (ren , 't ae(ll'atrl'c clinic To be eligible for the study chIldren needed an mner-ci y P '" " " . 'th tlleIr . nlother conditIOns, At , , WI, to be IIvmg ' and free . of chromc , . . entry, . mothers provided demographic and background mformatlOn regardmg their family and their work outside the home, During 4 weeks of foll()w~up daily measures of maternal stress and child illness were recorded, A final data source included a medical record review to document health care utilization, We use these data to illustrate statistical issues that pertain to regression analysis with time-dependent covariates, The specific scientific questions that we consider include:

,<0

, .

.'

"

'

•

249

Illness

,

30 25

. . '

c:

~ Q;

D..

20 15 10 5 0 0

5

10

15

20

25

Day

1. Is there an association between maternal employment and stress? Stress

2. Is there an association between maternal employment and child illness?

30

3. Do the data provide evidence that maternal stress causes child illness? A total of 55 mothers were employed outside the home (55/167 = 33%). We will refer to mothers that work outside the home as 'employed', and will refer to mothers that do not work outside the home as 'non-employed'. The analysis data contains additional baseline covariates including self-reported maternal and child health, child race, maternal educational attainment marital status, and household size. Time-dependent measures for household i included daily ratings of child illness, lit, and maternal stress, Xit, during a 28-day follow-up period t = 1,2" , . ,28. In the first week of follow-up the prev~ence of maternal stress was 17% but declined to 12%, 12%, and 1?% m weeks 2 through 4. The prevalence of child illness similarly declined shghtly ~om 16% in the first week to 14%, 11 %, and 11% in the subsequent weeks. 12 ' 1 shows th ecrud e weekly stress and Illness ' f TFIgure ' prevalence for amfil les With employed mothers and for non-employed mothers. For illness we nd large day-to-day vana 't'Ion b ut observe a trend of slightly decreased prevaIence among children wh th for stress is mar I' ose rna ers are employed. The time course e camp lcated with I d h .., , emp oye mot ers Illitially havmg a higher rate of stres b t f stress than non_empsl uda terhweek 14 the employed mothers report less aye mot ers. To meaningfully address the " '11 asSOCiatiOn between maternal employment and either st~ess or I ness we need t 1£ . confounders. For exam I T bl a contra or several potentIal ing mothers were high Psceh' la e 12.1 shows that the majority of work00 grad uates wh'l 1 OJ' mothers were high school ad 1 e on y 4170 of non-employed gr uates, Also c d th ' ompare to the non-employed mot hers the employed . rna ers were m rk I sus 43%) and to be white (62 W are 1 e y to be married (58% ver/0 versus 37o/c) S' . causaI pathway that lead f o. lnce stress may be m the s rom employment t 01'II ness we do not adjust for

o

25 20 E Q)

.-..

Q;

15

•

.

D··

0

·•..J;1 D

0

•

•

•

--_Ii!.. _

....... D'"

Employed = 0

--_. Employed =1

•• 0 .......................

0

1

D .0

° ·~·~::·~-""~~lJ.:::::::.:-I:J.. ~-- ...... --.G--...a..tJ.._~....

D..

10

o

5

0

o

•• .---.--0 0 •

•

-0

---_a •

•

0 0

5

10

0

• 15

20

25

Day

Fig. 12.1. The prevalence of maternal stress and child illness in the MSCM study during the 28 days of follow-up for those families where the mother worked outside the home (employed = 1) and those families where the mother did not (employed = 0).

any of the daily stress indicators when evaluating the dependence of illness on employment, Similarly, we do not adjust for illness in the analysis of employment and stress. Therefore, the only time-dependent variable in our initial analyses is the study day (time) - a non-stochastic time-dependent covariate. Table 12,2 presents unadjusted and adjusted log odds ratios for the primary exposure variable, maternal employment, and both of the longitudinal outcomes, Using generalized estimating equation (GEE) with working independence the crude association that adjusts for a common temporal trend indicates that working mothers are slightly less likely to

TIME-DEPENDENT COVARIATES

AN EXAMPLE: THE MSCM STUDY

250

251

. ~or mothers who were . t mmarles Table 12.1. Covafla e su d those who were not. employed outside the home an d 0 1 Employe n = 112 Employed n = 55 (o/c)

(%)

Married 0= no 1 = yes Maternal health 1,2 = fair/poor 3 = good 4 = very good 5 = excellent Child health 1, 2 = fair/poor 3 = good 4 = very good 5 = excellent Race 0= white 1 = non-white Education o S high school 1 = HS graduate Household size o = less than 3 1 = 3 or more

0

42 58

57 43

9 33 47 11

17 34 32 17

7 7 55 31

5 16 46 33

62 38

37 63

16 84

59 41

38 62

31 69

have ill children (estimated odds ratio = exp(-0.12) = 0.89) but are nearly equivalent in their rates of reporting stress. Adjustment for covariates has a minor impact on the coefficient of employment in the analysis of illness and indicates a non-significant difference between employed and non-employed mothers. In the adjusted analysis of stress we find a different time pattern for employed and non-employed mothers with a significant group-by-time interaction. Therefore, using GEE we conclude that there is a significant decline in the rate of child illness over the 28 days of follow-up but that there is no significant difference between employed and non-employed mothers. For stress we find a difference in the rate of decline comparing employed and non-employed mothers with a negative but non-significant time (week) coefficient of -0.14 for non-employed mothers, and a time (week) coefficient of -0.14-0.20 = -0.34 for the employed mothers. The regression methods

Table 12.2, Logistic regression analysis of the . t' b assocla ,Ion etween and b~th longitudinal illness and stress using GEE with an Illd:pendence workmg correlation matrix. Time is modelled usin the vanable week = (day-14)/7. g

~mployment

Coef.

SE

Z

Coe£.

SE Z Coef. SE Z Illness Intercept -1.86 (0.11) -16.44 -0.50 (0.39) -1.26 -0.50 (0.39) Employed -1.26 -0.12 (0.17) - 0.69 -0.14 (0.17) -0.83 -0.15 (0.18) Week -0.83 -0.19 (0.05) - 3.59 -0.19 (0.05) -3.59 -0.19 (0.06) Married -3.06 0.55 (0.15) 3.69 0.55 (0.15) 3.69 Maternal health -0.06 (0.10) -0.57 -0.06 (0.10) -0.57 Child health -0.32 (0.09) -3.68 -0.32 (0.09) -3.68 Race 0.48 (0.16) 2.91 0.48 (0.16) 2.90 Education -0.01 (0.20) -0.04 -0.Q1 (0.20) -0.04 House size -0.75 (0.16) -4.84 -0.75 (0.16) -4.84 Week x employed -0-02 (0.12) -0.17 Stress Intercept -1.91 (0.10) -18.50 -0.13 (0.45) -0.29 -0.12 (0.45) -0.27 Employed -0.04 (0.20) - 0.20 -0.25 (0.19) -1.28 -0.28 (0.19) -1.42 Week -0.20 (0.05) - 4.37 -0.21 (0.05) -4.41 -0.14 (0.06) -2.49 Married 0.34 (0.16) 2.12 0.34 (0.16) 2.12 Maternal health -0.29 (0.10) -2.91 -0.29 (0.10) -2.91 Child health -0.26 (0.10) -2.57 -0.26 (0.10) -2.58 Race 0.21 (0.18) 1.17 0.21 (0.18) 1.18 Education 0.52 (0.18) 2.85 0.52 (0.18) 2.86 House size -0.46 (0.16) -2.78 -0.46 (0.16) -2.79 Week x employed -0.20 (0.10) -2.13

that we have introduced in Chapter 7 are well suited for ~alyses that focus on the characterization and comparison of groups over tIme. The final scientific question seeks to determine the casual.effect of stress h'ld 'll Figure 12.2 shows raw stress and illness series for 12 r~ on C I I ness. .. I t f variation in the reportmg domly selected famIlies. We find a arge amoun 0 t during foIlowI bject (#219) reports no s ress of stress. For examp e, one su d f t in the first week up while another subject (#156) r~ports 3 h.a~s 0 sk re:.d 3 days in week 1 day in the second week, 5 days III the t Ir .wee , four. Analysis of these data raises several questIOns: . l'IOn between stress on day t and 1. What is the cross-sectional assoCia

illness on day t? d on dav (t _ k) 'J 2. Does illness at day t depend on prior stress measure for k = 1,2, ... ? t al stress on day t? Does ~~~ predict maternal stress? 3. What are the factors that influ~COe child illness on day (t - k) for k - , , 1

;n;

TIME-DEPENDENT COVARIATES

252

IIIn:SSW-W N

Vq

!'"

14

7

!

"i ~.i

.

21

t'ln~s:-vJL N ~ ~~

IL......i i i.....

Str:ss N ,u4/:......

14

7

28

--A.JJ\-.

IIIn~SS N

21

-"

Y If' Stress: ..c Nlli 1

14

7

28

21

7

28

Y

N ooeu.,t,U,,,.UUleet',,,e

Stress *,11II,:

'.'

14

...

21

N

7

28

14

60

..

21

28

Day

SubJecl =102

SUbject = 42

SUbject = 94

n .

IIln:s~ N Y •• • Stress ,,:: ': /': N 01.

14

21

V

7

28

14

21

Day

Subject = 7

Subject = 96

Subject=34

, ,.. , ::.

~"'_il." ~

7

21

...

Day

I.

N ",

14

..

Day

IIIn~ss...A .. " Stress .'\

7

28

. . **,......,u........

14 Day

21

28

IlIn~ULy Stress

t t~~~. ~" r r'\ 14 Day

21

28

28

Illn:s~ N y

.. ' ,'.:' ':, ' . N '_4..,6 .~ i. i ~ •• 10, ... ~ • 7

Stochastic covariates: full and partly conditional means

12.3

y

Day

7

Xii is endogenous meaning that it is both a predictor of the outcome of interest, and is predicted by the outcome measured at earlier occasions. Certain authors refer to scenarios where the covariate influences the response, and the response influences the covariate as feeAlback (Zeger and Liang, 1991). In this situation the response at time t-l, l'iI- , may be both 1 an intermediate variable for the outcome at time t. lit, and a c~nfounder for the exposure at time t, Xii. This situation leads to a consideration of proper targets of inference and appropriate methods of estimation and is discussed in Section 12.5.

Illness

Day

IlIn~s~

28

Subject = 129

'" '1'

21

Day

lI'n:s~ N ........ .., .. ,.." ........ Y t Stress.; N Ai ".Jr

14

N·;. •·

Subject = 117

Subject =112

7

y

Illness

Day

Day

253

Subject = 156

Subject = 110

Subject =41

Stress~! N .l: \

STOCHASTIC COVARIATES

~

~

Stress! \

N.'

~

w

Jr

7

r

u ",,:

i\i1

!\

14

21

': . •• 28

Day

Fig. 12.2. A random sample of data from the MSCM study. The presence or absence of maternal stress and child illness is displayed for each day of follow-up.

With longitudinal data and a time-varying exposure or treatment there are several possible conditional expectations that may be of scientific interest and thus identify useful regression models. We distinguish between partly conditional and fully conditional regression models since the taxonomy identifies models whose parameters have different interpretations, and relates to assumptions required for valid use of covariance weighted est~ation, such as with linear mixed models, or with non-diagonal GEE working models. For example, if we are interested in the relationship between a response on day t and an exposure on the same day, then we can use E(Y;t I Xjd to characterize whether the average response varies as a function of concurrent exposure measurement. We may also hypothesize a la~ between exposure and ultimate impact on the response so focus our analySIS on E(Y;t I X~t-d, or more genera11y on E(Y.:>t I X .t-k ) for some value ofh k. ~Alternatively, d I the entire exposure history may predict o~tcome .and t dere o~ we m~ .; X X·) erhaps allowmg a Simple epen ence 0 .t E(lit I XiI, i2,.... , >t-I, P * = ~ Xis' Finally, we may target on the cumulative exposure, Xit L:s ,. t' a1aus mean L it .t 's one partly conconditional meano The cross-sec ~~n a1 ex ectations that include onl~ a ditional mean, as are the cond~tlOlnd th~ entire covariate history pnor . t e, or that me u e single lagged covana 'on analysis 0 f Iongt't udinal to time t The first consideratIOn for regre:>sl tion of the target of infer. . . tes is determma . t or data with time-varymg covana . dOt'onal mean is of mteres , f 11 vanate con I I . ence - that is, whether the u.. co ean captures relationships of ~rIm~y whether a certain partly condltIon~ m . f valid and efficient estimatIon interest. The second ISsue IS identIficatIOn 0 methods.

i

0

The first question considers the marginal association between a longitudinal response variable and a time-dependent covariate. In Section 12.3 we discuss regression methods that can be used in this context. The second question deals with the analysis of a stochastic time-dependent covariate and the specification of covariate lags. This issue has received attention in time-series literature, but we only discuss methods for finite lag models since longitudinal series are typically short and methods for infinite series are not necessary. Finally, question three addresses whether the covariate

•

,

••

0

•

0

TIME-DEPENDENT COVARIATES

STOCHASTIC COVAR.lATES

254

. . endo enous the full covariate condimay depend on any or all When the cavan ate process IS 1/ I X, ~ = 1, 2, ... , , k th tional mean, E( I it . . ' . te is exogenous we now at when a covana f H Xi.' owever, 1/ X X, 2 ., Xii)' If we urther couariates • T) - E( I ; I it-I, ,t-,· , E(Yit IXi. 8 = 1,2, ... , 'tt posures predict the response then I the k most recen ex X, ) _ assume that on Y . ' . ll' E(Y.'t I X·.t-, I X , t -2,"" Xit-k, ... , 21 we obtain further slmphficatlO . Th' fore under specific model assumpere , d" I X't-k). E(Yit I X it-I, X it-2, .. :' 'd a finite covariate lag, the partly can ItlOna ) may equal the full covariate conditions such as exogenelty an X. mean E(YitIXit-I,Xit-2,'''' ,t-k

f)

tional mean. . Iy be interested in the cross-sectional may simp . , In. s~me Sl't uationsY; weand X it . In Section 12.3.1 we discuss estimatIOn asSOCIatIOn between. 'alt d I However many longitudinal studies are issues for cross-sectIOn ma e 6 . , h Ih , . th' act of prior exposure on current ea t stainterested III asse6smg e Imp ) 1 S t ' 124 tUB and thus focus on characterizing E(Yit I X it -l, : ' . , Xii . n ec .lOn . th ds using single or multiple lagged covanate val,. . we dISCUSS regressIOn me 0 'n general we may not believe that future exposure causally nes. Althaugh I . ' d .infl uences c ut r health ager estatus n , when the covanate processy IS en . . nous, the fact that [Xit I fir (t), fif (t - 1) 1 depends on. ~i (t). ImplIes t). ThIS l~entIfies one that E(Yit IXi. 8 = 1, 2, .. , ,T) 'I E(Yit I Xis S important scenario where the full covariate conditIOnal ~ean IS n~t equal to the scientifically desired partly conditional mean. SectIOn 12.5 dIscusses methods of analysis when a covariate is endogenous,

255

Pepe and Anderson (1994) h i ' r.l ] " ave c anfied th E [S{3 (fJ, W) = 0 It IS sufficient to assume at to ensure that

/-lit =: E(Yit I Xit)

:=:

E(Yit I XI X t,

.2, ... ,

X)

iT·

(12.3.1)

Furthermore, if this condition is not l' fi d in the cross-sectional association betw::~sXe but substantive il~terest is pendence GEE should be used othe . b' It and ¥it then workmg inde. rWlse lased regr' . obtam. We refer to (12.3.1) as the full . esslOn estImates may assumption. Although Pepe and AndCOvarta(t1e99COTl).dltlOna.l mean (FCCM) . erson 4 focused on th f GEE , t h e Issue that they raise is important f aliI '. . e use 0 methods including likelihood-based method~r 'hong~tudlnal data analysis · . d rna d e Is , sue as lInear and generalized Imear mlxe To understand the importance of the FCCM d' , . h con Ihon we consider t e sums represented by matrix multiplications in the t' t' fun' S{3({3, W): es Ima lng ctlOn

(12.3.2)

.<.

12.3.1

Estimation issues with cross-sectional models

In Chapter 4 we showed that the multivariate Gaussian likelihood equations take the form m

(12.3.3)

:Z:j .

where wijk = Wijk, and Wijk is the (i,k) element of the weight matrix Wi. In order to ensure that E[S{3«(3, W)] = 0 we can consider the expectation of each summand in (12.2.3):

E[xijwijk(Yik - /-lik)] = E {E[Xijwijk(Yik - /-lik) I Xii, X i2 ,··· ,XinJ}

LX~Wi(Yi - X;(3) = 0,

= E {xijwijdE[Yik I XiI, X i2 ,." ,Xin.] - Itik)}'

i==1

l where Wi = [Var(Y i IXi)r . We noted that the resulting weighted least squares solution to these equations enjoys a consistency robustness since the estimator using a general weight matrix Wi remains unbiased even when I Wi =f. [Var(Y i IXiW . Similarly, in Chapter 8 we introduced estimating equation methods where the regression estimator is defined as the solution to the estimating equation

S(3(f3, W) :::::

f; (8;;; )' Wi(Y m

i -

J.£i) = O.

Consistency of the w . ht d I '. elg e east squares estimator and the GEE regres~Ion estImator relies on the assumption that S «(3 b' d' that IS, has expectation equal to zero. (3, IS un lase ,

W)' ,

If the FCCM condition is satisfied then /-lik = E(Yik IXiI, X i2 , .•• , Xini ) and the estimating function is unbiased. However, if /-ltk = E(Y;k I Xik) '1= E(Yik I XiI, X i2 , . , . , X ini ) then the estimating function will likely be biased and result in inconsistent estimates for the cross-sectional mean structure. Finally, if a diagonal weight matrix is used then 8{3((3, W) simplifies to

S.(f'J, W)

~ t, [t,XijW,;;(}\j -"'j)]

(12.3.4)

and S ({3 W) will have zero expectation provided that Itij =. E(Y,j IXij). In thi: c~e the FCCM condition is not required for conSIstent crosssectional estimation.

TIME-DEPENDENT cOVARIATES

256

STOCHASTIC C'QVARlAl'ES

257

12.3.2

A simulation illustration

and the failure of methods that use . lated data under the following non-diagonal covariance weJghtmg we slmu mechanism: (12.3.5) ViI = 'Yo + 'YI Xil + 'Y2 X il-1 + bi + eilt ·

t To I')1 us f,rat e fhp , . FCCM 1ISIiUmp . ' ,Ion

Xii == pXU-I bI>, Cit, (JI'

tv

(12.3.6)

+ fil,

(12.3.7)

mutually independent mean zero.

This model represents the plausible scenario where a ti~e-dependent . t e hfl8 an a utoregressive structure and a response vanable .depends covarla on both current and lagged values of the covariate. The model yIelds the full conditional and cross-sectional mean models E(Yit I Xii,"', Xin ) = 'Yo E(YiII Xit)

+ 'YIXit + 'l'2 X it-l,

= /30 + {3I X it,

where (30 = 'Yo and /31 = 'Yl + P . 'Y2· The induced cross-sectional model remains linear in Xii. In many applications the cross-sectional association between X it and lit is of substantive interest. For example, in assessing the predictive potential of biomarkers for the detection of cancer, the accuracy of a marker is typically characterized by the cross-sectional sensitivity and specificity. Although alternative predictive models may be developed using longitudinal marker series, these models would not apply to the common clinical setting where only a single measurement is available. Pepe and Anderson (1994) conclude that using longitudinal data to estima~e a cross-sectional model requires that either the FCCM assumption be verIfied or that working independence GEE be used. To demonstrate the bias that manifests through covariance weighting we generated data' under models (12.3.5)-(12.3.7) with bi '" N(O 1) e't '" N(O 1) and E'O '" N(O 1) ' N ( 2 ' , J " , , , fa "" 0,1 - P ). Under these assumptions it can be shown that E[xit_Iw*, It(Yt-(3o-(3X)] h-, t I it

* = wit-l,t . '1'2 • (1 -

p2)

indicating the potential for b' 'f th . 'd' d las I e covanates are time varying (p =F 1), pie Icte by X ( . ./.- 0) . . . . used (w* .. . ./.- 0)' ,t-I 12 r ,and a non-dIagonal weight matnx IS ,t-J,t r . For a range of correlations ( - 0 9 0) . each of which conta' d d P - . - .1 we SImulated 1000 data sets me ata on m - 200 b' . tions per subJ'ect Th b su Jects WIth up to 10 observa. e num er of obs t' I' generated as a uni'corl d erva IOns LOr each subject, ni, was . l' n ran om vari bl b mg dat.a missing completel at a e etween 2 and 10, representY random for a final scheduled follow-up of

v.' l,t IS

Table 12.3. Average estimates r {3 . {10 + (3 1 X it ~ased on models (l2.;.5)~(~;.~~e l~ea.r model E(Y;, I Xid =: "f'J = 1 for different values of the co .. ) With 'Yo = O. 1'1 =: 1. and V8rlate auto-correlation.

~

0.9

~l = 1.9

P 131

=:

0.7 1.7

P==0.5 {3 I -

Independence Exchangeable AR(I)

1.90 1.73 1.73

1.70 1.51 1.36

1.5

1.50

p==O.3

/3 1

1.3

1.33

1.30 1.19

1.11

0,89

P == 0.1 1.1

131

T = 10: We est.ima~ed the cross-sectional regression cot>fficient GEE With workmg. mdependence, compound symmetric (excha(31

1.10 1.01 0.74

.

~::g

and AR(I) correlatlOn structures. Table 123 shows tl . I ' ngea ), . d' . . ie Sllnu alton results m Icatmg that GEE using either exchangeable or AR(I) I' tId . corre atlon strueures ea to biased :stimates of (31. For example, when p =: 0.7 the excha~gea~le GEE estimator is negatively biased with a mean of 1.51, and a relative bIas of (1.51-1.7)/1.7 =: -11% while the AR(I) GEE t' at . . '1 I ' ' es 1m or is SImI ar y negatively biased with a mean estimate of 1 36 and I t' b' ( ., a re a lve .las of 1.3~ -.1.7)/1. 7 = :-20%. These simulations illustrate that if regres. slon analysIs mvolves a time-dependent covariate then either the FCCM condition should be verified, or a working independence GEE estimator should be used. 12.3.3

MSCM data and cross-sectional analysis

The results of GEE analysis of child illness, Yit, and maternal stress, Xit, are presented in Table 12.4. The children of mothers who report stress on day = t are estimated to have an illness odds of exp(O.66) = 1.93 the odds of illness among children of mothers that do not report stress. Unless we can verify the FCCM assumption, the GEE exchangeable and GEE AR(l) estimates cannot be assumed valid. Table 12.4 shows that we obtain smaller estimated regression coefficients using non-diagonal covariance weighting schemes. In particular, using an AR(I) correlation yields a coefficient estimate (0.37 -0.66)/0.66 = 43% smaller than the working independence estimate. In the next section we evaluate the FCCM assumption and find that illness, Yit, is associated with lagged maternal stress, Xit-k for k = 1,2, ... ,7. In addition, stress appears strongly ~utocorrela~ed. Therefore we suspect that GEE estimators based on non-diagonal ~Jght matrices are biased. There are important limitations to the cross-sectlO~al summaries that we use independence estimating equations (lEE) to obtam. The cross-sectional association does not imply causation and it is equally plausible that stress causes illness or that illness causes stress. In order

TIME-DEPENDENT COVARIATES

LAGGED COVARIATES

258

• sis of child illness, Yit, and stress, Xii, T~ble .12.4. GEE. analy relation models. Time is modelled using usmg different workmg cor. I adJ'usts for employment, marital 1 14)/7 AnalysIs a so week = ((ay. '1 j h It I race education, and household , status, maternal and ChI ( ea 1, size (not shown). Working correlation Independence

Stress (Xit ) Week

Exchangeable

Est.

SE

Est.

0.66 -0.18

(0.14) (0.05)

0.52 -0.18

SE

(0.13) (0.05) P= 0.07

AR(I) Est. 0.37 -0.20

P=

SE (0.12) (0.05) 0.40

to infer cause we need to address the temporal ordering of exposure and outcome.

12.4 Lagged covariates

In many applications an ent' . rId' Ire covartate history X X a ) e ~n . consIdered as potentiall d' .. ' il, .2, ... , Xii is availchromc disease epidemiol 't . y pre Ichve of the response Yo r X' _ " ogy I IS common to use . . If· n
12.3.4

Summary

Analysis of stochastic time-dependent covariates requires consideration of the dependence of the response at time t, Iit, on both current, past, and future covariate values. We have shown that GEE with working independence can provide valid estimates for the cross-sectional association between Iit and Xit but that covariance weighted estimation can lead to bias. One solution is to consider specification of the regression model as fully conditional on all of the covariates. In our simulation example in Section 12.3.2 this would require inclusion of the necessary current and lagged covariates. However, in other situations there is feedback where the current :e~ponse influences future covariate values, and satisfying the FCCM condItIon would require conditioning on the future covariates. This may no.t be desirable and therefore alternative methods would need to be conSIdered . . . Pepe and An derson (19)' 94 dISCUSS the FCCM assumptlOns requkil~'ed .to Use GEE with general covariance weighting and offer GEE with war ' ch' . E ng dmdependence as a' sa~' e anaI YSIS Olce. Related work is presented ~isc mon al. (1997) and Pan et al. (2000). The FCCM condition is also d cova~~:~es o~rg(e;er~1ustered data analysis where separate 'within-cluster' different ' ffi' I) i), and 'between-cluster' covariates or j( may have coe clCnts In this h f ,., tion of X .. and th' . case t e ull conditional mean /-Lij is a funcI) e covanate values f 11 h . through Xi. See Palta et or a ot er observations, Xik, k =1-), further discussion. al. (1997) or Neuhaus and Kalbfleisch (1998) for

/t

259

A single lagged covariate

In cer~ain applic.ations there is a priori justification to consider the covariate at a smgle lag tIme k time units prior to the assessment of disease status. For example, many pharmacologic agents are quickly cleared from the body so may only yield short duration effects. In this case analysis can use any of the methods discussed in Chapters 7-10 provided the FCCM condition is satisfied or appropriate alternative methods are selected, such as GEE with working independence. It is perhaps more common that the appropriate lag is unknown and several different choices are considered. If regression methods are used for a single time t* then we can formulate a general model using a lagged covariate as

where Zi represents a collection of additional time-invariant covariates, and /-Lit. = E(Iit* I Xjt* -k, Zi)' In this model the coefficient {31 (k) explicitly depends on the choice of the lag, k. When interest is in the coefficient function (3I (k) and comparisons between the coefficient at different lags k and k*, then parily conditional methods described by Pepe et at. (1999) can be used. Such metho~s allow inference on (31 (k) by forming the observations (Yit~, X~t*-k' Zi) usmg multiple values of k and then using GEE with workmg mdependence. More . . general'lze d rmear model'. generally, consider the partly condItiOnal

h{E(Y;;t I Xis, Zi)}

= (30(t, s) + (31(t, s)· Xis + (3~(t, s)· Zi·

TIME-DEPENDENT COVARIATES

260

,

LAGGED COVARIATES

X. may depend on both the response

Here the coefficient of t~e co~anate I '~ertain applications we may 8.'lsume time t and/or the covana~e tl~e:, In (t -.9) and may restrict analysis to that (3(t,.9) is only a functIOn 0 / (~9;~) refer t'o this as a partly conditional pairs such that t > .9, Pepe et a: t' e is included as a predictor, rather . I 'ngle covanate 1m , {X. 8 < t}, or the entire covanate process model smce on? a BI, than the covariate history '8', 12 T} when modelhng lit, {Xis .9 == , "':' I ~ ~ the covariate functions (lj (t, s) the partly Given funetIOna orms d ' GEE with working independd" I del can be estimate usmg con ItlOna mo, panded data set containing (Yit, Xis, Zi) for ence by constructmg an ex t ' n 2 records per subject derived from ni 11 ' (t ) and may con am i " a pairs. ' .9 . GEE 11 s the sandwich variance estImator to vahdly ffi' f t' observatIOns, Usmg a ow d make inference on. the coe clent unc Ions. compute stand ard errors' an . 't' I models that use a SlUgle value, of the The partIy cond IlOna . covanate 1990) process are st rong Iy rel ated to measures of .crosB-correlatlOn' (Dlggle, . · d providing a generalIzed cross-asSOCIatIOn measure, an d can b e vlewe "'" . To recognize this, recall that the cross-correlatIOn p(s, t) == corr(Yit, ~iS) is related to (3ds,t), where E(litIXis) == (lo(s,t)x+ (ll(S,t)· Xis sl~ce (31(S,t) == p(s,t) , ai/a;, ai == /Var(Yit), and as = /~a:(Yis). SImilarly, when lit and Xit are binary, the logistic partly condItIOnal model specifies (31 (s, t) which is the pairwise log odds ratio (Heagerty and Zeger, 1998). Therefore, the partly conditional models provide a method for characterizing the association between two stochastic processes that uses the flexibility of a regression formulation to capture the temporal structure of association between continuous, discrete, or mixed variables.

[Zo(j), Zl(j), Z2(j) model Z (.) - 'I '

261

Z (j)]

Zj =

.. "

p

.

For example'

h

- J ' The regression model for " In t e polynomial With sums of the products Z ( ') , X.. J.L.it IS then a linear model •

1

J

1

J

It-J

as COvanates:

L

0:

h(J.Lit) = (30

+

L(3J' X it _

j

= (30

j=1

~ i30 + ~ "

[t.

+ ~ [Z' ]. ~

JI

X tt - J ,

J=l

Z,(j)

X,,_,] ,

P

= (lo

+

L 'IXi~,I'

1=0

aQ

12.4,2

Multiple lagged covariates

-

(.1

fJO

(3j = '0

+ (3l ' X it-I + (32' X

. Although distributed lag models permit parsimonious modelling of multIple ~agged measurements, the specification of both the number of lagged cova~lates and the degrees of freedom for the coefficient model need to be consldere~. Selection of the number of lagged covariates, L, or the order of the coeffiCIent model, p, may be determined using likelihood ratio tests for nested model~, or ~sing score or Wald tests (Godfrey and Poskitt, 1975) or through conSIderatIOn of a loss function (Amemiya and Morimune, 1974).

12.4.3

When interest is in using the entire covariate history {Xis s < t} as a predictor then methods that use multiple lagged covariates may be needed. The time series literature has considered models for both infinite and finite co~ariate lags. Since longitudinal data are typically short series we review a tillite lag proposal that useB a lower dimensional model for the coefficients of lagged covari~tes, In distributed lag models (Almon, 1965; Dhrymes, 1971) lagge~ coeffiCIents are assumed to follow a lower order smooth parametric functIon, For example, with a finite lag L a polynomial model with p < L can be used to obtain smooth regression coefficients: h{E(Yit IXis S < t )} -

where X~ = " L Z (') X . ,t,1 Wj=1 1 J ' it-j' In matrIX form we obtain (3 = Z' d h(JLJ = X i{3 = XiZ', = (X;)',. I, an

it - 2

+ .. ,+ (lL . Xit-L,

+ 'I . j + '2 . j2 + ... + IP . jP. r

Polynomial models and mit use of t d' d sp me models (linear, cubic, natural) all pers an ar softwa ' h ' be represented l' re smce t e distributed lag model can as a mear model with appropriate basis elements,

MSCM data and lagged covariates

We first consider estimation of the association between illness, Yit, aIld stress, X it- k , using a single lagged stress covariate, We specify a logistic regression that adjusted for baseline covariates, Zi: logit E(Yit I X it -

k,

Zi) = (lo(k)

+ (l1(k) . X it - k + 13;{k). Zi

and used GEE with working independence for inference, In Fig. 12.3 we display the point estimates and 95% confidence intervals for (3I(k), k = 1,2, ... ,7, based on separate fitted models for each value of k, Next we specify a parametric function for 130 (k) and (31 (k) and assume a constallt (32. Using natural splines with knots at t j = 4,8,12,16 we estimate a lag coefficient function, /31 (k), and pointwise standard errors using all possible pairs (Yit, Xis) such that t > s. Figure 12.3 shows the estimated coefficient function and reveals a decaying association that is not significalltly different from 0 after k = 9. To investigate whether maternal stress measured 00 t~e previous? days is predictive of current child illness we use logistic regresslOo controllmg for

LAGGED COVARIATES

TIME-DEPENDENT COVARIATES

262

263

Tabl~ 1~.5. Coefficients of lagged str . of chIld Illness }';t Estirn t ess, X;t-k, as predIctors '.. aes are fro 1" using GEE with working independence rn a o~lSt~c regression employment status, marital status rna and adJust1ng.for week, at baseline, race, education and h ' ~~na~, and chIld health , ouse 0 d sIze.

Lag coefficient function 0.8 0.6

j (!!

0.4

~

0.2

g ~

Coefficient model

.........

~ ~

. . . . ..

0.0

. .

-- ..

. . . . ~ .. ~

;"

...... ,~.: .

~

.

\\........

'0

. . .

.

,.'

. .

.

. .

Saturated parameters = 7

. . .

....... -- .....

-0.2

X it - 1 X it - 2 X it - 3 X it - 4 X it - 5 X it - 6 X it -7

-0.4 -0.6

o

5

10

15

20

25

Time lag (days)

Coefficients of lagged stress in logistic regression m~dels for illness . X £ k- 1 2 Shown IS a smooth lag that use a single lagged covarIate, it-k or - , ,.... . ., . function with pointwise 95% confidence intervals, and the mdlvldual estImates 3

Natural spline parameters = 4

St.ep funct.ion paramet.ers ::::: 3

Est.

SE

Est.

SE

Est.

SE

0.34 -0.05 0.18 0.25 0.22 0.19 0.25

(0.16) (0.15) (0.13) (0.13) (0.14) (0.14) (0.14)

0.24 0.14 0.11 0.17 0.25 0.26 0.21

(0.16) (0.12) (0.12) (0.09) (0.12) (0.11) (0.13)

0.32 0.13 0.13 0.13 0.23 0.23 0.23

(0.16) (0.10) (0.10) (0.10) (0.11) (0.11) (0.11)

Fig. 12. .

for k = 1,2, ... ,7. baseline covariates. To account for the potential correlation in the longitudinal response we rely on empirical standard errors from GEE with working independence for inference. Table 12.5 shows the fitted coefficients for. the lagged covariates Xit-j for j = 1,2, ... ,7. Using separate unconstram:d coefficients we obtain a significant positive coefficient for Xit-l but obtam a non-significant negative coefficient for X it -2' Coefficient estimates for X it - 3 through X it -7 vary between 0.18 and 0.25. Alternatively, we adopt a distributed lag model using a natural cubic spline model for the coefficients ;3j and using knots at j = 3 and j = 5 requires only four estimated parameters rather than seven. Figure 12.4 shows the fitted coefficients and 95% confidence intervals using this model, and fitted coefficients from a monotone model that assumes ;3j = 1'0 + 1'1 . (1/j). The constraints imposed by the spline models lead to less variation in the estimated coefficients of Xit-j. The model assumptions also lead to decreased standard errors f?r the fitted stress coefficients, ~j = zj;y. The monotone model yieldS ;3j = 0.19+0.03· (1/ j) and exhibits minimal variation in the estimates. One disa?vanta~eof the spline models is that the parameterization does not lead to dIrectly mterpretable parameters. The fitted values from an alternative ste~ functio~ m~del,;3j = I'O+l'l'(j ~ 2)+1'2·(j ~ 5) is shown in Table 12.5. ThIS ~odel mdlcates that lagged stress is positively correlated with illness and yIelds lag coefficient values of 0.32, 0.13, and 0.23. Testing ')'1 = 0

Models for lagged stress coefficients

0.6

0-

~

0.4

-~--- ~----- t---- t-:-:-!

III

'C

"8

0.2

-'0::';"::,-......

....................

OJ

0

= 'E .!!?

0.0

0

IE 1Il 0

u

• Saturated. df= 7

-Q.2

c Spline. df=4

v Monotone. df=2 -Q.4

2

3

. I d t S Fig. 12.4. Logistic regression analysIS of agge B res, child illness using distributed lag models.

7

6

5 4 Time lag (days) X'I

k

•- ,

as predictors of

TIME-DEPENDENT COVARIATES

264

. _ (132 = 133 = 134) versus HI: 131 01 ((32 = 13:j = 134) and evaluates Ho· .131 'fi- t It wl'th a Z value of -1.18. Similarly, 1'2 tests . Id a non-slgm can resu Yle S I 13 - (33 = /34 equals the common value of the h ther the common va ue 2 • we. 13 _ a _ 13 We fail to reject equahty of these coefficients later coeffiCIents 5 - 1-'6 - 7· . •

a 66) E ch of these models suggests an assOCIatIOn between (Z for 'Y2 IS. . . tha revious week and current ch'ld h I I'11 ness a Itough maternaI stress III e p ." the statistical significance of the fitted coeffiCients vanes dependmg on the A

•

specific model choice. . . · e choose to use GEE with an mdependence workmg correlaSIllce w . . d I l'k I'h tion model for estimation we cannot use the maxImize og- I e I ood or information criterion such as AIC or BIC to compare the adequacy of different distributed lag models. As an alternative we can assess the predictive accuracy of each model by deleting individual subjects, re-fitting the model, and comparing observed and fitted outcome vectors. We use the c-index, or area under the ROC curve, as a global summary of model accuracy (Harrell et al., 1984). The c-index is 64.1% for the saturated model with 7 degrees of freedom, and is 63.8% for the spline model (p = 4), and 64.2% for the monotone model (p = 2). Thus, these models provide nearly identical predictive accuracy with the monotone model only slightly favoured. Figure 12.5 shows fitted models using k = 1,2, ... , L lagged stress variables for different choices of L. We can read this plot from right to left to determine the first model that has a significant coefficient for the last

Multivariate models with different lags 1.0

• lag =1:7 • lag = 1:5

" lag =1:4 • lag = 1:3 c lag = 1:2

'§

III

"C "C 0

0.5

Cl

g

c:

1

Q)

'0

:E Q)

0.0

.....................

0 ()

-0.5

2

3

4 Time lag (days)

5

265

lag. In the model with L - 7 ha . . we see t t the nfid . co ence mterval for {37 crosses 0 mdlcating non-significance S"1 find that the confidence interval t: {3' ~ml arly, for L := 5 and L == 6 we lor L mtersects a Th I' resent Wald-based hypothesis test £ . ese eva uatlOns rep• s or nested mod Is d . WIth L = 4 where (34 is significant. Finall F' e, an we first reject the coefficient estimates as we cha thY' Ig. 12.5 shows the changes in . nge e order L of th h " e cavanate lag. In general, the value of the coefficient t: lor eac remamIng t . d . erm Increases as we decrease L and remove more distant I agge covanates.

12.4.4

Summary

In this section we first discussed estimation of th coeffi . elent, I3d k),.of a single covariate measured k time units prio t teh r O e response Yit Either separat e parameters for every value k can be est' t d . . h d . Imae usmg standard met 0 s, o~ ~ smooth covanate function can be estimated by adoptin a partly condl~lOnal regression model. Partly conditional models can be .:ed to charactenze the longest lag at which X· and Y. . . S' . ' . tt-k it remam associated. Imllarly, usmg multIple lagged covariates we discussed both saturated m~thods that allow a separate coefficient for each covariate lag, and distnbuted l~g mod~ls that adopt regression structure for the lag coefficients. M~dels With multIple lagged covariates can be useful to describe the associatIOn between the full covariate process and the response process or can be used to parsimoniously predict the response as a function of th~ complete covariate history.

12.5 Time-dependent confounders

o lag = 1:6

~

TIM&DEPENDENT CONFOUNDERS

6

7

Fi.g. 12.5. Coefficients of lagged stress' '. USIng L lagged covariates X. £ k In lOgistIC regression models for illness , ,t-k or := 1,2,. " , L for different choices of L.

Thaditional epidemiologic regression analysis considers a classification of variables that are related to both an exposure of interest and the outcome as either confounders or intermediate variables. A confounder is loosely defined as a variable that is associated with both the exposure of interest and the response, and which if ignored in the analysis will lead to biased exposure effect estimates. An intermediate variable is one that is in the causal pathway from exposure to outcome and an analysis of exposure should not control for such a variable since the effect of exposure mediated through the intermediate variable is lost. In longitudinal studies a variable can be both a confounder and an intermediate variable, leading to analytical challenges. For example, using data from an observational study of HIV infected patients we may hope to determine the magnitude of benefit (or harm) attributable to treatment with AZT on either patient survival or longitudinal measures such as CD4 count. However, we may find that the CD4 count at time t predicts both later CD4 counts and subse~uent treatment choices. In this case CD4 at time 5 < t is the response varIable for treatment received prior to 5, but is also a predictor of, and therefore a potential confounder for, treatment given at future times, t > 5.

time t on treatment received prior to time A naive regressIOn of CD4 a t g treated subjects. Such a finding D4 I I er mean C amon t may revea a ow £' h t a tients who are more sick are the ones may simply reflect the ac1t tha Pb ct'Ions that follow we first summarize . treatment. n t e su se . . that are ~Jven . then we discuss methods of estimatIOn and issues uBmg a Simple examhPle , SCM data. Although the methods that we ly these methods to t e M . . d d app., develo ed for general analysis With a tIme- epen ent ~~:~:~:d:;,v~nbte:i~ seetio:we focus ~n the special case of an en~ogen~us . covariate. A more general an d theoretical treatment can be found m Robms et al. (1999) and the references therein. . f

12.5.1

Feedback: response is an intermediate and a confounder

To clarify the issues that arise with time-dependent covariates consider a . I . f s tudy tJ'mes , t -- 1"2 with exposure and outcome slDgepalfo . . . measureV) Let Y. be a disease or symptom seventy mdIcator (1 = t ment s (Xt-I,.it· disease/symptoms present, 0 = disease/symptoms absent) and let X t = 1 if treatment is given and 0 otherwise. Assume that the exposure X t - I precedes Yi for t = 1, 2 and that YI either precedes or is simultaneously measured with X I. Figure 12.6 presents a directed graph that represents the sequential conditional models: logit E(YI IX o = xo)

= -0.5 - 0.5 . xo, logit E(X I IYI = YI, X o = xo) = -0.5 + 1.0 . YI,

(12.5.2) Xl,

(12.5.3)

where 'H.f = {Xo, X d and hf = {xo, xI}. These models specify a beneficial effect of treatment X o on the outcome at time 1 with a log odds ratio of -0.5 in the model [VI I X o]. However, the second model specifies that the treatment received at time 2 is strongly dependent on the outcome at time one. For either X o = 0 or X o = 1 if patients have a poor initial response (VI = 1) they are more likely to receive treatment at time 2 than if they responded w~ll (YI ~ .O~. Finally, the response at the second time is strongly correlated With the 1llltlal response and is influenced by treatment at time 2.

-G 1

,

and response

0 500

Xo

n

YI n

0

1

0

1

311

189

365

135

XI

0

1

n

194

117

0

Y2 n

1

500

1

0

142 52 96

E(Y2

1 Xo

= 1, Xl

E(Y2

1 Xo

= 1, Xl

E(Y2

1 Xo

= 0, Xl

E(Y2

/ Xo

= 0, XI

1

21

0 71 0

1

27 44

0

1

0

1

0

1

118

227

138

51

84

1

59 59

0

1

166 61

0

1

113 25

0

1

0

1

19 32 42 42

+ 42)/(138 + 84) = 0.30 = 0) = (61 + 32)/(227 + 51) = 0.33 = 1) = (21 + 59)/(117 + 118) = 0.34 = 0) = (52 + 44)/(194 + 71) = 0.36

= 1) = (25

Table 12.6 shows the expected counts for each treatment/outcome path if 500 subjects initially received treatment and 500 subjects did not. This table illustrates the benefit of treatment at time 1 on the first response, Yl , showing that only 134 subjects (27%) are expected to have symptoms if initially treated as compared to 189 subjects (38%) among those not treated. The apparent benefit of treatment is diminished by the second measurement time with 30% of patients showing symptoms among those receiving treatment at both times (Xo = Xl = 1) versus 36% among those not receiving treatment at either time (Xo = Xl = 0). The conditional distributions in (12.5.1)-(12.5.3) lead to a marginal distribution of Y 2 conditional on the treatment at times 1 and 2 represented by the regression structure: logit E(Y21 rif = hf) = -0.56 - 0.13 . Xo

-

0.10·

Xl -

0.04·

XO • Xl·

(12.5.4)

/1 G ·G

Fig. 12.6. Time-dependent covariate, X t -

Table 12.6. Expected counts when 500 subjects are initially treated, X o = 1, and 500 subjects are not treated, X o = 0, when treatment at time 2, XI, is predicted by the outcome at time 1, Y I , according to the model given by (12.5.1)-(12.5.3).

(12.5.1)

logit E(Y2 1 'H.f = hf, YI = YI) = -1.0 + 1.5· YI - 0.5·

5J

267

TIME-DEPENDENT CONFOUNDERS

TIME-DEPENDENT COVARIATES

266

¥t.

This model indicates a beneficial impact of treatment among those subjects that were observed to have treatment at time 1 and at time 2. Note that the marginal expectation is computed by taking averages Over the distribution of the intermediate outcome. Since the intermediate variable influences both the second outcome and treatment assignment we

268

TIME-DEPENDENT CONFOUNDERS

TTME-DEPENDENT COVARTATES

to obtain both Pr(Yzl Xo, XI) and Pr(X I I Xo): need to average over Y:I

Table 12.7. Regression of stress, X it , on illness, Yit-k k == 0,1, and previous stress, X it - k k == 1,2,3,4+ using GEE with working independence.

== Pr(Yz == 1/ Xo == XO, XI == XI)' Pr(Yz == 1, XI == 11 X o == 1) /12(1,1) == Pr(XI == 11 Xu == 1)

/lZ(XQ, XI)

Pr(Y == I,X 1 == IjX o == 1) == z

L

Pr(Yz == I/X I == I,Y == 1

Intercept Yl,X O

Yit Yit-I

== 1)

YI =0,1

X it - I X it - 2 X it - 3 Mean(X it - k , k ~ 4)

x Pr(X I == 11 YI == YI, X o == 1) x Pr(YI == YI I X o == 1), Pr(X I == llXo == 1) ==

L

Pr(XI == I/YI == YI,XO == 1)

Employed Married Maternal health Child health Race Education House size

YI=O,I

x Pr(YI ==

YI

IX o ==

1).

Suppose that the scientific goal is to determine the effect of treatment at both time 1 and time 2 on the final patient status, Y z· We may consider reporting the observed marginal means lJ,z(xo,xd. However, since YI is correlated with both Xl and Yz we recognize that these observed marginal effects do not account for the confounder YI and therefore do not reflect the causal effect of treatment. The marginal structure does reflect a small beneficial effect of treatment at time 1 in addition to time 2, with the coefficient of X o in (12.5.4) equal to -0.13. On the other hand, if analysis controlled for YI then we would obtain the conditional model (12.5.3) that generated the data. In this model which adjusts for YI there is no effect of Xo on the (conditional) mean of Y z since we have conditioned on the intermediate variable YI and blocked the effect of Xo. Therefore the analysis that adjusts for YI does not reflect the causal effect of both X and ~I since by conditioning on an intermediate variable it only charact~rizes direct effects, This simple illustration forces realization that with longitudinal exposures (treatments) and longitudinal outcomes a variable can be both a confo~nder and an intermediate variable. No standard regression methods can t en be used to obtain causal statements. 12.5,2

MSCM data and endogeneity

To determine if there is feedback in th M current child illness d' t e SCM data we evaluate whether pre IC s current and f t that in our analysis in Section 12 4 u ure maternal stress. Note Xii-I, X it - 2 '" and did t thO we only used lagged values of stress no use e current val X t d' " f ore, If Xit is associated with y. th ue i! 0 pre Ict Yit. There,t en we have eVIdence of endogeneity,

269

Est.

SE

Z

-1.88 0.50 0.08 0.92 0.31 0.34 1.74 -0.26 0.16 -0.19 -0.09 0.03 0.42 -0.16

(0.36) (0.17) (0.17) (0.15) (0.14) (0,14) (0.24) (0.13) (0.12) (0.07) (0.07) (0.12) (0.13) (0.12)

-5.28 2.96 0.46 6.26 2.15 2.42 7.27 -2.01 1.34 -2.83 -1.24 0.21 3,21 -1.28

or feedback, where the response at time t (Yid influences the covariate at future times (X it is the covariate for time t + 1). Table 12.7 presents results from a regression of Xii on Yit-k for k == 0,1, prior stress values and covariates. Using GEE with working independence we find a significant association between Xit and Yit even after controlling for prior stress variables. 12.5.3

Targets of inference

Robins et al, (1999) discuss the formal assumptions required to make causal inference from longitudinal data using the concept of counterfaetual outcomes (Neyman, 1923; Holland, 1986; Rubin, 1974, 1978). Define Y;~Xt) as the outcome for subject i at time t that would be observed if a given treatment regime Xt = (xo, Xl! ... , Xt-I) were followed. For example, Y;~O) represents the outcome that would be observed if subject i received no exposure/treatment at all times s < t while y;~I) represents the outcome that this same subject would have if exposed/treated during all times. For a given subject we can only observe a single outcome at time t for a single s~ecified treatment course, All other possible outcomes, Y;~Xt) for Xt ix t are not observed and are called counterfactual or potential outcomes. Defining potential outcomes facilitates definition of estimands that characterize the causal effect of treatment for both individuals and for

x;.

TIME-DEPENDENT CONFOUNDERS

TIME-DEPENDENT CoVARIATES

270

(I) yeO) epresents the causal effect of treata .- it b: t' response at time t. We cannot opulations. For example, Y P , t - 1 on the zth su Jec s ment though tIme , 'fi a t since we only observe one , h' b eet-specI c el1ec directly estImate t IS su J b' t Bra well-characterized study popupotential outcome for each su Jec. 0 I a t as 0 = E[y(l) _ y,~U)] = d fi e the average causa el1ec t ,t lation we can also e n d 1 observe one outcome per sub(I) [(0)] S' any stu y can on y E[Yt, ] - E ~t . IDce . h'ch potential outcome is observed, h' that determmes w I ject, the mec allls~ ~ 1 t' g the observed outcomes to the becomes critically Important or1 re ,a m randomized study the average . I t s For examp e, III a ., potentia ou come. . 'E(Y: I x - I ) is an unbIased estimate res onse among treated subJects, it t '.. ' p (I). th entire study populatIOn smce the assIgnof the mean of Yi! m e d related to the outcome or covariates. . d d t to treatment IS random an un men . d ' d t ' 1 with full compliance havmg m treate an m Thus m a ran omlze fla , . f th t control subjects the treatment assignment IS mdependent 0 e po en. (y(t) yeo») and therefore the observed treatment effect, tlal outcomes i t ' it . d . , I ' " Y. at = mwi it' 1(X,t. -- 1) - .!..mE" Y. t . l(Xt, = 0) is an unblase estImate of the causal effect Ot· In general, we aEsume that subjects i = 1,2, ... , m are rando~ly sampled from a population of subjects and that we seek to ~ake mference regarding the effect that a treatment applied to this populatIOn would h~ve on the outcome of interest. Let Zi be a vector of time-invariant baselme covariates. Define xtl (z) = E[y/(x tl I Z = z] as the average outcome that would be observed at time t within the sub-population defined by Z = z if all subjects followed the treatment path Xt. For example, in the MSCM we are interested in the effect of stress so that Xt = (X I ,X2 , ••• , Xt-d, a possible pattern of maternal stress exposures, and tt~ 1) (z) represents the prevalence of child illness among children with covariate Z = z if their mother reported stress each day through day t - 1. Define the causal effect of continuous exposure as aT (z) = tt¥) (z) - j.J,~) (z ), where T represents the end of study (or other specified time). Robins et al. (1999) formalized the definition of 'no unmeasured confounders' that is required in order to estimate causal effects from longitudinal data. Assume that for each time t the exposure X t is independent of the vector of future potential outcomes given the observed exposure history x through ti~e t - 1, 1t (t - 1) = (X o, Xl, .. " Xt-I), and the observed outcome. hIstory through time t, 1tY (t) = (YI, Y2, ... , Vi). We state this assumptIOn as

ttl

{( yeo) y(l»). S

's

_

}

,s-t+1,t+2, ... ,T ..lXtl1tx(t-1),1tY(t).

~ote that we are assuming that exposure tIme t and therefore can causally effect v I

(12.5.5)

X t is ascertained or selected at but n t~.T Th d t+k 0 It. e no unmeasure

271

confounder assumption is also referred to as the sequential randomization assumption. This assumption states that given information through time t, exposure at time X t does not predict the value of the future potential outcomes. This assumption would be violated, for example, if there existed an unobserved variable u that influenced the likelihood of treatment or exposure and predicted the potential outcomes. A physician who prescribes treatment to the sickest patients is one such mechanism or variable. Although for treated patients we hope that the observed outcome y(1) is larger than the unobserved outcome y(O) (assuming a larger value of Y is better), we expect that both y(1) and yeO) are lower for patients selected for treatment, X = 1, as compared to the potential outcomes [y(O), y(l)] for the more healthy patients that are assigned X = O. Pearl (2000) cites two main reasons that statisticians have been hesitant to adopt causal models. First, causal inference requires assumptions such as (12.5.5) that are not empirically verifiable. Second, causal statements require new notation. For example, in our simple example given by (12.5.1)-(12.5.3) we structure data such that E(Y21 X o = 0, Xl = 0) = 0.30 and E(Y21 X o = 1, Xl = 1) = 0.36. These conditional expectations represent the average of Y2 among subjects observed to receive certain treatment paths. However, causal statements refer to the effect of interventions in the entire population rather than among possibly select, observed subgroups. That is, although we may observe a subgroup that experiences a treatment course, we may not be interested in the mean response in this specific subgroup since it does not generalize to the mean that would be observed if the entire population experienced the same treatment course. Pearl (2000) uses notation such as E[Y2 do(Xo = 1, Xl = 1)] to denote the outcome for the entire population, or j.J,~1) in our notation. Here the notation do(X = 1) indicates the situation where X = 1 was enforced for the entire study population. Pearl's notation emphasizes the fact that we are interested in an average outcome after assignment of the covariate value rather than the average of outcomes in subgroups after simply observing the covariate status, Table 12.8 presents the outcomes YI and Y2 determined by the conditional distributions (12.5.1)-(12.5.3) when the covariate values are controlled rather than allowed to be influenced by Yi. In this case we obtain /.L(l) = E[Y2 / do(Xo = 1, Xl = 1)] = 0.402 and j.J,~O) = E[Y21 do(Xo 2= 0, Xl = 0)] = 0.267 giving a causal risk difference of <>2 == 0.267 - 0.402 = -0.135. We can also calculate the causal odds ratio as [0.267/(1 - 0.267)]/[0.402/(1 - 0.402)] = 0.542. The causal effects are actually larger than associational comparisons conveyed by the observed marginal means where we find E(Y2 / X o = 1, Xl = 1) E(Y2 1 X o = O,X I = 0) = 0.302 - 0.362 = -0.060, and the odds ~atio is 0.760. The observed mean difference reflects the fact that subjects 1

TIME-DEPENDENT COVARIATES

272

TIME-DEPENDENT CONFOUNDERS

Table 12.8. Expec t ed outcomes when treatment is controlled and the causal path leading from Yi to X2 is blocked. All subjects Xo

= Xl = 1

YI

n

Y2

n

0 0

1 0

0 0

0 0

1 0

0

1 0

1 0

0 0

0 0

0 0

1

0 0

1

0 0

1 0

0 0

Xl n

1 269

0 731

1 0

0 0

731

1

0 0

1 133

O'

0 598

1 269 1 0

0 134

1 134

= (133 + 134)/1000 = 0.267 All subjects Xo = Xl = 0 /leI)

Xo n YI n

Xl n ~

n /leo)

1

0 1000

0

1

0 622 0 622

1 0

0 1 0 1 455 167 0 0

1

0 0

378

0 378

1 0

0 0

There have been several novel approaches proposed for obtaining estimates of causal effects from longitudinal data with time-varying covariates. One approach is termed 'g-computation' and is described in detail for binary response data by Robins et al. (1999) or more generally in Robins (1986). In this approach, causal effect estimates are obtained from estimates of the observed response transition model structure under the necessary assumption of no unmeasured confounders, or equivalently sequential randomization (Robins, 1987). Recall that we are interested in the average response at time T after receiving treatment for all times prior to T, compared to the response at time T after receiving no treatment for all previous occasions. Note that we can decompose the likelihood for a binary response ¥il,"" ¥iT and a binary time-dependent covariate X iO ,.'" X iT - I into the telescoping sequence

0

T

1 0

.c =

II Pr[¥it' ?tnt -

1), H; (t - 1), Zi]

t=l

0 1 143 235

0 0

1 0

0 0

1 0

0 0

1 0

0 0

1 0

0 0

1

x Pr[Xit -

0

= (167 + 235)/1000 = 0.402

with a poor response at t' 1 . Ime were more lIkely to seek treatment and th th b . us e su group wIth (X - 1 X ) subJ'ects (i e Y l'k 1° - , 1 = 1 represent somewhat sicker '. I more ley to be 1) d . ~ compare to those subjects observed to follow (X _ follow (X o = X =0 1, XI = 0)- SImilarly, subjects observed to as compared to' sU~J' t bare more lIkely to be subjects with YI = 1 . ee s a served to follow (X - 0 X ) 1 = O. Note that III order to calculate the stochastic assignment of t t causal. effects we have substituted the . . . assignment meeh' rea ment IY;I, X]0, Wit. h a determllllstlC b at tIme 1,[X I alllSm ut h t 1 ~ve. no a tered the response models Pr(YI I X o) or Pr(Y Iy, X X) can identify the st~blel 'b ?l~i' I . This SImple example shows that if we such as the sequence of U1 In~ .blocks of the data generating process, then we may predict th bcohndl:lOnal distributions in (12.5.1)-(12.5.3) e e aVlOur of th e populatlOn ' . ' under alternatIve,

1)

1

l?tr (t -

1),

H; (t -

2), ZJ

This likelihood factors into the likelihood for the response transition model = Pr[Yit !Hr (t - 1), H; (t - 1), Zi] and the likelihood for the covariate transitions .cx = Pr[Xit - 1 IHr (t - 1), H; (t - 2), Zi]' Unknown parameters in .c y and .c x can be estimated using maximum likelihood as described in Chapter 10. Under the assumption of no unmeasured confounders given by (12.5.5) the causal effect of treatment can be identified from the observed data since (12.5.5) implies (Robins et ai., 1999, p. 690):

.cy

°

Estimation using g-computation

0

1

0 0

manipulated conditions and thereby provide estimates of causal effects (Pearl, 2000). 12.5.4

1 1000

0 0

Xo n

273

ni=!

ni=l

°- ,

Thus, we can use the distribution of the observed response at time t among subjects with an observed treatment path Xt and observed response history HY (t - 1) to estimate the conditional distribution of the counterfactual outcome yt(Xt). We obtain the distribution of the outcome at the final time T by using the conditional probabilities to obtain the joint probability for ~lXt) , .•. ,Yi~Xt) and then summing over all possible intermediate paths for

275

TIME-DEPENDENT CONFOUNDERS

TIME-DEPENDENT COVARIATES 274 (x,)

the first t - 1 outcomes, Yil

12.5.5

V(x,);

, ... , it-I

Table 12.9 shows estimated coefficients for a transition model with Yit

= Pr[Yi~x,) = 1/ Zi = Z], /1 (x,) "" P [V(x,) V(x,) = yt-I, ... , Yil Pr[y;~Xf) I Zi = z] = L.J r it ' ,t-I (x,)(z)

as the response variable and with baseline covariates Z" lagged illness, Y,t-k k = 1,2, and lagged maternal stress, X it - k k = 1,2,3, as predic-

I Z , -- Z ],

_

- YI

y,-[

IV = "" L.J II P r [Vex,) r is t

(

)

~ = Ys-I, ... , 1

l8

Vex,) il

-

.Z _

YI,

i -

]

Z ,

Y,-1 s=1

=L

t

IT Pr[Yisl 'Hi (8 - 1) = Ys-I'

Y t-l s=1

'H;

(8 -

1) =

Xs-I,

Zi = z],

. th first 8 elements of the treatment or exposure path of interh were X B IS e l I t ' . a special case of the g-compu t at'zona l algon'th m ' ca est, Xt. ThIS cu a IOn IS . ' . formula of Robins (1986), In our simple example thIs computatIOn IS

/l~I) =

L Pr (Y21 XI = 1, Yi = YI,X

O=

1)· Pr(YI

= Yll X o = 1)

yl

since Y1 is the only intermediate path. Finally, since we can use the observed data to estimate response transition probabilities, Pr[Yis IHr (s - 1) = 1 Z·~ = z] , we can use the observed data to estimate I Lt (8 -1) = x S-, Y8-1' '1.IX /l(x,)(z) and 8t (z) = /l(I)(Z) - /l(O) (z), In general, the g-computational formula can be evaluated using Monte Carlo to simulate response series Yt, ... , Yi for given treatment sequences and specified covariates, and then we can marginalize over Y1 , ... , Yt - I to obtain /l (X,) (z). Such calculations make it clear that in estimating the causal effect we are controlling the sequence of exposures, but are allowing the intermediate outcomes ~, 8 < t, to unfold and therefore we capture any indirect effects that are mediated by these earlier outcomes. In summary, if we collect adequate covariates Z·, such that we are willmg to assume no unmeasured confounders, then we can use the observed da;a to model the sequence of conditional distributions, Pr[Yit IHr (t - 1), 1-f. i (t - 1); Zi], and then use these to calculate probabilities for a final end-point under exposure paths of interest. By so doing we can provide treatment comparisons that are not confounded by exposure feedback. S.tandard alternatives may not provide a satisfactory solution in this setg . Obs~rved marginal associations are biased since they do not control lOr the pnor outcom h' h es w IC are exposure confounders. Transition modt eIs can control for' pnor ou comes but only capture the direct effects of exposure and no effe t d' t d h c s me la e t rough earlier changes in the outcome.

.

;m

MSCM data and g-computation

tors. Although the dominant serial association is first order (the coefficient of Yit-I equals 2.36), we do find significant second-order dependence. Our model includes Xit-I,Xit-2, and X it - 3 but only the coefficient of X,t-3 obtains significance. We performed several checks including assessing further lagged values of both illness and stress but found no substantial departures. Using this model to perform the g-computation involves choosing a final end-point time, identifying a covariate value of interest, Zj, and then generating Markov chains with stress controlled at 1 for all times, and controlled at 0 for all times. Using 28 days as the endpoint time, and focusing on Zi = z* with employed = 0, married = 0, maternal and child health = 4, race = 0, education = 0, and house size = 0, we obtain p,(l)(z*) = 0.189, and p,(O)(z*) = 0.095. This implies that for this subpopulation, continual maternal stress is associated with a J(z) = (0.189 - 0.095) = 0.094 increase in the prevalence of child illness. A causal log odds ratio comparing continual stress to no stress over 28 days in the subpopulation Zi = z* is estimated from p,(l)(z*) and p,(O)(z*) as 0.80. Using a marginal regression model and GEE gives the

Table 12.9. Regression of illness, Yit, on previous illness, Yit-k k = 1,2, and stress, X it - k k = 1,2,3 using GEE with an independence working correlation matrix.

Intercept

Yit-l Yit-2 X it - 1 X it -

2

X it - 3 Employed Married Maternal health Child health Race Education House size

Est.

SE

z

-1.83 2.36 0.33 0.24 -0.14 0.40 -0.09 0.44 0.01 -0.24 0.31 0.01 -0.53

(0.29) (0.16) (0.14) (0.14) (0.15) (0.13) (0.13) (0.12) (0.07) (0.06) (0.13) (0.14) (0.12)

-6.29 14.83 2.31 1.72 -0.93 3.21 -0.70 3.79 0.10 -3.90 2.50 0.06 -4.42

TIME-DEPENDENT CO VARIATES

TIME-DEPENDENT CONFOUNDERS

277

276

. ,] I dds ratio as 1.38 (based on the sum of , ling assoclatJOna og 0 • corresponc ' f f i '. t . Table 125) and the transition model in saturated model coe, clen ,s m f I 0,14 + 0.40) = 0.,'50. Here a dlf(~et effed 0 on Y . .' Ta)t Je12. 9 gives ", , . timate the causal effect which the marginal associatIOn appears to.oveles . th I'k 1'1· d th t t: h X - I increases e I e I 100 a· should be anticipated when JOt tl-k . vI' . tIle likelihood that X,t+k = 1. The MSCM Yit = 1, and I it = mcreases . ' . " Its are in contra'lt to our example given by (12.5.1)- (12.5.3) anaIYSIS feSU f I Y 0 our 'lkelihood that Y 1 = 1. Fma I] y, t Ile va I'ICl't where Xl decrear;cc] th c I . t . I' on the assumption of no unmeasured confounders, y x causaI estlma ,BS Ie les and on the model form used to estimate Pr[Yit I Hi ~t - 1), Hi (t. - 1~, Zi)One limitation to use of the g-computational algonthm for estimatIOn IS that no direct regression model parameter represents the ~~ll hypothesis of no causa] effect, and thus the approach does not faclhtate formal

structures the causal effect of exposure and identifies a single parameter, fh, that can be used to quantify and test the causal effect of exposure. Estimation for MSMs can be obtained using IPTW estimation. Recall that the key reason we cannot use standard methods such as GEE is due to the fact that the prior response, Y;t-k, is a confounder. However. we do not want estimates that control for the prior response since it is also an int~rmediate variable ..I~ the absence of confounding association implies causatIOn and thus obtammg a population where confounding was nonexistent would allow use of standard regression methods. Robins (1998, 1999) and Hernan et at. (2001) discuss how to use weights to construct a pseudo-population that has the causal relationship of interest but is free from confounding. Define the stabilized weights:

testing.

SWi(t)

(024 _

o

0

•

0

12.5.6

Estimation using inverse probability of treatment weights (IPTW)

One of the advantages of using the g-computational algorithm approach to obtaining estimates of causal effects is the fact that a model for the treatment (or exposure) is not required. Stated alternatively, the likelihood £,y contains the parameters necessary to calculate Ot (z) and the telescoping likelihood corresponding to Xit given by LX is ancillary. However, the g-computational algorithm is based on an indirect parameterization of b"t(z) and does not facilitate testing for exposure effects by setting a parameter equal to zero, or permit structuring of exposure effects in the presence of covariates Zi. An alternative approach to estimation based on marginal structural models (MSMs), introduced by Robins (1998) and discussed by Hern~n e~ al. (2001), does require a model for the covariate process, but permits direct regression modelling of causal effects. In this section we first ~escribe the basic model and then discuss methods of estimation using mverse probability of treatment weighted GEE. . Marginal structural models specify the average counterfactual outcome directly a:s a function of exposure and covariates. Define the average If the subpo pu Iat'IOn WI'th Z i = z experienced a treatment outcome . regIme XI:

IL;X,)(z)

=:

E[Yi~xtll Zi

= z].

We formulate a regres . d I" sian mo e lOr the counterfactual outcomes such as

h{IL;Xt)(z)}

= 130 + I3I X,t* + fJ2 z. r-l,'

2,

where, for example, Xit represents cumulative ex * or any other function of the covariat h' t ~osure, Xit = L:. < t Xis, e IS ory. ThiS model parsimoniously

=

II

Pr(Xis=xi.I7t;1(,(S-1)=hi~(S-1),Zi). s < tPr(Xis = Xis l7tr (s - 1) = h~ (s - 1), 7ti'!. (s - 1) = hi'!. (s - 1), Zi)'

These weights compare the probability of the treatment received through time t - 1 conditional only on knowledge of the treatment histories to the probability of the treatment received conditional on both treatment and response histories. The weights SWi(t) would be identically one if the covariate process was exogenous (by definition) and can therefore be viewed as a measure of endogeneity. In practice, the weights SWi(t) will need to be estimated by choosing and fitting models for the numerator, Pr(Xis l7tf (s - 1), Zi), and the denominator, Pr(Xis l7tns - 1), 7tf (s - 1), Z;). Correct modelling of the denominator is necessary for consistent parameter estimation, while the numerator can be any function of (7tf (s), Zi), the choice of which only impacts estimation efficiency not validity. One assumption necessary for use of weighted estimation is that the weights are bounded away from zero, that is: Pr(Xis I (s - 1),

Hr

7tf (8 - 1), Zi) 2': € > O. GEE with working independence and estimated weights SWi(t) can be used to obtain an estimate of the causal regression parameter {3. Weights are necessary to obtain causal estimates, and GEE is used simply to obtain a sandwich variance estimator that accounts for the repeated meaSures. It is interesting to note that the variance of /3 is smaller when weights are estimated, making the sandwich variance estimator conservative. Formal justification for the IPTW estimation is given in Robins (1999) and references therein. To illustrate that weights can be used to construct a pseudo-population where Yit is still an intermediate but no longer a confounder we return to the simple example given by (12.5.1)-(12.5.3) whose expected counts are shown in Table 12.6. In Table 12.10 we reproduce the expected counts for

TIME-DE PEN •

278

TIM&DEPENDENT CONFOUNDERS

DENT COVARIATES

obtain W t e-weight data and Table 12.10. Examp IeofusingIPT d MSM or that corresponds to (1251) ..causal effect estimates for a saturate (12.5.3).

Xo 1 1 1 1 1 1 1 1

~ o o

o o

o o

1 1 1 1

1 1

0

0 0 1

0 1 1

0 1 1

1

0 0

~

1 0 0 0 0

",(Xo=l,Xl=l) ",(XO=1,Xl=O) ",(XO=O,Xl=l) t.t(Xo=o,x 1 =O)

~

1 1 0 0

1 0 1

~

0 1 0 1 0 1 0 1 0 1 0

Expected count

Weight

41.9 41.9 31.6 19.2 25.2 112.8 61.2 166.3 58.8 58.8 44.4 26.9 21.4 96.1 52.1 141.6

0.712 0.712 1.474 1.474 1.174 1.174 0.894 0.894 0.755 0.755 1.404 1.404 1.245 1.245 0.851 0.851

Re-weighted count 29.8 29.8 46.6 28.3 29.6 132.5 54.7 148.7 44.4 44.4 62.3 37.8 26.7 119.6 44.4 120.6

== (29.8 + 29.6)/(29.8 + 29.8 + 29.6 + 132.5) = 0.268 == (46.6 + 54.7)/(46.6 + 28.3 + 54.7 + 148.7) = 0.364

= (44.4 + 26.7)/(44.4 + 44.4 + 26.7 + 119.6) = = (62.3 + 44.4)/(62.3 + 37.8 + 44.4 + 120.6) =

0

re-weighted population YI no longer is associated with Xl. For example, Pr{X l I Y1

= 1, X o = 1)

= (29.8

+ 29.8)/{29.8 + 29.8 + 46.6 + 28.3)

= 0.443,

Pr{X l I Y1 = 0, X o = 1) = (29.6

+ 132.5)/(29.6 + 132.5 + 54.7 + 148.7)

= 0.443,

showing that YI does not predict Xl. However, Y1 does still predict the final outcome and therefore remains an intermediate variable in the pseudopopulation. Therefore, in the pseudo-population YI is only an intermediate variable and not a confounder, and we can use standard summaries in the pseudo-population to obtain causal summaries.

12.5.7

MSCM data and marginal structural models using IPTW

To obtain MSM estimates for the MSCM data we first estimate SWi(t) based on models for the exposure process Xit. For the denominator we use the logistic regression model presented in Table 12.7 where Y it , ¥it-I, lagged stress, and baseline demographic covariates are used as predictors. For the numerator we use the same regression model except that illness measures are excluded (not shown). Using the estimated weights we then use working independence GEE to obtain estimates of the causal regression parameters and valid (conservative) standard errors. Table 12.11 displays the MSM estimates and standard errors. In this model we use the first three

0.302 0.402

as a function of X o, Yl , and Xl. We also compute the stabilized weights SW(2) = Pr(Xll Xo)/Pr(Xll Yl,XO). Notice that subjects with (Xo = 1, Yl = 1, Xl = 1) are down-weighted, SW(2) = 0.712, while subjects with (X o = I'Yl = O,X l = 1) are up-weighted, SW(2) = 1.174. This suggests that our observed population has an over representation of subjects with (X~ = 1, Yl = I,X 1 = 1) relative to our desired pseudo-population where Y1 IS not a confounder. Intuitively this reflects the fact that when Yl = 1, subjects are more likely to obtain Xl = 1 and in order to correct for this selection we up-weight those with Yl = and down-weight those with l Y == 1. To verify that the pseudo-population (re-weighted count) has the causal structure of interest we compute the observed means as a function of X o and X . 1" e . I, margma lZlllg over Yl . We find agreement (to rounding rror) WIth the g-computation results given in Table 12.8. Note that in the

Y2

279

Table 12.11. MSM estimation of the effect of stress, Xit-k k 2' 1, on illness, ¥it.

Intercept

X it - l X it - 2 X it - 3 Mean(Xit _ k , k 2' 4) Employed Married Maternal health Child health Race Education House size

Est.

SE

Z

-0.71 0.15 -0.19 0.18 0.71 -0.11 0.55 -0.13 -0.34 0.72 0.34 -0.80

(0.40) (0.14) (0.18) (0.15) (0.43) (0.21) (0.17) (0.10) (0.09) (0.21) (0.22) (0.18)

-1.77 1.03 -1.05 1.23 1.65 -0.54 3.16 -1.27 -3.80 3.46 1.57 -4.51

TIME-DEP END

280

ENT COVARIATES

r X' lor k -- 1, 2, 3 , and the average of l"lTlTed values of maternal stress, .t-k this model to estimate the casual "'eO , t a t - 4, We can use. d by Bumming t h e cae ffi clents . f 'ndicators prIOr 0 st ress I 28~day perlO effect of continual stress over a 0 18 + 0.71) :: 0.85. The MSM log odds - 0.19 + ' bt . ed using the g-computational the stress predictors: (0.15 h timate 0 am ratio is comparable to tees. b t:: 0.80 in Section 12.5.5). One , f, e covanate su se algorithm (estImate or on h' th a t we can test the causal null, H o: /31 + advantage of the MSM approac, IS d fEcI'ents and standard errors, We , th estImate coe (32 + (33 + (34 = 0 usmg e d lue of 0,046, Therefore, using the obtain a Z statistic of 1.998 a~ ~~a is of no effect of continuous stress MSM we reject the causal null ypo es on the likelihood of child illness.

12.5.8

Summary

. d d usal targets of inference for analysis of This section has mtro uce ca d h . data WI'th endogeno us covariates. We have demonstrate t at longitudmal . t d d methods cannot provide causal summaries, and have mtroduced san ar t'Ion an d MSM with IPTW as alternative approaches. We have g-computa not presented the detailed theory that underlies the cau~al methodology and refer the interested reader to Robins (1998, 1999), Robms et al. (1999), and Hernan et al. (2001) for further detail and additional references,

12.6 Summary and further reading In this chapter we have provided a taxonomy for covariate processes and a taxonomy for conditional means, or regression models. When a covariate process is exogenous analysis focuses on specifying a regression model in terms of cross-sectional associations, E(Yit IXit}, or in terms of lagged relationships, E(tit IX it - I , X it - 2 , , •• ). We have shown that biased estimation can result if GEE with non-diagonal weighting is used unless the model /lit = E(tit I Xit) for cross-sectional models, or tLit :: E(¥itIXit-I,Xit-I,,,,,Xit_k) for finite lag models, equals the FCCM, E(tit IXis S = 1,2,.", T). For an exogenous covariate process we need o,nly consider dependence of tit on 11.f (t - 1) to satisfy the FCCM conditIon. Partly conditional and distributed lag regression models can adopt a fie~ible relationship between past exposure and current outcomes and can be Implemented using standard software programs, For endogenous covariates we ~ave shown that a prior response variable can be both a confounder and an mt,erm~diate variable. This issue has motivated the development of ;ausal estlma~lon methods by Robins and co-workers. Although we have locused on a smgle tim d d f £ '. e- epen ent endogenous covariate the same issue 0 a conwundmg mtermed' t . bl ' , ate that' d' t' f la e vana e may arise via a time-dependent covanIS IS mct rom th t e response and the exposure. The methods tha

SUMMARY AND FURTHER READING

281

we overview have been developed for this more general scenario and are discussed in Robins (1986, 1987, 1998, 1999) and Robins et ai, (1999) for the case of longitudinal binary response data. Lauritzen (2000) provides an excellent overview of graphical models and causal inference. Finally, we have not discussed another class of methods known as g-estimation and structural nested models. Robins (1999) overviews structural nested models and compares them to both g-computation and MSMs.

CLASSIFICATION OF MISSING VALUE MECHANISMS

283

be that a sequence of atypically low (or high) values on a particular unit foreshadows its removal from the study. 13.2 Classification of missing value mechanisms

13 Missing values in longitudinal data

13.1 Introduction ., I . , the analysis of longitudinal data whenever one or Mlssmg va ues aflse III . h' th t d more of the sequences 0 f measu rements from units Wit m e s u yare I are not.f taken, . mcomp et e, 'III the se nse that intended measurements . . h are I'lable The emphasis IS Important: I we c · lost or are ot herwlse unava . .oose in ~dvance to take measurements every hour on one-half of the subjects and every two hours on the other half, the resulting d~ta could also be described as incomplete but there are no missing v~lues ~n the se?se t~at we use the term; we call such data unbalanced. This IS not Just playmg WIth words. Unbalanced data may raise technical difficulties - we have seen that some methods of analysis can only cope with data for which measurements are made at a common set of times on all units. Missing values raise the same technical difficulties, since of necessity they result in unbalanced data, but also deeper conceptual issues, since we have to ask why the values are missing, and more specifically whether their being missing has any bearing on the practical questions posed by the data. A simple (non-longitudinal) example makes the point explicitly. Suppose that we want to compare the mean concentrations of a hormone in blood samples taken from ten subjects in each of two groups: one a control, the other an experimental treatment intended to suppress production of the hormone. We are presented with ten assayed values of the hormone concen~ration from the control subjects, eight assayed values from the treated subjects, and are told that the values from the other two treated subjects are 'missing'. If we knew that this was because somebody dropped the test t~bes on the way to the assay lab, we would probably be happy to proceed With ~ t,:,o-sample t-test using the 18 non-missing values. If we knew that tbhel mlsslllg values were from subjects whose hormone concentrations fell e ow the sensitivity threshold f th '. . a1 o e assay would mask the ve fl t . ' Ignormg the missmg v ues fl d . .ry e ec we were lookmg to detect. Lest this example o en the sophisticated read f 't that more subtle versions 0 I are not uncomm n . Ier,I we. suggest . o III rea ongItudmal studies. For example, it may

Little and Rubin (1987), give a general treatment of statistical analysis with missing values, which includes a useful hierarchy of missing value mechanisms. Let Y' denote the complete set of measurements which would have been obtained were there no missing values, and partition this set into y' = (y(o), y(rn») with y(o) denoting the measurements actually obtained and Vim) the measurements which would have been available had they not been missing, for whatever cause. Finally, let R denote a set of indicator random variables, denoting which elements of y' fall into yeo) and which into y(rn). Now, a probability model for the missing value mechanism defines the probability distribution of R conditional on y' = (y(o), y(m)). Little and Rubin classify the missing value mechanism as • completely random if R is independent of both yeo) and y(m) • random if R is independent of y(m) • informative if R is dependent on y(m).

It turns out that for likelihood-based inference, the crucial distinction is between random and informative missing values. To see this, let f(y(o), y(m), r) denote the joint probability density function of (y(o), y(m), R) and use the standard factorization to express this as

f(y(o), y(m), r) = f(y(o), y(m))f(r I y(o), y(m)).

(13.2.1)

For a likelihood-based analysis, we need the joint pdf of the observable random variables, (y(o), R), which we obtain by integrating (13.2.1) to give

f(y(O), r) =

J

f(y(o), y(m))f(r I y(o), y(m))dy(m).

(13.2.2)

Now, if the missing value mechanism is random, f(rjy(o),y(m)) does not depend on y(m) and (13.2.2) becomes

f(y(o),r) = f(rly(o»)

J

f(y(o),y(m))dy(m)

= f(r I y(o))f(y(o)).

(13.2.3)

Finally, taking logarithms in (13.2.3), the log-likelihood function is

L = log f(r I yeo))

+ log f(y(o)),

(13.2.4)

which is maximized by separate maximization of the two terms on the right-hand side. Since the first term contains no information about the

284

distribution of Y

MISSING VALUES IN L ONG (0)

ITUDINAL DATA

, 't for the purpose of makinp; inferences ,we can Ignore I

I t th completely random and random missabout y(o). e d t without distinction as Because of the above reSll t, )0. h . ometlmes relerre a ing value mec ams~s, a:c St' t t~ remember that 'ignorabiJity' in this i norable However, It IS nnpor ,an , c . c g ';. f h rk I'I ood function 8B the ba.'ils lor ll1Ierence. " ' . , 'd .. '1 d sense relles on the use 0 tel e I I e alized estImatll1g equatIOns, as etlcnue For example, the met ha d a f gen r , I I . , . ' l ' d 1 under the stronger a.'iSUmptlOn t lut t Ie mlss111 SectIOn 8.2.3, IS va I on y 'h' 'I'k l'h d , h ' ' . mpletely random. Also, even WIt 111 a I e I 00 mg value mec amsm IS co . . h '. of a random mlSSll1g value mec alllsm. a.'l . based anaIySIS, th e t rea tent m ignorable makes several tacit assumptions, The first of these, a.'l emphasIzed by Little and Rubin, is that f(y(o)) and f(r I y(o)) are separately parameterized, which need not be the case; if there a:e parameters common to f(y(o)) and f(r I y(o)), ignoring the first term 111 (13.2.4) leads. to a loss of efficiency, Secondly, maximization of t.h~ secon,d t~rm ~n the nro~t~hand side of (13.2.4) implies that the uncondItIOnal dlstnbutlOn of Y IS t~e correct inferential focus. Again, this need not be the case; for example, III a clinical trial concerned with a life-threatening condition in which missin~ values identify patients who have died before the end of the study and y(o measures health status, it may be more sensible to make inferences about the distribution of time to survival and the conditional distribution of y(o) given survival rather than about the unconditional distribution of y(o). 13.3 Intermittent missing values and dropouts

An important distinction is whether missing values occur intermittently or as dropouts. Suppose that we intend to take a sequence of measurements, Y1,,·., Yn , on a particular unit. Missing values occur as dropouts if whenever Yj is missing, so are Yk for all k 2 j; otherwise we say that the missing :alues ar~ intermittent, In general, dealing with intermittent missing values IS more dlfficu.lt :han dealing with dropouts because of the wider variety of patterns o~ mIsslllg values which need to be accommodated. 'h . When , Illtermittent ml'ssl'n g vaIues aflse t rough a known censonng mechamsm ' for example if all I b I k , v a ues e ow a nown threshold are missing , the EM algoflthm (Dem pst l 1977) provIdes . ea., a possible theoretical' . er t framework (Laud 1988' H h 1999) W . ' ' ,ug es, ,hen mtermittent missing values do not aflse from censoring th f' kn' ' . ' e reason or theIr being missing is often own, slllce the subjects III qu t' . . th' . C " es IOn remam m the study and in some cases IS IllIormatlOn WIll make it reaso bl ' unrelated to the m na e to assume that the missingness is easurement process In h h . be analysed by any method which c . sue cases, t e resultmg data can thermore if the method f l ' a~ accommodate unbalanced data. Fur, 0 ana YSIS IS likel'h d b d' . be valid under the weak . I 00 - ase , the mferences wIll er assumptIOn that th ' . random, e mlssmg value mechanism is

INTERMITTENT MISSING VALUES AND DROPOUTS

285

In contrast, dropouts are frequently lost to any form of follow-up and we have to admit the possibility that they arise for reasons directly or indirectly connected to the measurement process. An example of dropout directly related to the measurement process arises in clinical trials, where ethical considerations may require a patient to be withdrawn from a trial on the ba.'lis of their observed measurement history. Murray and Findlay (1988) discuss this in the context of long-term trials of drugs to reduce blood pressure where 'if a patient's blood pressure is not adequately controlled then there are ethical problems associated with continuing the patient on their study medication'. Note that a trial protocol which specifies a set of circumstances under which a patient must be withdrawn from the trial on the basis of their observed measurement history defines a random, and therefore ignorable, dropout mechanism in Little and Rubin's sense. In contrast, the 'dropouts' in the data on protein content of milk samples arose because the cows in question calved after the beginning of the experiment (Cullis, 1994), and there may well be an indirect link between calving date and milk quality. When there is any kind of relationship between the measurement process and the dropout process, the interpretation of apparently simple trends in the mean response over time can be problematic, as in the following simulated example. We simulated two sets of data from a model in which the mean response was constant over time, and the random variation within units followed the uniform correlation model described in Section 4.2,1. In each data-set, up to ten responses, at unit time-intervals, were obtained from each of 100 subjects. Dropouts occurred at random, according to the following model: conditional on the observed responses up to and including time t - 1, the probability of dropout at time t is given by a logistic regression model, logit(pt)

= -1 - 2Y'_1'

In the first simulation, shown in Fig. 13.1, the correlation between any two responses on the same unit was p = 0,9, and the empirical mean responses, calculated at each time-point from those subjects who had not yet dropped out, show a steadily rising trend. A likelihood-based analysis which ignores the dropout process leads to the conclusion that the mean response is essentially constant over time whereas the empirical means suggest a clearly increasing time-trend. There is no contradiction in this apparent discrepancy: the likelihood-based analysis estimates the mean response which would have been observed had there been no dropouts, whereas the empirical means estimate the conditional mean response in the sub-population who have not dropped out by time t. Another way to explain the discrepancy between the likelihood-based and empirical estimates of the mean response is that the former recognizes the correlation in

SIMPLE SOLUTIONS AND THEIR LIMITATIONS

287

the data and, in effect, imputes the missing values for a particular subject taking into account the same subject's observed values. To confirm this, Fig. 13.1(b) shows the result of the second simulation, which uses the same model for measurements and dropouts except that the within-unit correlation is p = O. Both the empirical means and a likelihood-based inference now tell the same story - that there is no time-trend in the mean response. These examples may convince the reader, as they convince us, that it would be useful to have methods for analyzing data with a view to distinguishing amongst completely random, random and informative dropouts. Fortunately, the additional structure of data with dropouts makes this a more tractable problem than in the case of intermittent missing values. In the rest of this chapter, we concentrate on the analysis of data with dropouts. In Section 13.4 we briefly mention two simple, but in our view inadequate, solutions to the problem. In Section 13.5 we describe methods for testing whether dropouts are completely random. In Section 13.6 we describe an extension to the method of generalized estimating equations which gives consistent inferences under the assumption of random dropouts. In Section 13.7 we review various model-based approaches which can accommodate completely random, random or informative droputs, henceforth abbreviated to CRD, RD and ID respectively.

13.4 Simple solutions and their limitations 13.4.1

2

o

-2

2

4

6 Time

8

10

Fig. 13.1. Simulated realizatio f d I . uniform correlatio d d ns 0 a mo e wIth a constant mean response, n a1n , ran om dropouts: (a) within-unit correlation p = 0.9; (b) within-unl't corre a t Ion p == 0 0 .d . . . . .... '" ataj - - : empIrIcal mean response.

Last observation carried forward

As the name suggests, this method of dealing with dropouts consists of extrapolating the last observed measurement for the subject in question to the remainder of their intended time-sequence. A refinement of the method would be to estimate a time-trend, either for an individual subject or for a group of subjects allocated to a particular treatment, and to extrapolate not at a constant level, but relative to this estimated trend. Thus, if Yij is the last observed measurement on the ith subject, flj(t) is their estimated time-trend and Tij = Yij - fli(tj), the method would impute the missing values as Yik = [J,i(tk) + Tij for all k > j. Last observation carried forward is routinely used in the pharmaceutical industry, and elsewhere, in the analysis of randomized parallel group trials for which a primary objective is to test the null hypothesis of no difference between treatment groups. In that context, it can be argued that dropout, and the subsequent extrapolation of last observed values, i~ an inh~r~nt feature of the random outcome of the trial for a given subject. ValIdIty of the test for no difference between treatment groups is then imparted by the randomization without requiring explicit modelling assumptions. A further argument t~ justify last observation carried forward is that. if, for example, subjects are expected to show improvement .over the duratIOn of a longitudinal trial (i.e. treatment is beneficial), carrymg forward a last

288

MISSING VALUES IN LONGITUDINAL DATA

r

observation should result in a conserva Ive

assessment of any treatment

benefits. I 't' te in particular contexts, we do not 'h WhIlst t ese argum ents are egl , d Ima ~ ard as a general met h0 d . recommend la.,t observation carrIe orw 13.4.2

Complete case analysis . I ay 0 f deaI ·mg

. ' 'th dropouts is to discard all IllcomWI , Another very slmp.e ':" 'ousl wasteful of data when the dropout process Yt 's Perhaps more seriously, it has plete sequences. ThIS IS obvl . I d t the measuremen proces. , 18 unre ate. o. b"f the two processes are related, as the comthe potentIal to mtroduce las I d t be a random sample with respect to plete cases cannot then be assume 0 the distribution of the measurements Yij' . h In general we d0 no t reeomm end complete case analysIs. Perhaps. t Ie · '. hen the scientific questions of interest are genullle y onIy excep t IOn IS w . . f thO k' d confined to the sub-population of completers, but s~tuatlOns .0 ~s III would seem to be rather specialized. There are many msta~ces III whIch the questions of interest concern the mean, or other propertIes, of the measurement process conditional on completion, but t~is is not quite the same thing. In particular, if we accept that dropout IS a random eve~t, a~d if we are prepared to make modelling assumptions about the relatI?nshlP between the measurement process and the dropout process, then the mcomplete data from the subjects who happened to be dropouts in this particu~aT realization of the trial provide additional information about the propertIes of the underlying measurement process conditional on completion, This information would be lost in a complete case analysis.

13.5 Testing for completely random dropouts In this section, we assume that a complete set of measurements on a unit would be taken at times t j , j = 1, ... ,n, but that dropouts occur. Then, the available data on the ith of m units are Yi = (YiI, ... , Yini)' with ni S n and Yij taken at time t j . Units are allocated into a number of different treatment groups. Our objective is to test the hypothesis that the ~ropouts are completely random, that is, that the probability that a umt drops out at time t j is independent of the observed sequence of mea:'u~ements on that unit at times t l , ... , t j - I . We view such a test as a prehmm~ry screening device, and therefore wish to avoid any parametric assumptIOns about the process generating the measurement data. Note that our. defi.nition of completely random dropouts makes no reference to the pOSSIble mfiuence of explanatory variables on the dropout process. For example, suppose that in a study comparing two groups with different mean response profiles, the dropout rate is higher in the group with the higher mean response If we ignored th ld' e group structure the dropout process wou appear to be informative in the undifferentiat~d data, even if it were

TESTING FOR COMPLETELY RANDOM DROPOUTS

289

completely random within each group. We therefore recommend that for preliminary screening, the data should first be divided into homogeneous sub-groups. Let Pij denote the probability that the ith unit drops out at time tj' Under the assumption of completely random dropouts, the Pij may depend on time, treatment, or other explanatory variables but cannot depend on the observed measurements, Yi' The method developed in Diggle (1989) to test this assumption consists of applying separate tests at each time within each treatment group and analysing the result.ing sample of p-values for departure from the uniform distribution on (0,1). Combination of the separate p-values is necessary if the procedure is to have any practical value, because the individual tests are typically based on very few cases and therefore have low power. The individual tests are constructed as follows. For each of k = 1, ... , (n - 1), define a function, hdvl, ... , Y~')' We will discuss the choice of hd·) shortly. Now, within each group and for each time-point tk, k = 1, .. , ,n-l, identify the R k units which have n; ~ k and compute the set of scores hik = hk (Yil, ... , Yid, for i = 1, ... ,Rk. Within this set of R k scores, identify those rk scores which correspond to units with mj = k, that is, units which are about to drop out. If 1 :s Tk < Rk, test the hypothesis that the Tk scores so identified are a random sample from the 'population' of Rk scores previously identified. Finally, investigate whether the complete set of p-values observed by applying this procedure to each time within each treatment group behaves like a random sample from the uniform distribution on (0,1). The implicit assumption that the separate p-values are mutually independent is valid precisely because once a unit drops out it never returns. Our first decision in implementing this procedure is how to choose the functions h k (.). Our aim is to choose these so that extreme values of the scores, hik , constitute evidence against completely random dropouts. A sensible choice is a linear combination, k

hk(Yl, ... ,Yk) =

LWjYj.

(13.5.1)

j=I

As with any ad hoc procedure, the success of this will be influenced by the investigator's ability to pick a set of coefficients which reflect the actual dependence of the dropout probabilities on the observed measurement history. If dropout is suspected to be an immediate consequence of an abnormally low measurement we should choose Wk =, 1 and all other Wj = O. If it is suspected to be the culmination of a sustamed sequence ~f low measurements we would do better using equal weights, Wj = 1 for all J. The next decision is how to convert the scores, hik, into a test statistic and p-value. A natural test statistic is hk, the mean of the Tk scores

290

MISSING VALUES IN LON

GITUDINAL DATA

. _ k h'ch are about to drop out. 't wIth n WI. I' corresponding to those um s 'andom the approximate samphng ( ISI H-" _ R- ",ilk hk, and variance completely at r If d ropouts occur _ . ith mean k - 'k ~t= I t _ 2 lribution of hk is GauRSIan, w,' ' 1)-1 ",Rk (hk - Hd , ThiH iR , h 52 - (Rk ~t=l t 52(Rk - rk)!(rkRk), w em k rng theory. See, for example, kif lementary samp I d R refill t rom e . the presen t context of the 1'k a n k a standard . . . some ' ' . atl'on may then be poor. For an Cochran (1977), However, III "an approxlm will be small and t he GaUSSI I t ndomization distribution of each e exact test, we can evaluate the comPd era mpling If (Rk) is too large for th sis of ran om sa . 1'k hk under the nil.II Ihypof, e, I t' Jrocedure is to sample from the 'ble a terna Ive I ' this to be practlca , a ~asl lH te h- after each of s - 1 indepen, . d' t 'butlOn vve recompu . k randomizatIOn IS r l ' h en wI'thout replacement from the I t ' s of rk scores c os dent ran~~m se ec I~n and let x denote the rank of the original hk amongst set hik, Z - 1, ... , Rk, Th _ xis is the p-value of an exact, Monte the recomputed values. en, P C I test (Barnard 1963). I ar;he final stage' consists of analysing the resultin~ ~et o~ p-.va ~es. · I I ses such as a plot of the empmcal distrIbutIOn . 'f h d'f Informa I grap hIca ana Y , function of the p-values with different plotting symbols to Identl y t e l ferent treatment groups, can be useful here. Diggle (1989) also sugg~sts a formal test of departure from uniformity using the Kolmogorov-Smlrnov statistic; in practice, we will usually have arranged that a preponderance of small p-values will lead us to reject the hypothesis of co~pletelY r~n~o~ dropouts, and the appropriate for~ of the Kolmogorov-Smlrnov statIstIc IS its one-sided version, D+ = sup{F(p) -pl. Another technical problem now arises because each p-value derives from a discrete randomization distribution whereas the Kolmogorov-Smirnov statistic tests for departure from a continuous uniform distribution on the interval a :'S p :'S 1. We would not want to overemphasize the formal testing aspect of this procedure, preferring to regard it as a method of exploratory data analysis. Nevertheless, if we do want a formal test we can again use Barnard's Monte Carlo testing idea to give us an exact statement of significance. We simply rank the observed value of D+ amongst simulated values based on the appropriate set of discrete uniform distributions which hold under the null hypothesis.

Example 13.1. Dropouts in the milk protein data Recall that these data consist of up to 19 weekly measurements of protein content in milk samples taken from each of 79 cows. Also, the cows were ~located amongst three different treatments, representing different diets, III a completely randomized design. Of the 79 cows, 38 dropped out during the study. There were also 11 intermittent missing values. In Section 5.4 we fitted ~ model to these data in which, amongst other things, the mean ~e~~onse Ill. each treatment group was assumed to be constant after an Imbal settling-in period . We not ed that an apparent nse . m . the observed mean response near the end of the experiment was not supported by testing

TESTING FOR COMPLETELY RANDOM DROPOUTS

291

against an enlarged model, and speculated that this might be somehow connected to the dropout process. As a first stage in pursuing this question we now test whether the dropouts are completely random. ' Th: dropo.ut~ are confined to four of the last five weeks of the study, and occur m 12 dlstmct treatment-by-time combinations. To construct the 12 test statistics we use h k (YI , ... , Yk) = Yk, the latest observed measurement and implem.ent.a M~nt~ Ca~lo tes~ using s = 999 random selections frOl~ the randomIzatIOn dlstnbutIOn of h k for each treatment-by-time combination. The resulting p-values range from 0.001 to 0.254. On the basis of these results, w~ reject firmly the hypothesis of completely random dropouts. For example, I~ ~e us: th.e K~lmogorov-Smirnov statistic to test for uniformity of the empIrIcal dlstnbutlOn of the p-values, again via a Monte Carlo implementation with s = 999 to take account of the discreteness of the problem, we obtain a p-value of 0.001. Note that 0.001 is the smallest possible p-value for a Monte Carlo test with s = 999. Table 13.1 shows the 12 p-values cross-classified by dropout time and treatment. It is noticeable that the larger p-values predominate in the third treatment group and at the later dropout times, although the second of these features may simply be a result of the reduced power of the tests as the earlier dropouts lead to smaller sample sizes at later times. We obtained very similar results for the milk protein data when we used hdYl, ... , Yk) = 1'-1 L~=k-r+I Yj, for each of r = 2,3,4,5. The implication is that dropouts predominate amongst cows whose protein measurements in the preceding weeks are below average. Now recall the likelihood-based analysis of these data reported in Section 5.4. There, we accepted a constant mean response model against an alternative which allowed for a rising trend in the mean response, whereas the empirical mean responses suggested a rise towards the end of the experiment. We now have a possible explanation. The likelihood-based analysis is estimating f..ll(t), the mean response which would apply to a population with no dropouts, whereas the

Table 13.1. Attained significance levels of tests for completely random dropouts in the milk protein data. Treatment (diet) Dropout time (wks)

Barley

Mixed

Lupins

15 16 17 19

0.001 0.016 0.022 0.032

0.001 0.001 0.053 0.133

0.012 0.011 0.254 0.206

292

MISS ., 'ING VALUES IN LO

GENERALIZED ESTIMATING EQUATIONS

NGITUDINAL DATA

. th ean response of a uuit condiempirical means are C8timatmg pdt)!, 't~ ~ t Under completely random , h ' u dropped out Jy 1m, . , 'th independent measurements at tional on Its not UVIll" dropouts WI ler random dropouts Wit . II s(~n-. dropouts, or nu(1er nndom < . • (.) L (t) whereat; un( . different, tImes, IL] t - 12 , (t f= 2(t). This underlines the danger of ally correlated mea..'mrementfJ, I~l ) ! /lIe I'U the collo(luial sense. . I d nt~ a"l Ignora ) 1 regardmg ranc om r?po ' .t·OII between Digglc's (19~9) proced'd (IDOl) mt~outaconneci ' Rl out . po ': .11" ls At each time-point, we could use the ure and logistiC regreHslOn an, YSl . to'ry variable in a logistic regression . h ( Y ) as an exp ana .. function k YI,"" ~. fd t Thus if Pk is the probablhty that a model for the probabIlity a ropon. , unit will drop out, we a'lsume that (13.5.2) Then, conditional on the observed valuc~, hik, for all the units who ha~e not dropped out previously, the mean, h, o~ those about to drop out IS the appropriate statistic to test the hypotheSIS that (3 = 0 (Cox and Snell, 1989, Chapter 2). . . Clearly, it is possible to fit the logistic model (13.5.2), or extensIOns of It, using standard methodology for generalized linear models as suggested by Aitkin et ai. (1989, Chapter 3). For example, we can introduce an explicit dependence of Pk on time and/or the experimental treatment administered. Example 13.2. Protein content of milk samples (continued) We now give the parametric version of the analysis described in Example 13.1. Let Pgk denote the probability that a unit in the gth treatment group drops out at the kth time. We assume that (13.5.3) with hk(YI, .. ' ,Yk) == Yk· In this analysis, we consider only those four of the 19 time-points at which dropouts actually occurred, giving a total of 24 parameters. The residual deviances from this 24-parameter model and various simpler models are given in Table 13.2. Note that the analysis is based on 234 ,binary responses. These consist of 79 responses in week 15, the first OccasIOn on which any units drop out 59 in week 11 from the units which did not drop out in week 15, and simil~rly 50 and 46 from weeks 17 and 19; there were no dropouts in week 18. From Jines 1 to 5 in Table 13.2, we conclude first that there is a strong dependence of the dropout probability on the most recently observed meas~rer.ncnt; for example, the log-likelihood ratio statistic to test model 5 f f d W wlthm model 4 is 197.66 - 119.32 == 78 34 on 1 d also conclud th t th . egree 0 ree am. e e, a e nature of the dependence does not vary between t rcat ments or 1 t 3' " . times' none f r the residual d . ' b 0 m~s o. gives a slgmficant reduction 1ll eVlance y companson With line 4. The dependence of the

293

Table 13.2. Analysis of deviance for a logistic regression analysis of dropouts in the milk protein data. Model for log

{Pgk /

+ (Jgkhk + {3gh k Ogk + (3khk Ogk + (3hk

1. Ogk

2. 3. 4.

Ogk

5.0 g k

6. 7. 8.

O:g

Og (l

+ o~ + (3h k + (Jhk

+ (3h k

(1 -

Pgk)}

Residual deviance 111.97 116.33 118.63 119.32 197.66 124.16 131.28 139,04

df 210 219 218 221 222 227 230 232

dropout rate on treatment or time is investigated in lines 6 to 8 of the table where we test different assumptions about the Ogk parameters in the model. We now have some evidence that the dropout rate depends on treatment. Comparing lines 7 and 8 the log-likelihood ratio statistic is 7,76 on 2 degrees of freedom, corresponding to a p-value of 0.021. The evidence for a dependence on time is rather weak; from lines 6 and 7 the log likelihood ratio statistic is 7.12 on 3 degrees of freedom, p-value = 0.089. The parametric analysis in Example 13.2 allows a more detailed description of departures from completely random dropouts than was possible from the non-parametric analysis in Example 13.1. As always, the relative merits of non-parametric and parametric approaches involve a balance between the robustness and flexibility of the non-parametric approach and the greater sensitivity of the parametric approach when the assumed model is approximately correct. Note that in both approaches the implied alternative hypothesis is of random dropouts. This underlines an inherent limitation of any of the methods described in this section - the hypothesis of completely random dropout is seldom of intrinsic scientific interest, and rejection of it does not determine the subsequent strategy for analysis of the data. As we have seen, the distinction between random and informative dropout is at least as important from a practical point of view as is the distinction between completely random and random dropout. 13.6 Generalized estimating equations under a random missingness mechanism One of the attractions of the method of generalized estimating equations (GEE), as described in Section 8.2.3, is that for problems in which the questions of interest concern the relationship between the population-averaged mean response and a set of explanatory variables, the GEE method provides

MODELLING THE DROPOUT PROCESS

MISSING VALUES IN LONGITUDINAL DATA

294

., I assum ption that .the model for d the mInima . consistent inference un er 'fi d In particular, If the analysIs . eetly specl e . the mean response IS carr. . atrix of the response vector • J: for the variance m assumes a workmg Jorm b i t but under reasonably general · y may e as , , which is incorrect, effi Clene.. I r the basic form of the GEE .t 's retamed. J~oweve, . h conditions, conSIS ency I, pletely random, otherWIse t e ~ethod assumes that any dropouts. are. COlm f ting equatIOn IS os t . . h Ima· t'll wish to estimate, under minconsistency of tees · d opouts we may s I When data con t am r , 11'lch would have pff~vailed in the . the mean response w imal assumptIOns, ins et al. (1995) present an extension of the GEE absence of dropouts. Rob d t which preserves the property of d t 'th random ropou s, method to a a WI h response without requiring correct consistent inference about t e mean s ecification of the covariance structure. . . p The basic GEE method uses the estimating equatIOn (8.2.4), whIch we reproduce here as (13.6.1)

Recall that in (13.6.1), Y i denotes the vector of responses on the ith s~b ject, Iti the corresponding mean response, and (3 ~he vector of regreSSI?n parameters defining the mean response. Expressed mformaIly, the essentIal idea in Robins et al. (1995) is that if Pij denotes the probability that subject i has not dropped out by time t j , given that subject's observed measurement history YiI,· .. ,Yi,j-l and any relevant covariate information, then the observation Yij from this one subject is representative of all subjects with comparable measurement and covariate information who would have been observed had they not dropped out. Hence, to restore the unbiasedness of (13.6.1) for the complete population we need to weight the contribution of Yij by the inverse of Pij. This leads to the extended estimating equation,

S{3({3, a) ==

8 (8;;; )' m

Var(Yi)-I p-I (Yi - Iti) == 0,

i~ which .p is a diagonal. matr!x with non-zero

(13.6.2)

elements Pij. Robins et al. ( 995) gIve a careful dIscussIOn of the precise conditions under which (13.6.2) does indeed lead to consistent inferences about (3 when the p" are themsleves estimated from the data using an assumed random dropo~~ model. This extension to GEE r . . . bl . . eqUlres, lllevlta y, that we can consIstently estunate the dropout prob b'l"t' J: • a Illes lOr each subject given their observed m easur~ment history and any relevant covariates. This makes the method b est SUIted to large-scale t d' A b " · dropout model fas u h' les.h th rgua ly, It IS a tall order to fit a paramet rIc ' r w IC e data necessarily provide relatively

295

sparse information, in circumstances where the analysts are reluctant to commit themselves to a parametric model for the covariance structure. Scharfstein et al. (1999) extend this approach to the case of informative dropout and argue in favour of presenting a range of inferences for the quantities of interest based on different values of informative dropout parameters, rather than attempting to estimate aspects of the dropout process on which the data provide little or no information. The idea of weighting observed measurements in inverse proportion to the corresponding estimated dropout probabilities also appears in Heyting et al. (1992). 13.7 Modelling the dropout process

In this section we review a number of approaches to parametric modelling of longitudinal data with potentially informative dropouts, highlighting the practical implications of each approach and the distinctions amongst them in respect of the assumptions which they make about the underlying dropout mechanism. The formal development will be in terms of a continuous response variable. The extension of these ideas to discrete responses will be discussed briefly in Section 13.9. Following Diggle and Kenward (1994), we adopt the notational convention that for anyone subject, the complete set of intended measurements, Y* == (Yj*, ... , Y;), the observed measurements Y == (YI , ... , Yn ) and the dropout time D obey the relationship

Yj ==

{lj*: 0:

j < D, j:::: D.

We emphasize that in this notation a zero value for Y is simply a code for missingness, not a measured zero. Note also that 2 :S D :S n + 1, with D == n + 1, indicating that the subject in question has not dropped out.

13.7.1

Selection models

In a selection model, the joint distribution of Y* and D is factorized as the marginal distribution of Y* and the conditional distribution of D, given Y*; thus P(Y*,D) == P(Y*)P(DjY*). The terminology is due to Heckman (1976), and conveys the notion that dropouts are selected according to their measurement history. Selection models fit naturally into Little and Rubin's hierarchy, as follows: dropouts are completely random ifF(D I Y*) == P(D), that is, D and Y* are independent; dropouts are random if P(D I Y*) == P(D IYt, ... , YD- I ); otherwise, dropouts are informative. * Let () and ¢ denote the parameters of the sub-model for Y and of the sub-model for D conditional on Y*, respectively. We now derive the joint distribution of the observable random vector, Y, via the sequence of

MISSING VALVES IN LO N

296

MODELLING THE DROPOU'"

GTTVDINAL DATA

.1

J

Pk(Hkl y; ¢)f;(y I Hk; O)dy

di = n

L(O, ¢) = L 1 (O)

m

LIO) =

.L log{fdi_I (Yin, i==1

!k(y I Hk; 0,
(13.7.3)

(1371-1373) determine the joint distribution of Y, and hence . ., " d h EquatlOns the likelihood function for 0 and ¢. Suppressing the depen ence on t e parameters, the joint pdf of a complete sequence, Y, is

m

L 2 (¢)

f(y) = R(YI)

II h(Yk IH

L 3 (0, ¢)

k)

/>(Yk IH.) } Pr(Yd = 0 IHd, Yd- k of 0)

g

d-I [

]

{I - Pk(Hk, Yk)} Pr(Yd = 0 I Hd, Yd- I

=

L

log{Pr(D

= di IYin.

Recall th~t under RD, L 3 (0, ¢) depends only on ¢, and can therefore be absorbed mto L 2 (¢). Hence, under RD, (13.7.6)

whilst for an incomplete sequence with dropout at the dth time-point,

=1d-l (y)

Pk(Hk , Yik)}

i:di'S.n

(13.7.4)

IT

= .L .L log{1 -

and

k==2

~ {mYd

di - l

i==1 k==1

m

l' 0), (13.7.5)

where 1a- I (y) deno:es. the joint pdf of the first d - 1 elements of Y* and the ~roduct term wlthm square brackets is absent if d = 2 (H . ate that under either CRD or RD y, the (unobserved) I f h ' Pk k, Y; ¢) does not depend on va ue 0 t e measur t t . emen a tIme tk· It follows that this term can be brought t' of (13,7,2), which then red~~e:I~~ the integral sign on the right-hand side

.

+ L 2 (¢) + L 3 (O,¢),

where

(13.7,2)

and for y 1= 0,

fry)

297

represents the sequence of observed m f' easurements on th . h ' + 1 1'the umt does not drop out and d. ' . e zt umt, where otherwise, Then, the log-likelihood for (0 A..) , IdentIfies the dropout time ''P can be partitioned as

. H where Hk = (Y1 ,,··, Yk-d· Let conditional distributions for Yk .g~venl kl. riate pdf of Y~ given Hk and • . 0
PROCESS

Pr(Yk = 0 I Hk_ 1 , Yk- I of- 0) = Pk(Hki ¢),

smce the integrand is now a pdf d' contribution to the likell'h d an mtegrates to one. This implies that the f 00 separates i t t one or ¢' We now consider the fo f no wo c.omponents, one for 0 and a set of data consisting of m 'trm,O th~ resultmg likelihood function for um S, m whIch Yi = {.. ._ Y'J' J - 1, ... ,di - I}

and maximization of L{O, ¢) is equivalent to separate maximization of L 1 (O) and L~ (¢). Hence the dropouts, which only affect L 2 (¢), are ignorable for .lzk~hhood-based inference about 0. Similarly, L 2 (¢) in (13.7.6) is the log-lIkelIhood associated with the sub-model for D conditional on the observed elements of Y*, and it follows that the stochastic structure of the measurement process can be ignored for likelihood-based inference about ¢. In summary, under random dropout, the log-likelihood associated with a selection model separates into two components, one for the parameters of the measurement sub-model, the other for the parameters of the dropout sub-mOdel, and it is in this sense that the dropout process is ignorable for inferences about the measurement process, and vice versa. Equally, it is clear that this would not hold if there were parameters in common between the two sub-models, or a more general functional relationship b~tween 0 and ¢. Moreover, the implicit assumption that the relevant SCIentific questions are addressed by inference about the Y* process may no.t be reasonable in particular applications. For example, if dropouts occur PrImarily because subjects with a poor response are withdrawn from the stUdy on ethical grounds, it might be relevant to ask what the trial would have shown if these subjects had not been removed, and this is precisely hat is addressed by an analysis of the Y* process. On the other hand, If dropouts OCcur because some subjects experience an adverse reaction to

:v

MISSING VALUES IN LO

MODELLING THE DROPOUT PROCESS

NGlTUDINAL DATA

J: bt . . I nse) then .ITuerences a ou , ' oar chmca respo , from a P . f bJ'eets for whom such adverse t re atment (as dlstmd .. ulatlOn 0 su I Y' relate to a fictitIOuS p~p . ht be of more practical relevance to ana reactions do not occur, and It mig t' and the pattern of responses , " d ' f adverse reac IOns . I' I d se both the mCI ence 0 '. 'th 0 adverse reactIOn. T llS ea s Y I t' n ofsubJeets WI n c . , .• amongst thesub-popu a 10 d 1 hich we discuss m SectIOn 13.7.2. , f tt n mixture, rna . ' I for () am1 on to the Idea 0 pa ,er . J: e s, wtive the log-hkehhooc d out process IS llllorma , I When t he rap , ' I nalysis becomes more romp ex. I the statlstlca a c/J does not separate. ane " he need to evaluate the integral (13.7.2) From a technical pomt of vl~ewt' t th computation of the likelihood. More comp Ica es c t t' d I th relationship between an observed at each dropou Ime, t Ily the need to rna e e . fun damen a , b' bi concomitant (the measurement whIch , t (dropout) and an uno serva e even b d h d the subJ'cct not dropped out) typIcally leads would have been 0 serve a , , 'ffi I 'd t'ft b'l't of the model parameters, makmg It dl cu t, or even d S J: to poor J en I a I I Y , 'ble, t 0 vaI'd ImpoSSI I at e the assumed model from the observed ata. ee, lor example, Fitzmaurice et ai. (1996).

298

Example 13,3. Protein content of milk samples (continued) We now fit the Diggle and Kenward model to the milk protein data, our objectives being to establish whether the dropout process is informative, and if so to find out if this affects our earlier conclusions about the mean response profiles, For the mean response profiles we use a simple extension of the model fitted in Section 5.4, where we implicitly assumed random dropouts. Note that this model allows the possibility of an increase in mean response towards the end of the experiment, If flg(t) denotes the mean response at time t under diet g, we assume that

Inf Section 5.4 ' Example 5.1, we use d a rna del for the covanance . structure a the complete measurement process Y*(t) which included three distinct components of variation' a d ' t . 11 . ran am m ercept component between animals, a sena y correlated compo t 'thO , urement err nen WI In ammals, and an uncorrelated measor component Howe th ' between animal , v e r , e estImated component of variation this componentSt:~e;~~YF~:tl, and in what follows we have chosen to set of the dropouts occur in th 1 y, ~r the dropout process we note that all ast for the probability of dr e t ve weeks of the experiment, Writing Pk opou at time k h for k :s; 14, whereas for k > 15 ' we t erefore assume that Pk = 0

-

,

299

Table 13.3. Likelihood analysis of dropout mechanisms for the milk protein data. Dropout mechanism

2L max

ID RD ({31 == 0) CRD ({31 == {32 == 0)

2381.63 2369.32 2299.06

Table 13.3 gives values of twice the maximized log-likelihood for the full model and for the reduced models with cP1 == 0 (random dropouts) and 4>1 = 4>2 = 0 (completely random dropouts). Comparison of the RD and CRD lines in Table 13.3 confirms our earlier conclusion that dropouts are not completely random. More interestingly, there is overwhelming evidence in favour of informative dropouts: from the ID and RD lines, the likelihood ratio statistic to test the RD assumption is 12,31 on one degree of freedom, corresponding to a p-value of 0.0005, In principle, rejection of the RD assumption forces us to reassess our model for the underlying measurement process, Y·(t). However, it turns out that the maximum likelihood estimates of the parameters are virtually the same as under the RD assumption. This is not surprising, as most ofthe information about these parameters is contained in the 14 weeks of dropoutfree data, With regard to the possibility of an increase in the mean response towards the end of the experiment, the maximum likelihood estimates of {32 and {33 are both close to zero, and the likelihood ratio statistic to test {32 == {33 = 0 is 1.70 on two degrees of freedom, corresponding to a p-value of 0.43, In the case of the milk protein data, our reassessment of the drop~ut process has not led to any substantive changes in our inferences concernmg the mean response profiles for the underlying dropout-free process Y*(t). This is not always so. See Diggle and Kenward (1994) for exa~ples. Note also that from a scientific point of view, the analYSIS rep~r~ed here is suspect, because the 'dropouts' are an artefact of the .defi~ItIOn of time as being relative to calving date, coupled with the ter~llnat.lOn of the study on a fixed calendar date. This leads us on to ~ dISCUSSlO~ of pattern mixture models as an alternative way of representmg potentIally informative dropout mechanisms,

13.7,2

Pattern mixture models

P attern mixture models introduced by L1'ttl e (1993) , work .with the " factor' .IzatlOn , of the joint distribution ' * d D' t th marmnal dIstrIbutIOn of Y an moe 0'

MODELLING THE DROPOUT PROCESS MISSING VALUES IN L

ONGITUDINAL DATA

f Y* given D, thus P(Y*, D) = ·' I distributIOn a . . ,'hl t of D and the can dItlOna . oint of view, it IS always POSSI e 0 P(D)P(Y*' D), From a theoretIcal p . t e model and vice versa, as they d I as a pattern mIX ur . . I , .' f the same joint distrihutJOlI. n pracexpress a selection mO e .' t' . · factorIzatIOns 0 are simply aIternat Ive ifferent kinds of simplIfymg assump IOns, " the two approaches lead to d t Ice, I and hence to different ana yses. , sible rationale for pattern mix' oint of VIeW, a pos , From a rna deIImg P " d t time is somehow predestmed, th t each subject s ropOu . . 'es between dropout cohorts, ThIS ture rna deIS IS a and that the rneasuremendt process I:~:;y to apply very often although, as , I 't tation waul seem un I htera III erpre . ' th 'Ik protein data originally analysed by noted above one exceptIOn IS e ml th 'd t . d d (1994) using a selection model. Because e ropou Dlggle an enwdar . I to different cohorts the literal interpretation times' correspon precIse y . ' of a pattern mixture model is exactly rIght for these d~ta. The arguments in favour of pattern mixture ,mod~llmg are. us~ally o~ a of subjects m a 10ngItudmai tnal ' k'III d " First classification more pragmat IC . . . , , according to their dropout time prOVIdes an obvIous way of dIvIdmg the subjects into sub-groups after the event, and it is sensible to ask whether the response characteristics which are of primary interest do or do not vary between these sub-groups; indeed, separate inspection of sub-groups defined in this way is a very natural piece of exploratory analysis which many a statistician would carry out without formal reference to pattern mixture models. See, for example, Grieve (1994), Second, writing the joint distribution for Y* and D in its pattern mixture factorization brings out very clearly those aspects of the model which are assumption-driven rather than data-driven. As a simple example, consider a trial in which it is intended to take two measurements on each subject, Y* == (Yt, Yn, but that some subjects drop out after providing the first measurement, Let f(y Id) denote the conditional distribution of Y* given D == d, for d = 2 (dropouts) and d = 3 (non-dropouts). Quite gen~rally, f(y Id) == f(YI Id)f(Y2\ YI, d) but the data can provide no informatIon about f(Y2 \YI, 2) since, by definition D = 2 means that Y* is not observed. ' 2 Extensions of the k'nd f l ' tern ' t d 1 I 0 examp e gIven above demonstrate that patmIx ure mo e s cannot b 'd t'fi d . the conditional distributions e 1 en I e WIthout pl~ing restrictions on the use of complete . !(y I d): For example, LIttle (1993) discusses case mtsszng vanable restr' t' h' h assuming that for each d IC Ions, w IC correspond to < n + 1 and t :::: d, 300

.

K

f(Yt\YI'''.,Yt-I,d)=f(y

t

ly1, ... ,Yt-l,n+1).

At first sight pattern ' t hierarchy, Ho~ever Mo~~: urehmodels do not fit naturally into Rubin's , , erg s et al (1997) h h corresponds precisely to the th £ 11 '. s ow t at random dropout e 0 Owmg set of restrictions, which they

301

call available case missing value restrictions: f{Yt I YI" , . ,Yt-I, d) = f(Yt I YI,·,. , Yt-I, D > t).

This result implies that the hypothesis of random drop t b . . . , , ou cannot e tested Without makmg. addItIOnal assumptIOns to restrict the 1· f . c ass 0 a Iternat, , smCe the available case Inissl'ng val ue res t.ne . t'Ions ives under conSIderatIOn, . . , . cannot he verIfied empmcally, The identifiability associated , .problems " . . ' with in~ormat'Ive d ropout models,. ~nd th: ImpOSSIbIlIty of vahdatmg a random dropout assumption on e~lpI~Ical eVIdence a,lone, serve as clear warnings that the analysis of a 10ngItudmai data-set WIth dropouts needs to be undertaken with extreme caution .. How~ver, in the author's opinion this is no reason to adopt the superfiCIally SImpler strategy of automatically assuming that dropouts are ignorable.

Example 13.3, Protein content of milk samples (concluded) The first stage in a pattern mixture analysis of these data is to examine the data separately within each dropout cohort. The result is shown in Fig. 13.2, The respective cohort sizes are 41, 5, 4, 9 and 20, making it difficult to justify a detailed interpretation of the three intermediate cohorts. The two extreme cohorts produce sharply contrasting results. In the first, 19-week cohort the observed mean response profiles are well separated, approximately parallel and show a three-phase response, consisting of a sharp downward trend during the initial three weeks followed by a gentle rising trend which then levels off from around week 14. In the IS-week cohort, there is no clear separation amongst the three treatment groups and the trend is steadily downward over the 15 weeks, We find it hard to explain these results, and have not attempted a formal analysis, It is curious that recognition of the cohort effects in these data seems to make the interpretation more difficult,

13,7.3

Random effect models

Random effect models are extremely useful to the longitudinal data anal?,st. They formalize the intuitive idea that a subject's pattern of ,resp~nses 1~ a study is likely to depend on many characteristics of that subject, mcludmg some which are unobservable. These unobservable characteristics are then included in the model as random variables, that is, as random effects. It is therefore natural to formulate models in which a sUb!ect's propensity to drop out also depends on unobserved variables, that IS, on random effects, as in Wu and Carroll (1988) or Wu and Bailey (1989). In the present context, a simple formulation of a model of this kind woul~ ?e t~ pO,stul~te ' . TT ) a b Ivanate random effect, U = (UI , U2 an d t 0 mo del the jomt distnbutIOn

MISSING VALUES IN

802

MODELLING THE DROPOUT PROCESS

LONGITUDINAL DATA

303

hierarchy, the dropouts in (13.7.7) are completely random if U and U

4.6

1 are independent, whereas if U1 and U2 are dependent then in general the2

4.5

dropouts are informative. Strictly, the truth of this last statement depends on the precise formulation of It (y I UI) and f2(d IU2). For example, it would be within the letter, but clearly not the spirit, of (13.7.7) to set either fr(ylud = fl(y) or h(dlu2) :::: h(d), in which case the model would reduce trivially to One of completely random dropouts whatever the distribution of U.

4.0 4.0

':'.,.,;<:: , ,,--~

.

""

•

3.0

2.5

:.-,

,

""

.

,I

:5 3.6 3.0

"

L----;;:;--1'5-' II 10 15 o

2.5

L---:----:;~-ffi--' 6 10 15 o Time (weeks)

Tim. (w.ek.)

4.0

3.0

2.5

10

15

Contrasting assumptions: a graphical representation

In our opinion, debates about the relative merits of selection and pattern mixture models per se are unhelpful: models for dropout in longitudinal studies should be considered on their merits in the context of particular applications. In this spirit, Fig. 13.3 shows a s~hematic view o~ selecti.on, pattern mixture and random effects models whIch takes the pomt of VIew

4.5

6

13.7.4

100

L--_~-~------:::-----

o

5

nm. (w.ek.)

10

15

(a)

(b)

(c)

(d)

Time (weeks)

80

4.5

60

2.5

40

o

5

10 TIme (weeks)

15

Fig. 13.2. Observed mean response profiles for milk protein data, by dropout

cohort.

of Y· , D and V as

f(y, d, u) :::: h (y Iul)h(d IU2)f3(U).

(13.7.7)

In (13.7.7), the dependence between Y· and D is a by-produot of the Uland V 2, or to put it another Way, Y* 'andD are conditIOnally mdependent given U. In terms of Little flJuj".Rubin's

depende~c~ betwe~n

MISSING VALUES IN LONG!

TUDINAL DATA

A LONGITUDINAL TRIAL OF DRUG THERAPIES

h ther or not we recognIze effects are almost always ';ith u~ (:i: formulating a model for that random d I ) and that one conslderatlO h' k what kinds of causal Id be to t i l l . h 'our roo e s , t em I~ , I t ial with dropouts wou ts of random vanables: plausibly,exist U. Each ts Y dropout tImes measuremen, 'd ndence grap h for these three. ranuom . , Fi, 13,3 is a conditional III epe d e between two vertices mdrcates III 'b gles in which the absence of an et,g are conditionally independent varra , 'bl 'n ques IOn I. that the two random vana es I e most general kind of mode rs, r~p. the third (Whittaker, 1990), Th I ft anel whilst the remammg gIven h in the top- e P , , resented by the complete grap, I d deleted correspond to selectIOn, 'h of which has a Slllg e e ge , graphs, eae d effects models, 'd h k' d of thought experiment that Pattern mixture and ran om , ' t d d as an aJ to t e III P' Figure 13,3 IS III en e . d 'd' how to deal with dropouts. 19. the data analyst must conduct III eCI. IllgI'fy'ng assumptions are possible. 'I that any SImp 1 I 13,3(a) represents a dema I compelled to express a model , uld be more or ess Under this scenano, we wo ., d' t 'b t' ns for Y and U conditional on 11 fon of Jomt IS n u 10 , for the data as a'bl co ec I 13 , 3(b) invites interpretatIOn d' crete vaIues 0 f D , F'g 1. each of the pOSS.I ~ IS , d ff ts or latent subject-specific charas a causal cham m whIch ran om e ec Y £ t 't' U influence the properties of the measurement process, ,or ~ee::bj~~t i~ question, with propensity to drop out subsequently det~r mined by the realization of the measurement process. In contrast" F~g, 13,3(c) invites the interpretation that the s~bject-specific ch~racte~rs~lCs initially determine propensity to drop out, wrth a cO,nsequentlal vanatlOn in the measurement process between different, predestmed dropout cohorts, Finally, Fig, 13,3(d) suggests that measurement and dropout p~ocesses are a joint response to subject-specific characteristics; we could thmk of th~se characteristics as unidentified explanatory variables, discovery of whIch would convert the model to one in which Y and D were independent, that is, completely random dropout. Figure 13.3 also prompts consideration of whether selection modelling of the dropout process is appropriate if the measurement sub-model for Y includes measurement error. To be specific, suppose that the measurement sub-model is of the kind described in Section 5.3.2, in which the stochastic variation about the mean response has three components,

304

:e~~~~~~~;~i:ht

;ma:g~~~~:~ :~eets

tij = J.Lij + {d:jU i + Wi(t ij )} + Zij,

305

.

diag.~am

(13.7,8)

The br~cketing of ~wo of the terms on the right-hand side of (13.7,8) is to emphasIse that {dijU i + Wi (t ij )} models the deviation of an individual subject's resp~~se trajectory from the population average, whereas Zij represents additive measurement error, that is, the difference between the ?bserve~ response Yij and an unobserved true state of nature for the subject m qUest,rOD" In a model of this kind, if dropout is thought to be a response to the subJect s true underlying state, it might be more appealing to model the

conditional dropout probability as a function of this underlying state rather than as a function of the observed (or missing) Yij, and the conclusion would be that a random effects dropout model is more natural than a selection dropout model. Even when the above argument applies, a pragmatic reason for cosidering a selection model would be that if a random dropout assumption is sustainable the resulting analysis is straightforward, whereas non-trivial random effects dropout models always correspond to informative dropout because the random effects are, by definition, unobserved. 13.8

A longitudinal trial of drug therapies for schizophrenia

In this section we present an analysis, previously reported in Diggle (1998), of data from a randomized trial comparing different drug therapies for schizophrenia, as described in Example 1.7. Table 13.4 lists the data from the first 20 subjects in the placebo arm, Table 13.4. PANSS scores from a subset of the placebo arm of the longitudinal trial of drug therapies for schizophrenia, from weeks -1, 0, 1,2,4, 6 and 8, where -1 is pre-randomization and 0 is baseline, A dash signifies a missing value,

-1

0

1

2

84 68 108

112 56

112

117

79

87

104

77

69 96 72 106 96

44

89 95 60

79 72 94 104 102 119 94 91 73

113 111 118 89

78 64

76 90 105

84

8

58

70

68

60 84

98

88

98 88

111

99

84 53 75 116 94 153 66 57

95

113

83

108

76

101

53

55

64 86

95

108

98 52 144

47

72

116 90 57

71

64

72

65

103

95 97

110 97

107

102

123 100

113 90

63

77

57

115

121

103

104 91 64 84

88 122

6

4

113 48

80

121

90 52

62

A LONGITUDINAL TRIAL OF DRUG

MISSING VALUES IN LONGITUDINAL DATA

THERAPIES

307

306

' t only 253 are listed as completing the 23 t s a complete sequence of PANSS scores, RecaJl th at , 0 f theh 5 16 po. len 'ded d Ith gh a furt er provi stu y, a o.u d' t 'b t' of the stated reasons for dropout, whilst gives the IS rJ U IOn . each 0 f Tabl e 16 '. h b f dropouts and completers III the'SIX Table 1 7 gIves t e num ers 0 . ,. . . N t th t the most common reason for dropout IS madtreatment groupS. 0 e a . h I b , d th t the highest dropout rate occurs III t e pace 0 equate response, an a d ' 'd group, followed by the haloperidol group and the lowest ose flspen one

100

group.

As shown in Fig. 1.6, all six groups show a mean response profile decretlBing over time post-baseline, with slower apparent rates of decrease

towards the end of the study. Figure 13.4 shows the observed mean response as a function of time within sub-groups corresponding to the different dropout times, but averaged across all treatments. The mean response profile for the compIeters is qualitatively similar to all of the profiles shown in Fig. 1.6, albeit with a steeper initial decrease and a levelling out towards the end of the study. Within each of the other dropout cohorts a quite different picture emerges. In particular, in every case the mean score increases immediately prior to dropout. This is consistent with the information that most of the dropouts are due to an inadequate response to treatment. It also underlines the need for a cautious interpretation of the empirical mean response profiles. Figure 13.4 provides strong empirical evidence of a relationship between the measurement and dropout processes. Nevertheless, we shall initially analyse.the m~asurement data ignoring the dropouts, to provide a basis for comparison with an integrated analysis of measurements and dropouts. From now on, we ignore the pre-baseline measurements because these preceded the establishment of a stable drug regime for each subject Ch T~s phase of the analysis follows the general strategy described in I ap er 5. Wfie first convert the responses to residuals from an ordinary eas t squares t to a model h' h 'fi combination of tl'm d t w IC speC! es a separate mean value for each e an reatment W th . this residual process u d th . e. en estimate the variogram of only on the time-separ' n. er eTh8SsumPt~on that the variogram depends , at IOn u. e resultmg' . III Fig. 13.5, Its salient £eat vanogram estImate is shown 11 Ures are' a sub t t' I . smooth increase with u. . . s an 10. mtercept; a relatively , a maximum value b t . cess variance These feat su s antlally less than the pro, . ures suggest fitt' III Section 5.2.3 whilst th lUg a model of the kind described . .' e general shape f th . consistent WIth the Gau . . a e VarlOgram appears to be sSlan correlation function (5.2.8). Thus,

Q)

rn

c::

o

c. ~

90

c::

rc Q) E

80

70

o

2

4

8

6

Time (weeks)

Fig. 13.4. Observed mean responses for the schizophrenia trial data, by dropout cohort an averaged across all treatment groups.

and

2

Recall that the three variance components in this model are a , the variance of the serially correlated component, 7 2 = a 2 al' the measurement error component, and v 2 = cr2a2' the between subject component. As a model for the mean response profiles J.li(t), Fig. 13.6 suggests a collection of convex, non-parallel curves in the six treatment groups. The

MISSING VALVES IN LON

308

A LONGITUDINAL TRIAL OF DRUG THERAPIES

GITVDINAL DATA

400

300 .1200 ~ 100

oLO---~2---~4---~6;;----';8--~ Lag

Fig. 13.5. The estimated variogram for the schizophrenia trial data. The horizontal solid line is an estimate of the variance of the measurement process. The dlUlhed horizontal lines are rough initial estimates of the intercept and asymptote of the variogram.

400

300

~

.g 200 III

> 100 0 0

2

4

6

Lag

:i :3,6,.

8

g. The estimated variogram for the schizophrenia trial data (~_), oget er With a fitted parametric model ( ).

simplest model consistent with th'Is beh' . aVlour IS /L;(t)

'=

/L

+ 81; + (hi + "Ike:

k = 1, ... ,6,

(13.8.1)

where k '= k(i) denotes the treatment . allocated. A more sophistl' t d . group to which the ith subject is . a constraint that the meanca e non-lInear m0 d e,I perh aps'mcorporatmg . response should b hOrIzontal asymptote as t' . e monotone and approach a lme mcreases , m'Ight be preferable on biological

309

grounds and we shall return to this question in Ch t 14 B ., I d .. f h ap er . ut for a purely ernprnca escnptlOn 0 t e data a low-order p l ' 1 . 0 ynoIDla model should be adequate. Note also that, as dIscussed earlier' thO h . . . III IS C apter, the empirIcal man response traJectones are not estimating th ·(t) d 'd d I h' e /-it ,an should be conSI ere on y as.rou.g ~Ides to their general shape. The focus of SCIentIfic mterest is the set of me . '" an response curves, /l i (t). In partIcular, we wIsh to mvestJgate possible simpl'Ifi cawns t' f h . 0 t e assumed mean . .response . , model by testmg whether correspond'Ing . se t,s 0 f contrasts are ~Igmficantl.y dIfferent fro.m ~ero. As described in Section 5.3.3, two ways of Implementmg tests ~f t~IS kmd are to use the quadratic form To defined at (5.3.14) or the log-hkehhood ratio statistic W k defined at (5.3.15). For the current example, two different kinds of simplification are of potential interest. First, we can ask whether the quadratic terms are necessary, that is, whether or ~ot the six quadratic parameters II; in (13.8.1) are all zero. The quadratIc form statistic is To = 16.97 on 6 degrees of freedom, correponding to a p-value of 0.009, so we retain the quadratic terms in the model. Second, in Fig. 1.6 the differences amongst the mean response curves for the four risperidone treatments were relatively small. Also, when we fitted the model (13.8.1) we found that the estimated values of the risperidone parameters were not related in any consistent way to the corresponding dose levels. This suggests a possible simplification in which all four risperidone doses are considered as a single group. Within the overall model (13.8.1) the reduction from the original six to three groups is a linear hypothesis on 9 degrees of freedom (3 each for the constant, linear and quadratic effects). The quadratic form statistic is To = 16.38, corresponding to a p-value of 0.059. We therefore arrive at a model of the form (13.8.1), but with k = 1,2 and 3 corresponding to haloperidol, placebo and risperidone, respectively. The parameter estimates for this model are shown in Table 13.5. Note that the treatment code is 1 = haloperidol, 2 = placebo, 3 = risperidone, and that estimation of the mean response parameters incorporates the constraint that 61 = 0, that is, 62 and 63 represent the differences between the intercepts in the placebo and haloperidol groups, and in t~e the risp~ridone and haloperidol groups, respectively. The ran~om all~catlOn of patIents t~ treatment groups should result in estimates 62 and (h close to zero, an this is indeed the case, both estimates being less than two standard erro~s in magnitude. The evidence for curvature in the mean response profiles ~s not strong; the log-likelhood ratio test of the hypothesis that the quadr~tIc parameters are all zero is D = 8.72 on 3 degrees of freedom, correspondmg to a p-value of 0.033. How well does the model fit the data? Figure 13.6. shows . th~ cor. an d parame t ric maxImum hkelihoo d respondence between non-parametnc estimates of the variogram. The fit is reasonably good, in that. the valu~s of V (u) are closely modelled throughout the observed range of tIme-lags m

MISSING VALUES I

A LONGITUDINAL TRIAL OF DRUG THERAPIES

N LONGITUDINAL DATA

311

ent model fitted to d th measurem 5 Parameter estimates fo~ medropout a,.'lsumption and un er 13 Table . ., . I data , under ran 0 h nia t na mption the SChIZOp re . d ro pout assu informatIve SE Parameter Estimate

310

Mean response t2 , k !J>i(t) = f.l + 15k + 8k t + 'Yk .

= 1,2,3

!J>

01 02 03 81 B2 83 1'1 1'2 1'3

88.586 0 0.715 -0.946 -0.207 0.927 -2.267 -0.113 -0.129 0.106

0.956 0 1.352 0.552 0.563 0.589 0.275 0.081 0.088 0.039

95

90 \ \ \ \

\

.............

\

, Q)

UI

85

c

0 C-

\

"

\

'\, " \

l!?

"

"

"

",

\

,

\

C1l

....................."'\\...

"

\

c

....,.......

,

\

UI

.•....

\.,\

\

Q)

~

.....

\

\ \

80

\ \ \

Covariance structure 2)} ,(u) =a2{al+1-exp(-a3u Var(Y) = q2(1 + al + a2)

CJ2

al a2 a3

170.091 0.560 0.951 0.056

the data. Notice, however, that the estimate of 01 involves an extrapol~~ tion to zero time-lag which is heavily influenced by the assumed parametn form of the correlation function. . e Figure 13,7 shows the observed and fitted mean responses m t~ haloperidol, placebo and risperidone groups. On the fac~ of i~, t~e fit IS qualitatively wrong, but the diagram is not comparing hke wIth hke. As discussed in Section 13.7, the observed means are estimating the mean response at each observation time conditional on not having dropped out prior to the observation time in question, whereas the fitted means are actually estimating what the mean response would have been had there been no dropouts. Because, as Fig. 13.4 suggested and as we shall shortly confirm, dropout is asociated with an atypically high response, the non-dropout subpopulation becomes progressively more selective in favour of low responding subjects, leading to a correspondingly progressive reduction in the observed mean of the non-dropouts relative to the unconditional mean. Note in particular that the fitted mean in the placebo group is approximately constant, whereas a naive interpretation of the observed mean would have suggested a strong placebo effect. It is perhaps worth emphasizing that the kind of fitted means illustrated in Fig. 13.7 would be produced routinely by software packages

\ ~,

75

70

~

~ o

2

4 Time (weeks)

6

8

. t he pace I b0 (•••: •• ) , haloperidol (( - ) Fig. 13.7. Observed and fitted means III ared .. p . WIth fitted ts means ...... , and risperidone (------) treatment groups, com and - - - - - -, respectively) from an analYSIS Ignormg dropou .

. corre1at ed data.. They the correresult which include facilities for modellmg the are likely ' d 1 hich recogmzes 1of .a lIkelihood-based fit to a mo e tsw on t h e same subJ'ect but treats ation between repeated me~ureme~ nd Rubin sense. As discussed lllissing values as ignorable III the LIttle a t be appropriate to ' Section 13.7, estimates 0 f t h'IS k"III d , mayor may no In l' t 'on but in any event th . tIcular app ICa 1 , d e scientific questions pose III a par d and fitted means surely the qualitative discrepancy between the observe deserves further investigation,

A LONGITUDINAL TRIAL OF DRUG THERAPIES

MISSING VALUES TN LONGITUDINAL DATA

312

of measurements and dropouts. d t joint ana Iy. S IS' . . We therefore procce 0 a. 134' which the mean response WIthIn . .' In ase I'mmediately before dropout, T he empirical behaviour of FIg. ' h sharp mcre .' , each dropout cohort 8 ows ~ derance of 'inadequate response as coupled with the overwhelmmg prePtonmo'delli~g the probability of dropout £i dropout sugges s the stated reason or ':I e that is a selection model. In the . of the meEl8urec respons " . as a ,functIOn . 'm Ie 10 istic regression model for dropout, WIth the first Instance, we fit a SI p g lanatory variable. Thus, if Pi) denotes t asurement as an exp . most recen me . t' d ops out at the jth time-pOInt (so that the the probability that patIen z r . . jth and all subsequent measurements are mIssIng), then

logit(pij)

= ¢o + ¢lYi,j-l.

(13.8.2)

The parameter estimates for this simple dropout model are ¢o = -4.~17 and ~1 = 0.031. In particular, the positive estimate of 1>1 confirms that .hl~h responders are progressively selected out by the dropout process. W1thm this assumed dropout model, the log-likelihood ratio statistic to test the sub-model with (it = 0 is D = 103.3 on 1 degree of freedom, which is overwhelmingly significant. We therefore reject completely random dropout in favour of random dropout. At this stage, the earlier results for the measurement process obtained by ignoring the dropout process remain valid, as they rely only on the dropout process being either completely random or random. Within the random dropout framework, we now consider two possible extensions to the model: including a dependence on the previous measurement but onei and inclUding a dependence on the treatment allocation. Thus, we replace (13.8.2) by (13.8.3)

where k :::: k(i) denotes the treatment allocation for the ith subject ~ot.h exte~sions yield a significant improvement in the log-likelihood, ~ mdlcated III the first three lines of Table 13.6.

Table 13.6. Maximized log-likelihoods under different dropout models. logit(pij) Log-likelihood (30

+ (3!Yi ,j-I

+ (3IYi,j_1 + (32Yi,j-2 + (3IYi,j-l + (32Yi,j_2 (301< + "fYij + {3IYi.j-l + (32Yi,j

Finally, .we test ~he random dropout assumption by embedding (13.8.3) within the InformatIve dropout model logit(pij)

= /3ok + "YYij + 131Yi,J-1 + 132Yi.J-2.

(13.8.4)

FraIn lines 3 and 4 of Table 13.6 we compute the log-likelihood ratio statistic to test the ~ub-model of (13.8.4) with') = a as D = 7.4 on 1 degree of freedom ThIS corresponds to a p-value of 0.007, leading us to reject the random dropout assumption in favour of informative dropout. We emphasize at this point that rejection of random dropout is necessarily pre-conditioned by the particular modelling framework adopted, as a consequence of the Molenberghs et al. (1997) result. Nevertheless, the unequivocal rejection of random dropout within this modelling framework suggests that we should establish whether the conclusions regarding the measurement process are materially affected by whether or not we assume random dropout. Table 13.7 shows the estimates of the covariance parameters under the random and informative dropout models. Some of the numerical changes are substantial, but the values of the fitted variogram within the range of time-lags encompassed by the data are almost identical, as is demonstrated in Fig. 13.8. Of more direct practical importance in this example is the inference concerning the mean response. Under the random dropout assumption, a linear hypothesis concerning the mean response profiles can be tested using either a generalized likelihood ratio statistic, comparing the maximized log-likelihoods for the two models in question, or a quadratic form based on the approximate multivariate normal sampling dis~ribution. of the estimated mean parameters in the full model. Under the mformatIve dropout assumption, only the first of these methods is available, ~ecause the current methodology does not provide standard errors for the estImated treatment effects within the informative dropout model. Under the random dropout model (13.8.3), the generalized likelihood ratio statistic to test the hypothesis of no difference between the three mean . D = 42.32 on 6 d egrees 0 f freedom , whereas under the response profiles IS Table 13.7. Maximum likelihood estimates ~ covariance parameters under random dropout an informative dropout models. Parameter

(30

(301<

2

-20743.85 -20728.51 -20724.73 -20721.03

313

Dropout Random Informative

170.091 137.400

0.560 0.755

0.951 1.277

0.056 0.070

A LONGITUDINAL TRIAL OF DRUG THERAPIES

LONGITUDINAL DATA . MISSING VALUES IN ~ __ .._

314

315

95

400

E

300

i 200

90

'I:

~

, \

100

. \~~ .. ".

,

\

,,

OL------::---------;4~----66---~88- 2 o Lag

\

....... \

85

,

\

" Q)

en c

0 0-

,,

~

( 8 4) th corresponding statistic is D = informative dropout model 13.., e profiles differ significantly 35.02. The conclusion tha~ the mean response The estimated treatment

\

Q)

.,

.\'\

'. ....

\'

\\ ,

c

~

\

, \, \\,\

~

..........

\

en

the schizophrenia trial data ( - ) , Fig. 13.8. The estimated ~ariogra~ or 'ng random dropouts ( ) and together with fitted parametflc mode s assuml informative dropouts (......). f

...........

\

80

\

,

\

\ .....

,,

, ~

\ ...

-- --

75 ............

~=~~"::::f:,:,::~~::::;y~~:~~7::~~their estimat~dl'tan~M~

errors when we move from the random dropout model to the m orma :h dropo~t model. For each parameter, the absolute difference betwe~n .e estimates under informative and under random dropout assumptIOns IS less than the standard error of the estimate under the random dropout assumption. Finally to emphasize how the dependence between the measurement ., processes affects the interpretation of the 0 bserved mean and dropout response profiles, Fig. 13.9 compares the observed means in the three treatment groups with their simulated counterparts, calculated from a simulated realization of the fitted informative dropout model with 10 000 subjects in each treatment group. In contrast to Fig. 13.7, these fitted means should correspond to the observed means if the model fits the data well. The correspondence is reasonably close. The contrast between Figs. 13.7 and 13.9 encapsulates the important practical difference between an analysis which ignores the dropouts and one which takes account of them. By comparison, the distinction between the random dropout model (13.8.3) and the informative dropout model (13.8.4) is much less important. In summary, our conclusions for the schizophrenia trial data are the follOWing. Figure 13.9 demonstrates that all three treatment groups show a downward trend in the mean PANSS Score conditional on not having

"-

70

-- '.

~

~ o

2

4 Time (weeks)

6

8

) haloperidol (--) and risperiFig. 13.9. Observed means in the placeboJ";jOth simulated means conditional done (- - ~ - - -) treatment groups, compare and , respectively), from an on not yet having dropped out (...... , informative dropout model. b t en . . the mean PANSS score e we dropped out of the study; the reductlOn7~8 If we want to adjust for the baseline and 8 weeks is from 86.1 to . ~ fitted means are those shown selection effect of the dropouts, the releva~ANSS score is almost constant in Fig. 13.7. In this case, t~e fitted n:e~~wnward trend in each ~f the ~~ in the placebo group but stIll shows , hange between baseline an c 86 5 to 71.4. I n VIe . w of the f m active treatment groups. The ' 0 n - average . . . P is now ro . . d t weeks In the flspendone grou h It of an ma equa e response, fact that the majority of dropou t s are t e resu

DISCUSSION

. LONGITUDINAL DATA MISSING VALUES IN

317

." gP. PANSS ~l:Or('. WI' I bserved avUd ' " r . t this artificially deflates t,w (~, ' '1'37 Ilrp morr' appropnatr' 1lI( ,land tha, " t.h fitted curves In F II!;, " j s t Inn are those' III .f the treatnwIl., . , 11 u(' that ,c ". wou ( arg , 'I' .h mi('a! effN:t,ivene..ss 0 , " , 'hipves H lower mean .. t 'of tJw llOe e " . 'spendone , I e , , ea ,ors Fr ('I'tlwr IJOlllt of VIPW, rJ, l t.' 'II 'l!Id tlip estllnatpd F' 13.!J, om·,· . 'hout tile ,1'1<, ' ,Ii' score than weeks dose to tllf' 20 Yr, t ]wtwpcn basehne. ,L! .1' ''''II'Hl!J/'OV('lllenl, Vii Ith ' .J Win rCf UC lOn ., -t 't ing a e 111)( " m '~ ~hi('h is regarded a,s demons ',ra" . 'fi' ant variation in the dropout teflon, " then'! IS SIg11I C. , tI ' 1, 'tl 1 till' higllest rates III, ,\(. reganI o·tll ('. r]ro!JOut process, , .t .groups, WI. ' . , t' es between the three trratmen., '1 nL> O'roup. There is also slglllfra, . . I ' t in the I'lspenr 0 ' M i I 'b group and the owes ' , l ,ii' !rarrw1/Jork r'('plY'8('n,f(~r pace 0 . . h' thf'. sdf'.dzrm m.or (, m y , I'. icant evidence that, 11Jzt.m. h ' m is infonnatlve, Howpver, t lIS " 's b'/ equation (J,J, 8,4/) " the, dropout Imec ' ams . egarding the mean PANSS " score, n . II ff 't the cone mllons r ' , does not matena y}l ec ., , d ' andom dropout assumptIOn, , ' h th un er a I' by companson WIt ,e,analYSIS ,

316

P~NSS

IwlopeTlrl(~J ~I~I:~t~

COllles

;:n-

C)

13.9 Discussion

. ' owin literature on dealing with dropouts ll1 longltud, There , . thl's chapter is necessarily incomplete, ' I t ISd'efast-gr and t IIeg revIew III " , ma SUI S, ' f hat different perspective to ours, IS A useful introductIOn, rom a somew Little (1995). 'h t 1.' By An emerging consensus is that analysis of data WIt po ~n Ia ' lt informative dropouts necessarily involves assumptlOns W h'IChare dIfncu . ( 97)' or even impossible, to check from the observed da.ta. Copas and LI 19 reach a similar conclusion in their discussion of lllference based on nonrandom samples, This suggests that it would be unwise to rely on the precise conclusions of an analysis based on a particular informativ~ ~ropout model. This of course should not be taken as an excuse for aVOldmg the issue, but it may well be that the major practical value of informative dropout models is in sensitivity analysis, to provide some protection against the possibility that conclusions reached from a random dropout model are critically dependent on the validity of this stronger assumption. The discussion published alongside Scharfstein et ai. (2000) contains a range of views on how such sensitivity analyses might be conducted, One striking feature of longitudinal data with dropouts, which we first encountered with the simulated example in Section 13.3, is the frequent divergence of observed and fitted means when dropouts are ignored. This arises with likelihood-based methods of analysis when the dropout mechanism is not completely random and (as is almost universal in our experience) the data are correlated. In these circumstances the likelihood implicitly recognizes the progressive selectivity of the ~on-dropout sub-population and adjusts its mean estimates accordingly, In fact, this represents a rather general counterexample to the commonly held view that ordinary least squares regression, which ignores the correlation structure of

the data, usually gives acceptable estimates of the mean response function, As WP have emphasized in the earlier sections of this chapter, the difficulty with ordinary least squares in this context lies not so much in the method itsp]f (which is, of course, unbiased for the mean irrespective of the true correlation structure), but rather in a failure to specify precisely what is the rpCjuired estimand. Our discussion in this chapter has focused on linear, Gaussian measurPHlf'nt models, both for convenience of exposition and because it is the context in which most progress has been made with respect to the analysis of data with potentially informative dropouts. However, the ideas extend to discrete response models, as do the attendant complications of interpretation, Kenward et al, (1994) and Molenberghs et al. (1997) discuss selection models for discrete longitudinal data under random and informative dropout assumptions, respectively, Fitzmaurice et al. (1996) discuss both random and informative dropout mechanisms in conjunction with a binary response variable, with particular emphasis on the resulting problems of non-identifiability, Follman and Wu (1995) extend random effect models for dropout to discrete responses, " Generalized linear mixed models (Breslow and Clayton, 1993) proVIde an extension of linear Gaussian selection modelling ideas to discl:ete or categorical response as follows. Consider the linear model for a contllluous response variable, v. 1 1) --

1/. t"'1)

+ d'U i + Wi(t ij ) + Zij, 1)

, h' h .. - E[Y:·] U· is a vector of random effects for the ith subIII w lC Pl) 1)' 1 , . TtY (t)' serially correlated ject with associated explanatory vanables d i ), 1 IS a 'd d 'h b' t d Z· is a set of mutually 1ll epen random process for the zt Sll Jec an 1) d set ofh' be trivially re-expresse as a ent measurement errors, T IS can . U d the Wi(t) as conditional distributions for the Yij gIven the ,an

Yij IVi, Wi(t)

,...... N(j.Lij

/ + dijUi + uri (t .. ) VI

1)'

7 2)

,

(13.9,1)

2 . mutually independent conditional where 7 = Var( Zij) and the Y;j ale. . t eneralized linear mixed on the U i and the Wi (t). To turn thlS 1ll 0 ~ g ly replace the Gaussian . I esponse we Slmp h model for a discrete or categonca r .' d' t 'bution for example t e . . . . ropnate IS n , 't ble link function to transcondItIOnal m (13.9.1) by a mOIl" app. P . t ther WIth a SUI a f h OlSson for a count response, oge I W.(t), onto the whole ate f: h I' d' t I I " + d..U i + 1 1) 't lOrm t e mear pre lC or, 'flij - t""1) 1) uld then pOSl a m odel dropout we co · t real line. To allow for informa Ive , d'ng dropout probab'l I. h 1.. and correspon 1 . f for the relationshlp between t e 'fl ) . d'ctor '1'1 is a functIOn 0 1.1 e Imear pre 1 '/ h ' h h 1 f 1 for thinking about t e ities. Models of this kind, in w lC . . J:use u tive dropout behaVIOur. Unobserved random variables, may be very k' . h ld 'nduce llllorma f IUds of mechanisms WhlC cou 1 ., 1 '11 is a form 0 rand 0 m Modelling the probability of dropou t condltIOna on '/

NCITUDINAL DATA VALUES IN 1.0 J MT uSTNC ,-"

;J18

, ' 1'}.J. 7'} Brcause TJ is a fUIH"tion of I f I in SectIOn ,.J, " effects morldling fiB ( e ~ne( , " , j W(t) these models generally f'qlllitl' , , • nabJefl U itnf unobserved ran(Iom V.I " I'" 'rks about thr Iw(,d to cOlisHler • J . t lind ear leI lema ' to informatIve (ropou., , , .:fi' assllmlJtions about /.Iw dropout ," f. e1uslOnH to Spf,(! (, , . y 0 con" I !vI f (Ia'lnc'ntally the tWnHl,tlvlt , ' t f apr> yore un . , , the exawple of Murm(lchanJflm contmue) .. h,t 1 ut can in sowl' cases be J (19H8) [(·mmds us t .1 (ropo ray lind Fmd ay, d' ,., ents whether or not these include a dimdly related to observe trWiJ.~IlI(,m ' , t error component. ,. meWmrAllTlCTfl tl' I' C'Vl" dl'ucussion is only indirectly. re.,levant to tlw I,JrobMUl: I 0 ,Ill a) J . h' j'ff"' ' , ' 'tt t 'ssing values willch can be rat (r (I crcnt In Il~m8 raIsed by mt.erml , ,em , mJ. , character from dropouts, , . ,_ ' 'tt t missing valueR can arise through explJCltly stated cenSOrIng Int,erml. en .. 'I l' bl rules, For example, values outside a stated range m,ay be ,SImp y unr~ I~ e because of the limitations of the measuring techmques ,m use tll1S IS a feature of many bioassay techniques, Methods for handlIng censored data are Vf~ry well eRtabliRhed in survival analysis (e.g. Cox and Oakes, 1984; Lawless, 1982), and have been developed more recently for the types of correlated data structures which arise with longitudinal data (Laird, 1988; Hughes, 1999), When censoring is not an issue, it may be reasonable to assume that intermittent missing values are either completely random or random, in which case a likelihood-based analysis ignoring the missing values should give the relevant inferences, An exception would be in longitudinal clinical trials with voluntary compliance, where a patient may miss an appointment because they are feeling particularly unwell on the day, From a practical point of view, the fact that subjects with intermittent missing values remain in the study means that there should be more opportunity to ascertain the ~e!lBons for the missingness, and to take corrective action accordingly, than IS often the case with dropouts who are lost to follow-up. Perhaps the most important conclusion from this chapter is that if dropouts are, not completely random we need to think carefully about what the relevant mfel'e~c:s are, Do we want to know about the dropout process; or ,about the condItIonal distribution of the measurements given that the um\~~ n~t dropped out; or about the distribution that the measurements wou tlO,l.ow in the ,absen,ce of dropouts? Diggle and Kenward (1994) use d len model t.o lllvestJgate f h some 0 t e consequences of informative dropouts USI'ng - ltd d , Slmu a e ata th h d h mative dropouts d d' ey s owe t at wrongly treating inforas ran om ropouts int d b' . 1'0 uces las m~o p~ra~eter estimates, and that likelihood-based mative dropout processes f d methods can be used to IdentIfy mforrom ata sets of l' t' , a rea IS Ie SIze. A much more difficult problem is to I'dent'f' , I Ya ulllque model f 1 or any rea dataset, where all aspects of the model are u k , _ h n nown a prwi'/. and random or informative dropout ' as we ave suggested above, time-trends in the mean respon~:~cesses may be partially confounded with l'

0',"

14 Additional topics

"

The ?eld of lon~itudinal~ata analysis continues to develop. In this chapter, we gIve a short mtroductlOn to several topics, each of which can be pursued in greater detail through the literature cited,

14.1

Non-parametric modelling of the mean response

The case study on log-bodyweights of cows, discussed in Section 5.4, introduced the idea of describing the mean response profile non-parametrically_ There, the approach was to use a separate value for the mean response at each time-point, with no attempt to link the mean responses at different times. This assumes firstly that the complete mean response curve is not of direct interest, and secondly that the times of measurement are COIDIDon to all, or at least to a reasonable number, of the units, so as to give the required replication at each time, An instance in which neither of these assumptions holds is provided by the CD4+ data of Example 1.1, For these data, the CD4+ count as a function of time since seroconversion is of direct interest, and the sets of times of measurement are essentially unique to each person in the study. To handle this situation, we now consider how.to fit smooth non-parametric models to the mean response profile as a,fu~ctlO~ of time, whilst continuing to recognize the covariance structure wlt~m umts, In Section 3.3 we discussed the use of smooth, non-parametrIc curves as exploratory to~ls, drawing mainly on well-established methods for cros:sectional data. If we want to use these methods for confirmatory analys~ , we need to consider more carefully how the correlation structure of longIt· , , fh much to smooth the data, ud mal data impinges on conSIderatIOns 0 ow and how to make inferences, , tal treatTo develop ideas we assume that there are no experImen t d ' 'bl Th data can be represen e as ments or other explanatory vana es, e b f ure.. to.) '_ . '_ 1 m} where ni is the Dum er 0 meas {(Y1J' 'J , J - 1, ... , ni, z.- , ... , . ' _ ~m n' for the total number ments on the ith of m umts. We WrIte N - £...i=I ' of measurements. Our model for the data takes the form

320

NON-PARAMETRIC MODELLING OF THE MEAN RESPONSE

ADDITIONAL TOPICS

,_ re independent copies of a stationary where {ci(t), t E R} fan -: 1,.,:, m a 2 and correlation function p(u), and a (t)}, wIth varIance random process, {E th f nction of t t) is a smoo u " . ( J' non parametric estImate of J1(t) IS the mean response, Jl , , d' t 't'vely appea mg, ~ sImple, an mUll data with large weights given to measurements t' t ""0 implement this idea, we define a a weIghted aver~ge of theI ' t whICh are case 0 . .11 , at t Imes ijt ' () b mmetric , non-negative valued functIOn I< u to easy kerne 1f unc zan, I" t _ 0 and small values when Iu I is large. For taking large values c ose a u , . what 11rollows we use the GaussIan kernel, exampIe, III I«u) = exp( _u 2 /2),

(14.1.1)

, Sect'10n 3"3 Now , choose a positive number, h, and define weights as III (14.1.2)

Note that w~.(t) is large when tij is close to t and vice versa, and that the rate at ~iuch the w;'(t) decrease as Itij - t I increases is governed by the value of h, a small :aJue giving a rapid decrease, Define standardized weights,

so that 2:::'12::;:"1 Wij(t) == 1 for any value of t, Then, a non-parametric estimate of Jl( t) is

321

which controls the ?verall degree of smoothing and takes on the role played by the constant h m non-adaptive kernel estimation. While the choice of h has a direct and maJ'or impact th I' on e resu tmg estimate J1( t), the chOIce of ~ernel function is generally held to be of secondary importance. ~he GaussIan kernel (14,1.1) is intUitively sensible but not uniquely compellmg. Estimators of this kind were introduced by Priestley and Chao (1972) for independent data, and have been studied in the longitudinal data setting by Hart and Wehrly (1986), Muller (1988), Altman (1990), Rice and Silverman (1991), and Hart (1991). From our point of view, the most important question concerns the choice of h. In some applications, it will be sensible to choose h to have a particular substantive interpretation; for example, smoothing over a 'natural' time-window such as a day if we wish to eliminate ciradian variation. If we want to choose h automatically from the data, the main message from the above-cited work is that a good choice of h depends on the correlation structure of the data, and in particular that methods for choosing h based on a false assumption of independence between measurements can give very misleading results, Rice and Silverman (1991) give an elegant, cross-validatory prescription for choosing h which makes no assumptions about the underlying correlation structure. Their detailed results are for smoothing spline estimates of M(t), but the method is easily adaptable to kernel estimates. For ~ g.iven h, let p,(k>(t) be the estimate of Jl(t) obtained from (14.1.3) but OIruttmg the kth subject, thus A

'

ni Tn

p,(t) =

p,(k) = LLwi;)(t)Yij, i#j=1

ni

L L Wij(t)Yij' i==1 j==l

(14.1.3)

A useful refinement of (14.1.3) is an adaptive kernel estimator in which wthe re~lace ~he cons~ant, h, by a function of t such that h(t) is s~all when ere IS a high densIty of data I t t d' ' h f case a ,an VIce versa. This is consistent wI'th th e vIew t at or data h' h h' h . times th b d w IC are Ig ly replIcated at a fixed set of , e 0 serve average at tim t' ate of Jl(t) without an t. e IS a reasonable non-parametric estimis, in effect setting / ~m~o~Illg ov,:r a range o~ neighbouring times, that ative behaviour can be h' so, USlUg a functIOn h(t) with this qualit. s own more gener 11 t . Jl(t). See, for example, Silverman 19 a y 0 Improve the estImates of kernel estimate is approxim t eI ( ~4) who demonstrates that a variable lent to a smoothing spline. In the remainder of this section w:c Y of h, The essential ideas ~pply ollSlIter Ihn detail the case of a constant value a so 0 t e case f d t' L where, typically, the function h(t) is i d 0 a ap Ive r;.ernel estimation n exed by a scalar quantity, b say,

::w:a

where wiJ>(t) = wij(t)/ {Eiik E;~l wij(t)}. The~, the Rice and . . . the qu antlty Silverman prescription chooses h to mlrumlze

S(h) =

ff

2

{Yij - P,(i)(tij)}

•

(14.1.4)

i=1 j=l . that it is estimating the mean The rationale for minimizing (14.1.4 ) IS h d ' points t·· up to , 'J , across t e eSlgn d ( ) square error of pet) for It t average h 'I1 this write an additive constant which does not depend on . 0 see ,

E [{y;; -

jl(;) (t;;)}']

~ E [ ({Yl' - ,,(t;,)) + {,,(t;,) - fie;) (~j)} )']

NON-PARAMETRIC MODELLING OF THE M ADDTTIONAL TOPICS

322

. d .d . to the following three terms: and expand the rl,e;ht-han SI e In

+ 2E [{1J. lJ

, ,)}2] E [{Yij - IJ. (t 'J

+ E [{/l( tij) -

- p(tij)} i/l(tiJ) - il(' 1(t'J)}]

() 2] .

P, '}

he first ofthe three terms in this expression is eq~al to Var(YiJ) and Now, t d I n " while the second is zero because E( Yij) = J.t( tij) and does not epene 0 ., • . llij is independent of p,(i)(tij), by constructIOn. Thus, E [hiij - jJ.(i)(tij)}2] = Var(Yij)

+ MSE(i)(tiJ , h),

(14.1.5)

where MSE(i)(t, h) is the mean square error of ('l(i)(t) for J.l(t). Substitution of (14.1.5) into (14.1.4) gives the result.. . . . One interesting thing about the Rice and SIlverman preSCrIptIOn IS that it does not explicitly involve the covariance structure of the data. However, this structure is accommodated implicitly by the device of leaving out all observations from a single subject in defining the criterion S(h), whereas the standard method of cross-validation would leave out single observations to define a criterion

Direct computation of (14.1.4) would be very time-consuming with large n~mbers of subjects. An easier computation, again adapted from Rice and SIlverman (1991), is the following. To simplify the notation, we temporarily suppress the dependence of Wij(t) on t and write w' = ~ni W" so that L.."J=l 'J' 4."i=1 Wi == 1. Note that

"m

"

YI),,-tJ(i)(t)_ .... .) -

== Yij ==

{/l.' (tij)

Yij -

{fi,(t

-

{Yij -

X

ij ) -

A(tij )}

{t J=I

~ ~

-

t

WijYij

} / (1 - Wi)

WijYij}

{I + wd(1 - Wi)}

J=l

+ {w;/ (1 -

WijYij!Wi -

Wi)}

A(ti j )}

•

(14.1.6)

Using (14.1.6) to compute S(h) 'd avO! s the need £ l' . th em 1eave-one-out estimates, A(i)(t). Fu or exp lCI: computation of rther, computatIOnal savings can

EAN RESPONSE

323

be made by collecting the time-points t,· int d . ' 'J' 0 a re uced set say t r _ 1, ... , p, and computmg only the p estimates //(t ) C h 'I . '" C , t'" ,. lor eac va ue of h . To rna k e mlerences about M(t), we need to know th . . e c,ovanance structure f the data. Let V be the N x N bl k-d' a _ m OC lagonal covanance matrix of the data, where N - Li=I ni· For fixed h, the estimate r/(t) d fi db ' ne y (14 .1 ,3) is a lmear com b'mat'IOn 0 f the data-vector y say' (.t),.. _ e()' _ 1 J1 . - w t y. Now, for p values t,., r - '.... let I-t be the vector with rth element' (t ) and W the p x N matnx WIth rth row w(t,.). Then, J1 ,.,

,.p,

A

'

it == Wy,

•

(14.1.7)

and it has an approximate multivariate Gaussian sampling distribution with

E(jL)

= WE(y)

and Var(ft) = WVW'.

(14.1.8)

Note that in general, {L(t) is a biased estimator for M(t), but that the bias will be small if h is small so that the weights, Wij, decay quickly to zero with increasing Itij - t I. Conversely, the variance of jl(t) typically increases as h decreases. The need to balance bias against variance is characteristic of non-parametric smoothing problems, and explains the use of a mean-squared-error criterion for choosing h automatically from the data. In practice, a sensible choice of h is one which tends to zero as the number of subjects increases, in which case the bias becomes negligible, and we can use (14.1.8) to attach standard errors to the estimates jl(t). The covariance structure of the data can be estimated from the timesequences of residuals, rij = Yij - P(tij). In particular, th.e empirical variogram of the rij can be used to formulate an appropr~ate model. Likelihood-based estimation of parameters in the assumed covarIance struc. al pom . t 0 f view , the ture is somewhat problematical. From apractIc computations will be extensive if the number of subjects is large and t~ere is little replication of the observation times, tij, which is pre~isely the SItuation in which non-parametric smoothing is most likely reqUIred. Also, the ab sence of a parametric model for the mean pu t s the inference on a rather . shaky foundation. Simple curve-fitting (moment) estimates of covarIance parameters may be preferable.. 1418) for the A final point to note is that expreSSIOns (14.1.7) and ( 'd' th value m . ' f me that both Vane ean vector and varIance matnx 0 JL assu h iF t of estimating whilst of h are specified without reference to the data. Tee. ec. 1 1S arge, Parameters in V will be small when th e nu mber of umts all hen JL(t) is the effect of calculating h from the data should also b~ sm llw nderstood a smooth function although the theoretical issues are ess we u A

NON-PARAMETRIC MODELLING OF THE MEAN RESPONSE

ADDITIONAL TOPICS 324

. . detailed discussion of these and other here. The references CIted above gIve issues. I . can be extended to incorporate .d how the ana ySls . We now comll er arametric analysis, Wf> can simply . I t tments For a non-p b' ts from each treatment group. From expenmenta ,rea .. th d separately to su Jec app Iy 1,he me, 0 f' .t' referable to use a common value of the . t' int a VIeW I IS P l' • • an mte~pre Ive po ,Wh' the data are from an experiment companng smoothmg constant, hI.. en uared-erro r criterion (14.1.4) for choosing several treatments, t e mean-sq . . . ,t . . f t 'butions from wlthm each treatrnen gIOUp. h consists of a sum 0 can n . . . f h • 1, 1, • estimated from the emplflcal vanogram a t e Covanance s rue ure IS . . . I t reatment . reSIduals pool ec across ,. groups , assummg a common covanance . . structure in all groups. Another possibility is to model treatment contrasts ~arame~ncal~y usmg a linear model. This gives the following semi-parametnc specIficatlOn of a model for the complete set of data,

(14.1.9) where reij is a p-element vector of covariates. This formulation also covers situations in which explanatory variables other than indicators of treatment group are relevant, as will be the case in many observational studies. For the extended model (14.1.9), the kernel method for estimating J1.(t) can be combined iteratively with a generalized least squares calculation for {3, as follows:

1. Given the cur:ent estimate, (3, calculate residuals, rij = Yij - X~j{3, and use these III place of Yij to calculate a kernel estimate, p,(t). 2. <;?iven p., calculate residuals, rij = Yij - fl(t), and update the estimate {3 using generalized least squares,

325

Example 14.1. Estimation of the population mean CD4+ curve For the CD4+ data, there are N = 2376 observations of CD4+ II b ' . ce num ers on m = 369. men mfected WIth the HIV virus Recall th t t" . . . a Ime IS measure d in years WIth the ongm at the date of seroconversion h' h'IS k nown . . " , W IC approximately for . each mdlvldual. As in our earll'er , paramettiC ' ' analYSIS of these data, we mclu~e the follOWing explanatory variables in the linear part of the model: smokmg (packs per da!); recreational drug use (yes/no); numbers of sexual partners; and de~resslve symptoms as measured by the CESD scale. For the non-parametnc part of the model, we use a mildly adaptive kernel estimator, with h(t) = b{f(t)}-025, where f(t) is a crude estimate of the local density of observations times, t ,J , and b is chosen by cross-validation, as described above. . Figure 14.1 shows the data with the estimate fi(t) and pointwise confidence limits calculated as plus and minus two standard errors. The standard errors were calculated from a parametric model for the covariance structure of the kind introduced in Section 5.2.3. We assume that the variance of each measurement is 7 2 + a 2 + 1/2 and that the variogram within each unit is ,(u) = 7 2 + a 2 {1- exp( -au)}. Figure 14.2 shows the empirical variogram and a parametric fit obtained by an ad hoc procedure, which gave estimates f2 = 14.1, 0- 2 = 16.1, f)2 = 6.9, and eX = 0.22. Figure 14.1 suggests that the mean number of CD4+ cells is approximately constant at close to 1000 cells prior to seroconversion. Within the first six months after seroconversion, the mean drops to around 700. Subsequently, the rate of loss is much slower. For example, it takes nearly three years before the mean number reaches 500, the level at which, at the

2500

where ?C is the N. >< p matrix with rows X~j' V is the assumed blo.ck-diagonal covarIance matrix of the data and r is the vector of reSIduals, rij.

3. Repeat steps (1) and (2) to convergence. This algorithm is an exa I f th Hastie and Tibshirani 19~6 eo. e back-fi~ting ~lgorithm described by there is a n f (d' ). Typically, few IteratlOns are required unless ear-con oun mg betw th r of the model. This might well ha ee: . e mear and non-~arametric parts nomial time-trend 'In the l' pp n If, for example, we mcluded a polymear part Furth d t '1 . a discussion of the asym t t' ' . er e al s of the algOrIthm, and given in Zeger and Diggl~(~~~4P)roPdertles of the resulting estimators, are an Moyeed and Diggle (1994).

Q;

.c

~

c 'ijl 1500

o

+

2!i ()

-2

o Years since seroconversion

. t and pointwise confidence Fig. 14.1. CD4+ cell counts with kernel estlma e limits for the mean response profile.

NON-LINEAR REGRESSION MODELLING ADDITIONAL TOPICS 326

327

variance (72, whilst the mean response function 11(.) , I' , '''''' ,IS a non- mear function of explanatory varIables Xi measured on the ith sub' t d Jec an parameters . . 1 I (3 For examp Ie, an exponentIal growth model wI'th ' . a smg e exp anatory variable x would speCIfy

40

30

Some non-li~ear models can be con~erted to a superficially linear form by transformatlOn, For example, the SImple exponential growth model above can be expressed as

y(u) 20

10

0

3

2

0

4

5

u

Fig. 14.2. CD4+ cell counts: observed and fitted variograms *: sample variogram - - -: sample variance ._; fitted model.

time the data were collected, it was recommended that prophylactic AZT therapy should begin (Volberding et al., 1990), As with Example 5.1 on the protein contents of milk samples, the interpretation of the fitted mean response is complicated by the possibility that subjects who become very ill may drop out of the study, and these subjects may also have unusually low OD4+ counts,

14.1.1

Further reading

Non- and semi-parametric methods for longitudinal data are areas of current research activity, See Brumback and Rice (1998), Wang (1998), and Zhang et ai, (1998) for methods based on smoothing splines. Lin and ~arroU (200?) develop methods based on local polynomial kernal regresSIOn, and Lm and Ying (2001) consider methods for irregularly spaced longitudinal data, 14.2 Non~linear regression modelling

In this ~ection, we consider how to extend the framework of Chapter 5 t~ non-1mear models for the mean response, Davidian and Giltinan (1995) gIVe a much mOre detailed ac t f th' , fr ,coun 0 IS tOPIC, Another useful review, om a somewhat ,dIfferent perspective, is Glasbey (1988). The cross-sectIOnal form of the non-II'n . , ear regreSSIOn model IS

Y;

= ,1(X i; (3) + Zi

i

== 1" , , ,n,

(14,2.1)

where the Zi are mutually independent d . , and are assumed to be normall d' t 'b eVIatI?nS from the mean response y IS n uted WIth mean zero and common

where J-i* (-) = log J.L(-) and f3j = log (31' However, it would not then be consistent with the original, non-linear formulation simply to add an assumed zero-mean, constant variance error term to define the linear regression model,

Y* = J-i*(x; (3*)

+ Z;

i

= 1"", n,

as this would be equivalent to assuming a multiplicative, rather than additive, error term on the unlogged scale. Moreover, there are many forms of non-linear regression model for which no transformation of f.1.(-), (3 and/or x can yield an equivalent linear model. Methods for fitting the model (14.2,1) to cross-sectional data are described in Bates and Watts (1988). In the longitudinal setting, we make the notational extension to 't,T 1: ij

=,...,I/(x ij,'(3)+Zij

]·=l" .. ,nt",' i=l",.,n,

(14,2.2)

. WI'th'III su b'Jec ts , and where, as usual, i indexes subjects and J, OCCasIOns consider two generalizations of the cross-sectional model: 1. Correlated error structures - the sequence of deviations Zij; j = 1, , . , , ni within the ith subject may be correlated; 2. Non-linear random effects - one or more 0 f the regression parameters . f . 'fi c s t ochastic perturbatIOns d0 a (3 may be modelled as subJect-specI ' (14 2 2) (3 is replaced by a ran am population average value, t h us III .' 'fr d' t 'b _ £ h subject am a IS n u vector B i , realised independent IY or eac . ) 'th mean (3 and tion (usually assumed to be multivariate GaUSSIan WI variance matrix

V,e.

. . Cha ter 5 we could treat these P , t'on of randomly Recall that in the linear case conSIdered III t ' th t ntext the assurop I Wo cases as one, because III a co 1 fan mean response varying subject-specific parameters leaves the p.opu as~ructure of the Yij' unchanged and affects only the form of the covarIance

JOINT MODELLING

ADDITIONAL TOPICS 328

alized linear models dkussed in , 1 .Is as for t he gener " , . For non-hnear moe c , j'ff t implications, both for statIstIcal th two CfJ.8eR have c I eren Chapter 7, e ,t t' f the model parameters. ' analysis and for the interprfl ,a JOn 0 C:orrclatcd cr7YJrs , . (1422) with a parametric specifiC<.ltion for the The model of mterest lS ." Z n ) d euo t,e th e' vecZ - (ZI f h ZIJ',Let' ; ,covariance structure ate I "." ..:..' ' ted wl'th the rth subject, and t; - (td,"" t,.ni) the tor a f Z "aSSOCIa , , " . Z f II IJ , 't of measurement times. We fJ.8sume that j 0 ows eorrespon dm/l; vee or . " . ' ' t e GlllJSSl'ari distribution WIth mean zero and varIance matnx a mu It,Ivafla ,, 14,2.1

OJj

OJ

V;(tji a), I hd 'b d . For exploratory analysis, we can use the genera approac escn e . III Chapter 3 to identify suitable parametric families for the mean functlOn itO and the covariance matrices V;(t;ia). H0v.:ever, often the reas.on for adopting a non-linear model will be that a partIcular form for t-t(.) IS suggested by the scientific context in which the data arise, For example, in pharmaco-ldnetic studies, J.L(-} is often obtained as the solution to a differential equation model of the underlying biochemical processes involved (Gibaldi a.nd Perrier, 1982). The form of the covariance matrices Vi(t i ; a) may also be derived from an explicit stochastic model but this appears to be rare in practice. More commonly, the correlation structure is chosen empirically to provide a reasonable fit to the data, its role being to ensure a.pproximately valid inferences about itO. If this is the case, it would be appropriate to develop the model for the V; (,) from the variogram of the ordinary (nonlinear) least squares residuals after fitting the chosen parametric model for IJ,(.). Once parametric forms have been chosen for the mean and covariance structure, the log-likelihood function for the complete set of model parameters (/3, a) follows as rn

L(n,(3) =

LL (n,I3),

(14.2.3)

i

;",1

where -2£; (0, (3)

==:

log IV,(t i ; n) I + (Y i

-

f.idV,(t i ; a) -I (Y i

_

Iti),

(14.2,4)

Y i is the ni-element vector with jth 1 . vecto.r w.ith jth element f.i(Xij; (3). e ement Yij and Itl IS the ni-element Llkehhaad-based inference f II . dples as far the linear mod I °d?WS accordmg to the same general prin. . e S Iscussed in Ch t 5 nan-hnear setting, there is not the same 0 .ap er al~hough, lD the of an explicit solution for (3' cand't' 1 pportumty to explOIt the existence IlOna ·on a.

14.2.2

329

Non-linear random effects

In this second generalization of the cross-section I

l' . . (14 ). a non- Illear regressIon model, equatIOn .2.2 IS assumed to hold conditionall th . ~ h' h y on , e regressIon parameters IJ, W Ie are then allowed to vary randoml b t b' 'th b' y e ween su Jects. Thus, for t h e Z su Ject the parameter (3 is replaced bIB h 'ff t b' ~ I , were the B, for d I eren su Jects are mutually independent In It' . . t G . .. . . Ulvafla ·e ausslan random vanables WIth common mean (3 and variance matrix ViJ. Let b =. (b l ., ... , bm ) d~note the set of realized values of B 1 , ... , B . m The~, the IIkelI~ood functIOn fo~ the model parameters 0 .. 13. and ViJ is obtallled by takmg the expectatIon of the conditional likelihood given b with respect to the assumed multivariate Gaussian distribution of b. Thus, if €;(a,b) = expLi(a,b), where Li(-) is the contribution to the conditional log-likelihood from a single subject, as defined by (14.2.4), then the overall log-likelihood for the random effects model applied to data on m subjects is

L(a,(3, Vj3) = flog

J

€;(o:,b)f(b;{3, ViJ)db,

(14.2.5)

i=1

where 10 denotes the multivariate Gaussian density with mean (3 and variance matrix V/3' Exact likelihood calculations for this model therefore require repeated numerical evaluation of an integral whose dimension is equal to p, the number of elements of (3. While this task is computationally feasible for typical values of p it would require a specially written program for each particular class of non-linear models. Lindstrom and Bates (1990) therefore propose an approximate method of evaluation in which the contribution to the conditional likelihood, Ri(a,b), is replaced by a multivariate Gaussian density whose expectation is linear in b. This allows explicit eval.uation of the integral terms. The resulting algorithm provides a c0n,rputatlOnally fast, albeit approximate, method for a wide class of non-Imear mode~s and assumed covariance structures for the error terms Zij' The method IS implemented in the Splus function nlme O.

14.3 Joint modelling of longitudinal measurements and recurrent events

. h' d'gm which either treats Untl!. now we have worked mostly WIt lD a para 1 . b

times of ~easurements as fixed by the study design or, m lan dO tservath~ · t times are unre a t e 0 t lanaI setting assumes that the measuremen . t' phenomenon ~f interest and from a statistical modellmg per~pec Ive c~ examp e, th erefore be treated as if' they had been fi xed'lD advance:1 Thus,d lor ta the data . d V. '1 (1990) epl epsy a , III Our earlier discussions of the Thall an al . fix d t' 'ntervals b f ents lD e lme-I \V~re presented as counts of the num ers 0 eVl Id have been to treat the PrIor to analysis. An alternative approach wou

ADDITIONAL TOPICS

JOINT MODELLING

330

331

· '''s the response variable. Often, f h . dividuaI seIzures

. actual tImes . 0 t fe . IIId'vldual , even t - t'1mes into interval-counts will be a the aggregatIOn 0 ImfcI strategy I)u t I't represents a discarding of poten' rf'AJ.'lonable data-ana y I, , .' b anch of statistical methodology ha.s . . f f Indeed a major r . IOn. '. ' to deaI wl'th such data, which are vanously tlal m orma d' 't own fight deve Iope m'ut I sprocess da t a, eve nt history data or recurrent event data, known as pOI t la (1993) Survival analysis, also a major ample Aml'ersen e .· , S f, ee, or ex. . ' I 1 I in its own " right r of statlstJeal met laC 0 o g y , refers to the specIal. ca.se a , the outcome 0 f m . t erest is the time of occurrence of a smgle, ae in which

We assume that the data-format for a Single sub' t' . . Jec IS a measurement n and a cou t' sequence 0: J = 1, ... , n at tImes t J·: J' == 1 . . '. ,. , , , n mg process {N(u): 0 :s; u :s; d} which Identifies event-times within the inte I (0 d) ' h occur a fer t ·time d are therefore censored, rva , , Events w h Ie One aim is to provide a bivariate modelling framework which includes the standard methods of choice for separate, univariate analyses of the measurement and event processes. For the measurement p . rocess. we therefore assume that

terminating event. . h' h In this section, we consider briefly how we might analyse data m w IC each subject in a longitudinal study provides both a sequence, of measurements at a set of fixed times and a set of random times at w~lch events of substantive interest occur. The resulting methods are potentially relevant to many kinds of investigation, including the followin~ exa~ples. . In AIDS research, many studies are concerned with usmg a longitudinally measured biomarker such as CD4 cell count or estimated viral load to predict the time to onset of clinical AIDS. In longitudinal clinical trials, inferences about mean response profiles may need to be adjusted to take account of informative dropout; in Chapter 13, we discussed this area under the implicit assumption that dropout occurs at one of a number of prespecified measurement times, but in some applications dropout will be a well-defined event occurring at a precisely recorded time. In psychiatric studies, it may be of interest to model jointly the times of onset of acute episodes and the longitudinal evolution of a general measure of psychiatric disturbance such as the Positive and Negative Syndrome Scale (PANSS) used in the case study of Section 13.8. Models and methods for dealing with data of this kind have become wi~ely studied in recent years. Hogan and Laird (1997a) give an excellent revIew. Other contributions include Pawitan and Self (1993), Tsiatis et at. (1995), Faucett and Thomas (1996), Lavalley and De Gruttola (1996) Hogan and Laird (1977b), Wulfsohn and Tsiatis (1997) Finkelstein and Schoenfeld (~999), Henderson et al. (2000) and Xu and Z~ger (2001). Note that, accordmg to the p t' 1 . the distrib art' ICU ar context, the focus for inference may be on modellmg f th . on a 10 't d' 1 U IOn 0 e tIme to a terminal event conditional ngl U ma measurement se d' . . lono-itudirlal m quence, on a Justmg mference about a "" easurement sequence t 11 f ' . modelling th e Jom . . t evo lUtlOn ' of a 0 a ow or mformatlVe dropout or on m ' Much of the literatu r 't d b ea:'urem~nt and an event-time process. and makes extensive us : ~~ ~an~oove IS motIvated by specific applications, latent stochastic proces t . d m effects, or more generally underlying es, 0 m Uce asso . t' b and event-time processe H ,cIa Ion etween the measurement lation given in Henders~~ ;re; weoglve an outline of the general formuconcerning inference for thee a' \2 00) and note some unresolved issues resu lbng class of models.

(14.3.1)

Go

•

where the mean response J.l(t) is described by a linear model, ZJ is a measurement error, normally distributed with mean zero and variance 7 2 , and WI (t) is a Gaussian stochastic process which can be decomposed into a random effect term and a stationary term, say

as discussed in Chapter 5. For the event process, we adopt a semi-parametric proportional hazards formulation as in the seminal work of Cox (1972), with a second latent stochastic process providing a time-dependent frailty term. Thus, we model the intensity of events as

A(t)

= Ao(t)exp{a(t) + W2 (t)},

(14.3.2)

where Ao(t) is a non-parametric baseline intensity, a(t) is a linear model which describes the proportional effects of explanatory variables measured at time t, and the Gaussian process W 2 (t) has a decomposition comparable to that of WI (t), namely

W2(t)

= d2(t)'U2 + V2 (t).

ASsoclatlOn " between the measurement and event-time processes is induced 'b t 'on for . d'IS t nUl . the two ranby postulating a joint multivariate GaUSSIan dom effect vectors U1 and U2, and a non-zero cross-covanance structure, /'12(U) == Cov{W1(t), W 2 (t for the biva~iate process W~t). Y and N Inference in this model is not entirely straightforward, Us~ d t to denote the measurement and event data respectively, and h l~ok l.eh:~~ t h e complete path of WI (t) for O:S; t < _ d ,we can express t e l e I contribution from a single subject in the form

un,

f(O) =fl (f);Y)E w /y[f 2 (O;NIW2]'

(14.3.3)

d d [. corresponding to the In (14.3.3), the term f l (0; Y) is of the stan;: or~ in Chapters 4 and 5. multivariate Gaussian distribution of Y, as ISCUsse

MULTIVARIATE LONGITUDINAL DATA

ADDITIONAL TOPICS

333

332

. ally mOff~ complicated. It reduces to tlJ(~ standard The second tflrm IS gflner " ". ' t' I hazards model with frailty if the component latent form for a propar ,lOna ,." .., , ,I TV' (.) are indeIJenrlent although tfus ca.,;(~ IS of hmIted ( ) ann rr Z ' prorR..5SeS WI' , interest hflrfl. A simplfl method of estimation is to ~se standarc~ ~ethods to analyse the meH8urement data, including evaluatIOn of the rrlHlImum mean square predictor for Wz(-), then to analyse the event-time data llsing predicted values of Wz(t) in place of the true, unknown values. It will usually be preferable to base inference on the full likelihood function (14.3.3). Faucett and Thomas (1996) and Xu and Zeger (2001) use Bayesian methods, implemented via Markov chain Monte Carlo. Wulfsohn and Tsiatis (1997) obtain maximum likelihood estimators using an EM algorithm, in the special case when W2(t) = "I WI (t) and WI (t) = d I (t)' UI . Henderson et at. (2000) extend the method of Wulfsohn and Tsiatis to a wider elMS of models and note the identifiability problems which can arise when W2 (t) is allowed to be time-varying in conjunction with a nonparametric specification of the baseline intensity Ao (t). Some authors have questioned the wisdom of relying on the assumption of Gaussian random effect or latent process models, on the grounds that the resulting inferences can be sensitive to assumptions which cannot easily ~e checked from the available data. This echos the concerns expressed In ,eha.pter 13 about reliance on specific informative dropout models to adjust mferences about the mean response in incomplete longitudinal measurement ~ata. These issues are raised, for example, in the discussion of Scharfstem et at. (1999).

developed latent "variable models and gen eraI'lze d estlmat" . . . regressIOn analysIs w1th multivariate res mg equatIOns for , ponses, We Illustrate the approach to mit' , " . u Ivanate long't d' I d consHienng a SImple pre-post design 'In who .h . 1,11 ma ata by IC a vector res . 1 at ba.<;eline and then again after a IJeriod f t ponse IS 0 )served . . a ,reatment for two grou receIvmg a placebo, the other a new treatm t Th t . ps, one form en . e 1J3.<;IC model takes the Y,)k

= (3ok + /11kPostJ + "YkPostJ . * Trt + C'Jk. , £.

(

14.4.1)

where Yijk is the response for item k at time t fa ' P . ., f 1) r person l. ost· IS the mdIcator 0 the post-treatment time and Trt indicate th t t J ' , s ' e rea ment group . . to which the lth person belongs (O-placebo; l-active treatment). If we let Y;j = (YijI, Yij2': .. ,Yij30) be the 3D-dimensional vector response, then the model can be wntten as

Yij = (30

+ (31 . Post j + "I . Post) * Trt j + E,),

where (30 = «(301, ... , (3030) is the vector of expected responses at baseline for each o~ the 30 items, f3I = (f3II, ... , 1'1130) is the vector of changes from baselme for the placebo group and "I = ("11,1'2,"" "130) comprise the 30 item-specific differences "Ik in the average change from baseline for the treatment and placebo groups. Here, "I is the parameter vector of interest. In the classic linear model, we would further assume that fi = (E;I' <2)' is a 60 x 1 vector of near zero residuals with 60 x 60 covariance matrix:

14,4 Multivariate longitudinal data

Tmhe~:e ~s a tfr,end in .biomedical and public health research toward outcome ,,,,,,mes 0 IIlcreasmg complex1't Th' b . t d y. IS ook pnmarily considers complexities that arise with repea e measures of I . . common to observe muIr . t a sca ar outcome. But 1t IS also measure the severity of l~~na ~ ou:comes repeatedly, For example, we IZOP comprises 30 different sc t relllc symptoms with the PANSS which symp am reports prod' 3 at each time, In neuroim . ,ucmg a 0 x 1 response vector agmg, we record rep t d' each of which is a high d" ea e Images on an individual , - ImenslOnal vect fl' . . ' exam~le IS time series studies of or 0 . voxe mtenSItJes. A final I gene expreSSiOn arrays where the outcome IS a 10 OOO-dimensl' . ana vector of t' sectIOn, we briefly discuss ext. es Imates of mRNA levels. In this . . enslOns of Ion 't d' appropnate for multivariat gi u mal data analysis methods 'th' e responses W £ WI a 30-dunensional out W · e ocus on the PANSS example rmear models are set up simil come. e also 0 I . l E n y conSIder linear models' nony work on this topic was by O:Brien (1984) and Pocock et al. (19~rl W et al. (1996) and Gray and Br k 0 focused on statistical tests. Dupuis 00 meyer (1998 2000) , have more recently

:rl

where Vl l = Var(€id, V22 = Var(fI2), and VIZ = COV(fiI, EiZ)' The goal of statistical inference is to estimate the regression coefficients as efficiently as possible. Suppose, we observe Y; = (r:'I' r:~)' for each of m persons and let Xi be the three (predictors) by two (times) design matrix for each item: Int Postj Trti

Xt

=

G~

~ti)-

Let Xi = Xt 0 ho be the Kronecker product. with Xt. on ~he dia~ anal and 0 elsewhere. Finally, collect the regreSSIOn coeffiCIents mto.fJ {(f3ok, /1!k' /'k), k = 1, ... , 30}. Then, we can write the model for a smgle person as

Yi

Xi

f3

60 x 1

60 x 90

90 x 1

+

fi, 60 xl

fi '" N (0, V).

ADDITIONAL TOPICS 334

Th b the GUllSs-Markov theorem, the minSuppose V is kn?wn · ~n, t Y f (3 (and the MLE under the Gaussian imum varianced unbIase d estlam e 0 38sumption) is given by

~

I I I I I I I I

0-

1

and we have then '" N ((3, (2:::1 XW- Xi) -1).. . Figure 14.3(a) shows estimates of 'Yk for the 30 Items along wIth confidence intervals for a data set of PANSS scores . t e 95 at 70 • • • • • approxlma for m = 174 schizophrenic patients who partIcIpated in a clImcal trIal .' . . . comparing risperidol (6 mg) to placebo. We can see that for every item, the estimate of 'Yk IS negatIve mdICatmg less severe symptoms for patients receiving risperidone. To obtain these estimates, we used the REML estimate of V as discussed in Section 4.5. The challenge in the analysis of multivariate longitudinal data, particularly with a high-dimensional response, is to choose a sensible approximations to V. Note, even in this simple pre-post design, V is 60 x 60 and has

Cl

are available fr~m joint longitudinal modelling of the 30 items by weighting the data with V-I if V is poorly estimated. One simple approach is to weight the data by Wi = diag (a~ ,... ,(50) -1 , where 8-2 is an estimate of Var(~jk). The resulting estimator

I I I I I I I I

XiS)' W Xi] I

J

Y."k a ., -- X* jkJJik

+ fijk,

e

I I

z

e

I

e

I

e

I I I

e a

I I

a e

I

Cl

I

a

1

e

I I I I I I I I I

e e

e

---e--e e

I I

co

c.!l

e

I

I I I

e

1-I

It! o

I

o

c;)

--

--e---

-e---

------

--&--

---e--

---e--

--&-

-----

--&--

---e--

-e-

t--

Ie I I I I

--

I

e

--+-

o'.

Here, we are accounting for th 'bl' but ignoring correlation e .pOSSI y different variances of the items A better strategy m~g~~ngtltems ~t the same or different times. model with random effects forein~i~~~slder an hierarc~ical random effects f uals but also for Items. Returning to the basic model we might , assume or the first level, that

B

I I I

i=1

~ I,' [t, X:W (Y; - XiSHY; -

B

I I I I I

is approxim~~ely nor.mal with mean (3 and variance which can be estimated by the empIrIcal vanance formula

Va;(Jiw)

B

1I

~w = (f:x:wxt)-I tx:WYi = 10 1fx:wYi, i=1

e

I

z

I --e-I I --e-I l-e--e-I I I --&-

11.

e

I

(6~) = 1770 parameters. We are likely to lose any gains in efficiency that

.=1

e

e

I

I

e o

"f

~

-

--e--

--

--e--

-&--

U)

q

l()

q

c:i

0

<7

'I

B

ADDITIONAL TOPICS

336

where (iik that

= ((iO'h ,(3)'h' lik)'

In a second level, we assume, for example, bI

+

(j :3 x 1

(i;k 3x1

3 x 1,

-, th lat1'on arid item average regression coefficients, 15 k. is the were h (i IH (~ pOpll deviation of the eoeffieients for item k from fJ and ,), 18 the deVIat.lOn of subject i's coefficients from (J, Note, this parti::ular second ,level m~de,1 a.'l~umes > '

-

,

'

,

there are no interactions of item and subject; the subject dev1atlOn IS the same for all items. To complete the specification, we can assume Ok and bi are independent, mean zero Gaussian variables with variances D/j and Db, rp,spectively. In some applications, we might allow only a subset of the coefficients to have random effects so that Do or Db might be degenerate. This multilevel formulation provides a lower-dimensional parameterization of the variance matrix Var(Y) = V, To see the specifics, we write

Yt

Xi

60 xl

60 x 90

6 90 x 1

+ +

fi 60 xl,

where 130 is a 30 x 1 vector of ones and 6 = (6i, 6&, . , , ,6~0)" Then, we have Yt == Xj(J ® 130 + f; , = X; (6 + bi QS1130) + fi,

<

where Var(fi) == Xi(D/j ® 130 + Db ® 1301~0)X: + V and V = Var(fi)' If we assu.me V == diag (ar,... , a~o, ar, ... , a50) , then the only correlation among Ite~s at the same or different times derives from shared random e(ffec)ts. ThIs model reduces the parameters necessary to specify V from

6~

== 1770 down to

G) + G) +

30 = 36.

We can estimate the fixed iF t f {3ters by restricted m' . e. ec s a and the covariance parameBayes' estimates of ~Imum likelIhood. ~lso of interest are the empirical k k == 1 30 (3 + Ok, the populatIOn average coefficients for item , , ... , . These can be estimat d b ' method as follows First bt' th . e y a SImpler approximate Vk == Var{i3k 1(3,,) 'k == 1 ,0 ;~n If e maxImum likelihood estimates Sk and mean j3 and varia~ce D ' 't'h" (3~ we sume t~e (3k s are independent with rv , dthe empmcal .. estImate is given by II" en k N((3'D/j+Vi) k an Bayes'

a.:

~k ==

(Do

+ VK)-l[D/jSi + Vk .8j.

Figure 14.3(b) shows these estimate 30PANSS items. COmparing thes s for the treatment effect for the can see the value of borrow. e results to those from Fig. 14.3(a) one mg strength across items. '

Appendix Statistical background A.I Introduction This appendix provides a brief review of some bas'IC S t,at'IS t'lcaI concepts used through~ut the b~ok. Readers should find sections A.2 and A.3 useful fo: the matenal that IS presented in Chapters 4, 5, 6 and 13 which deal wIth methods for analysing data with a continuous response variable. These fou: ch.apters als~ make extensive use of the method of maximum likelihood, whIch IS the subject of section AA. Sections A.5 and A.6 outline the basic concepts of g~neralised li~ear mo.dels, which provide a unified methodology for the analySIS of data WIth contmuous or discrete responses. This material is most relevant for Chapters 7-11.

A.2 The linear model and the method of least squares In many scientific studies, the main goal is to predict or describe an outcome or response variable, in terms of other variables which we will refer to as predictors, covariates or explanatory variables. The explanatory variables can either be fixed in advance, such as the treatment assignment in an experiment, or uncontrolled, such as smoking status in an observational study. Regression analysis has been used since the early nineteenth century to describe the relationship between the expectation of a response variable, Y, and a set of explanatory variables, Xj: j = 1, ... ,po The linear regression model assumes that the response variable and the explanatory variables are related through

where r; is the response for the ith of m subjects and Xij is the value of the jth explanatory variable for the ith subject. Usually, XiI = 1 for every subject so that (3 is the intercept of this regression model. The €j are rando~ variables ~hich are assumed to be uncorrelated with each other, and to have E(fi) = 0 and Var(fj) = u 2 • This implies that the first 2 two moments of Yi are E(Yi) = m~~ and Var(li) = u , whe:e ~i ~d {3 are P-element vectors. Note that these statements involve no dlstflbutlOnal

338

STATISTICAL BACKGROUND

MULTIVARlATE GAUSSIAN THEORY 339

, b t 1': although a comman assumption is that the , joint , assumptIOn a au" , It" te Gaussian, Models of thIS kmd f Y, Yo IS mu Ivana , , 'b' dlstn utlOn a I, ' , , , , m h 'mental and observational studies, In have been widely used III bot ex~ef1 s fact linear models include as special case dummy variables, used , " h'ch the x·Jare , (1) the analysis of vanance, III w I " h 11 t'on of experimental umts to treatments, to mdlCate tea oca I , 1e regreSSIOn, ' I'n which the x·J are quantitative variables, . (2) multlp .

for ¢ which are linear combinations of the Y: Th' . . least-squares estimation is known as the G ,. MIS °kPtlmahty property of auss- ar ov Theo The constant variance, 0'2, of the (. is II rem. estimated by 1 usua y unknown but can be

'f ' e , I'n which the x·J are a mixture of contmuous (3) the analySIS a covananc and dummy variables,

Many books, including Seber (1977) and Draper and Sm'th (1981) , , f Ieast-squares estimation, I , give more d et al'Ied d'ISCUSSlOns 0

Each regression coefficient, !3j, describes the c,hange in the e~pected value of t he response van'able , Y , per unit change of Its correspondmg explanatory variable, Xj, all other variables held fixed, , The method of least squares (Legendre, 1805; Gauss, 1809) IS a longstanding method for estimating the vector of regression coefficients, {3. The idea is to find an estimate, /3 say, which minimizes the sum of squares m

RSS =

l)Yi - x~(3)2. i=1

This procedure is formally equivalent to solving

which gives rise to

m

2

0- =

L:(J!i - X:,8)2/(m - p), i=l

A.3 Multivariate Gaussian theory This section reviews some of the important results for multivariate Gaussian observations, For more detailed treatment of multivariate Gaussian theory see, for example, Graybill (1976) or Roo (1973). A random vector Y = (Yi"." Yn ) is said to follow a multivariate Gaussian distribution if its probability density is of the form

f(y; J.t, V) = (21r)-n/21 V

1- 1/ 2 exp{ -(y -/t)'V-I(y -/t)/2},

where -00 < Yj < 00, j = 1, ... , n, As in the univariate case, this distribution is fully specified by its first two moments, IL = E(Y) and V = Var(Y), A convenient shorthand notation is Y rv MVN(IL, V). The following properties of the multivariate Gaussian distribution are used extensively in the book: 1. Each 1j has a univariate Gaussian distribution.

~ternative, and more familiar, form of the least-squares estimate /3 is obtam~d by defining an m-element vector Y = (Y1 , .. " Ym ) and an m by

An p

matrIX X with ijth element

Xij'

Then,

i3 = (X' X)-l X'Y, The least-squares estimate /3 . .

Firstl 't· b' '.' enJoys many deSirable statistical properties. ,y, ,I IS an un lased estimator of (3 that is E(f;,) = r.l It . matrIX IS ' 'fJ fJ' S varIance Var(,B)

= 0'2 (X' X)-I,

Secondly, for any vect fk :i. = a' ?:J has th II or a o. nown coefficients, if we let ¢ = a' f3 then 'P IJ e sma est possIble var' lance amongst all unbiased estimators

2. More generally, if ZI = (Y1 , •.. , Yn1 ) with nl < n, then Zl also follows a multivariate Gaussian distribution with mean ILl = (/-ll, . , . ,/-lnl) and covariance matrix Vi which is the upper left nl by nl submatrix of V, 3. If, additionally, Z2 = (Yn1 +1 , •. " Yn ), then the condition~~istribution of Zl given Z2 = Z2 is multivariate Gaussian. Its conditional mean vector is

and its conditional variance matrix is

where J.t2 = (ILnl +I, ... , Jin) and V is partitioned as

_(Vii Vi2). V{2 V22

V-

STATISTICAL BACKGROUND

LIKELIHOOD INFERENCE 341

then BY is also distributed . frank m < n, . t . . an m x n matnx. 0 with mean - vee.tor B ,... I I and vanance rna fiX 4 If B IS . a multivariate GaussIan,

340

;VB'. )'V-I(y - p,) has a chi-squared . bl U - (Y - tt . U 1/2 T h random vana e d "'''-n' 5. distrzbutzon e, . WI'th n degrees of free om, which we wnte as , l'h od inference b A.4 LIke I 0 .ft t' of the probability or pro a(J) . . d d . b ed on a specI ca JOn d t This expression, f(y; , IS III exe Likelihood inference IS as b'lity density for the observed a a, Y'(J Once the data are observed, the I f nk n parameters, . . (J Th by a vector..0 u. f(·) nowthat are unk nown to the investIgators are. en, only quantItIes III . . th function the likelihood functwn for 8 IS e

Example A.I. Consider YI and Y2 to be two independent binomial observations with sample sizes and probabilities (nI,PI) and (n2,P2), respectively. A typical example for which this setting is appropriate is a clinical trial where two treatments are being compared. Here, Y; denotes the number of patients responding negatively to the treatment i, to which ni subjects were assigned, and Pi is the corresponding probability of negative response for i = 1,2. It is convenient to transform the parameters (PI, P2) to (O I ,fh), where PI(I- P2 ))

0 1 = log ( P2(1 -pI)

This leads to a likelihood function for

L(8IYI'Y2 ) CXPIYl (1 -PI )n 1 -

L(O/y) = f(Yi(J)·

(~ 1 - PI

Note that the likelihood is interpreted as a f une t'JOn of (J , with Y held fixed

=

at its observed valu~. . . f (J is the value, 0, which maximizes The maximum lzkelthood esh:nate o . 'th That is for any the likelihood function or, eqUIvalently, Its logan m. , value of 0,

= exp{0 1

L(O Iy) ~ L(91 y). According to the likelihood principle, 0 is then regarded as the v~lue ~f ~ which is most strongly supported by the observed data. In practice, obtained either by direct maximisation of log L, or by solving the set of equations

8(8) = 8l0gL/&O = O.

(A.4.1)

The function 8(0) is known as the score function for (J. Very often, numerical methods are required to evaluate the maximum likelihood estimate. Popular methods include the NeIder and Mead (1965) simplex algorithm for direct maximisation of log L, or Newton-Raphson iteration for solution of the score equations (AA.l). The maximum likelihood estimate is known to enjoy many optimality I.?roperties in large samples. In particular, under mild regularity conditions, o is asymptotically unbiased, and asymptotically efficient in the sense that the elements of 0 are estimated with the smallest possible asymptotic variance~ of a~y. as~mptotically unbiased estimators. The asymptotic variance rnatnx of 0 IS given by the expression

v = {-E(8210g Lj802n -I. The matrix V-I. IS

I k '. . a so nown as the Fzsher znjormation mat'T"tx for

(J.

,

Yl ( )

~ 1- P2

YI

) Y2

+ 02YI + 02Y2 -

(J

= (0 1 ,

(

2

)

of the form

P2Y2 (1 -P2 )n2-Y2

(1- ptJn l (1 - P2)"2

nI log (1

+ e61 +0

2 )

-

n 2 10 g (1

+ e62)}.

The parameter 01 is called the log-odds ratio. A zero vall!e for 01 den~tes equality of PI and P2· The maximum likelihood estimate, (J, can be denved as the solution of the pair of equations, nI exp(B I + ( 2 ) = YI _ nlPI = 0, YI - 1 + exp(B I + ( 2 ) and YI

+ Y2

nI exp(OI + ( 2) _ n2 exp(02) - 1 + exp(B I + ( 2) 1 + exp(B2)

= YI + Y2 - nlPI - n2P2 = o.

This gives

o~I --

») ,

10 (YI(n 2 - Y2 g Y2(nl - YI)

B~ = log ( Y 2 ) . n2-Y2

2

Fisher's information matrix for B can be obtained by straightforward algebra, and is

V- I

-

-

+ (2 ) 2 + exp(Ol + (2)} (0 ) nl exp 01 + 2 {I + exp(OI + (2)}2

({I

) nI exp(01 + O2 (0 + 0 )}2 {I + exp I 2

nl exp(Ol

nI exp {I

(LJ

(11

+ exp(OI

+ LJ

)

° + 172

)}2 2

+

) .

n2 exp(02)

{I

+ exp(B2 )}

2

~, u er left entry of V, namely The asymptotic variance of 0 1 IS the pp (0 )}2J-I I , [nl exp(Ol + (2)/{1 + exp(OI + O2)} 2J- + [n2 exp(02)/{1 + exp 2

343

· h can be estimated consistently by whw 1

Y; + n]

1

+-.!-+ __

1 - lil

112

>

•

>

"

the Pearson's chi-squared test statistic, The nu b fd ' exampIe IS ' one, because the sub-mod I hm er 0 egrees of freedom in thIS the unrestricted model has two. e as one parameter, whereas

112 - Y'2

. III . t h'18 exam pie means that both n] and n'2 are The word 'asymptotw' large, . . i by fitting a series of sub-models wlnch Likelihood Illfcrence procee( s , , '" ' . means thoa tach sub-model III the ' sequence are nested ThIS ec h IS'contamed b withm the prevIOus one, I n Example.A, 1, an interestmg , hypot eSIS, or su Th · that 01 = 0, corresponding to equalIty, 1of PI and' P2· e mo de,I t 0 t es t IS t' '£r b t thl's sllb-model and the full model Wit I no restnc JOn on dluerence e ween , ,.. , 01 can be examined by calculating the likelihood ratw test statzstzc, which is defined as >

GENERALIZED LINEAR MODELS

STATISTICAL BACKGROUND

342

,

' .

G = 2{log L(O Iy) -log L(Oo Iy)}, where 00 and iJ are the maximum likelihood estimates of (J under the null hypothesis or sub-model, and the unrestricted model, respectively. Assuming that the sub-model is correct, the sampling distribution of G is approximately chi-squared, with number of degrees of freedom equal to the difference between the numbers of parameters specified under the sub-model and the unrestricted model. An alternative testing procedure is to examine the score statistic, S( (J), as in (A.4.l), The score test statistic is

S(Oo)V(Oo)S' (0 0 ), whose null sampling distribution is also chi-squared, with the same degrees of freedom as for the likelihood ratio test statistic. In either case the submodel is rejected in favour of the unrestricted model if the test s'tatistic is too large, Example A.2. (continued) Suppose that we want to test whether the probabilities Pi and P2 from two treatment ' d ' I Th' " , s are 1 entICa, IS is equivalent to testing the sub-model :I~h:~ := NOlte that the value of ()2 is unspecified by the null hypothesis n ere ore las to be estimated The al eb ' £ . . ratio test stat' t' G' .' g ralC orm of the hkehhood in application~s i~c c IS c~~p~cated, and we do ~ot give it here, although n has the simple for: easl y e evaluated numencally. The SCore statistic

t

(YI - EI)2 + (Y2 - E 2)2 EI E2 where E; = ni(YI + Y2)/(nI + n ) 's null model that the two gr 2 I the expected value for Y; under the oups are t h e same Thi t . . . . s s atlstlC IS also known as

A.5 Generalized linear models Regression models for independent discrete and t' 'fi d , . con ·1llUOUS responses have been Ull! e under the class of generaliZed linear d l GL~l . .' moes, or lV s (McCullagh and NeIder, 1989), thus providing a common bod f t t' t' a l' d' Y 0 sa ,IS ,leal met h 0 d a Iogy lOr Iuerent types of response Here we " t h . I' " ., review. I' sa lent features of thIS class of models, We begin by considering two particular GLMs, logistic and Poisson reg:ession models, and then discuss the general class, Because GLMs apply to mdependent responses, we focus on the cross-sectional situation as in sec~ion A.2, w~th a si~gle response Yi and a vector Xi of P expla~atory vanables assOCIated WIth each of m experimental units. The objective is to describe the dependence of the mean response, J1.; = E(Y;), on the explanatory variables.

A,5.1

Logistic regression

This model has been used extensively for dichotomous response variables such as the presence or absence of a disease. The logistic model assumes that the logarithm of the odds of a positive response is a linear function of explanatory variables, so that

Iog Prey; Pr(li

=

1) = Io gf.Li- - =x'(3 . i

= 0)

1 - Jii

:ax.

iFigure A.I shows plots of Pr(Y = 1) against a single explanatory able x for several values of j3, A major distinction between the lOgIstIC regres~ion model and the linear model in Section A,2 is that the linearity applies to a transformation of the expectation of Yi, in this case the log odds transformation rather than to the expectation itself. Thus, the regression coefficients (3 'represent the change of the log odds of the response , variable per unit, change of x. Another feature 0 f t h e d'ch ~ 0 t om?us response . variable is that the variance of li is completely determmed by Its mean, Jl•. Specifically,

This is to be contrasted with the linear model, where Var(l'i) is usually asSumed to be a constant, u 2 , which is independent of the mean.

STATISTICAL BACKGROUND

344

----.~

1.0

GENERALIZED LINEAR MODELS

A.5.3

..,.,

345

The general class

Linear, logistic and Poisson regression models are all special cases of generalized linear models, which share the fOllowing features. First, the mean response, J.li = E(Yi), is assumed to be related to a vector of covariates, x, through

0.8

0.6 pIx) 0.4

For logistic regression, h(J.li) = log{J.li/(1 - J.li)}; in Poisson regression, h(Jli) = log(Jli)' The function h(.) is called the link function. Second, the variance of Yi is a specified function of its mean, J.li, namely

0.2

0.0 L...--,-_-.--_:--;~ 2 4 -4 -2 o

Var(Y;)

X

Fig. A.!. The logistic model, p(x) = exp(#x)/{1 +exp(#x)}; - : .........: (3 = 1; - - -: # = 2.

#

= -0.5;

A.5.2 Poisson regression Poisson regression, or log-linear, models are applicable to problems in which the response variable represents the number of events occurring in a fixed period of time. One instance is the number of seizures in a given timeinterval, as in Example 1.6. Because ofthe discrete and non-negative nature of count data, a reasonable assumption is that the logarithm of the expected count is a linear function of explanatory variables, so that

= Vi = ¢V(J.li).

In this expression, the known function v (.) is referred to as the variance function; the scaling factor, ¢, is a known constant for some members of the GLM family, whereas in others it is an additional parameter to be estimated. Third, each class of GLMs corresponds to a member of the exponential family of distributions, with a likelihood function of the form

f(yd = exp[{Yi f}; - t/J(fh)}/r/> + C(Yi,¢)]'

(A.5.l)

The parameter 0i is known as the natural parameter, and is re~ated to ?,i through Jli = 8t/J(Od/80i . For example, the Poisson distribution IS a speCIal case of the exponential family, with

log E(Yi) == x~f3.

~ere, the regression coefficient for a particular explanatory variable can be mterpreted as the logarithm of the ratio of expected counts before and after a o~e unit increase in that explanatory variable, with all other explanatory variables held constant. The term 'Poisson' refers to the distribution for counts derived by Poisson (1837), p(y) == exp( -J.l)/lY Iyl, y

= 0,1, "

i~

Var(Yi)

= E(Yi) == exp(x~,8).

. me . I u d e the Gaussian t or Normal Other distributions within this family gamma distribution, the binomial distribution and the two-parame er distribution. . f3 n be estimated by solving In any GLM the regression coeffiCients, ,ca the same estimating equation,

..

istic At's logt. hregreSSiOn, the assumption that Yi follows a Poisson distribuIon Imp les t at the varianc f v · d . th e mean and vanance . e a same .l i IS etermmed by its mean. In this case, are the

,

Oi=logJli' t/J(Oi) = exp(Oi) , e(Yi,¢)=-log(Yi')' r/>=1.

S(f3) =

f (:;)

I

V;I{y; - J.li(f3)} = 0,

(A.5.2)

i=I

) is the derivative of the logarithm where Vi = Var(Yi). Note that SCf3. ~ which is the maximum likeof the likelihood function. The solut~on f3 : ly reweighted least squares; l 'h . b bt' ned by Iterat Ive . I I ood estimate, can e 0 8J d t 'led discussion. Finally, marge See McCullagh and NeIder (1989) for a e 8J

QUASI-LIKELIHOOD

STATISTICAL BACKGROUND 346

samp Ies

,

'b t'

~ follows a Gaussian dlstrI u JOn

f3

V

=

(

~ (Dlti)' v:-

£-

Df3

with mean {3 and variance

D1.li) - I '8/3

then a simple calculation gives the asymptotic variance matrix of i3 as

4>

(A.5.3)

I

347

8 m

(

Xi

)-1

exp(x~f3)x:

i=1

.. I b V which is obtained by replacing This variance can be eHtImate( Y in the expression (A.5.3).

13 with

i3

A.6 Quasi-likelihood . rt f the GLM family is that the score function, One Importadnt prlope ~h~ mean and the variance of the }i, Wedderburn S(f3) depen s on y on . (A 5 2) ". ~an (1974') was the first to point out that the estimati~g equatIOn therefore be used to estimate the regression coefficIents for any chOIce~ of link and variance functions, whether or not they corre~pond to a ~artIcu lar member of the exponential family. The name quasz-score funetwn was coined for S(f3) in (A.5.2), since its integral with respect to 13 can be thought of as a 'quasi-likelihood' even if it does not :o~stitute a ~rop:r likelihood function. This suggests an approach to statIstIcal modellIng 10 wWch we make assumptions about the link and variance functions without attempting to specify the entire distribution of }i. This is desirable, since we often do not understand the precise details of the probabilistic mechanisms by which data were generated. McCullagh (1983) showed that the solution, ~, of the quasi-score function has a sampling distribution which, in large samples, is approximately Gaussian with mean (3 and variance given by equation (A,5.3).

Example A.2. Let YI ",., Ym be independent counts whose expectations are modelled as

/3

Thus, by comparison with (A.5,3) the variance of is inflated by a factor of 4>. Clearly, ignoring over-dispersion in the analysis would lead to under-estimation of standard errors, and consequent' over-statement. of significance in hypothesis testing. In the above example, 4>E(Y,) is but one of many possible choices for the variance formula which would take account of over-dispersion in count data. Fortunately, the solution, is a consistent. estimate of (3 as long as h(lti) = x~l3, whether or not the variance function is correctly specified. This robustness property holds because the expectation of S(f3) remains zero so long as E(}i) = lli((3), However, the asymptotic variance matrix of has the form

/3,

/3

V,

~V

(t.(:)'

v;lVal(v;)v;l: ) V

Note that V2 is identical to V in (A.5.3) only if Var(¥;) = Vi· When this assumption is in doubt, confidence limits for f3 can be based on the estimated variance matrix

11, = Y

(t.(:)'

~,({3)}'V;l: ) Y,

V,-l{V; -

(A.6.1)

/3

evaluated at (:J, We call V a model-based variance estimate of and V2 a robust variance estimate in that V2 is consistent regardless of whether the specification of Var(Yi) is correct.

logE(¥;) = x:f3,

i

= 1, ... , m,

In bio~edical studies, frequently the variance of Vi is greater than E(Yi), the varJa~ce expression induced by the Poisson assumption, This phenomenon IS known as over d' . 0 ne way to account for thIS . IS . to - zsperswn, assume that Var(Y.) - -I.E(Y.) h -I. . . N ' - '+'. i ": ere '+' IS a non-negative scalar parametel. ate that for the POlsson d1stribution 1> = l' if we allow -I. > 1 we no longer hav d' t 'b . " 'P, 1s.n utton from the exponential family. However if we defi ne (3' as t hee aSolutlOn to '

i=I

An alternative to the variance function Vi = 4>lli,

' t 'b tion (Breslow, 1984), is the form induced by the poisson-gamma d IS n u Vi = Iti(l

+ lti¢J) ,

1\,

m

L>i{Yi - exp(x~t3)} == 0

Example A.2. (continued)

'

, 'is difficult to choose empiriWith a limited amount of data avaIla~le, and McCullagh, 1993). g cally between these two variance functIOns ( llUl TT helps to alleviate 'ance estlmat e, Y2, The availability of the rob ust varI , f, ula in larger samples. of varIance orm ' the concern regarding the ch Olce

STATISTICAL BACKGROUND 848

It is interesting to note t~at in the special case where Iti == It and hence log f-li == 13o, the estimate '\12 reduces to m

~ - 2 L,..(Yi - Y) 1m 2 ,

i=l

Bibliography

the sampling variance of ~o == logY (Royall, 1986).

Aerts, M. ~nd. Claeskens, G. (1997). Local polynomial estimation in multiparameter lIkelIhood models. Journal of the American Statisti I A . t· 92, 1536-45. ca ssocza wn, Afsarinejad, K. (1983). Balanced repeated measurements designs. Bio etrik 70, 199-204. m a, Agresti, A. (1990). Categorical data analysis. John Wiley, New York. Agresti, A. and Lang, J. (1993). A proportional odds model with subject-specific effects for repeated ordered categorical responses. Biometrika, 80, 527-34. Agresti, A. (1999). Modelling ordered categorical data: recent advances and future challenges. Statistics in Medicine, 18, 2191-207. Aitkin, M., Anderson, D., Francis, B., and Hinde, J. (1989). Statistical modelling in GLIM. Oxford University Press, Oxford. Albert, J.H. and Chib, S. (1993). Bayesian analysis of binary and polytomous data. Journal of the American Statistical Association, 88, 669-79. Alexander, C.S. and Markowitz, R. (1986). Maternal employment and use of pediatric clinic services. Medical Care, 24(2), 134--47. Almon, S. (1965). The distributed lag between capital appropriations and expenditures. Econometrica, 33, 178-96. Altman, N.S. (1990). Kernel smoothing of data with correlated errors. Journal of the American Statistical Association, 85, 749-59. Amemiya, T. (1985). Advanced econometrics. Harvard University Press, Cambridge Massachusetts. Amemiya, T. and Morimune, K. (1974). Selecting the optimal order o.f ~olyno mial in the Almon distributed lag. The Review of Economics and Statzstzcs, 56, 378-86. Andersen, P.K., Borgan, 0., Gill, R.D., and Keiding, N. (1993). Statistical models based on counting processes. Springer-Verlag, New York. Anderson, J.A. (1984). Regression and orde~ed categorical variables (with Discussion). JournoJ, d{ the Royal Statistical Soczety, B, 46, 1-30.

BIBLIOGRAPHY BIBLIOGRAPHY 350

. mponcnt models with binary A'tk' M (1985). VarIance co ..' t B 47 AnderfolOn, D.A. and I .m, .. " al 01 the Royal Statz8tzcal 80cu ',I), , , . interviewer vaTlahlllty. Journ reRpon s C . ' " 203 10. . d reg r c8SlUTt. Au mtroduction ~) Pl t transfo1'1natlO ns an p Atkinllon, A.e, (198.1. 0 II" " ~7!alysis. Oxford University' ress, to ,qraphical mdhod.~ of diagTto,~tzc regresszon )

Oxford. . h app I'Ica t'1011 • t t rreilltcd data Wit ' , , A (1994) Lo~iHtic re~resHlon ,or au oco AzzzaIml,. . . 81 767 75 to repeated me8J!ureH. Biol1wtnka" , A rc resentation of the joint distribution of re/lponses to n Bahadur, R.R, (1961). P 't lysi8 and prediction (cd. H. Solomon), 't In Studzes on I em ana f dichotomous I ems. h t' I Studies in the Social Sciences VI, Stan ord pp. 158-68, Stanford Mat ema Ica . University PresR, Stanford, California.

351

Breslow, N.E. and Day, N.E. (1980). Statistical methods in cancer research, Volume J. lARC Scientific Publications No. 32. Lyon. Breslow, N.E. and Lin, X. (1995). Bias correction in generalized linear mixed models with a single component of dispersion. Biometrika, 82, 81-91. Brumback, B.A. and Rice, J.A. (1998). Smoothing spline models for the analysis of nested and crossed samples of curves (C/R: p976-994). Journal of the American Statistical Association, 93, 961-76. Carey, V.C. (1992). Regression analysis for large binary clusters. Unpublished PhD thesis, Department of Biostatistics, The Johns Hopkins University, Baltimore, Maryland. Carey, V.C., Zeger, S.L., and Diggle, PoOL (1993), Modelling multivariate binary data with alternating logistic regressions. Biometrika, 80, 517-26.

Barnard, G.A. (1963). Contributions to the discussion of Professor Bartlett's paper. Journal of the Royal Stati.~tical Soezety, B, 25, 294.

Carlin, B.P. and Louis, T.A. (1996), Bayes and empirical Bayes methods for data analysis, Chapman and Hall, London.

Bartholomew, D.J. (1987). Latent variable models and factor analysis. Oxford University Press, New York.

Chambers, J.M. and Hastie, T.J. (1992). Stati8tical models in S. Wadsworth and Brooks-Cole, Pacific Grove.

Bates, D.M. and Watts, D.C. (1988). Nonlinear regression analysis and its Applications. Wiley, New York.

Chambers, J.M., Cleveland, W.S., Kleiner, 8., and Thkey, P.A. (1983). Graphical methods for data analysis. Wadsworth, Belmont, California.

Becker, R.A., Chambers, J.M., and Wilks, A.R. (1988). The new S language. Wadsworth and Brooks-Cole, Pacific Crove.

Chib, S. and Carlin, B. (1999). On MCMC sampling in hierarchical longitudinal models. Statistics and Computing, 9, 17-26.

Besag, J. (1974). Spatial interaction and the statistical analysis of lattice systems. Journal of the Royal Statistical Society, B, 36, 192-236.

Chib, S. and Greenberg, E. (1998). Analysis of multivariate probit models. Biometrika, 85, 347-61.

Billingsley, P. (1961). Statistical inference for Markov processes. University of Chicago Press, Chicago, Illinois.

Clayton, D.G. (1974). Some odds ratio statistics for the analysis of ordered categorical data. Biometrika, 61, 525-31.

Bishop, S.H. and Jones, B. (1984). A review of higher-order cross-over designs. Journal of Applied Statistics, 11, 29-50.

Clayton D.G. (1992). Repeated ordinal measurements: a generalis~ esti~t, . ing equation approach. Med~cal Researc h Counct'I Biostatistics Umt Technacal Reports, Cambridge, England.

Bishop,Y.M.M., Fienberg, S.E., and Holland, P.W. (1975). Discrete multivariate analysis: theory and proctice. MIT Press, Cambridge, Massachussetts. Bloomfield, P. and Watson, G.S. (1975). The inefficiency of least squares. Biometrika, 62, 121-28.

~oo~h, J.C, and Hobert, J.P. (1999). Maximizing generalized linear mixed model hkeh.ho.ods with an automated Monte Carlo EM algorithm. Journal of the Royal Stattstlcal Society, B, 61, 265-85. Box, G.P, ~nd Jenkins, C.M. (1970). Time series analysis _ forecasting and control (revised edn). Holden-Day, San Francisco, California. Breslow, N.E. (1984). Extra-Poisson variation in log linear models. Applied Statlstlcs, 33, 38-44.

rBreslow,' N.E. d

and Clayton D G (1993) A . ' . . . pprmomate inference in generalized mear mlxe models. Journal of the American Statistical Association, 88, 125-34.

Cleveland, W.S. (1979). Robust locally, w~ighted r~gr~ion a~~9~;~othing scatterplots. Journal of the American Statzstzcal ASSOCIatIOn, 14, Cochran, W.G. (1977). Sampling techniques. John Wiley, New York. y data Biometrics, 46, d I fi b' Conaway, M.R. (1990). A random effects mo e or mar . 317-28. d 'ifi ce in regre.ssion. Cook, D. and Weisberg, S. (1982). Residuals an an uen Chapman and Hall, London. amples (with d Copas, J.B. and Li, H.G. (1997). Inference for ~n-;:5~~5~ Discussion). Journal of the Royal Statistical Soctety, " . 06 Second supplement published 182? Courcier. Reissued with a supplement, 18 . 929 576-9 in A source book In A portion of the appendix was translated, 1 ,pp.

BIBLIOGRAPHY BIBLIOGRAPHY

352 b H A Ruger and H.M. Walker, McGraw , DESmith ed. trans. y . . y: k mathematzcs, . d' 1959 in 2 volumes, Dover, New or. H'Il New York' reprmte ' d t Chapman and Hall, London. I " Cox, D.R. (1970). Analysis of bmary a a. .' . d i d life tables (with dIl'ICUsslOn). Journal of Cox, D.R. (1972). RegressIOn mo e S an _ the Royal Statistical Soczety, B, 74, 187 200. . " , ' · t fstical analysIs. Statzstzcal sczence, 5, I d Cox, D.R. (1990). Role of rna e SIllS a I J.

169-74.

353

Diggle, P.J. (1990). Time series: a biostatistical introduction. Oxford University Press, Oxford.

•

.

Cox, D.R. and MIller, Wiley, New York.

H D (1965). The theory of stochastic processes. John .,

1984). Analysis of survival data. Chapman and Hall, Cox, D.R. an d 0 a kes, D . ( London. J (1989) Analysis of binary data. Chapman and Hall, II E .. Cox, .R . an d Sne, . London.

n

Diggl e , P.J. and Kenward, M.G. (1994). Informative dropout in long't d' I d t . ( . h d' .) A . I U ma a a analySIS WIt Iscusslon. pphed Statistics, 43, 49-73. Diggle, P.J. (1998). Dealing with missing values in longitudinal studies. In Advances zn the statzstzcal analysis of medical data (ed. B.S. Everitt and G. Dunn), pp. 203-28. Edward Arnold, London. Diggle, P.J. and Verbyla, A. (1998). Nonparametric estimation of covariance structure in longitudinal data. Biometrics, 54, 401-15. Draper, N. and Smith, H. (1981). Applied regression analysis (2nd edn), Wiley, New York. Drum, M.L. and McCullagh, P. (1993). REML estimation with exact covariance in the logistic mixed model. Biometrics, 49, 677-89.

Cressie, N.A.C. (1993). Statistics for spatial data. Wiley, New York.

Dupuis Sammel, M. and Ryan, L.M. (1996). Latent variable models with fixed effects. Biometrics, 52, 650-63.

Crouch, A.C. and Spiegelman, E. (1990). The evaluation of integrals of the form J f(t) exp(_t 2 ) dt: application to logistic-normal models. Journal of the American Statistical Association, 85, 464-69.

Emond, M.J., Ritz, J., and Oakes, D. (1997). Bias in GEE estimates from misspecified models for longitudinal data. Communications in Statistics, 26, 15-32.

Cullls, B.R. (1994). Contribution to the Discussion of the paper by Diggle and Kenward. Applied Statistics, 43, 79-80.

Engle, R.F., Hendry, D.F., and Richard, J.-F. (1983). Exogeneity, Econometrica, 51, 277-304.

Cul1is, B.R. and McGilchrist, C.A. (1990). A model for the analysis of growth data from designed experiments. Biometrics, 46, 131-42.

Evans, J.L. and Roberts, E.A. (1979). Analysis of sequential observations with applications to experiments on grazing animals and perennial plants. Biometrics, 35,687-93.

Davidian, M. and Gallant, A.R. (1992). The nonlinear mixed effects model with a smooth random effects density. Department of Statistics Technical Report, North Carolina State University, Campus Box 8203, Raleigh, North Carolina 27695. Davidian, M. and Giltinan, D,M. (1995). Nonlinear mixed effects models for repeated measurement data. Chapman and Hall, London. Deming, W.E. and Stephan, F.F. (1940). On a least squares adjustment of a sampled frequency table when the expected marginal totals are known. Annals of Mathematical Statistics, 11, 427-44.

~empster, A.P., L~ird, N.M., and Rubin, D.B. (1977). Maximum likelihood from mcomplete data B, 39, 1-38.

VIa

the EM algorithm. Journal of the Royal Statistical Society,

Dhrymes ' P "J (1971) . D'zs t n 'b ut ed Iags: problems of estimation and formulation. Holden-Day, San Francisco, DBiggle't P,J · 4(1988). An approach to the analysis of repeated measures. lOme ncs, 4 ,959-71. Diggle, P.J. (1989). Testing for rand d . Biometrics, 45, 1255-58. om ropouts 1ll repeated measurement data.

Evans, M. and Swartz, T. (1995). Methods for approximating int~~als instatistics with special emphasis on Bayesian integration problems. Statzstzcal Sctence, 10,254-72. Faucett, C.L. and Thomas, D.C. (1996). Simultaneously model.lin g censored survival data and repeatedly measured covariates: a Gibbs sampling approach. Statistics in Medicine, 15, 1663-86. Fearn, T. (1977). A two-stage model for growth curves which leads to Rao's covariance-adjusted estimates. Biometrika, 64, 141-43. 'l' th d its applications (3rd Feller, W. (1968). An introduction to probabz zty eory an edn). John Wiley, New York. d D A (1999) Combining mortality and . . . . .' Moo" 18 1341-54. Finkelstein, D.M. and Schoenfel, longitudinal measures in clinical trials. Statzstzcs m zcme, , . . d pendence estimating equaFitzmaurice, G.M. (1995). A caveat c~ncern~ng III e309-17 . tions with multivariate binary data. Bzometncs, 51,

BIBLIOGRAPHY BIBLIOGRAPHY

354 and Clifford, P. (1996). Logistic regression Fitzmaurice, G.M., Heath, A.F.; . . Journal 01 the Royal Slalzstzcal models for binary panel data wIth attntlOn.

Society, A, 159, 249 63.

. . • J N M (1993). A hkehhood-based method for ., " "k 80 141 .51. Fitzmaurice, G.M. and Lam, analysing longitudinal binary responses. Bwmetrz a, , " . . M 1 n t itzky A.G. (HJ9:3). RegresSIon models Fitzmaurice, G.M., LaIrd, N. ., all( .0 n ," 8 284 99 for discrete longitudinal responses. Statz.9tzcal Sczence" .

355

Godambe, V.P. (1960). An optimum property f l ' " " A if" 0 regu ar maxImum likelihood estImatIOn. nna so Mathemattcal Statistics, 31, 1208-12. Godfrey, L.G. and Poskitt, D.S. (1975). Testing the r t· t" f . l 1 esrIe IOns 0 the Almon lag technIque. Journa 0 the American Statistical Assoc,a ; t"lOn, 70 ,105-8. Goldfarb, N. (1960). An introduction to longitudinal stat1..stical analysis: the method of repeated observations from a fixed sample. Free Press of Glencoe, Illinois.

M (1995) An approximate generalized linear model with Follman, D. and Wu,. '.. " .. , 51 15168 random effects for informative mlssmg data. Bwmetrzcs" .

Goldstein, H. (1979). The design and analysis of longitudinal studies: their role in the measurement of change. Academic Press, London.

. k S J (1992). Repeated measures in clinical trials: anaF'r1son L J and Pococ, ., . S t" t" " , ... t t' tics and its implication for deSIgn. ta zs zcs m lysis using mean summary s a IS • Medicine, 11, 1685--1704. . LJ d Pock S.J. (1997). Linearly divergent treatment effects in F'r1son, ... an oc , . . t t' t' clinical trials with repeated measures: efficient analYSIS usmg summary s a IS ICS. Statistics in Medicine, 16, 2855-72.

Goldstein, H. (1986). Multilevel mixed linear model analy SIS . usmg . 't . " I era t'Ive generalised least squares. Bwmetrika, 73, 43-56.

Gabriel, K.R. (1962). Ante dependence analysis of an ordered set of variables. Anna18 of Mathematical Statistics, 33, 201~12. Gauss, C.F. (1809). Theoria motus corporum celestium. Hamburg: Perthes et Besser. Translated, 1857, as Theory of motion of the heavenly bodies moving about the sun in conic sections, trans. C. H. Davis. Little, Brown, Boston. Reprinted, 1963; Dover, New York. French translation of the portion on least squares, pp. 11134 in Gauss, 1855. Gelfand, A.E. and Smith, A.F.M. (1990). Sampling-based approaches to calculating margina densities. Journal of the American Statistical Association, 85, 398-409. Gelfand, A.E., Hills, S.E., Racine-Poan, A., and Smith, A.F.M. (1990). Illustration of Bayesian inference in normal data models using Gibbs sampling. Journal of the American Statistical Association, 85, 972-85. Gelman, A., Carlin, J.B, Stern, H.S. and Rubin, D.B. (1995). Bayesian data analysis. Chapman and Hall, London.

Goldstein, H. (1995). Multilevel statistical models (2nd edn). Edward Arnold, London. Goldstein, H. and Rasbash, J. (1996). Improved approximations for multilevel models with binary responses. Journal of the Royal Statistical Society, A, 159, 505-13. Gourieroux, C., Monfort, A., and Trognon, A. (1984). Psuedo-maximum likelihood methods: theory. Econometrica, 52, 681-700. Graubard, B.I. and Korn, E.L. (1994). Regression analysis with clustered data. Statistics in Medicine, 13, 509-22. Gray, S.M. and Brookmeyer, R. (1998). Estimating a treatment effect from multidimensional longitudinal data. Biometrics, 54, 976-88. Gray, S. and Brookmeyer, R. (2000). Multidimensional longitudinal data: estimating a treatment effect from continuous, discrete or time to event response variables. Journal of American Statistical Association, 95, 396-406. Graybill, F. (1976). Theory and application of the linear model. Wadsworth, California.

Gibaldi, M. and Perrier, D. (1982). Pharmacokinetics. Marcel Dekker, New York.

Greenwood, M. and Yule, G.U. (1920). An enquiry into the nature of frequency distributions to the occurrence of multiple attacks of disease or of repeated accidents. Journal of the Royal Statistical Socieity, Series A, 83, 255-79.

Gilks, W., Richardson, S., and Speigelhalter, D. (1996). Markov chain Monte Carlo in practice. Chapman and Hall, London.

Grieve, A.P. (1994). Contribution to the Discussion of the paper by Diggle and Kenward. Applied Statistics, 43, 74-6.

Gilmour, A.R., A?ders~n, R.D., and Rae, A.L. (1985). The analysis of binomial data by a generalized hnear mixed model, Biometrika, 72, 593-99.

Griffiths D.A. (1973). Maximum likelihood estimation for the beta-binomial distribution, and an application to the househo Id d'ISt n'b u t'IOn 0 f the total number of cases of a disease. Biometrics, 29, 637-48.

Glasbey,C.A. (1988). Examples of regression with serially correlated errors. The Statzstzczan, 37, 277-92.

G11~~e~ G.FI·V · and McCullagh, P. o

e

(1995). Multivariate logistic models. Journal ova S tattsttcal Society, B, 57, 533-46.

Gromping, U. (1996). A note on fitting a marginal model to mixed effects loglinear regression data via GEE. Biometrics, 52, 28()-5. . f ult' . trouped Guo, S.W. and Lin, D.Y. (1994). Regression analySIS 0 m lvarl8 e g survival data. Biometrics, 50, 632-39.

BIBLIOGRAPHY

BIBLIOGRAPHY

357

356 E t d d generalized estimating equations Hall DB. and Severini, T.A. (1998). x en ;tatistical Association, 93, 1:365-75. for ~lu8~ered data, Journal of lite Amencan .. . .. t' egression. Cambridge Umverslty HardIe, W. (1990). Applied nonparame nc r .

Press New York.

• , . 1 'ff R M Pryor D.B. and Rosati, R.A. (1984). .., ' ' ., 5' . t· . E Lee , K.L. 'CalI, 11 F.., Harre, . [. . proved prognostic predictIOn. , talls zcs tTl Regression modelling strateglCs or 1m Medicine, 3, 143-52. . . t' with time series errors. Journal Hart, J.D. (1991). Kernel regressIOn estlma Ion of the Royal Statistical Society, B, 53, 173-87. Kernel regression estimation using ( 1986) E . Hart J.D. and WeIu Iy, T . ' . - l A _. , t d t Journal of the American Statzstzca ssoczatlOn, repeated mellBuremen s a a. 81, 1080-88.

Harville, D. (1974). Bayesian inference for variance components using only error contrllBts. Biometrika, 61, 383-85.

Heckman, J.J. and Singer, B. (1985). Longitudinal anal . Cambridge University Press, Cambridge. yszs

fl b 0

a Our market data.

Hedayat, A. and Afsarinejad, K. (1975). Repeated d . .. . measures eSlgns I In A r vey of statzstzcal deszgn and linear models (Ed J N S' )" su . . . nVllBtava. North-Holland Amster d am. ' Hedayat, A. and Afsarinejad, K. (1978). Repeated measure d ' II A of Statistics, 6, 619-28. s eslgns, . nnals d 1 Hedeker, D. and Gibbons, R. (1994). A random-effects ordinal regr~s . ' B lOmetrics, . " SlOn mo e for mu 1tl'1 eve I ana I YSIS. 50, 933-44. Henderson, R., Diggle, P., and Dobson, A. (2000). Joint modelling of longitudinal measurements and recurrent events. Biostatistics, 1, 465-80. Hernan, M.A., Brumback, B., and Robins, J.M. (2001). Marginal structural models to estimate the joint causal effect of nonrandomized treatments. Journal of the American Statistical Association, 96, 440-8.

Harville, D. (1977). Maximum likelihood estimation of variance components and related problems. Journal of the American Statistical Association, 72, 320-40.

Heyting, A., Tolboom, J.T.B.M., and Essers, J.G.A. (1992). Statistical handling of dropouts in longitudinal clinical trials. Statistics in Medicine, 11, 2043-62.

Hastie, T.J. and Tibshirani, R.J. (1990). Generalized additive models. Chapman and Hall, New York.

Hogan, J.W. and Laird, N.M. (l977a). Model-based approaches to analysing incomplete longitudinal and failuretime data. Statistics in Medicine, 16, 259-72.

Hausman, J.A. (1978). Specification tests in econometrics. Econometrica, 46, 1251-72.

Hogan, J.W. and Laird, N.M. (1977b). Mixture models for the joint distribution of repeated measures and event times. Statistics in Medicine, 16, 239-57.

Heagerty, P.J. (1999). Marginally specified logistic-normal models for longitudinal binary data. Biometrics, 55, 247-57.

Holland, P. (1986). Statistics and causal inference. Journal of the American Statistical Association, 81, 945-61.

Heagerty, P.J. (2002). Marginalized transition models and likelihood inference for longitudinal categorical data. Biometrics 58 (to appear).

Huber, P.J. (1967). The behaviour of maximum likelihood estimators under nonstandard conditions. Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability, 1, LeCam, L.M. and Neyman, J. editors, University of California Press, pp. 221-33.

Heagerty P.J. and Kurland, B.F. (2001). Misspecified maximum likelihood estimates and generalized linear mixed models. Biometrika, 88, 973-85. Heagerty,. P.J. and Zeger, S.L. (1996). Marginal regression models for clustered ordlllal measurements. Journal of the American Statistical Association 91 1024-36. ' , Heagerty,. P.J. and Zeger, S.L. (1998). Lorelogram: a regression approach to eX~lormg dependence in longitudinal categorical responses. Journal of the Amencan Statzstxcal Association, 93, 150-62. Heagerty, P.J. and Zeger S L (2000) M . l' rk l'h d' f . '. . . . argma Ized multilevel models and I e I 00 In erence. Statlstlcal Science, 15, 1-26. Heagerty, P.J. and Zeger S L (2000) Mit' . . ' " . u lVaflate continuation ratio models: connectIOns and caveats. Biometrics, 56, 719-32. Heckman, J.J. (1976) The commo t f n s ruct~re 0 statistical models of truncation, sample selection a d I: 't d d 1 for such models ~ 17 e E ependent vaflables, and a simple estimation method . nna s 01 conomzc and Social Measurement, 5, 475-92.

Hughes, J.P. (1999). Mixed effects models with censored data with application to HIV RNA levels. Biometrics, 55, 625-9. Jones, B. and Kenward, M.G. (1987). Modelling binary data from a three-point cross-over trial. Statistics in Medicine, 6, 555-64. Jones, B. and Kenward, M.G. (1989). Design and analysis of cross-over trials. Chapman and Hall, London. Jones, M.C. and Rice, J.A. (1992). Di~playing .the imp~~~1:::~es of large collections of similar curves. The Amencan Statzstzczan, ' .h . I rrelation- a state-space Jones, RM. (1993). Longitudinal data wzt serta co . approach. Chapman and Hall, London. Jones, R.H. and Ackerson, L.M. (1990). Serial correlation in unequally spaced longitudinal data. Biometrika, 77, 721-31.

BIBLIOGRAPHY

BIBLIOGRAPHY

359

358

. . F (1991) Unequally spaced longitudinal data .' . J onCll, R"H. and Boadl-Boteng, "161-75 wit.h IlClrial eorrclation. Bzometnc,~, 47, . .. (' J (1978) Mining geostatistic8. Academic Press, • Journel, A.G. and HUlJoregts, N " New York. of systf'matic sampling from conveyor belts. Jowett, G.H. (19S2), T he accuracy" Applied Statistics, 1, SOg. . I P t·· R I (1980) . Tlw statistical aualysi8 of failure time KalbflClBch, J. D ,am ren Ice, ,." data. .John Wiley, New York.

Lauritzen, S.L. (2000). Causal inference from gr h' I R-99-2021, Department of Mathematical Sci en ap Alcalbmodels.. Res~arch Report ces, a org Ulllversity. Lavalley, M.P. and De Gruttola , V. (1996) • IV ~Iodel ~ .. S lor empIrIcal B ayes t' of longitudinal CD4 counts. Statistics in Medicine, 15, 2289_305. es Imators Lawless, J.F. (1982). Statistical models and meth d ' I" . ' a s Jar IJetlme data, John \Viley , k N ew Yor. Lee, Y. and NeIder, J.A. (1996). Hierarchical gener I' d I' .. ') J . . a Ize meal' models (with dISCUSSIon. ournal of the Royal Stattstlcal Society, Series B, 58, 619-78,

Kalman, R.E. (1960). A new approach to linear filtering and prediction problems. Journal of Baaic Engineering, 82, 34 45.

Lee, Y. and Neider, J.A. (2001), Hierarchical generalised linear mod I th 'f r d r e s , a syn eSIS 0 genera Ise meal' models, random-effects models and structured d' . Biometrika 88, 987-1006. IsperSlOns.

Karim, M.R, (1991). Generalized linear models with random eff~cts: a. Gi~bs sampling approach, Unpublished PhD thesis from the Johns Hopkms Umverslty Department of Biostatistics, Baltimore, Maryland.

Legendre, A.M. (1805). Nouvelles Methodes pour La determination des orbites des comEtes. John Wiley, New York.

Kaslow, R.A"

Ostrow, D.G"

Detels, R. et al, (1987). The Multicenter

AIDS Cohort Study: rationale, organization and selected characteristics of the participants. American Journal of Epidemiology, 126, 310-18. Kaufmann, H. (1987). Regression models for nonstationary categorical time series: asymptotic estimation theory. Annals of Statistics, 15, 863-71. Kenward, M.G. (1987). A method for comparing profiles of repeated measurements. Applied Statistics, 36, 296-308. Kenward, M.G., Lesaffre, E., and Molenberghs, G. (1994). An application of maximum likelihood and estimating equations to the analysis of ordinal data from a longitudinal study with cases missing at random. Biometrics, 50, 945-53. Korn, E.L. and Whittemore, A.S. (1979). Methods for analyzing panel studies of acute health effects of air pollution. Biometrics, 35, 795-802. Laird, N.M. (1988), Missing data in longitudinal studies. Statistics in Medicine, 7,305-15. N.M. and Wang" F (1990) t' t'lIIg rates of change III . randomized Laird, I' ' . . E SIma c InIcal tl'lals, Controlled Clinical Trials, 11, 405-19.

LB~ird, tN :M . and Ware, J .H. wme ncs, 38, 963-74.

(1982). Random-effects models for longitudinal data

.

Lang, J.B, and Agresti A (1994) S' I f ,'.' . Imu taneously modeling joint and marginal IS l'l U Ions 0 multIvarIate categorical responses Jo . Statzsllcal Association, 89, 625-32. . urnal of the Amencan

d' t 'b t'

Lang, J.B., McDonald J,W and Sm'th P W modeling of mUltivaria~e cat" . I I , ' .F. (1999). Association-marginal egOl'lca responses' a max' l'k l'h Journal of the Amencan St t' t· l A ' .' Imum lei ood approach. a IS lca ssonatlOn, 94, 1161-71. Lange, N. and Ryan L (1989) A . " . ssesslIIg norm I't· d Annals of Statistics, 17, 624-42. a I y 111 ran om effects models.

Lepper, A.W.D. (1989). Effects of altered dietary iron intake in Mycobacterium paratuberculosis-infected dairy cattle: sequential observations on growth, iron and copper metabolism and development of paratuberculosis. Res. Vet. Sci., 46, 289-96. Liang, K.- Y. and McCullagh, P. (1993). Case studies in binary dispersion. Biometrics, 49, 623-30. Liang, K.-Y. and Zeger, S.L. (1986). Longitudinal data analysis using generalized linear models. Biometrika, 73, 13-22. Liang, K.-Y. and Zeger, S.L. (2000). Longitudinal data analysis of continuous and discrete responses for pre-post designs. Sankya, B , 62, 134-48. Liang, K.-Y., Zeger, S.L., and Qaqish, B. (1992). Multivariate regression analyses for categorical data (with Discussion). Journal of the Royal Statistical Society, B, 54, 3-40. Lin, D.Y. and Ying, Z. (2001). Semiparametric and nonparametric regression analysis of longitudinal data. Journal of the American Statistical Association, 96,103-26. Lin, X. and Breslow, N.E. (1996). Bias correction in generalized linear mixed models with multiple components of dispersion. Journal of the Amencan Statistical Association, 91, 1007-16. Lin, X. and Carroll, R.J. (2000). Nonparametric function estimation for cJust~red data when the predictor is measured without/with error. Journal of the Amencan Statistical Association, 95, 520-34.

J an d B a t es, D .M . (1990). Nonlinear mixed effects models for LI'nds t rom, M ., repeated measures data. Biometrics, 46, 673--87.

360

BIBLIOGRAPHY

BIBLIOGRAPHY 361

. " t'mg equations for correlated binary data: . . S (1989) Generalized estlma D LipSitZ,.. f ciation Technical report, epartment using the odds ratio as a measure 0 as~o Ith' of Biostatistics, Harvard School of Public Hea .

Molenberghs, G. and Lesalfre E (1994) M . I ,. . argma mod r f I J e mg 0 corre ated ordinal data using a multivariate Plackett distr'b t' , , . 1 U IOn. ournal o{ the A ' me7'1.can Statistzcal Assoczatzon, 89, 633-44.

. gton, D . (1991). Generalized estimating. equa. L· 't S laird N an d Harrm IPSI Z, ., -' ,., d . jds ratios as a measure of assoCIatIOn. tions for correlated binary ata: uSing or Biometrika, 78, 15.360.

Molenberghs, G. and Lesaffre, E. (1999). Marginal d I' . I d a t a. S tatzstzcs ' , m . Medicine, 18, 2237-55. rna e mg categonca

. ture models incomplete data. ) P attern-mlx Little, R..1.A. (1993. . ' for multivariate , Journal of the American Statistical Assoczatwn, 88, 125-34. Little, R..1.A. (1995). Modelling the drop-out mechanism in repeated-measures studies. Journal of the American Statzstzeal Assoczatwn, 90, 1112 21. Little, R..1.A. and Rubin, D.B. (1987). Statistical analysis with missing data. John Wiley, New York. Little, R.1. and Rubin, D.B. (2000). Causal effects in clinical and epidemiological studies via potential outcomes: concepts and analytical approaches. Annual Review in Public Health, 21, 121-45. Lin, G. and Liang, K.-Y. (1997). Sample size calculations for studies with correlated observations. Biometrics, 53, 937-47. Lin, Q. and Pierce, D.A. (1994). A note on Gauss-Hermite quadrature. Biometrika, 81,624-9. Lindsey, J.K. (2000). Obtaining marginal estimates from conditional categorical repeated measurements models with missing data. Statistics in Medicine, 19, 801-9.

f 0

'. multJvanate '

Molenberghs, G., Michiels, B., Kenward M.G. and D' I P.1 ( . . h . , , Igg e, " 1997). IVhssmg data mec amsms and pattern-mixture models St t' 't· N I ' a ~s ~ca . eer and~ca 1 5 3 - 6. 1 , 52 , Monahan, J.F. and Stefanski, L.A. (1992) Normal scale nIl' t '. • . . x ure approXimatIOns to F (x) and computatIOn of the logistic-normal integral. In Handbook of the logzstzc dzstrzbutzon (ed. N. Balakrishnan), pp. 529-40. Ivlarcel Dekker, New York. Morton, R (1987). A generalized linear model with nested strat of extra-Poisson variation. Biometrika, 74, 247-57. Mosteller, F. and Tukey, J.W. (1977). Data analysis and regression: a second course in statistics. Addison-Wesley, Reading, Massachusetts. Moyeed, RA. and Diggle, P.J. (1994). Rates of convergence in semi-parametric modelling of longitudinal data. Australian Journal of Statistics, 36, 75-93. Milller, M.G. (1988). Nonparametric regression analysis of longitudinal data. Lecture Notes in Statistics, 41. Springer-Verlag, Berlin. Munoz, A., Carey, V., Schouten, J.P., Segal, M., and Rosner, B. (1992). A parametric family of correlation structures for the analysis of longitudinal data. Biometrics, 48, 733-42.

Lunn, D.J., Wakefield, J., and Racine-Poon, A. (2001). Cumulative logit models for ordinal data, a case study involving allergic rhinitis severity SCores. Statistics in Medicine, 20, 2261-85.

Murray, G.D. and Findlay, J.G. (1988). Correcting for the bias caused by dropouts in hypertension trials. Statistics in Medicine, 1, 941-46.

Mason, W.E. and Fienberg, S.E. (eds) (1985). Cohort analysis in social research: beyond the zdentification problem. Springer-Verlag, New York.

NeIder, J.A. and Mead, R (1965). A simplex method for function minimisation. Computational Journal, 7, 303-13.

McCullagh, P. (1980). Regression models for ordinal data (with discussion). Journal o{ the Royal Statzshcal Society, B, 42, 109-42.

Neuhaus, J.M., Hauck, W.W., and Kalbfleisch, J.D. (1992). The effects of mixture distribution misspecification when fitting mixed-effects logistic models. Biometrika, 79, 755-62.

McCUllagh, P. (1983). Quasi-likelihood functions. Annals of Statistics, 11, 59-67. . , McCUllagh, P. and NeIder J A (1989) G Hall, New York. ' . . . eneraltzed lznear models. Chapman and McCulloch, C.E. (1997). Maximum I'k I'h d . mixed models. Journal o{ th A ' I e I 00 ~lgonthms for generalized linear e me7'1.can Statzstzcal Association, 92, 162-70. Mead, R. and Curnow, R.N. (1983) S ' . '. experimental biology Chap d H . tatzshcal methods zn ag'7'1.culture and . man an all, London. Molenberghs, G., Kenward M G d L e~affre, E. (1997). The analysis of longitudinal ordinal data Wi~h . '£ ., a~ III ormatJvedvl dropout. Biometrika, 84, 33-44.

Neuhaus, J.M. and Jewell, N.P. (1990). Some comments on Rosner's multiple logistic model for clustered data. Biometrics, 46, 523-34. Neuhaus J M and Kalbfleisch J.D. (1998). Between- and within-cluster , . . , .' 638-45 covariate effects in the analysis of clustered data. BlOmetncB, 54, . Neuhaus J.M. Kalbfleisch J.D., and Hauck, W.W. (1991). ~ comparison " , . average d approaches for analyzmg correlated of cluster-specific and populatIOn binary data. International Statistical Review, 59, 25-36. . . Neyman J. (1923). On the applIcatIOn a f pro ba b'l't 11 y theory . ' to agricultural . ' . , experiments: essay on prinCiples, sectIOn 9 , t ransIa ted in Stat1,Stzcal Sczence, 1990, 5,65-80.

BIBLIOGRAPHY BIBLIOGRAPHY

362

. t based on partially E L (1948). Consistent estlma es NeYr,nan, .1. and ~~~~' Ec~~ometrica, 16, 1-32. consIstent observatl . samples with multiple end. PC (1984). Procedurcfi for comparIng O'Brien, . ' . pOI·nts. Biometrics, 40, 1079-87. . r)'ance functIon "n.stimation for non-normal . M.C. (1992). ParametrIc va . Palk't d me!L'lurement data. Biometncs, 48, 18-30. repea e f :I' doth -H (1997). Effect of con oune mg an er Palta, M., Lin, C-Y., and Chao, d'. I d ta In Modelling longztudinal and . .' d Is for longltu lIla a . . ' rnisspecIflcatlon In rno e l" t'ons and future directzons (Spnnger spatially correlated data: methods, app zca z_ ' Lecture Notes in Statistics, Volume 122), 77 87. . . 2000). A note on margmal Imear ( W L . T A and C annett , J .E. Pan, ." ~UhIB, 'I'~ d response data. American Statistician, 54, 191--5. regressIOn Wit corre a e . . 1985). Nested analysis of varIance wIth k K.H, ( Pantula S.G. and Po IIOC, autocor;elated errors. Biometrics, 41, 909-20.

w,.

Patterson, H.D. (1951). Change-over trials. Journal of the Royal Statistical Society, B, 13, 256-71. Patterson, H.D. and Thompson, R. (1971). Recovery of inter-block information when block sizes are unequal. Biometrika, 58, 545-54. Pawitan, Y. and Self, S. (1993). Modelling disease marker processes in AIDS. Journal of the American Statistical Association, 88, 719-26. Pearl, J. (2000). Causal inference in the health sciences: a conceptual introduction. Contributions to Health Services and Outcomes Research Methodology, Technical report R-282 , Department of Computer Science, University of California, Los Angeles. Pepe, M.S. and Anderson, G.A. (1994). A cautionary note on inference for marginal regression models with longitudinal data and general correlated response data. Communication in Statistics - Simulation, 23(4), 939-51. Pepe, M.S. and Couper, D. (1997). Modeling partly conditional means with longitudinal data. Journal of the American Statistical Association, 92, 991-8. Pepe, M.S., Heagerty, P.J., and Whitaker, R. (1999). Prediction using partly conditional time-varying coefficients regression models. Biometrics, 55, 944-50. Pierce, D.A. and Sands, B.R. (1975). Extra-Bernoulli variation in binary data. Technical Report 46, Department of Statistics, Oregon State University.

Pinh~iro,. J.C. and Bates, D.M. (1995). Approximations to the log-likelihood functIOn III the non-linear mixed-effects model. Journal of Computational and Graphical Statistics, 4, 12-35. Plewis, 1. (1985). Analysing change: measurement and explanation using longitudmal data. John Wiley, New York.

363

pocock, S.J., Geller, N.L., and Tsiatis, A.A. (1987). The analysI's of mult' 1 . cI"Imca1 t rIa. . 15 B'wmetrics, 43, 487-98. Ip e endpoints III poififion, S.D. (1837). Recherches sur la Probabilite des Jugements en Mat' · C" lere Criminelle :~ .en M atlere. IVlle, ~recede€'..s des Regles Generales du Calcul des ProbabIlItIes. .Bacheher, ImprImeur-Libraire pour les Mathematiques , Ia Physique, etc., P ans. pourahmadi, M. (1999). Joint mean-covariance models with application to longitudinal data: unconstrained parameterisation. Biometrika, 86, 677-90. Prentice, R.L. (1986). Binary regression using an extended beta-binomial distribution, with discussion of correlation induced by covariate measurement errors. Journal of the American Statistical Association, 81, 321-27. Prentice, R.L. (1988). Correlated binary regression with covariates specific to each binary observation. Biometrics, 44, 1033-48. Prentice, R.L. and Zhao, L.P. (1991). Estimating equations for parameters in means and covariances of multivariate discrete and continuous responses. Biometrics, 47, 825-39. Priestley, M.B. and Chao, M.T. (1972). Non-parametric function fitting. Journal of the Royal Statistical Society, B, 34, 384-92. Rao, C.R. (1965). The theory of least squares when the parameters are stochastic and its application to the analysis of growth curves. Biometrika, 52, 447-58. Rao, C.R. (1973). Linear statistical inference and its applications (2nd edn). John Wiley, New York. Ratkowsky, New York.

D.A. (1983).

Non-linear regression modelling. Marcel Dekker,

Rice, J.A. and Silverman, B.W. (1991). Estimating the mean and1cohvarRianc~ . Ily wh en th e d a t a are curves . Journal 0 t e oya structure nonparametnca Statistical Society, B, 53, 233-43. . ated me!L'lurement data. Ridout, M. (1991). Testing for random dropouts III repe Biometrics, 47, 1617-21. h to causal inference in mortality studies Robins, J.M. (1986). A new approac. . 0 control of the healthy worker with sustained exposure periods - applIcatIOn t . survivor effect. Mathematical Modelling, 7, 1393-512. ch to causal inference III morRobins, J.M. (1987). Addendum to 'A new .ar;rr~a lication to control of the tality studies with sustained exposure peno ~ M aft matics with Applications, healthy worker survivor effect.' Computers an a e 14, 923-45. . t' . uivalent trials. Stat'!S ICS Robins, J.M. (1998). Correction for non-compliance III eq in Medicine, 17, 269-302.

364

BIBLIOGRAPHY

. I t tural models versus structural nested models J M (1999). Margma fl ruc " b' Ro ms, .. ' . . I 0t t' tical models in epidemwlogy: the envzrontoolfl for causal mference. n.7 a UI ) 9~ 1"4 f as , , ' L (. 1 ME Halloran and D, Berry, pp. .j o J . MA ment and elmzcal trza~. u. ., Volumr; 116, Springer-Verlag New York. . .I.M., G' reen IdS causal effect Robmfl, an , ., an d Hu ,F. -C. (1999). Estimation of the I.. . ' (n the marginal mean of a repeated Jmary outcome of a tlme-varymg exposure J . ' ' ' . . '}I D'ISCUSSlon. . ) Jo umal o.rthe American WIt ~ . . , StatU/tIcal Assoczatwn, 94, 687712. (

' J M Rotnl'tzky" A. and , Zhao L.P. (1995). Analysis. . of semi parametric Rob Ins, ,. " regression models for repeated outcomes in the presence of mJssmg data. Journal oj the American Stati.~tical Association, 90, 106-21. Rosner, B. (1984). Multivariate methods in ophthalmology with application to other paired-data situations. Biometrics, 40, 1025-35. Rosner, B. (1989). Multivariate methods for clustered binary data with more than one level of nesting. Journal of the American Statistical Association, 84, 373-80. Rothman, K.J. and Greenland, S. (1998). Modern epidemiology. LippincottRaven. Rowell, J.G. and Walters, D.E. (1976). Analysing data with repeated observations on each experimental unit. Journal of Agricultural Science, 87, 423-32. Royall, R.M. (1986). Model robust inference using maximum likelihood estimators. International Statistical Review, 54, 221-26. Rubin, D.B. (1974). Estimating causal effects of treatment in randomized and non-randomized studies. Journal of Educational Psychology, 66, 688-701. Rubin, D.B. (1976). Inference and missing data. Biometrika, 63, 581-92.

~ubin, D.B. (1978). Bayesian inference for causal effects: the role of randomization. Annals of Statistics, 6, 34-58. . Samet J M D ' . . F (20 ' :., ~mmlcl,.., Curnero, F.C., Coursac, 1., and Zeger, S.L. N 001 Fme partIculate aIr pollution and mortality in 20 US cities 1987-1994 ew ngland Journal of Medicine, 343(24), 1798-9. ' . Sandland, R.L, and McGilchrist C A (1979) S . Biometrics, 35, 255-71. ,. . . toChastIC growth curve analysis. Schall, R. (1991). Estimation in ener r d r . Biometrika, 78, 719-27. g a Ize mear models WIth random effects.

~charfstein, D.O., Rotnitzky, A., and Robi . Ignorable dropout using semipar t . ns, J.M. (1999). Adjusting for nonnc non-respon d I (. . . J ournal of the American Stati t'arne IA " se mo e s WIth DIScussIon). s tca ssoczatzon, 94, 1096-1120 . Seber, G A F (1977) L' . . . ,mear regression analysis. John Wiley, New York. Self, S. and Mauritsen R (1988) P [' " . ower/sample' I I . Inear models. Biometrics, 44, 79-86. SIze ca eu atlons for generalized

BIBLIOGRAPHY 365

Senn, S..1. (1992). Crossover trials in clinical . research. John Wiley, Chichester A (1997) . ShclIler, L.B., Beal, S .L " and Dun ne,. . cellsored ordered categorical longitud' I d ' AnalYSIS of nonrandomly , J ma ata from an I . . ' . a geBlc tnals (with DiscusslOn). ournal of the American St t t' I a zs lca ASSoCtatlOn, 92, 1235~55, Shih, .1. (1998), Modeling multivaraite d' [ . . 1115 28. lscrete allure tIme data. Biometrics, 54, Silverman, B.W. (1984). Spline smoothin . th ' , , g. e eqUIvalent va . bl k I Annals of Statzstlcs, 12,898-916. . na e erne method. Silverman, B.W. (1985). Some aspects f th . I' . ' . 0 e sp me smoothmg h non-parametnc regreSSIOn curve fitting (w'th D' , . approac to ., , I . Iscusslon) fouT7 I ,( th R Statzstzcal Sonely, B, 47, 1-52. . . W OJ e oyal SkeIlam, J.G. (1948). A probability distribution d "d f " 'b' b' erne rom the bmomlal dis trI utIOn y regardmg the probability of SUccess . bl b ' , . a.<; vana e etween the sets of . I J trIa s. ournal of the Royal Statzst1cal Society, B, 10, 257-61.

Snedecor~ and Cochran, W.G. (1989). Statistical methods (8th edn). Iowa State UmversIty Press, Ames, Iowa.

G.":,,,.

t' Snell, E.J. (1964). A scaling procedure for ordered categorical data B' 40, 592-607. . lOme ncs,

S~lomon~ P.J. and Cox, D.R (1992). Nonlinear components of variance models. Bzomet'T"tka, 79, 1-11. Sommer, A. (1982). Nutritional blindness. Oxford University Press, New York.

~om~er, A., Katz, J., and Tarwotjo, 1. (1984). Increased risk of respiratory mfectlOn and diarrhea in children with pre-existing mild vitamin A deficiency. American Journal of Clinical Nutrition, 40, 1090-95. Spall, J. C. (1988). Bayesian analysis of time series and Dynamic models. Marcel Dekker, New York. Stanek, E.J. (1988). Choosing a pre-test-post test analysis. American Statistician, 42,178-83. Stefanski, L.A. and Carroll, R.J. (1985). Covariate measurement error in logistic regression. Annals of Statistics, 13, 1335-51. Stern, RD. and Coe, R (1984). A model fitting analysis of daily rainfall data. Journal of the Royal Statistical Society, A, 147, 1-34. Stiratelli, R., Laird, N., and Ware, .1.H. (1984). Random effects models for serial observations with binary responses. Biometrics, 40, 961-71. Stram, D.O., Wei, L.J., and Ware, J.H. (1988). Analysis of repeated ordered categorical outcomes with possibly missing observations and time-dependent covariates. Journal of the American Statistical Association, 83, 631-37.

BIBLIOGRAPHY

366

BIBLIOGRAPHY 367

'r taka w,. aRK. (2000). Random effects in geneTD Speckman, P.L., an d su . d lB' . odels (GLMMs). In Generalized Imear mo e 5, a ayeszan ahzed Imear mIxed m S Gh hand B. Mallick), pp. 23·-39, Marcel-Dekker, perspectwe (ed. D. Dey,. os, New York. ,I A Rand Tran L. (1999). A comparison of mixed ' . d I I TenHave T.R., Kunse man, . ., '. . . d I for binary response data wIth two neste eve s effects lOgIstIC regressIOn mo e of clustering. Statistics in Medicine, 18, 947-60. S

~n,.,'

TenHave, T.R. and Dttal, D.H. (1994). Subject-specific an~ pOPulatfiilon-avAeragl~dd ., pp ~e contmatlOn rat'10 Iog I't models for multiple discrete survIval pro es. Statistics, 43, 371-84.

Volberding, P.A., Lagakos S W K h M A tomatic human immunod~fi~ie~~y~~ ' ' · r · ~t al. (1990). Zidovudine in asymprus Inlectlon . The N ew E ngland Journal of Medicine, 322, 941-9. Waclawiw, M.A. and Liang, K-y. (1993) P d' . generalized linear model. Journal of the re .IctIon of random effects in the 171-78. mencan Stat~tlcal Association, 88,

A

Wakefield, J. (1996). The Bayesian I' f models. Journal of the American Stati ~nalYAsls 0 . population pharmacokinetic s ~ca ssoczatwn, 91, 62-75. Wang, Y. (1998). Smoothing spline mod Is 'th . e WI correlated random errors. J ourna l 0 f th e A meNcan Statistical Association, 93, 341-48.

Thall, P.F. and Vail, S.C. (1990). Some covariance models for longitudinal count data with overdispersion. Biometries, 46, 657-71.

Ware, J.H. (1985). Linear models for the analysis of I 't d' I . American Statistician, 39, 95-101. ongl u ma studIes. The

Thara, R, Henrietta, M., Joseph, A., Rajkumar, S., and Eaton, W. (1994). Ten year course of schizophrenia - the Madras Longitudinal study. Acta Psychiatrica Scandinavica, 90, 329-36.

Ware, ~.H., Lipsitz, S., and Speizer, F.E. (1988). Issues in the analysis of repeated categorIcal outcomes. Stat~st~cs m Medicine, 7, 95-107.

Tsay, R (1984). Regression models with time series errors. Journal of the American Statistical Association, 79, 118-24.

Ware, 'J.H., .Dockery, D., Louis ' T "A et at . (1990) . Longi't u d'maI and crosssectIOnal estImates of pulmonary function decline in never-smoking adults. Amencan Journal of Epidemiology, 32, 685-700.

Tsiatis, A.A., De Gruttola, V., and Wulfsohn, M.S. (1995). Modelling the relationship of survival to longitudinal data measured with error. Applications to survival and CD4 counts in patients with AIDS. Journal of the American Statistical Association, 90, 27-37.

Wedderburn, R.W.M. (1974). Quasi-likelihood functions, generalized linear models and the Gaussian method. Biometrika, 61, 439-47.

Tufte, E.R. (1983). The visual display of quantitative information. Graphics Press, Cheshire, Connecticut.

West, M., Harrison, P.J., and Migon, H.S. (1985). Dynamic generalized linear models and Bayesian forecasting (with Discussion). Journal of the American Statistical Association, 80, 73-97.

Tufte, E.R (1990). Connecticut.

Cheshire,

White, H. (1982). Maximum likelihood estimation of misspecified models. Econometrics, 50, 1-25.

Tukey, J.W. (1977). Exploratory data analysis. Addison-Wesley, Reading, Massachusetts.

Whittaker, J.e. (1990). Graphical models in applied multivariate statistics. John Wiley, New York.

Envisioning information.

Graphics Press,

Tunnicliff~Wi~son, G. (1989). On the use of marginal likelihood in time series model estimatIOn. Journal of the Royal Statistical Society, B, 51, 15-27.

Vel~eman, P.F. and Hoaglin, D.C. (1981). Applications, basics and computing of exp oratory data analysis. Duxbury Press, Boston, Massachus~tts.

?

V,Dertbekse, and Molenberghs, G. (2000). Linear Mixed Models for Longitudinal a a. prmger, New York. Verbyla, A.P. (1986). Conditioning in the growth curve mod I B' t'k 73 475-83. e. wme N a, , Verbyla, A.P. and Cullis, B.R. (1990) M d II' . . ments. Applied Statistics, 39, 341-56.' 0 e mg m repeated measures experIVerbyla, A.P. and Venables, W.N. ( ) model. Biometrika, 75, 129-38. 1988 . An extension of the growth curve

Williams, E.J. (1949). Experimental designs balanced for the estimation of residual effects of treatments. Australian Journal of Scientific Research, 2, 149-68. Williams, D.A. (1975). The analysis of binary responses from toxicological experiments involving reproduction and teratogenicity. Biometrics, 31, 949-52. Williams, D.A. (1982). Extra-binomial variation in logistic linear models. Applied Statistics, 31, 144-48. Winer, B.J. (1977). Statistical principles in experimental design (2nd edn). McGraw-Hill, New York. Wishart, J. (1938). Growth-rate determinations in nutrition studies with the bacon pig, and their analysis. Biometrika, 3D, 16-28. Wong, W.H. (1986). Theory of partial likelihood. Annals of Statistics, 14,88-123.

BIBLIOGRAPHY 368

'I K R (1989) Estimation and comparison of changes in the Wu, M.C. an. dr B al ey, . . censoring' . ..' t' e right condItIonal hnear model. B"wmetncs, 45, presence 0 f Illlorma IV .'

~omparison

939-55. Wu, M.C. and Carroll, R.J. (1988). Estimation and of changes in the presence of right censoring by modeling the censonng process. Bwmetncs, 44,

Index

175-88. Wulfsohn, M.S. and Tsiatis, A.A. (1997). A joint model for survival and longitudinal data measured with error. Biometrics, 53, 330-39. Xu, J. and Zeger, S.L. (2001). Joint analysis of longitudinal data comprising repeated measures a.nd times to events. Applied Statistics, 50, 375-87. Yu, 0., Sheppard, L., Lumley, T., Koenig, J.Q., and Shapiro, G. (2000). Effects of a.mbient carbon monoxide and atmospheric particles on asthma symptoms, results from the CAMP air pollution ancillary study. Environmental Health

Perspectives, 12, 1-10. Yule, G.D. (1927). On a method of investigating periodicities in disturbed series with special reference to Wolfer's sunspot numbers. Philosophical Transactions of the Royal Society of London, A, 226, 267-98. Zeger, 8.L. and Diggle, P.J. (1994). Semi-parametric models for longitudinal data with application to CD4 cell numbers in HIV seroconverters. Biometrics, 50,689-99. Zeger, S.L. and Karim, M.R. (1991). Generalized linear models with random effects: a Gibbs sampling approach. Journal of the American Statistical

Association, 86, 79-86.

Zege~, S.L. and Liang, K-Y. (1986). Longitudinal data analysis for discrete and contmuous outcomes. Biometrics, 42, 121-30.

~ger, 8:L. asnd Lia.ng, K~. (1991). Feedback models for discrete and continuous lme Benes. tahshca Stmea, 1, 51-64. Zeger, S.L. and Liang . longitudinal data St 't'Kt',-Y ." (1992) M d: A . n overVIew of methods for the analysis of . a t5 te5 m e tcme, 11, 1825-39. Zeger, S.L., Liang, K-Y. 'e and ) Models for longitudinal data: a generalized estimating u t'Albert ,P.8. (1 9~8. q a lon approach. Bwmetrics, 44, 1049-60. Zeger, S.L., Liang, K.-Y. and Self S G ( d ' " . 1985): The analysis of binary longitudinal data with time~ind pen ent covanates. Bwmetrika, 72, 31-8. Zeger, S.L. and Qaqish B. (1988) M k . quasi-likelihood appro~h B t'" ar ov regressIon models for time series: a . tome ncs, 44, 1019-31. Zhang, D., Lin, X., Raz J and So M tic mixed models for l~n~itud' 1 ~ers, .F. (199B). Semiparametric stochasAssociation, 93, 710-19, ma ata. Journal of the American Statistical Zhao, L.P. and Prentice R L (1990) generalized quadratic mod 1 'B: .. Correlated binary regression using a

e.

wmetnka, 77, 642-48.

• ~c page num bers. Note: Figures and Tables are indicated by ;ta!"

adaptive quadrature technique 212-13 examples of use 232, 238 age effect 1, 157 in example 157-9, 159 AIDS research 3, 330 see also CD4+ cell numbers data alternating logistic regressions (ALRs) 147 see also generalized estimating equations analysis of variance (ANOVA) methods 114-25 advantages 125 limitations 114 split-plot ANOVA 56, 123-5 example of use 124-5 time-by-time ANOVA 115-16,125 example of use 116, 118 limitations 115, 125 ante-dependence models 87-9, 115 approximate maximum likelihood methods 175,210 advantage 212 autocorrelated/autoregressive random effects model 210, 211 in example 239 autocorrelation function 46, 48 for CD4+ data 48, 49 for exponential model 57, 84 autoregressive models 56-7,87-8, 134 available case missing value restrictions 300 back-fitting algorithm 324 Babadur representation 144 bandwidth, curve-fitting 45

Bayesian methods for generalized linear mixed models 214-16 examples of use 232, 238 beta-binomial distribution 178-9 uses 179 bias 22-4 bibliography 349--68 binary data, simulation under generalized linear mixed models 210, 211 binary responses logistic regression models 175-84 conditional likelihood approach 175-8 with Gaussian random effects 180-4 random effects models 178-80 log-linear models 142-3 marginal models 143--6 examples 148--60 sample size calculations 30-1 boxcar window 43 boxplots 10, 12 BUGS software 216 calf intestinal parasites experiment data 111 time-by-time ANOVA applied 116, 118 canonical parameters, in log-linear models 143, 153 carry-over effect experimental assessment of 7, 151-3 ignored 149 categorical data generalized linear mixed models for 209-16 examples 231,232,237-40

INDEX

371

INDEX 370

con d I't'JOns.I generalized linear regression model 209 conditional likelihood advantages of approach. 1778 for generalized linear mIxed models 171-2 maximization in transition model 138,

categorical data (cont,) £ 208-44 Iikelihood-ba.Oled methods or , Il'zed margms. , models for 216-31 examples 231-3, 240·-3 4 ordered, transition models for 201·transition models for 194-204 examples 197-201 193 " for random intercept logistIC regressIOn categorical responses, aBsociation model 175-8 among 52-3 random intercept log-linear model for causal estimation methods 273-4 count data 184-6 causal models 271 see also maximum likelihood causal targets of inference 269-73 conditional maximum likelihood CD4+ cell numbers data 3-4 estimation correlation in 46-8 and depressive symptoms score 39-41 random effects model for count data 184-6 estimation of population mean generalized linear mixed curve 325-6 graphical representation 9, 35-9 model 171-2 for transition models 138, 192-3, 203 marginal analysis 18 parametric modelling 108-10 conditional means 13, 191, 209 full covariate conditional mean 253 prediction of individual trajectories 110, 112-13 likelihood-based estimates 292, 238 random effects models 18, 130 partly conditional mean 253 time-dependent covariates 247 conditional models 153, 190 variograms 50, 51, 326 conditional modes 174 cerebrovascular deficiency treatment conditional odds ratios 144, 146-7 trial 148 confirmatory analysis 33 conditional likelihood estimation 177 confounders data 148 meaning of term 265 marginal model used 148-50, 181 time-dependent 265-80 maximum likelihood estimation 180-1 connected-line graphs 35, 35, 36, 37 random effects model used 181 alternative plots 37-8 chi-squared distributions 342 continuous responses, sample size chi-squared test statistic, in cow-weight calculations 28-30 study 107 correlated errors c-index 264 general linear model with 55-9 clinica.l trials non-linear models with 327,328 dropouts in 19, 285 correlation as prospective studies 2 among repeated observations 28 see also epileptic seizure .. ,; in longitudinal data 46-52 schizophrenia clinical trial consequences of ignoring 19 cohort effects 1 correlation matrix 24-5, 46 complete case analysis 288 for CD4+ data 46, 48 complete case missing variable correlation models restrictions 300 exponential 56-7 complete data score functions 173 uniform 55-6 completely random dropouts count data testing for 288-90 examples 160-1 in examples 290-3 generalized estimating equations completely random missing values 283, for 162-3 284 log-linear transition models for 204-6

marginal model for 137, 160-5 over-dispersed 161, 186 parametric modelling for 160-2 random effects model for 137, 186-8 counted responses marginal model used 160-5 random effects model used 184-9 counterfactual outcomes 269 regression model for 276-7 covariance structure modelling of 81-113, 323, 324 reasons for 79-80 covariate endogeneity 245, 246 covariates 337 external 246 internal 246 lagged 259-65 stochastic 253-8 time-dependent 245-81 cow weight data 103-4 parametric modelling 104-8 crossover trials 148 examples 7,9, 10,148-53,176-7 further reading recommended 31-2, 168 GLMMs compared with marginalized models 231-3 marginal models applied 148-53 random effects models applied 176-7 relative efficiency of OLS estimation 63 time-dependent covariates in 247 see also cerebrovascular deficiency treatment trial; dysmenorrhoeal pain treatment trial cross-sectional analysis, example 257-8 cross-sectional association in data 251, 254 cross-sectional data, fitting of non-linear regression model to 327 cross-sectional models correlated error structures 327, 328 estimation issues for 254-5, 256 non-linear random effects 327, 329 cross-sectional studies bias in 22-4 compared with longitudinal studies 1, 16-17,22-31,41 ,159-60 cross-valida-tion 45 crowding effect 205 cubic smoothing spline 44 curve-fitting methods 41-5

data score functions 173 derived variables 17, 116-23 examples of use 119-23 design considerations 22-32 bias 22-4 efficiency 24-6 further reading recommended 31-2 sample size calculations 25-31 diagnostics, for models 98 Diggle-Kenward model 295 fitted to milk protein data 298-9 informative dropouts investigated by 298, 318 distributed lag models 260-1 in examples 262, 269 dropout process modelling of 295-8 in examples 298-9, 301 graphical representation of various models 303-5 pattern mixture models 299-301,304 random effects models 301-3, 304, 305 selection models 295-8, 304-5 dropouts 284-7 in clinical trials 13, 285 completely random, testing for 288-90 divergence of fitted and observed means when ignored 911,316 in milk protein study 285 random 285 reasons for 12, 285, 299 ways of dealing with 287-8 complete case analysis 288 last_observation-earried-forward method 287-8 dysmenorrhoeal pain treatment trial 7, 9 data 10, 151 , ' GLMMs compared with margmallzed models 231-3 marginal model used 150-3 random intercept model fitted 177 efficiency, longitudinal co~pared with cross-sectional studIes 24-6 EM algorithm 173-4, 284, 332 " I Bayes estimates 112 emplrlC3 endogeneity, in example 268-9 endogenous variables 246',253 epileptic seizure clinical trIal 10 boxplots of data 12

372

INDEX

INDEX 373

epileptic seizure clinical trial (cont,) data 11 summary statistics 163 marginal model 163-.5 Poisson model used 161-2 random effects model 185-6, 188, 189 estimation stage of model-fitting process 95-7 event history data 330 exogeneity 245, 246-7 exogenous variables 246 explanatory variables 337 exploratory data analysis (EDA) 33-53, 19B-9,32B exponential correlation model 56-7, 84, B9 compared with Gaussian correlation model B7 efficiency of OL9 estimator in 61-2 variograms for 85 external covariates 246 extra-binomial variation 178

feedback, covariate-response 253, 258, 266-B first-order autoregressive models 56-7 87-8 ' first-order marginalized transition modeI/MTM(l) 226-7 230 241 in example 241, 242 ' , Fisher's information matrix 340 in example 341 fixed quadrature technique 212 examples of use 232 formulation of parametric model 94-5 F-statisti~ 115, 120, 122-3, 123-4, 125 full covanate conditional mean 253 full covariate conditional mean (FCCM) assumption 255 f~rther reading recommended 258 sImulation illustration 256-7 Gauss ~ Herml't e quadrature 212 213 GaussIan adapt' , , lve quadrature 212-13 GaussIan assumptions further reading recommended 189 general linear model 55 maximum likelihood estimation ,under 64-5, 180, 181 GaussIan correlation model 86

compared with exponential correlation model 87 variograms for 86 Gaussian kernel 43, 320, :321 Gaussian linear model, marginalized models using 218-20 Gaussian random effects logistic models with 180-4 Poisson regression with 188 Gauss-Markov Theorem 334, 339 g-computation, estimation of causal effects by 273-4 advantages 276 in example 275-6 generalized estimating equations (GEEs) 138-40, 146-7 advantages 293-4 for count data 162-3 example 163-5 and dropouts 293-5 further reading recommended 167, 168 for logistic regression models 146-7 203,240, 241 ' in examples 149, 150, 154 for random missingness mechanism 293-5 and stochastic covariates 257, 258, 258 and time-dependent covariates 249-50, 251 generalized linear mixed models (GLMMs) 209-16 Bayesian methods 214-16 and conditional likelihood 171-2 and dropouts 317 examples of use 231, 232, 237-40 maximum likelihood estimation for 172-5,212-14 generalized linear models (GLMs) 343-6 contrasting approaches 131-7 extensions 126-40 marginal models 126-8, 141-68 random effects models 128-30 169-89 ' tra?sition models 130--1, 190--207 genenc features 345--6 inferences 137-40 gen~rallinear model (GLM) 54-80 WIth correlated errors 55-9 ex~onential correlation model 56-7 umform correlation model 55-6 geostatistics 49 Gibbs sampling 174, 180,214

Granger non-causality 246 graphical representation of longitudinal data 6-7, 12,34-41 further reading recommended 53 guidelines 33 growth curve model 92 Hammersly-Clifford Theorem 21 hat notation 60 hierarchical random effects model 334, 336 holly leaves, effect of pH on drying rate 120-3 human immune deficiency virus (HIV) 3 see also CD4+ cell numbers data

es~imation of causal effects using 277-9

III example 279-80 iterative proportional fitting 221

joint modelling, of longitudinal measurements and recurrent events 329-32 joint probability density functions 88-9 kernel estimation 42--3 compared with other curve-fitting techniques 42, 45 in examples 42,43, 325 kernel function 320 Kolmogorov-Smirnov statistic 290, 291

ignorable missing values 284 independence estimating equations (lEEs) 257 lag 46 individual trajectories lagged covariates 259--65 prediction of 110-13 example 261-5 example using CD4+ data 112-13 multiple lagged covariates 260-1 Indonesian Children's Health Study single lagged covariate 259--60 (ICHS) 4, 5 last observation carried forward 287-8 marginal model used 17-1B, 127, 132, latent variable models 129 135-6, 141, 156-60 marginalized models 222-5 random effects model used 18 129 least-squares estimation 338-9 130, 132-3, 182--4 " further reading recommended 339 time-dependent covariates 247 optimality property 33B-9 transition model used 18, 130--1, 133, least-squares estimator 33B 197-201 bias in 23--4 inference(s) variance of 63 about generalized linear models 137-40 weighted 59--64, 70 about model parameters 94, 97-8 likelihood-based methods informative dropout mechanisms 295,316 for categorical data 208-44 in example 313-16 for generalized linear mixed models 209-16 representation by pattern mixture for marginalized models 216-31 models 299-301 informative dropouts for non-linear models 328 likelihood functions 138, 171, 173, 340 consequences 318 likelihood inference 340--3 investigation of 298, 318, 330 examples 341-3 informative missing values 80, 283 intercept likelihood ratio testing 98, 342 likelihood ratio test statistic 342 of linear regression model 337 linear links 191 random intercept models 90-1, 170, linear models 337-8 210,211 and least-squares estimation intermediate variable, meaning of method 338-9 term 265 marginal modelling approach 132 intermittent missing values 284, 2B7, 318 random effects model 132-3 internal covariates 246 transition model 133-4 inverse probability of treatment weights linear regression model 337 (IPTW)

374 link functions 191-2, 345 logistic regression models 343, 344 and dropouts 292 generalized estimating equations for 146-7 example 251 and lagged covariates 261-5 marginal-modelling approach 127, 135-6, 146-7 and Markov chain 191 random effects modelling approach 134-5, 175-80 examples 176-7,180-4 logit links 191 log likelihood ratio (test) statistic 98, 309 log-linear models 142-3, 344 canonical parameters in 143, 153 marginalized models 220-1 marginal-modelling approach 137, 143-6, 162, 164-5 random effects modelling approach 137 log-linear transition models, for count data 204-6 log-links 191 log odds ratios 52, 129, 147, 341 in examples 200, 235, 236 standard error (in example) 148-9 longitudinal data association among categorical responses 52-3 collection of 1-2 correlation structure 46-52 consequences of ignoring 19 curve smoothing for 41-5 defining feature 2 example data sets 3-15 calf intestinal parasites experiment 117 CD4+ cell numbers data 3-4 cow weight data 108 dys~enorrhoeal pain treatment tflal 7, 9, 10 epileptic seizure clinical trial 10 11, 12 ' Indonesian children's health stud 4,5 y milk protein data 5-7 8 9 pig. weight data 34 ' , s~hlzophrenia clinical trial 10-13 14 SItka spruce growth data 4-5 6' gener~1 linear models for 54-80' ,7 graphIcal representation 6-7 1" 34 , >G, -41

INDEX further reading recommended 53 guidelines .'33 missing values in 282-318 longitudinal data analysis approaches 17-20 marginal analysis 17-18 random effects model 18 transition model 18 two-stage/derived variable analysis 17 classification of problems 20 confirmatory analysis 33 exploratory data analysis 33-53 longitudinal studies 1-3 advantages 1,16-17,22,245 compared with cross-sectional studies 1, 16-17,22-31 efficiency 24-6 lorelogram 34, 52-3 further reading recommended 53 lowess smoothing 41,44 compared with other curve-fitting methods 42, 45 examples 3, 36, 4 a

Madras Longitudinal Schizophrenia Study 234-7 analysis using marginalized models 240-3 marginal analysis 18 marginal generalized linear regression model 209 marginalized latent variable models 222-5, 232 maximum likelihood estimation for 225 marginalized log-linear models 220-1 233 marginalized models ' for categorical data 216-31 examples of use 231-3,240-3 example using Gaussian linear model 218-20 marginalized random effects models 223, 225 222, marginalized transition models 225-31 advantages 230-1 in examples 233,241-3 fir~t-order/MTM(l) 226-7, 230, 241 III example 241, 242 se~ond-order/MTM(2) 228 III example 242

INDEX 375 marginal mean response 17 marginal means definition 209 likel~hood-based estimates 232, 242 log-lInear model for 143-6 marginal models 17-18, 126-8, 141-68 advantages of direct approach 216-17 assumptions 126-7 examples of use 17-18,127,132, 135-6, 148-60 further reading recommended 167-8 and likelihood 138 marginal odds ratios 145, 147 marginal quasi-likelihood (MQL) methods 232 marginal structural models (MSMs) 276 advantage(s) 280 estimation using IPTW 277-9 in example 279-80 Markov Chain Monte Carlo (MCMC) methods 214-16, 332 in examples 232,238 Markov chains 131, 190 Markov models 87, 190-206 further reading recommended 206-7 see also transition models Markov-Poisson time series model 204-5 realization of 206 maximum likelihood algorithms 212 maximum likelihood estimation 64-5 Compared with REML estimation 69, 95 for generalized linear mixed models 212-14 in parametric modelling 98 for random effects models 137-8, 172-5 restricted 66-9 for transition models 138, 192-3 see also conditional likelihood; generalized estimating equations maximum likelihood estimator 60, 64, 340 variance 60 MCEM method see Monte Carlo Expectation-Maximization method MCMC methods see Markov Chain Monte Carlo methods MCNR method see Monte Carlo Newton-Raphson method mean response non-parametric modelling of 319-26 parametric modelling of 105-7

mean response profile(s) for calf intestinal parasites experiment 118 for cow weight data 106 defined in ANOVA 114 for mil~ protein data 99, 100, 102,302 for schIzophrenia trial data 14 307 309,311,315 " measurement error and random effects 91-3 and serial correlation 89-90 and random intercept 90-1 as Source of random variation 83 measurement variation 28 micro/macro data-representation strategy 37 milk protein data 5-7 8 9 dropouts in 290-1 ' , reasons for 285 testing for completely random dropouts 291-3 mean response profiles 99, 100 102 302

'

,

parametric model fitted 99-103 pattern mixture analysis of 301, 302 variogram 50, 52, 99 missing value mechanisms classification of 283-4 completely random 283, 284 random 283, 284 missing values 282-318 effects 282 ignorable 284 informative 80, 283 intermittent 284, 287, 318 and parametric modelling 80 model-based variance 347 model-fitting 93-8 diagnostic stage 98 estimation stage 95-7 formulation stage 94-5 inference stage 97-8 moments of response 138 Monte Carlo Expectation-Maximization (MCEM) method 214 Monte Carlo maximum likelihood algorithms 214 Monte Carlo Newton-Raphson (MCNR) method 214 Monte Carlo test(s), for completely random dropouts 290, 291

376 Mothers' Stress and Children's Morhidity (MSCM) Study 247-53 cross-sectional analysis 257-8 and endogeneity 268-9 g_computation 275-6 and lagged covariates 261-5 marginal structural models using IPTW 279-80 sample of data 252 Multicenter AIDS Cohort Study (MACS) 3 CESD (depressive symptoms) scores 39-40, 41 objective(s) 3-4 8ee also CD4+ cell numbers data multiple lagged covariates 260-1 multivariate Gaussian theory 339-40 multivariate longitudinal data 332-6 examples 332

natural parameter 345 negative-binomial distribution 161, 186-7 NeIder-Mead simplex algorithm 340 nested sub-models 342 Newton-Raphson iteration 340 non-linear random effects, in crosB-sectional models 327, 329 non-linear regression model 326-7 fitting to crosB-sectional data 327 non-linear regression modelling 326-9 non-parametric curve-fitting techniques 41-5 see also kernel estimation; lowess; smoothing spline non-parametric modelling of mean response 319--26 notation 15-16 causal models 271 conditional generalized linear model 209 dropout models 295 mar~inal generalized linear model 209 mIDClmum likelihood estimator 60 multivariate Gaussian distribution 339-40 non-linear regression model 326-7 parametric models 83-4 time-dependent covariates 245 no- unmeasured-confounders assumption 27Q-.1 273 numerical integration m~thods 212-14

INDEX odds ratio, in marginal model 127, 128 ordered categorical data 201-4 proportional odd modelling of 201-3 ordering statistic, data representation using 38 ordinary least squares (OLS) estimation and ignoring correlation in data 19 naive use 63 errors arising 63-4 in nonlinear regression modelling 119 relative efficiency in crossover example 63 in exponential correlation model 61-2 in linear regression example 62 in uniform correlation model 60-1 in robust estimation of standard errors 70, 75, 76 and sample variogram 50, 52 outliers, and curve fitting 44-5 over-dispersed count data, models for 161, 186-7 over-dispersion 162, 178, 346 ozone pollution effect on tree growth 4-5 see also Sitka spruce growth data

panel studies 2 parametric modelling 81-113 for count data 16~2 example applications 99-110 CD4+ data 108-10 cow weight data 103-8 milk protein data 99-103 fitting model to data 93-8 further reading recommended 113 notation 83-4 pure serial correlation model 84-9 random effects + measurement error model 91-3 random intercept + serial correlation + measurement error model 90-1 serial correlation + measurement error model 89-90 and sources of random variation 82-3 partly conditional mean 253 partly conditional models 259-60 pattern mixture dropout models 299-301 graphical representation 908 304 Pearson chi-squared statistic 186 Pearson's chi-squared test statistic 343

INDEX penalized Quasi-likelihood (PQL) methods 175, 210, 232 example of use 282 period 1 pig weight data 34 graphical representation 34-5, 35, 36 robust estimation of standard errors for 76-9 point process data 330 Poisson distribution 161, 186, 344 Poisson-gamma distribution 347 Poisson-Gaussian random effects models 188-9 Poisson regression models 344 population growth 205 Positive And Negative Syndrome Scale (PANSS) measure 11, 153, 330, 332 subset of placebo data 305 treatment effects 334, 335 potential outcomes 269-70 power of statistical test 28 predictive squared error (PSE) 45 predictors 337 principal components analysis, in data representation 38 probability density functions 88-9 proportional odds model 201-2 application to Markov chain 202-3 prospective collection of longitudinal data 1,2

Quadratic form (test) statistic 97, 309 quadrature methods 212-14 limitations 214 quasi-likelihood methods 232, 346-8 in example 347-8 see also marginal quasi-likelihood (MQL) methods; penalized quasi-likelihood (PQL) methods quasi-score function 346 random dropout mechanism 285 random effects + measurement error models 91-3 random effects dropout models 301-3 in example 312-14 graphical representation 909, 304, 305 random effects models 18, 82, 128-30, 169-89 assumptions 17Q-.1

377

basic premise 129, 169 examples of use 18, 129, 130, 132-3 fitting using maximum likelihood method 137-8 further reading recommended 189 hierarchical 334, 336 marginalized 222, 223, 225 multi-level 93 and two-stage least-squares estimation 57-9 random intercept models 90-1, 170, 210, 211 in example 239 random intercept + random slope (random line) models 210, 211, 238 in example 238-9 random intercept + serial correlation + measurement error model 9Q-.1 random missingness mechanism 283 generalized estimating equations under 293-5 random missing values 283, 284 random variation separation from systematic variation 217,218 sources 82-3 two-level models 93 reading ability/age example 1,2, 16 recurrent event data 330 recurrent events, joint modelling with longitudinal measurements 329-32 regression analysis 337 regression models notation 15 see also linear ... j non-linear regression model relative growth rate (RGR) 92 repeated measures ANOYA 123-5 see also split-plot ANOYA approach repeated observations correlation among 28 number per person 28 respiratory disease/infection, in Indonesian children 4, 131-6, 156-60, 182-4 restricted maximum likelihood (REML) estimation 66-9 compared with maximum likelihood estimation 69, 95 in parametric modelling 96, 99, 100

INDEX

INDEX

379

378 . t d maximum likelihood (REML) rest nc e estimation (co nt.) in robust estimation of standard errors 70-1,73-4, 79 . . retrospective collection of longltudmal data 1-2 RIce-Silverman prescription 321,322 robust estimation of standard errors 70-80 examples 73-9 robust variance 194, 347 roughness penalty 44 sample size calculations 25-31 binary responses 30-1 continuous responses 28-30 for marginal models 165-7 parameters required 25-8 sample variogram(s) 49 e'lCamples 51, 90, lOS, 105, 101 SAS softwll.l'e ~80, 214 saturated models 50, 65 graphical representation 909 limitations 65 robust estimation of standard errors in 70-1,73 scatterplots 33, 40 and correlation structure 46, 41 examples 96,98-49,4 5 ,41 schizophrenia clinical trial 10-13 dropouts in 12,13,306-16 marginal model used 153-6 mean response profiles 14, 901, 311, 315 multivariate data 332, 334, 935 PANSS measure 11,153,330,332 subset of placebo data 805 treatment effects 334, 335 random effects model used 181-2 variograms 308, 314 schizophrenia study (Madras Longitudinal Study) 234-7 analysis of data 237-43 score equations 173--4, 340 score function 340 score test statistic(s) 241, 242, 342 second-order marginalized transition modeljMTM(2) 228 in example 2413 selection dropout models 295-8 in example 312-16 graphical representation 303, 304-5

semi-parametric modelli~g 324 . sensitivity analysis, and mformatlVe dropout models 316 serial correlation 82 plus measurement error 89-90 and random intercept 90-1 pure 84-9 as source of random variation 82 Simulated Maximum Likelihood (SML) method 214 single lagged covariate 259-60 Sitka spruce growth data 4-5, 6, 7 derived variables used 119-20 robust estimation of standard errors for 73-6, 77 split-plot ANOVA applied 124-5 size-dependent branching process 204-5 smallest meaningful difference 27 smoothing spline 44, 320 compared with other curve-fitting techniques 42 smoothing techniques 33-4,41-5, 319 further reading recommended 41,53 spline 44 see also smoothing spline split-plot ANOVA approach 56, 123-5 example of use 124-5 split-plot model 92, 123, 124 stabilized weights 277 in example 278 standard errors robust estimation of 70-80 examples 73-9 standardized residuals, in graphical representation 35, 96 STATA software 214 stochastic covariates 253-8 strong exogeneity 246-7 structural nested models, further reading recommended 281 survival analysis 2 systematic variation, separation from random variation 217,218

time-by-time ANOVA 115-16,125 example of use 116, 118 time-dependent confounders 265-80 time-dependent covariates 245-81 time series analysis 2 time-by-time ANOVA, limitations 115-16 tracking 35

trajectories see individual trajectories transition matrix 194, 195 transition models 18, 130-1, 190-207 for categorical data 194-204 examples 197-201 for count data 204-6 examples of use 18, 130-1, 133, 197-201 fitting of 138, 192-4 marginalized 225-31 for ordered categorical data 201-4 see also Markov models transition ordinal regression model 203 tree growth data see Sitka spruce growth data Tufte's micro/macro datarrepresentation strategy 37 two-level models of random variation 93 two-stage analysis 17 two-stage least-squares estimation 57-9 type I error rate 26-7

unbalanced data 282 uniform correlation model 55-6, 285

variance functions 345 variograms 34, 48-50

autocorrelation function estimated from 50 in examples 51,52,308, 31 4,326 for exponential correlation model 85 further reading recommended 53 for Gaussian correlation model 86 for parametric models 102, 105, 107 for random intercepts + serial correlation + measurement error model 91 for serial correlation models 84-7 for stochastic process 48, 82 see also sample variogram vitamin A deficiency causes and effects 4, 197 see also Indonesian Children's Health Study Wald statistic 233, 241 weighted average 320 weighted least-squares estimation 59-64 working variance matrix 70 choice not critical 76 in examples 76, 78 xerophthalmia 4, 197 see also Indonesian Children's Health Study

Analysis of Longitudinal Data, Second Edition

Read more

Longitudinal Data Analysis

Read more

Longitudinal Data Analysis

Read more

Longitudinal Data Analysis

Read more

A Primer in Longitudinal Data Analysis

Read more

Nonparametric Regression Methods For Longitudinal Data Analysis

Read more

Statistical Techniques for Data Analysis, Second Edition

Read more

Modelling Longitudinal Data

Read more

Bayesian discrimination with longitudinal data

Read more

Analysis of Messy Data Volume 1: Designed Experiments, Second Edition

Read more

Applied longitudinal analysis

Read more

Mathematical Analysis, Second Edition

Read more

Regression Analysis, Second Edition

Read more

Numerical Analysis (Second Edition)

Read more

Mathematical Analysis, Second Edition

Read more

Steroid Analysis, Second Edition

Read more

Mathematical Analysis, Second Edition

Read more

Real Analysis, Second Edition

Read more

Mathematical Analysis, Second Edition

Read more

Mathematical Analysis, Second Edition

Read more

Real Analysis, Second Edition

Read more

Mathematical Analysis, Second Edition

Read more

Regression Analysis, Second Edition

Read more

Handbook of Aqueous Solubility Data, Second Edition

Read more

Multivariate Data Analysis, 7th Edition

Read more

Multivariate Data Analysis (7th Edition)

Read more

PISA PISA Data Analysis Manual: SPSS, Second Edition

Read more

Bayes and Empirical Bayes Methods for Data Analysis, Second Edition

Read more

Exploratory Data Analysis with MATLAB, Second Edition (Chapman & Hall CRC Computer Science & Data Analysis)

Read more

Analysis of longitudinal marginal structural models

Read more

Recommend Documents

Analysis of Longitudinal Data, Second Edition

OXFORD STATISTICAL SCIENCE SERIES Plots, transformations, and regression a user's Atkinson: dinate in multivariable sta...

Longitudinal Data Analysis

June 6, 2008 16:16 C6587 C6587˙C000 Longitudinal Data Analysis June 6, 2008 16:16 C6587 C6587˙C000 Chapman & ...

Longitudinal Data Analysis

LONGITUDINAL DATA ANALYSIS WILEY SERIES IN PROBABILITY AND STATISTICS Established by WALTER A. SHEWHART and SAMUEL S. ...

Longitudinal Data Analysis

A Primer in Longitudinal Data Analysis

A PRIMER IN LONGITUDINAL DATA ANALYSIS A PRIMER IN LONGITUDINAL DATA ANALYSIS TOON TARIS SAGE Publications London ·...

Nonparametric Regression Methods For Longitudinal Data Analysis

Nonparametric Regression Methods for Longitudinal Data Analysis HULIN WU University of Rochester Dept. of Biostatistics...

Statistical Techniques for Data Analysis, Second Edition

Statistical Techniques for Data Analysis Second Edition © 2004 by CRC Press LLC Statistical Techniques for Data An...

Modelling Longitudinal Data

Springer Texts in Statistics Advisors: George Casella Stephen Fienberg Ingram Olkin Robert E. Weiss Modeling Longitud...

Bayesian discrimination with longitudinal data

Biostatistics (2001), 2, 4, pp. 417–432 Printed in Great Britain Bayesian discrimination with longitudinal data P. J. B...

Analysis of Messy Data Volume 1: Designed Experiments, Second Edition

ANALYSIS OF MESSY DATA VOLUME 1 DESIGNED EXPERIMENTS Chapman & Hall/CRC Taylor & Francis Group 6000 Broken Sound Par...