Modeling Intraindividual Variability with Repeated Measures Data:
Methods and Applications
MULTIVARIATE APPLICATIONSBOOK SERIES
The multivariate Applications book series was developed to encourage the use of rigorous methodology in the study of meaningful scientific issues, and to describe the applications in easy to understand language. The series is sponsored by the Society of Multivariate Experimental Psychology and welcomes methodological applications from a variety of disciplines, such as psychology, public health, sociology, education, and business. The main goal is to provide descriptions of applications of complex statistical methods to the understanding of significant social or behavior issues. The descriptions are to be accessible to an intelligent, non-technical oriented readership (e.g., non-methodological researchers, teachers, students, government personnel, practitioners, and other professionals). Books can be single authored, multiple authored, or edited volumes. The ideal book for this series would take on one of several approaches: (1) demonstrate the application of several multivariate methods to a single, major area of research; (2) describe a multivariate procedure or framework that could be applied to a number of research areas; or (3) present a variety of perspectives on a controversial topic of interest to applied multivariate researchers. There are currently 7 books in the series: What $There Were No Significant Tests?, co-edited by L. Harlow, S. Mulaik, and J. Steiger (1977). Structural Equation Modeling With LISREL, PRELIS, and SIMPLIS: Basic Concepts, Applications, and Programming, by B. Byrne (1998). Multivariate Applications in Substance Use Research, co-edited by J. Rose, L. Chassin, C. Presson, and S. Sherman (2000). Item Response Theory for Psychologists, co-authored by S . Embretson and S. Reise (2000). Structural Equation Modeling with AMOS, by B. Byrne (2001). Conducting Meta-Analysis Using SAS, co-authored by W. Arthur, Jr., W. Bennett, Jr., and A. I. Juffcutt (2001). Modeling Intraindividual Variability with Repeated Measures Data: Methods and. Applications, co-edited by D. S. Moskowitz and S. L. Hershberger (2002). Anyone wishing to propose a book should address the following: (1) title; (2) author(s); (3) timeline including planned completion date; (4) Brief overview of focus for the book including a table of contents and a sample chapter (or more); ( 5 ) mention any competing publications in this area; (6) mention possible audiences for the proposed book. More information can be obtained from the editor, Lisa Harlow, at: Department of Psychology, University of Rhode Island, 10 Chafee Rod., Suite 8 , Kingston, RI 02881-0808; Phone: 401-874-4242; Fax: 401-874-5562; or e-mail:
[email protected]. Information can also be obtained from one of the advisory board members: Leona Aiken (Arizona State University), Gwyneth Boodoo (Educational Testing Service), Susan Embretson (University of Kansas), Michael Neale (Virginia Commonwealth University), Bill Revelle (Northwestern University), and Steve West (Arizona State University).
Modeling Intraindividual Variability with Repeated Measures Data:
Methods and Applications
Edited by
D. S . Moskowitz M cGill University
Scott L. Hershberger California State University, Long Beach
2002
LAWRENCE ERLBAUM ASSOCIATES, PUBLISHERS Mahwah, New Jersey London
The final camera copy for this work was prepared by the editors and therefore the publisher takes no responsibility for consistency or correctness of typographical style. However, this arrangement helps to make publication of this kind of scholarship possible.
Copyright @ 2002 by Lawrence Erlbaum Associates, Inc. All rights reserved. No part of the book may be reproduced in any form, by photostat, microform, retrieval system, or any other means, without the prior written permission of the publisher. Lawrence Erlbaum Associates, Inc., Publishers 10 Industrial Avenue Mahwah, New Jersey 07430
I
Cover design by Kathryn Houghtaling Lacey
I
ISBN 0-8058-3125-8
Books published by Lawrence Erlbaum Associates are printed on acid-free paper, and their bindings are chosen for strength and durability. Printed in the United States of America 10 9 8 7 6 5 4 3 2 1
List of Contributors David A. Kenny, Department of Psychology, University of Connecticut, Babbidge Road Unit 1020, Storrs, C T 06269-1020. Nial Bolger, Department of Psychology, New York University, 6 Washington Place, room 752, New York, N.Y. 10003. Deborah A. Kashy, Department of Psychology, Texas A&M University, College Station, T X 77843-4235. Stephan E. Raudenbush, School of Education, Michigan State University, 610 E. University, Ann Arbor, MI 48109-1159. Patrick J. Curran, Department of Psychology, University of North Carolina, Chapel Hill, NC 27599-3270. Andrea M. Hussong, Department of Psychology, University of North Carolina, Chapel Hill, NC 27599-3270. J.O. Ramsay, Department of Psychology, McGill University, 1205 Dr. Penfield Avenue, Montreal, Quebec, Canada, H3A 1B1. Dennis Wallace, Department of Preventive Medicine, University of Kansas Medical Centre, 4004 Robinson Hall, 3901 Rainbow Blvd., Kansas City, KS 66160. Samuel B . Green, Department of Psychology in Education, Arizona State University, 308G Payne Hall, Tempe, AZ 85287-0611. Judith D. Singer, Graduate School of Education, Harvard University, Roy E. Larsen Hall, Appian Way, Cambridge, MA 02138. Terry E. Duncan, Oregon Research Institute, 1715 Franklin, Blvd., Eugene, OR 97403-1983. Susan C. Duncan, Oregon Research Institute, 1715 Franklin, Blvd., Eugene, OR 97403-1983. Fuzhong Li, Oregon Research Institute, 1715 Franklin, Blvd., Eugene, OR 97403-1983. Lisa A. Strycker, Oregon Research Institute, 1715 Franklin, Blvd., Eugene, OR 97403-1983. Steven Hillmer, School of Business, University of Kansas, 203 Summerfield Hall, Lawrence, KS 66044-2003. John R. Nesselroade, Department of Psychology, The University of Virginia, 102 Gilmer Hall, P.O. Box 400400, Charlottesville, VA 22904-4400. John J. McArdle, Department of Psychology, The University of Virignia, 102 Gilmer Hall, P.O. Box 400400, Charlottesville, VA 22904-4400. Steven H. Aggen, Department of Psychiatry, Virginia Commonwealth University, P.O. Box 980710, Richmond, VA 23286-0440. Jonathan M. Meyers, Department of Psychology, The University of Virginia, 102 Gilmer Hall, P.O. Box 400400, Charlottesville, VA 22904-4400.
Contents ix
Preface Traditional Methods for Estimating Multilevel Models David A . Kenny, Nial Bolger, and Deborah A . Kashy
1
Alternative Covariance Structures for Polynomial Models of Individual Growth and Change Stephen W. Raudenbush
25
Structural Equation Modeling of Repeated Measures Data: Latent Curve Analysis Patrick J. Curran and Andrea M . Hussong
59
Multilevel Modeling of Longitudinal and Functional Data J . 0. Ramsay
87
Analysis of Repeated Measures Designs with Linear Mixed Models Dennis Wallace and Samuel B. Green
103
Fitting Individual Growth Models Using SAS PROC MIXED Judith D. Singer
135
Multilevel Modeling of Longitudinal and Functional Data Terry E. Duncan, Susan C. Duncan, Fuzhong Li, and Lisa A . Strycker Times Series Regressions Steven Hillmer
171
203
vii
viii 9 Dynamic Factor Analysis Models for Representing
Process in Multivariate Time-Series John R. Nesselroade, John J . McArdle, Steven H. Aggen, and Jonathan M. Meyers
233
Author Index
267
Subject Index
273
Preface This volume began as a nightmare. Once upon a time, life for social and behavioral scientists was (relatively) simple. When a research design called for repeated measures data, the data were analyzed with repeated measures analysis of variance. The BMDP 2V module was frequently the package of choice for the calculations. Life today is more complicated. There are many more choices. Does the researcher need to model behavior at the level of the individual as well as at the level of the group? Should the researcher use the familiar and well-understood least-squares criterion? Should the researcher turn to the maximum likelihood criterion for assessing the overall fit of a model? Is it possible and is it desirable to represent the repeated measures data within structural equation modeling? So the nightmare began as (shall we be dishonest and say) one night of deliberations among these choices. The thought then arose that it would be useful to have the statistical experts writing in the same volume about the possibilities and some of the dimensions that are pertinent to making these choices. Hence the origin of the present volume. The issue of the analysis of repeated measures data has commonly been examined within the context of the study of change, particularly with respect to longitudinal data (cf., Collins & Horn, 1991; Gottman, 1995). This volume contains three chapters whose primary focus is on the study of growth over several years time (Raudenbush, chapter 2; Curran & Hussong, chapter 3; Duncan, Duncan, Li, & Strycker, chapter 7). Studies of change typically imply the expectation that variation, movement in scores, is genup or generally down. Not all repeated erally unidirectional-generally measures data are concerned with change, and change is only one aspect of the variability that occurs within individuals. To illustrate, consider an example from the study of social behavior. Personality, social, and organizational psychologists are often interested in the effects of situations on behavior: to what extent are individuals’ behaviors consistent across sets of situations and to what extent does the behavior of individuals change as a function of the situation. For example, the focus might be on how people’s dominant and submissive behaviors change as a function of being in a subordinate, co-equal, or supervisory work role. There might also be interest in whether people’s responses t o these ix
X
preface
situations vary as a function of their level on personality characteristics. Some people, let’s say extraverts, may change more in their behavior than other individuals in responding to these different situations. This could be studied in the laboratory in which individuals participate in situations in which they are placed in a subordinate role, a co-equal role, and a supervisory role and their responses are recorded. This would be an example of a balanced design. All participants would participate in three situations. These data can be analyzed in the familiar technique of repeated measures analysis of variance. We might introduce the personality variable of extraversion to examine the interaction between individual differences and situation. However, there is considerable error variance in a measure based on a one-occasion assessment (Epstein, 1979; Moskowitz & Schwarz, 1982). Measurements of the individual in each situation on several occasions would improve the quality of measurement. This is possible but difficult in the laboratory, so sometimes researchers make use of naturalistic techniques for collecting this kind of data (see Kenny, Bolger, & Kashy, chapter 1; also see Moskowitz, Suh, & Desaulniers, 1994). Despite whether the researcher remains in the laboratory or whether the researcher uses a naturalistic methodology, the researcher is confronted with decisions about how to handle the data. The multiple measures for each situation could be aggregated (averaged) to provide a single measurement in each situation for each individual. If this is done within the context of the laboratory, this provides a balanced design with better measures. Unfortunately, this strategy throws away information. Some people would have less variability in their measures than other people. It may be of interest to know who has more variability in their responses to such situations as being the boss or being the supervisee. As an alternative to the laboratory context, the researcher might use a naturalistic data collection method such as event contingent recording (Moskowitz, 1986; Wheeler & Reis, 1991). In event contingent recording, participants are given standardized forms and asked to record their behavior after being in certain kinds of events, such as all interpersonal events at work. The form could request information about characteristics of the situation as well as the person’s behavior so interpersonal events can be categorized into situations with a boss, situations with a co-equal, and situations with a supervisee. This method has appeal because it provides records of behavior in real life rather than responses t o possibly artificial situations in the laboratory. However, the structure of such data presents data analytic decisions. Individuals will report differing numbers of events. Individuals will report differing numbers of events in different kinds of situations. Individuals may report events corresponding to some of the targeted situations (e.g., in the subordinate and co-equal situations) but not to other targeted situations such as having the supervisory responsibilities of the boss situation. The data structure could be simplified by aggregating across events
preface
xi
referring to the same kind of situation to obtain one measure per situation and only including people in the sample who reported events in all three kinds of situations. The simplification of the data structure would provide a balanced design, and consequently the familiar data analytic techniques of repeated measures analysis of variance and repeated measures analysis of covariance could be used. However, such simplification would also eliminate information. The simplification would (1) not take into account variability in people’s responses across events of the same type of situation; (2) throw away that portion of a sample that has “missing data”; that is, individuals whose data do not include the representation of all kinds of specified events, and (3) disregard the time ordering of events. Once one becomes involved with recording multiple assessments of individuals behavior and affect responses, the variability of people’s responses across events becomes salient and compels modeling. For example, diurnal and weekly rhythms have been demonstrated for affect and behavior (Brown & Moskowitz, 1998; Larsen, 1987; Watson, Wiese, Vaidya, & Tellegen, 1999). Behavior and affect co-occur over time in ways that cannot be identified from static assessments of these variables (Moskowitz & Cote, 1995). Similarity and dissimilarity among measures or items from occasion to occasion may be of interest (see Nesselroade, McArdle, Aggen, & Meyers, chapter 9). The shape of variation can be of considerable importance, such as the shape of change in response to stress or psychotherapy or recovery from illness (e.g., Bolger & Zuckerman, 1995; also see Ramsay, chapter 4). The time ordering of events can be used to make inferences about antecedent-consequent relations (see Hillmer, chapter 8). So the focus of this volume is the examination of how individuals behave across time and to what degree this behavior changes, fluctuates, is stable, or is not stable. We call this change in individual behavior “intraindividual variability.” Intraindividual variability can be contrasted with “interindividual variability.” The latter describes individual differences among different people; the former describes differences within a single person. Although most behavioral and social scientists believe that behavior does differ from one occasion to the next, sophisticated techniques for exploring intraindividual variability have been underutilized. Several factors have contributed to the reluctance of analysts to utilize these techniques. One factor is their newness, many of them having only been developed within the last few years. A second factor is the perceived difficulty of implementing these techniques; descriptions tend to be highly technical and inaccessible to nonmathematically trained researchers. A third factor is the unavailability of computer programs to do the analyses, a situation that has recently been much improved with the release of new computer programs. The primary goal of this volume is to make accessible to a wide audience of researchers and scholars the latest techniques used t o assess intraindividual variability. The chapters of this volume represent a group of distinguished experts who have written on a range of available techniques. The
xii
preface
emphasis is generally at an introductory level; the experts have minimized mathematical detail and provided concrete empirical examples applying the techniques. The volume opens with a chapter by David Kenny, Niall Bolger, and Deborah Kashy, who contrast several procedures for the analysis of repeated measures data. They note two problems with using traditional analysis of variance (ANOVA) procedures for analyzing many contemporary designs using repeated measures data. The first is that research participants often will not have the same number of data points. The second is that the predictor variable generally does not have the same distribution across measurement points for all research participants. They approach the analysis of intraindividual variability within the context of multilevel analyses in which research participant are independent units and the repeated observations for each individual are not assumed to be independent. They illustrate that a strength of alternative procedures to ANOVA is that they more readily permit the evaluation of random effects that reflect the extent of variability among individuals to fixed effects. They compare features of three alternative procedures for modeling the group of research participants and the variability within the group of research participants: a two-step ordinary least-squares regression procedure, a weighted least-squares variation of multiple regression, and a procedure based on a maximum likelihood criterion. Stephen Raudenbush compares advantages of the hierarchical linear model (a multilevel model), structural equation modeling, and the generalized multivariate linear model in the analysis of repeated measures data. He argues for the flexibility of the hierarchical linear model (HLM). HLM permits the inclusion of all available data, allows unequal spacing of time points across participants, can incorporate a variety of ways of characterizing change in the data such as rate of change and rate of acceleration, and can provide for the clustering of individuals within groups such as schools or organizations. He then combines ideas from the standard hierarchical linear model and the multivariate model to produce a hierarchical multivariate model that allows for different distributions within persons of randomly missing data and time-varying covariates, permits the testing of a variety of covariance structures, and examines the sensitivity of inferences about change to alternative specification of the covariance structure. The procedure discussed permits the examination of whether alternative models are consistent with key inferences about the shape of change. Patrick Curran and Andrea Hussong describe how repeated measures data can be represented in structural equation models. They discuss the advantages and disadvantages of two kinds of structural equation models for representing longitudinal change: the autoregressive crosslagged panel model and the latent curve analysis model. They emphasize the latent curve approach, an approach that first estimates growth factors underlying observed measures and then uses the growth factors in subsequent analyses. Latent curve analysis provides two key advances over autoregressive
preface
...
Xlll
crosslagged panel models. The first is the capability to model data sets with more than two time points. The second is the capability to provide estimates of the extent of variability among individuals, both the extent of variability in starting points and in rates of change. An applied example concerning the development of antisocial behavior and reading proficiency is used to illustrate the latent curve analysis model. The example illustrates that predictors of behavior at single time points (e.g., initial status) are different from the predictors of the shape of change over time. They also use the example to illustrate several options for incorporating nonlinear forms of growth in structural equation models. James Ramsay provides a commentary on issues connecting the chapters by Raudenbush, Curran and Hussong, and Kenny, Bolger, and Kashy. He makes several points relative t o the study of longitudinal data, considering the implications of missing data, the number of points necessary to define characteristics of growth curves such as level, slopes, and bumps, and the possibility that the curves for individuals are not registered such that the curves for individuals may show a similar shape but reflect different timings of events. His chapter further extends the discussion of repeated measurements to the case where there are many measurements and makes the point that such data can be represented by a sample of curves using a set of techniques referred to as functional data analysis. His chapter ends on a note of caution, reminding the reader that moving to the more complex models that are sometimes presented in this book has costs that need to be considered. For example, the maximum-likelihood procedures are sensitive to the mis-specification of the variance-covariance structure. Moreover, adding random coefficient parameters uses up degrees of freedom leading t o a loss of power and potentially unstable estimates of fixed effects. Thus, the cautious researcher who has a moderate sample size may prefer to keep the model simple such as by remaining with a least-squares-based regression procedure (cf. Kenny, Bolger, & Kashy, chapter 1). There is considerable complexity in the analysis of the models that make use of random as well as fixed effects (see chapters 1 and 2). The chapters by Judith Singer and by Dennis Wallace and Samuel Green present detailed description of how to analyze and interpret such models using a commonly available package, the PROC MIXED procedure from SAS. Dennis Wallace and Samuel Green’s chapter provides extensive information about how to estimate fixed and random effects. They provide detailed explanations of the meaning of the underlying statistics, such as maximum-likelihood and restricted maximum-likelihood methods, and an introduction to some of the structures that may be found in the variancecovariance matrices. They provide an outline of recommended steps for estimating models incorporating fixed and random effects. These steps are illustrated using an example from a longitudinal study of the effect of two treatment interventions for reducing substance abuse among homeless individuals; the illustration includes an examination of whether the effectiveness of the treatment programs vary as a function of changing levels of
xiv
preface
depression. Judith Singer’s discussion provides practical advice for all stages of the analysis including data preparation and writing computer code. She illustrates a process that is sometimes mysterious for the novice researcher in this area. Models for the representation of individuals’ variability across time are sometimes presented as single equations at multiple levels (Bryk & Raudenbush, 1992) and sometimes by single equations that specify multiple sources of variation (cf. Goldstein, 1995). She demonstrates how separate equations can be written at multiple levels and then elements can be substituted in to arrive at a representation in a single equation. The presentation is situated in the context of individual growth models; the presentation can also be extended and applied to cases with repeated measures data that are unidirectional as described in the extended example presented earlier and in the chapter by Kenny, Kashy, and Bolger. Stephen Raudenbush comments that the use of structural equation modeling has not typically been extended to the case where individuals are clustered. Terry Duncan, Susan Duncan, Fuzhong Li, and Lisa Strycker take the step of providing such an extension. They provide an introduction to representing multilevel models in structural equation models using an example from an analysis of change in adolescents’ use of alcohol. They compare the strengths and weaknesses of three approaches for modeling longitudinal data that are clustered and unbalanced. One method, a full information hierarchical linear model (HLM), is familiar from the chapter by Raudenbush. A second method, a limited information multilevel latent growth model (MLGM), is an extension of latent growth modeling that was presented in the chapter by Curran and Hussong. The third approach is based on a full information maximum likelihood (FIML) latent growth modeling using an extension of a factor of curves model which has not previously been discussed in the book. They provide examples of programming in both HLM (Bryk, Raudenbush, & Congdon, 1996) and Mplus (Muthbn & M u t h h , 1998). Steven Hillmer provides a basic introduction to using time series models to predict intraindividual change. In a time series model, data points for the same variable are arranged sequentially in time, and a basic goal is to identify a model that best represents the sequencing of these data. Hillmer reviews the differences between the main kinds of models that might be used. He contrasts two classes of models: stationary models in which the joint probability of any set of observations is unaffected by a shift backward or forward in the time series and nonstationary models in which parts of the series behave similarly although not identically to other parts of the series. He reviews the steps of building a time series model, providing extensive graphical material for understanding the issues that might arise. The chapter includes an example of an interrupted time series data in which an intervention occurs during the course of a time series and the effect of the intervention is estimated. The extended example provided is drawn from the business literature on sales. Time series analyses can also be applied
preface
xv
to the modeling of variability within a person when sufficient data points have been collected. John Nesselroade, John McArdle, Steven Aggen, and Jonathan Meyers provide an introduction t o dynamic factor analysis. Dynamic factor analysis permits the examination of the similarity and dissimilarity of data from occasion t o occasion. They introduce the topic by describing the Ptechnique factor analysis, which uses the common factor model to model the covariation of multiple variables measured across time for a single individual. They note problems with this model in the representation of process changes over time, such as the representation of effects that dissipate or strengthen over time. They present two models that allow for time-related dependencies and illustrate the application of these two dynamic factor analysis methods using reports of daily moods. The necessary LISREL code for conducting these analyses is included. The initial organization for this volume was done within the context of two symposia presented at the 1997 meeting of the American Psychological Association. We thank Lisa Harlow, the editor of the Erlbaum Multivariate Applications Series for suggesting that we prepare a volume based on these symposia. We also thank James Ramsay, Yoshio Takane, and David Zuroff for comments on drafts of these chapters. We are also grateful to Chantale Bousquet and Serge Arsenauit for their preparation of the text in I4'QjX. Preparation of this volume was partially supported by funds from the Social Sciences and Humanities Research Council of Canada. We hope that the volume provides readers with a sense of the range of reasonable options for analyzing repeated measures data and stimulates new questions and more interest in repeated measures designs that extend beyond the context of longitudinal data. Pleasant dreams.. . Debbie S. Moskowitz Scott L. Hershberger
REFERENCES Bolger, N., & Zuckerman, A. (1995). A framework for studying personality in the stress process. Journal of Personality and Social Psychology, 69, 890-902. Brown, K. W., & Moskowitz, D. S. (1998). Dynamic stability of behavior: The rhythms of our interpersonal lives. Journal of Personality, 66, 105-134. Bryk, A. S., & Raudenbush, S. W. (1992). Hierarchical linear models: Applications and data analysis methods. Newbury Park, CA: Sage.
xvi
preface
Bryk, A. S., Raudenbush, S. W., & Congdon, R. T. (1996). H L M : Hierarchical linear and nonlinear modeling with the H L M / 2 L and H L M / 3 L programs. Chicago, IL: Scientific Software International, Inc. Collins, L., & Horn, J. (Eds.). (1991). B e s t methods f o r analysis of change: Recent advances, unanswered questions, future directions. Washington, DC: American Psychological Association. Epstein, S. (1979). The stability of behavior: I. on predicting most of the people much of the time. Journal of Personality and Social Psychology, 37, 1097-1126. Goldstein, H. (1995). Multilevel statistical models (2nd ed.). New York: Halstead Press. Gottman, J. M. (1995). T h e analysis of change. Hillsdale, NJ: Erlbaum. Larsen, R. J. (1987). The stability of mood variability: A spectral analysis approach t o daily mood assessments. Journal of Personality and Social Psychology, 52, 1195-1204. Moskowitz, D. S. (1986). Comparison of self-reports, reports by knowledgeable informants and behavioral observation data. Journal of Personality, 54, 101-124. Moskowitz, D. S., & Cote, S. (1995). Do interpersonal traits predict affect: A comparison of three models. Journal of Personality and Social Psychology, 69, 915-924. Moskowitz, D. S., & Schwarz, J. C. (1982). The comparative validity of behavioral count scores and knowledgeable informants’ rating scores. Journal of Personality and Social Psychology, 42, 518-528. Moskowitz, D. S., Suh, E. J., & Desaulniers, J. (1994). Situational influences on gender differences in agency and communion. Journal of Personality and Social Psychology, 66, 753-761. Muthdn, L. K., & Muthdn, B. 0. (1998). Mplus user’s guide. Los Angeles: Muthden and Muthen. Watson, D., Wiese, D., Vaidya, J., & Tellegen, A. (1999). The two general activation systems of affect: Structural findings, evolutionary considerations, and psychobiological evidence. Journal of Personality and Social Psychology, 76, 820-838. Wheeler, L., & Reis, H. T . (1991). Self-recording of everyday life events: Origins, types, and uses. Journal of Personality, 59, 339-354.
Chapter 1
Traditional Met hods for Est irnat ing Multilevel Models David A. Kenny University of Connecticut
Nial Bolger New Yorlc University
Deborah A. Kashy Texas A &M University Researchers often collect multiple observations from many individuals. For example, in research examining the relationship between stress and mood, a research participant may complete measures of both these variables every day for several weeks, and so daily measures are grouped within participants. In relationship research, a respondent may report on characteristics of his or her interactions with a number of different friends. In developmental research, individuals may be measured at many different times as they develop. In cognition research, reaction times may be observed for multiple stimuli. These types of data structures have been analyzed using standard (ANOVA) methods for repeated measures designs. The most important limitation of the analysis of variance (ANOVA) approach is that it requires balanced data. So, in the previous examples, each person would be required to have the same number of repeated observations. For example, in the stress and mood study, everyone might have t o participate for exactly 14 days, and in the relationships study each respondent might report
I
2
Kenny, Bolger, and Kashy
on interactions with exactly four friends. It is often the case, however, that data structures generated by repeated observations are not balanced, either because of missing observations from some participants or, more fundamentally, because of the nature of the research design. If, for instance, researchers were interested in learning about naturally occurring interactions with friends, they might have individuals describe their interactions with each person whom they consider to be a friend. For individuals who have few friends, there would be very few observations, whereas for other individuals there would be many. An additional factor can make the design unbalanced even if the number of observations per person is equal. For the design to be balanced, the distribution of each predictor variable must be the same for each person. So, if the predictor variable were categorical, there would need to be the same number of observations within each category for each person. If the predictor variable were continuous, then its distribution must be exactly the same for each person. The likelihood of the distribution being the same for each person is possible, but improbable. For example, in a study of stress and mood, it is unlikely that the distribution of perceived stress over the 14 days would be the same for each person in the study. In this chapter we introduce the technique of multilevel modeling as a means of overcoming these limitations of repeated measures ANOVA. The multilevel approach, also commonly referred to as hierarchical linear modeling, provides a very general strategy for analyzing these data structures and can easily handle unbalanced designs and designs with continuous predictor variables. In introducing multilevel modeling, we focus our attention on traditional estimation procedures (ordinary least squares and weighted least squares) that, with balanced data, produce results identical to those derived from ANOVA techniques. We also introduce nontraditional estimation methods that are used more extensively in subsequent chapters. We begin by introducing a research question on how gender of interaction partner affects interaction intimacy. We follow this by presenting an artificial, balanced data set on this topic and provide a brief overview of the standard ANOVA approach to analyzing such a data set. We then introduce a real data set in which the data are not balanced, and we consider an alternative to the ANOVA model, the multilevel model. Finally, we compare the least-squares estimation approaches described in this chapter to the maximum likelihood estimation approaches discussed in other sections of this book.
STANDARD ANOVA ANALYSIS FOR BALANCED DATAS Consider a hypothetical Rochester Interaction Record (RIR; Reis & Wheeler, 1991) study of the effects of gender on levels of intimacy in social interaction. The RIR is a social interaction diary that requires persons to complete
Estimating Multilevel Models
3
a set of measures, including the interaction partner’s gender and interaction intimacy, for every interaction that he or she has over a fixed interval. In our study, each of 80 subjects (40 of each gender) interacts with six partners, three men and three women. The study permits the investigation of the degree t o which the gender of an interaction partner predicts the level of perceived intimacy in interactions with that partner. One can also test whether this relationship varies for men versus women, that is, women may have more intimate interactions with male partners, whereas men have more intimate interactions with female partners. Using conventional ANOVA t o analyze the data from this study would result in a source table similar to that presented in Table 1.1. In the table, partner gender is symbolized as X , subject gender is denoted as 2 , and S represents subjects. Listed in the table are the sources of variance, their degrees of freedom, and the error terms for the F tests (the denominator of the F ratio) that evaluate whether each effect differs significantly from zero. The multilevel modeling terms that correspond t o each effect are presented in the last column of the table. These terms are introduced later in the chapter. It is helpful t o have an understanding of the different sources of variance. The between-subject variation in Table 1.1 refers t o the variation in the 80 means derived by averaging each subject’s intimacy ratings over the six partners. This between-subject variation can be partitioned into three sources, the grand mean, subject gender (Z), and subject within gender ( S / Z ) . The mean term represents how different the grand mean is from zero, and the subject gender variation measures whether men or women report more intimacy across their interactions. The third source of variation results from differences between subjects within gender. Within the group of males and females, do some people report more or less intimacy in their interactions? The within-subject variation refers t o differences among partners for each subject: Do people differ in how intimate they see their interactions with their six partners? The partner gender effect ( X ) refers t o whether interactions with male versus female partners are more intimate. The partner gender by subject gender interaction ( X by 2)refers t o whether same or opposite gender interactions are seen as more intimate. The partner gender by subject interaction ( X by S / Z ) is the variation in the effect of gender of partner for each subject (i.e., to what degree does the mean of female partners minus the mean of male partners vary from subject t o subject). Finally, there is variation due t o partner ( P / X S / Z ) ,and the issue is how much the intimacy ratings of interactions with partners differ from one another controlling for partner gender. Each person reports about three male and three female partners, and this source of variance measures how much variation there is in intimacy across interactions with partners who are of the same gender. Because in this example participants interact with a given ‘We use subject t o refer t o the research participants so t h a t subjects ( S ) can easily be distinguished from partners ( P ) in our notation.
Kenny, Bolger, and Kashy
4
Table 1.1 ANOVA Source Table for the Hypothetical Balanced Case
Source
df
Between Subjects
80
Mean
1
Subject Gender (Z)
1
Subject (S/Z)
78
Within Subjects
400
Error Term
s/z s/z P/XS/Z
Parameter
a0
a1
,d2
Partner Gender (X)
1
X by S/Z
CO
X by Z
1
X by S/Z
C1
78
P/XS/Z
of2
320
Not tested
X by S/Z Error (P/XS/Z)
oe
partner only once, this source of variability cannot be distinguished from other, residual sources, such as measurement error in Y.We therefore call all of the remaining variance in Y error. Within this model, there are three random effects: Subject ( S / Z ) ,Subject x Partner Gender ( X by S / Z ) , and Error ( P / X S / Z ) . It is possible to use the ANOVA mean squares to derive estimates for the Subject, Subject x Partner Gender, and Error variances. The subject variance, symbolized as o d 2 for reasons that will become clear in the multilevel modeling section of this chapter, measures variation in average intimacy scores after controlling for both subject and partner gender. The Subject x Partner Gender variance] symbolized as o f 2 ,measures the degree to which the effects of Partner Gender differ from subject to subject after controlling for the subject’s gender. Denoting a as the number of levels of X ( a = 2 in this example) and b as the number of partners within one level of X ( b = 3 in this example)] then the standard ANOVA estimates of these variances are given by Subject: od2 = (MSsIZ - M S p / x s / z ) / a b
(1.1)
Subject x Gender of Partner: o f 2= (MSXbyS/Z - M S p / x s / z ) / b (1.2)
As noted, an exact estimate of the partner variance cannot be obtained because it is confounded with error variance] and so we represent the combination of partner variance and error variance as o e 2 . Finally, although not usually estimated] we could compute the covariance between Subject and Subject x Partner Gender by computing the covariance between the
Estimating Multilevel Models
5
mean intimacy of the subject and the difference between his or her intimacy with male and female partners. Such a covariance would represent the tendency of those who report greater levels of intimacy to have more intimate interactions with female (or male) partners. Although this covariance is hardly ever estimated within ANOVA, the method still allows for such a covariance. The table also presents the usual mixed model error terms for each of the sources of variance. For the fixed between-subjects sources of variance, M S s / z is the error term. To test whether there are individual differences in intimacy, M S s / Z is divided by M S p / s x / z . The error term for the fixed within-subject effects is M S x x s / z . Finally, the error term for MSx.s/z is M S p l s x l z , which itself cannot be tested.
MULTILEVEL MODELS Multilevel Data Structure The ANOVA decomposition of variance just described only applies to the case of balanced data. For unbalanced data, a multilevel modeling approach becomes necessary. A key to understanding multilevel models is to see that these data have a hierarchical, nested structure. Although researchers typically do not think of repeated measures data as being nested, it is the case that the repeated observations are nested within persons. In hierarchically nested data with two levels, there is an upper-level unit and a lower-level unit. Independence is assumed across upper-level units but not lower-level units. For example, in the repeated measures context, person is typically the upper-level unit, and there is independence from person t o person. Observation is the lower-level unit in repeated measures data, and the multiple observations derived from each person are not assumed t o be independent. Predictor variables can be measured for either or both levels, but the outcome measure must be obtained for each lower-level unit. The following example should help to clarify the data structure.
Example Data Set As an example of the basic data structure, we consider a study conducted by Kashy (1991) using the RIR. In the Kashy study, persons completed the RIR for 2 weeks. Like the previous balanced-data example, this study investigated the degree to which partner gender predicts the level of perceived intimacy in interactions with that partner and whether this relationship differs between men and women. Because persons often interacted more than once with the same partner, we computed the mean intimacy across all interactions with each partner that is, for the purposes of this example, we created a two-level data set in which subject is the upper-level unit and partner is the lower-level unit. There are 77 subjects (51 women and 26 men) and 1,437 partners in the
6
Kenny, Bolger, and Kashy
study. The number of partners with whom each person interacted over the data collection period ranged from 5 to 51. The average intimacy across all interactions with a particular partner is the outcome variable, and it is measured for every partner with whom the person interacted. Partner gender, symbolized as X , is the lower-level predictor variable. Note that X can be either categorical as in the case of partner gender ( X = -1 for male partners and X = 1 for female partners) or it can be continuous (e.g., the degree to which the person finds the partner to be attractive). Subject gender is the upper-level predictor variable and is denoted as 2. In repeated measures research, upper-level predictor variables may be experimentally manipulated conditions to which each subject is randomly assigned or person-level variables such as gender, a person’s extroversion, and so on. If Z were a variable such as person’s extroversion, it would he a continuous predictor variable, but because 2 is categorical in the example, it is a coded variable ( 2 = -1 for males and 2 = 1 for females). Finally, the outcome variable, average intimacy of interactions with the partner, is measured on a seven-point scale and is symbolized as Y . Because a second example in which the X variable is continuous is helpful, we make use of the fact that Kashy (1991) also asked subjects to evaluate how physically attractive they perceived each of their interaction partners to be. Ratings of the partner’s attractiveness were centered by subtracting the grand mean across subjects from each score. (We feel that it is generally inadvisable to center X for each subject, so-called group centering.) The second example addresses whether interactions with partners who are seen as more physically attractive tend to be more intimate. We can also use subject gender as an upper-level predictor variable, which allows us to test whether the relationship between attractiveness and intimacy differs for male and female subjects. So, in the example data set, subject is the upper-level unit, and subject gender is the upper-level predictor variable or 2 . Partner is the lower-level unit and partner gender or partner’s physical attractiveness is the lowerlevel predictor or X . Intimacy is the outcome variable or Y , and there is an average intimacy score for each partner. The intimacy variable can range from 1 to 7, with higher scores indicating greater intimacy.
MOST BASIC APPROACH TO MULTILEVEL MODELING: ORDINARY LEAST SQUARES Although it is certainly possible for multilevel modeling to be a challenging and complex data analytic approach, in its essence it is simple and straightforward. A separate analysis, relating the lower-level predictor, X , t o the outcome measure, Y , is conducted for each upper-level unit, and then the results are averaged or aggregated across the upper-level units. In this section we introduce the ordinary least squares (OLS) approach t o multilevel modeling without reference to formulas. Specific formulas describing
Estimating Multilevel Models
7
multilevel analyses follow. Using the partner’s physical attractiveness example, this would involve computing the relationship between a partner’s attractiveness and interaction intimacy with that partner separately for each subject. This could be done by conducting a regression analysis separately for each subject, treating partner as the unit of analysis. In the Kashy (1991) example, this would involve computing 77 separate regressions in which attractiveness is the predictor and intimacy is the criterion. Table 1.2 presents a sample of the regression results derived by predicting average interaction intimacy with a partner using partner attractiveness as the predictor. For example, Subject 1 had an intercept of 5.40 and a slope of 1.29. The intercept indicates that Subject 1’s intimacy rating for a partner whom he perceived to be of average attractiveness was 5.40. The slope indicates that, for this subject, interactions with more attractive partners were more intimate, that is, one could predict that, for Subject 1, interactions with a partner who was seen to be 1 unit above the mean on attractiveness would receive average intimacy ratings of 6.69. Subject 4, on the other hand, had an intercept of only 2.20 and a slope of -.37. So, not only did this subject perceive his interactions with partners of average attractiveness to be relatively low in intimacy but he also reported that interactions with more attractive partners were even lower in intimacy. Note that, at this stage of the analysis, we do not pay attention t o any of the statistical significance testing results. Thus, we do not examine whether each subject’s coefficients differ from zero. The second part of the multilevel analysis is to aggregate or average the results across the upper-level units. If the sole question is whether the lower-level predictor relates to the outcome, one could simply average the regression coefficients across the upper-level units and test whether the average differs significantly from zero using a one-sample t test. For the attractiveness example, the average regression coefficient is 0.43. The test that the average coefficient is different from zero is statistically significant [t(76) = 8 . 4 8 , ~< .001]. This coefficient indicates that there is a significant positive relationship between partner’s attractiveness and interaction intimacy such that, on average, interactions with a partner who is one unit above the mean on attractiveness were rated as 0.43 points higher in intimacy. If meaningful, it is also possible to test whether the average intimacy ratings differ significantly from zero or some other theoretical value by averaging all of the intercepts and testing the average using a one-sample t test. It is very important to note that the only significance tests used in multilevel modeling are conducted for the analyses that aggregate across upper-level units. One does not consider whether each of the individual regressions yields statistically significant coefficients. For example, it is normally of little value to tabulate the number of persons for whom the X variable has a significant effect on the outcome variable. When there is a relevant upper-level predictor variable, 2,one can ex-
8
Kenny, Bolger, and Kashy
Table 1.2 A Sample of First-Step Regression Coefficients Predicting Interaction Intimacy with Partner’s Physical Attractiveness
& J Subject Number
Intercept
Slope
1
5.40
1.29
2
3.38
.03
3
2.64
.44
4
2.20
-.37
26
4.17
.48
Mean
3.78
.38
Women Subject Number
Intercept
Slope
27
4.07
.16
28
4.10
.45
29
3.88
.98
30
5.53
.32
77
4.31
.39
Mean
4.31
.45
9
Estimating Multilevel Models
amine whether the coefficients derived from the separate lower-level regressions vary as a function of the upper-level variable. If 2 is categorical, a t test or an ANOVA in which the slopes (or intercepts) from the lower-level regressions are treated as the outcome measure could be conducted. For example, the attractiveness-intimacy slopes for men could be contrasted with those for women using an independent groups t test. The average slope for men was M = 0.38 and for women M = 0.45. The t test that the two average slopes differ is not statistically significant, t(75) = 0.70, ns. Similarly, one could test whether the intercepts (intimacy ratings for partners of average attractiveness) differ for men and women. In the example, the average intercept for men was M = 3.78 and for women M = 4.31, t(75) = 2.19, p = .03, and so women tended to rate their interactions as more intimate than men. Finally, if 2 were a continuous variable, the analysis that aggregates across the upper-level units would be a regression analysis. In fact, in most treatments of multilevel modeling, regression is the method of choice for the second step of the analysis as it can be applied to both continuous and categorical predictors.
Multilevel Model Equations In presenting the formulas that describe multilevel modeling, we return to the example that considers the effects of subject gender and partner gender on interaction intimacy. As we have noted, estimation in multilevel models can be thought of as a two-step procedure. In the first step, a separate regression equation, in which Y is treated as the criterion variable that is predicted by the set of X variables, is estimated for each person. In the formulas that follow, the term i represents the upper-level unit, and for the Kashy example i represents subject and takes on values from 1 to 77; j represents the lower-level unit, partner in the example, and may take on a different range of values for each upper-level unit because the data may be unbalanced. For the Kashy example, the first-step regression equation for person i is as follows: Kj
= boi
+ b l i X i j + eij
(1.3)
where boi represents the intercept for intimacy for person i, and bli represents the coefficient for the relationship between intimacy and partner gender for person i. Table 1.3 presents a subset of these coefficients for the example data set. Given the way partner gender, or X , has been coded (-1, l ) , the slope and the intercept are interpreted as follows:
boi: the average mean intimacy across both male and female partners b1i: the difference between mean intimacy with females and mean intimacy with males divided by two
Kenny, Bolger, and Kashy
10
Table 1.3 Predicting Interaction Intimacy with Partner’s Gender: Regression Coefficients, Number of Partners, and Variance in Partner Gender
Men Subject Number
Intercept (boi)
Slope ( b l i )
Number of Partners
ax2
1
5.35
.76
11
.87
2
3.39
-.14
8
1.14
3
2.86
.69
16
.80
4
1.94
-.34
15
.84
26
4.41
.37
14
.73
Mean
3.85
.24
Number of Partners
ux 2
Women
Subject Number
Intercept (boi)
Slope
27
4.49
-.11
35
.50
28
4.03
.03
22
.62
29
3.65
.42
15
.50
30
5.98
.47
21
.86
77
4.40
.32
19
.98
Mean
4.39
-.16
(bli)
-
Note:
Gender of partner is coded 1 = female, -1 = male.
11
Estimating Multilevel Models
Consider the values in Table 1.3 for Subject 1. The intercept, boi, indicates that across all of his partners this individual rated his interactions to be 5.35 on the intimacy measure. The slope, b l i , indicates that this person rated his interactions with female partners to be 1.52 (0.76 X 2) points higher in intimacy than his interactions with male partners. For the second-step analysis, the regression coefficients from the first step (see Equation 1.3) are assumed to be a function of a person-level predictor variable 2:
boi = a0
+ a122 + di
(1.4)
bli = co
+ ClZi + fi
(1.5)
There are two second-step regression equations, the first of which treats the first-step intercepts as a function of the 2 variable and the second of which treats the first-step regression coefficients as a function of 2. In general, if there are p variables of type X and q of type 2 , there would be p + 1 second-step regressions each with q predictors and an intercept. There are then a total of p ( q 1) second-step parameters. The parameters in Equations 1.4 and 1.5 estimate the following effects:
+
ao: the average response on
Y for persons scoring zero on both
X and 2 a l : the effect of 2 on the average response on Y
co: the effect of X on Y for persons scoring zero on 2 c1: the effect of
2 on the effect of X on Y
Table 1.4 presents the interpretation of the four parameters for the example. For the intercepts (boi, ao, and CO) to be interpretable, both X and 2 must be scaled so that either zero is meaningful or the mean of the variable is subtracted from each score (i.e., the X and 2 variables are centered). In the example used here, X and 2 (partner gender and gender of the respondent, respectively) are both effect-coded (-1, 1) categorical variables. Zero can be thought of as an “average” across males and females. The estimates of these four parameters for the Kashy example data set are presented in the OLS section of Table 1.5. As was the case in the ANOVA discussion for balanced data, there are three random effects in the multilevel models. First, there is the error component, eij, in the lower-level or first-step regressions (see Equation 1.3). This error component represents variation in responses across the lower-level units after controlling for the effects of the lower-level predictor variable, and its variance can be represented as 0:. In the example, this component represents variation in intimacy across partners who are of the same gender (it is the partner variance plus error variance that was discussed in the ANOVA section). There are also random effects in each of
Kenny, Bolger, and Kashy
12
Table 1.4 Definition of Effects and Variance Components for the Kashy Gender of Subject by Gender of Partner Example
Effect Estimate
Multilevel Parameter
Definition of Effect Typical level of intimacy across all subjects and partners
Constant
a0
Subject Gender (2)
a1
Degree to which females see their interactions as more intimate than males
Partner Gender ( X )
CO
Degree to which interactions with female partners are seen as more intimate than those with male partners
X by Z
C1
Degree to which the partnergender effect is different for male and female subjects
ad2
Individual differences in the typical intimacy of a subject's interactions, controlling for partner and subject gender
Variance Subject
X by Subject
Individual differences in the effect of partner gender, controlling for subject gender
Error
Wihin-subject variation in interaction intimacy, controlling for partner gender (includes error variance)
Table 1.5 Estimates and Tests of Coefficients and Variance Components for the Kashy Gender of Subject of Partner Example Estimation Procedure
WLS
OLS
Multilevel
t
b
t
32.99
4.105
34.14
0.249
2.00
0.270
2.24
.71
0.056
1.18
0.054
1.12
-.200
-3.72
-0.181
-3.78
-0.188
-3.94
IT2
F
g2
X2bf
Parameter
b
t
b
Constant
a0
4.120
34.08
4.097
Subject Gender (2)
a1
.269
2.23
Partner Gender ( X )
co
.038
X by Z
c1
Variances
ML
Subject ( S / Z or d )
gd
x by s/z (f)
Of2
0.026
Error (e)
ge
1.886
0.863
8.22
0.853
1.22
0.025
1.22
8.22
1.888
m. OLS, WLS, and MLS estimates were obtained using the SAS REG procedure, the SAS GLM procedure, and HLM, respectively.
14
Kenny, Bolger, and Kashy
the two second-step regression equations. In Equation 1.4, the random effect is di and it represents variation in the intercepts that is not explained by 2. Note that di in this context is parallel to M S ~ I Z within the balanced repeated measures ANOVA context, as shown in Equation 1.1. The variance in di is a combination of oi, which was previously referred to as Subject variance, and cr,". Finally, in Equation 1.5, the random effect is fi and represents variation in the gender of partner effect. Note that f i here is parallel to M S X b y S / Zwithin the repeated measures ANOVA context, as shown in Equation 1.2. The variance in f i is a combination of 0;, which was previously referred to as the Subject by Gender of P a r h e r variance, and 0,". A description of these variances for the example is given in Table 1.4. Recall that it was possible to obtain estimates of 0; and 0; for balanced designs by combining means squares. As can be seen in Equations 1.1 and 1.2, in the balanced case the formulas involve a difference in mean squares divided by a constant. In the unbalanced case (especially when there is a continuous X ) , this constant term becomes quite complicated. Although we believe a solution is possible, so far as we know none currently exists. The multilevel model, with its multistep regression approach, seems radically different from the ANOVA model. However, as we have pointed out in both the text and Table 1.1,the seven parameters of this multilevel model correspond directly to the seven mean squares of the ANOVA model for balanced data. Thus, the multilevel model provides a more general and more flexible approach to analyzing repeated measures data than that given by ANOVA, and OLS provides a straightforward way of estimating such models.
Computer Applications of Multilevel Models with OLS Estimation One of the major advantages of using the OLS approach with multilevel data is that, with some work, virtually any statistical computer package can be used to analyze the data. The simplest approach, although relatively tedious, is to compute separate regressions for each upper-level unit (each person in the case of repeated measures data). In SAS, separate regressions can be performed using a "BY" statement. If PERSON is a variable that identifies each upper-level unit, the SAS code for the first-step regressions could be:
PROC REG MODEL Y = X BY PERSON Then a new data set that contains the values for boi and bli for each upper-level unit, along with any 2 variables that are of interest, would be entered into the computer. The OLS approach is certainly easier, however, if the computer package that performs the first-step regressions can be used
Estimating Multilevel Models
15
to create automatically a data set that contains the first-step regression estimates. Although this can be done within SAS using the OUTEST = data s e t name COVOUT options for PROC REG, it can be rather challenging because SAS creates the output data set in matrix form. Regardless of how the data set is created, the coefficients in it serve as outcome measures in the second-step regressions.
Complications in Estimation with Unbalanced Data The OLS approach t o multilevel modeling allows researchers to analyze unbalanced data that cannot be handled by ANOVA. As we have noted, there are two major reasons that data are not balanced. First, persons may have different numbers of observations. This is the case in Kashy data set where the number of partners varies from 5 to 51. Second, even if the number of observations were the same, the distribution of X might vary by person. In the example, X is partner gender, and the distribution of X does indeed vary from person to person and so the variance of X differs (see Table 1.3). As noted earlier, data are unbalanced if either the number of observations per person is unequal or the distribution of the X variables differs by person. Note that a study might be designed to be balanced, but one missing observation makes the data set unbalanced.
MULTILEVEL ESTIMATION METHODS THAT WEIGHT THE SECOND-STEP REGRESSIONS The OLS approach does not take into account an important ramification of unbalanced data: The first-step regression estimates from subjects who supply many observations, or who vary more on X , are likely in principle to be more precise than those from subjects who supply relatively few observations or who vary little on X . A solution to this problem is to weight the second-step analyses that aggregate over subjects by some estimate of the precision of the first-step coefficients. How best to derive the weights that are applied to the second-step analyses is a major question in multilevel modeling, and there are two strategies that are used: weighted least squares (WLS) and maximum likelihood (ML). Because the NIL approach is treated in detail in other chapters in the volume, we focus most of our attention on the WLS solution. However, we later compare WLS, as well as OLS, with ML .
Multilevel Modeling with Weighted Least Squares Expanding the multilevel model from an OLS solution to a WLS solution is relatively straightforward. As in OLS, in the WLS approach a separate analysis is conducted for each upper-level unit. This first-step analysis is identical to that used in OLS, as given in Equation 1.3. The secondstep analysis also involves estimating Equations 1.4 and 1.5. However, in
16
Kenny, Bolger, and Kashy
the WLS solution, Equations 1.4 and 1.5 are estimated using weights that represent the precision of the first-step regression results. The key issue then is how to compute the weights. In WLS, the weights are the sums of squares for X or SSi (Kenny et al., 1998). This weight is a function of the two factors that cause data to be unbalanced: The number of lower-level units sampled (partners in the example), a,nd the variance of X (partner gender in the example).
Multilevel Modeling with Maximum Likelihood The major difference between ML and WLS solutions to multilevel modeling is how the weights are computed. The ML weights are a function of the standard errors and the variance of the term being estimated (see chapter 5 for greater detail). For example, the weight given t o a particular boi is a function of its standard error and the variance of di. ML weighting is statistically more efficient than WLS weighting, but it is computationally more intensive. There is usually no closed form solution for the estimate, that is, there is no formula that is used to estimate the parameter. Estimates are obtained by iteration and the estimates that minimize a statistical criterion are chosen. In ML estimation, the first and second-step regressions are estimated simultaneously. Several specialized stand-alone computer programs have been written that use ML to derive estimates for multilevel data: HLM/2L and HLM/3L (Bryk, Raudenbush, & Congdon, 1994), MIXREG (Hedeker, 1993), MLn (Goldstein, Rasbash, & Yang, 1994), and MLwiN (Goldstein et al., 1998). Within major statistical packages, SAS’s PROC MIXED and BMDP’s 5V are available.
ESTIMATION OF WLS USING STANDARD COMPUTER PROGRAMS The estimation of separate regression equations is awkward and computationally inefficient. Moreover, this approach does not allow the researcher to specify that the X effect is the same across the upper-level units. It is possible t o perform multilevel analyses that yield results identical to those estimated using the LLseparate regressions” WLS approach but that are more flexible and less awkward. This estimation approach treats the lower level or observation as the unit of analysis but still accounts for the random effects of the upper level. We illustrate the analysis using SAS’s GLM procedure as an example. The analysis could be accomplished within most general linear model programs. We use SAS because it does not require that the user create dummy variables, but other statistical packages could be used. The WLS analysis that we describe requires that a series of three regression models be run, and then the multilevel parameters and tests are constructed from the results of these three models. Lower-Ievel units are treated as the unit of analysis. In other words,
17
Estimating Multilevel Models
each observation is a separate data record. Each record has four variables: the lower-level predictor variable X , the upper-level predictor variable Z, the outcome variable Y , and a categorical variable, called PERSON in the example that follows, which identifies each individual or upper-level unit in the sample. In the first run or Model 1 the setup is:
PROC GLM CLASS PERSON MODEL Y = Z PERSON X
Z*X
PERSON*X
The mean square error from the model is the pooled error variance or sz. Also, the F tests (using SAS’s Type I11 Sum of Squares) for both PERSON and PERSON by X are the WLS tests of the variance of the intercepts ( s i ) and the variance of the slopes (s:), respectively. Note that this model supplies only the tests of the intercept and slope variances. The other tests are not WLS tests and should be ignored Model 2 is the same as Model 1 but the PERSON by X term is dropped:
’.
PROC GLM CLASS PERSON MODEL Y = Z PERSON X Z*X/SOLUTION This model gives the proper estimates for main effect of X (co) and the Z by X interaction (c1) (see Equation 1.5). The SOLUTION option in the MODEL statement enables these estimates to be viewed. Mean squares for these terms are tested using the PERSON by X mean square (SAS’s Type 111) from Model 1 as the error term. If there are multiple X variables, Model 2 must be re-estimated dropping each PERSON by X interaction singly. Finally, Model 3 is the same as Model 1 except the PERSON term is dropped:
PROC GLM CLASS PERSON MODEL Y = Z X Z*X PERSON*X/SOLUTION INT The term INT is added so that the intercept can be viewed. This model gives the estimates of the Z effect ( a l ) and the overall intercept (ao) from Equation 1.4. The mean squares for these terms are tested using the PERSON Mean Square (Type 111) from Model 1. If there were two X variables, X1 and X z , then Model 2 would be estimated twice. In one instance, the PERSON by X 1 term would be dropped; however, the effects of the both X1 and Xa would remain in the equation as well as the PERSON by X2 interaction. In other instance, the PERSON by X z term would be dropped; however, the effects of the both ’The reader should be warned that, in the output, the Z effect has zero degrees of freedom. This should be ignored.
18
Kenny, Bolger, and Kashy
and X 2 would remain in the equation as well as the PERSON by X 1 interaction. If there were more than one 2 variable, they could all be tested using a single Model 3. The results from the tests of the variances of Model 1 have important consequences for the subsequent tests. If there were evidence that an effect (e.g., f ) does not significantly vary across upper-level units and so s$ is not statistically significant, Model 1 should be re-estimated dropping that term. In this case, instead of using that variance as an error term for other terms in Model 2, those terms can be tested directly within Model 1 using the conventional Model 1 error term. So if s: is not included in the model, co and c1 would be tested using the sz. Rarely, if ever, is the variance of the intercepts not statistically significant. However, if there was no intercept variance, a parallel procedure would be used t o test no and n l . Table 1.5 presents the OLS, WLS, and ML results for the Kashy data set. The OLS and WLS estimates were obtained from SAS using the methods described previously. The ML estimates were obtained using the HLM program (Bryk et al., 1994). Model 1 is estimated first t o determine whether there is significant variance in the intercepts and slopes across persons. There is statistically significant evidence of variance in the intercepts [F(75,1283) = 8 . 2 2 , ~< .001]; however, there is not evidence that the slopes significantly vary [F(75,1283) = 1 . 2 2 , ~= .lo). We adopt the conservative approach and treat the slopes as if they differed. We see that the intercept is near the scale midpoint of four. Because effect coding is used, effects for respondent gender, partner gender, and their interaction must be doubled t o obtain the difference between males and females. We see from the subject gender effect that females say that their interactions are more intimate than reported by males by about half a scale point. The p a r t n e r effect indicates that interactions with females are perceived as one tenth of a point more intimate than interactions with males. Finally, the interaction coefficient indicates that opposite-gender interactions are more intimate than same-gender interactions. One feature t o note in Table 1.5 is the general similarity of the estimates. This illustrates how WLS and even OLS can be used t o approximate the more complicated ML estimates. Of course, this is one example and there must be cases in which ML is dramatically different from the least-squares estimators. We discuss this issue further in the following section. x'1
COMPARISON BETWEEN METHODS In this section we consider the limitations and advantages of OLS, WLS, and ML estimation. The topics that we consider are between and within slopes, scale invariance, estimation of variances and covariances, statistical efficiency, and generality.
19
Estimating Multilevel Models
__---
*.--.
X
Figure 1.1: Individual within (solid line), pooled within (small dashed), and between line (large dashed line).
Between and Within Slopes The coefficient b l i measures the effect of X on Y for person i. In essence, OLS and WLS average these bli values to obtain the effect of X on Y . However, there is another way to measure the effect of X on Y . We can compute the mean X and mean Y for each person, and then regress mean Y on mean X (again weighting in the statistically optimal way) treating person as the unit of analysis. So for the example, we could measure the effect having more female partners on the respondent’s overall level of intimacy. We denote this effect as bg and the average of the b l i or within-subject coefficients as bw . Figure 1.1 illustrates these two different regression coefficients. There are three persons, each with four observations denoted by the small-filled circles. We have fitted a slope for each person, designated by a solid line. We can pool these three slopes across persons to compute a common, pooled within-person slope or bw. This slope is shown in the figure as the dashed line that we fitted for each person. The figure also shows the three points through which bB is fitted (the large-filled circles). The slope bB is fitted through these points and is shown by the large dashed line. There are then two estimates of the effect of X on Y : bw and b s . In essence, bw is an average of the persons’ slopes, and bg is the slope computed from the person means. For the Kashy data set, we estimated these two slopes for the effect of partner gender on perceived intimacy. The value for bw is 0.056, indicating that interactions with female partners are seen as more intimate. However, the value for bB is negative being 0.217. This indicates that people who have relatively more female partners
20
Kenny, Bolger, and Kashy
viewed their interactions as less intimate. (The coefficient is not statistically significant .) The ML estimate, as we have described it, of the effect of X on Y is a compromise of the two slopes of bw and bB whereas the WLS and OLS estimates use only a version of bw. Note that in Table 1.5 the ML estimate for this effect ( X ) is somewhat lower than the WLS estimate because ML uses the negative between slope. In our experience, these two slopes are typically different, and, as the example shows, sometimes they even have different signs. So, it is a mistake t o assume without even testing that the two slopes are the same. The prudent course of action is to compute both slopes and evaluate empirically whether they are equal. If different, in most applications we feel that bw is the more appropriate. To estimate both slopes the following must be done: create an additional predictor variable that is the mean of the X , for each person (Bryk & Raudenbush, 1992). Thus, there are two X predictors of Y : X i j and the mean X . The slope for X i j estimates bw and the slope for mean X estimates bg. Alternatively, the X variables can be “group-centered” by removing the subject mean for each variable (for more on centering in multilevel models see Kreft, de Leeuw, & Aiken, 1995). We should note that, in the balanced case, mean X does not vary, and so bw can be estimated but bB is not identified. Perhaps, the balanced case has misled us into thinking that there is just one X slope (bw) when in fact in the unbalanced case there are almost always two (that may or may not be equal).
Scale Invariance There is a serious limitation t o WLS estimation that is not present in either ML or OLS. Second-stage estimates using WLS estimation of intercepts are not scale invariant, that is, if an X variable were transformed by adding a constant t o it, the WLS second-step solution for the intercepts cannot ordinarily be transformed back into the original solution. The reason for this lack of invariance is that the weights used in the step-two equations differ after transformation. The standard error for the intercept increases as the zero point is farther from the mean. Because of the differential weighting of the intercepts, estimates of cell “means,” using the intercepts, will not be the same. To illustrate this problem using the sample data set, we recoded the data using dummy coding (males = 0, females = 1)instead of effect coding for the both person and partner gender variables. Table 1.6 presents the estimated cell means for the four conditions. We see that there is a difference between the predicted ”means” and so the coding system matters. Because ML estimates the weights simultaneously, it does not t o have this problem. Because OLS does not weight at all, OLS does not have 3However, if the same equation were estimated twice (e.g., an X variable is present
Estimating Mu1tilevel Models
21
Table 1.6 Estimated Cell “Means” for the Four Conditions Using WLS
Person
Partner
Estimated Gel1 “mean”
Gender
Gender
Effect Coding
Dummy Coding
Female
Female
4.221
4.254
Female
Male
4.471
4.503
Male
Female
4.085
4.055
Male
Male
3.611
3.581
this problem. Thus, this serious problem applies only to WLS. One simple solution to the problem is to always center the X variables using the grand mean. It is fairly standard practice to do this anyway.
Estimation of Variances and Covariances One major advantage of ML is that it directly provides estimates of variances and covariances. A procedure for obtaining WLS estimates of variance has been developed (Kashy, 1991), but it is very complicated. We know of no appropriate method for estimating covariances within WLS. Because slopes and intercepts are typically weighted differently, it is unclear how to weight each person’s estimates t o form a covariance.
It seems logically possible that estimates of both variance and covariance could be developed within OLS. However, we know of no such estimates. If OLS were to be used more in estimation in multilevel models, it would be of value to determine these estimators. ML has the strong advantage of providing estimates of these variances and covariances. Unfortunately, we should note that all too often these terms are largely ignored in the analysis. Most of the focus is on the fixed effects. Very often the variances and covariances are as important as the fixed effects. Knowing that X has the same effect on Y for all subjects (i.e., sz is zero) can often be a very interesting result because it implies that effect of X on Y is not moderated by individual differences. in one equation and dropped in the other), ML is likely to weight the effect differently in the two equations. This differential weighting creates difficulties in the decomposition of indirect effects in mediation.
22
Kenny, Bolger, and Kashy
Statistical Efficiency If we assume that the statistical model is correct, OLS is the least efficient, WLS the next, and ML the most. The complex weighting of ML creates this advantage. We wonder, however, whether this advantage may at times be more apparent than real. Consider the Kashy study. For both ML and WLS, why should people who have more partners count more than those with fewer? Statistically, more is better, but that may not be the case in all repeated measures studies. Perhaps, if there is a disparity in the number of observations per person, the researcher might want to test if number of observations (perhaps log transformed) is a moderating variable, that is, does the effect of X on Y increase or decrease when there are more observations? Number of observations would then become a Z variable entered in the second-step equations. We estimated such a model with the Kashy data and did not find evidence for moderation, but we did find a trend that persons with more interaction partners reported lower levels of intimacy.
Generality There are several complications of the model that we might want to consider. First, the outcome variable, Y , may be discrete, not continuous. For instance, in prevention studies, the outcome might be whether the person has a deviant status or not. Second, X or Y may be latent variables. In social-interaction diary studies, there may be several outcomes (intimacy, disclosure, and satisfaction) that measure the construct of relationship quality. It may make sense to treat them as indicators of a latent variable. Third, we have assumed that after removing the effect of the X , the errors are independent. However, the error may be correlated across time, perhaps with autoregressive structure. Fourth, the distribution of errors may have some other distribution besides normal (e.g., log normal). Typically, behavioral counts are highly skewed and so are not normal. Fifth, the variance in the errors may vary by person. Some people may be inherently more predictable than others. Increasingly, ML programs allow for these and other complications. However, it would be difficult if not impossible to add these complications to a least-squares estimation solution. Thus, ML estimation is much more flexible than least-squares estimation.
SUMMARY Multilevel modeling holds a great deal of potential as a basic data analytic approach for repeated measures data. An important choice that researchers will have to make is which multilevel estimation technique to use. Although statistical considerations suggest that ML is the best estimation technique
Estimating Mu1tilevel Models
23
t o use because it provides easy estimates of variance and covariance components, is flexible, and provides estimates that are scale invariant, there are times that OLS might also be very useful. We should note that ML estimation is iterative, and sometimes there can be a failure t o converge on a solution. Moreover, ML estimation, as conventionally applied, pools the between and within slopes without evaluating their equality. Therefore, when ML is used in an unsophisticated manner, it is possible t o end up confounding what may be conceptually very different effects. OLS approaches are familiar and easy t o apply, and results generated by OLS generally agree with those produced by ML. WLS has some advantages over OLS. Its estimates are more efficient and estimates of variance components are possible. However, it suffers from the problem that the intercept estimates are not scale invariant. Notably, if the data set is balanced or very near balanced, there is only a trivial difference between the different techniques. ML estimation still has the advantage that variance components can always be estimated, but, if the design is perfectly balanced, the variance components can be estimated and tested using least squares. A major advantage of both OLS and WLS solutions is that they can be accomplished by using conventional software (although SAS’s PROC MIXED is available for ML). Thus, a researcher can use conventional software t o estimate the multilevel model. WLS and OLS may serve as a bridge in helping researchers make the transition from simple ANOVA estimation t o multilevel model estimation. It may also be a relatively easy way t o estimate multilevel models without the difficulties of convergence and iteration. Finally, and most importantly, it can provide a way for researchers who are not confident that they have successfully estimated a multilevel model using new software t o verify that they have correctly implemented their model. We have generations of researchers who are comfortable with ANOVA and who have difficulty working with multilevel regression models. These people can estimate models using a WLS approach that approximates the more appropriate ML. Regardless how the researcher estimates a multilevel model, we strongly urge the careful probing of the solution. Even the use of standard ANOVA is complicated, and errors of interpretation are all too easy t o make. Researchers need t o convince themselves that the analysis is correct by trying out alternative estimation methods (some of which may be suboptimal), plotting raw data, and creating artificial data and seeing if the analysis technique recovers the model’s structure. We worry that, in the rush t o use these exciting and extraordinarily useful methods, some researchers may not understand what they are doing and they will fail to make discoveries that they could have made using much simpler techniques.
24
Kenny, Bolger, and Kashy
ACKNOWLEDGMENTS Supported in part by grants to the first author from the National Science Foundation (DBS-9307949) and the National Institute of Mental Health (R01-MH51964). Questions to the first author can be sent by email t o
[email protected].
REFERENCES Bryk, A. S., & Raudenbush, S. W. (1992). Hierarchical linear models: Applications and data analysis methods. Newbury Park, CA: Sage. Bryk, A. S., Raudenbush, S. W., & Congdon, R. T . (1994). Hierarchical linear modeling with the H L M / 3 L programs. Chicago, IL: Scientific Software International. Goldstein, H., Rasbash, J., Plewis, I., Draper, D., Browne, W., Yang, M., Woodhouse, G., & Healy, M. (1998). A user’s guide t o M L w i N . Institute of Education, University of London. (http://www.ioe.ac.uk/multilevel/) Goldstein, H., Rasbash, J., & Yang, M. (1994). M L N : User’s guide f o r version 2.3. London: Institute of Education, University of London. Hedeker, D. (1993). M I X R E G . A F O R T R A N program for mixed-effects linear regression models. Chicago, IL: University of Illinois. Kashy, D. A. (1991). Levels of analysis of social interaction diaries: Separating t h e effects of person, partner, day, and interaction. Unpublished doctoral dissertation, University of Connecticut, Storrs, CT. Kenny, D. A., Kashy, D. A , , & Bolger, N. (1998). Data analysis in social psychology. In D. Gilbert, S. Fiske, & G. Lindsey (Eds.), T h e handbook of social psychology (Vol. 1, 4 ed., p. 233-265). Boston, MA: McGraw Hill. Kreft, I. G. G., de Leeuw, J., & Aiken, L. S. (1995). The effect of different forms of centering in hierarchical linear models. Multivariate Behavioral Research, 30, 1-21. Reis, H. T., & Wheeler, L. (1991). Studying social interaction with the rochester interaction record. In M. P. Zanna (Ed.), Advances in experimental social psychology (Vol. 24, p. 269-318). San Diego, CA: Academic Press.
Chapter 2
Alternative Covariance Structures for Polynomial Models of Individual Growth and Change Stephen VV. Raudenbush Machigan State University In studies of psychological change, researchers seek statistical models that are developmentally meaningful and provide a reasonable fit to the data. They also seek inferences that are fairly insensitive t o questionable assumptions about the random behavior of their data. This chapter compares, contrasts, and integrates two modeling approaches in light of these concerns: a hierarchical linear model and a multivariate model for incomplete data. If the complete data are multivariate normal with homogeneous covariance structure, now-standard hierarchical models are submodels of the multivariate model. This principal can be exploited to compare the fit of alternative hierarchical models with each other and with an unrestricted multivariate model. However, hierarchical models often imply heterogeneity of covariance structure and are therefore more general than the conventional multivariate models for incomplete data. Both models can readily be extended to include the clustering within groups of repeatedly observed participants. Robust standard errors for the fixed regression coefficients are available within both approaches. Taken together, these approaches allow a thorough investigation of the sensitivity of key inferences to alternative model assumptions. The two approaches are illustrated by reanalysis of data from two large-scale longitudinal studies. Hierarchical linear models (” HLM” ) have become increasingly popular
25
26
Raudenbush
in studies of growth and change. As presented by Laird and Ware (1982) and Strenio, Weisberg, and Bryk (1983), the approach is based on a nested pair of models. At the first level, an outcome varies within each person over time as function of a polynomial growth or change curve plus a within- person random error. The parameters of the individual curve are then viewed as outcomes in a second, between-person model. In this second-level model, individual differences in background and experience can be specified to account for individual differences in trajectories of change. Bryk and Raudenbush (1987) showed how this modeling framework can supply estimates of the mean trajectory, of individual variation around that mean, of the reliability of measures of change, of the correlation between true status at any time and true rate of change, and of correlates of change. The approach typically provides reasonable estimates of the individual change function even when an individual’s data are sparse. HLMs are elsewhere described as ”multilevel models” (Goldstein, 1995) or ”random coefficient models” (Gibbons, Hedeker, Waternaux, & Davis, 1988; Longford, 1993), but the essential approach in polynomial studies of change is similar (c.f., Goldstein, 1987, 1989). The models are typically estimated via maximum likelihood (ML) or restricted maximum likelihood (REML), with empirical Bayes estimation of individual growth. Bayesian estimation may be preferred to maximum likelihood when the number of participants is small (Seltzer, 1993). One major advantage of the approach is flexibility in handling timeseries data (Ware, 1985). The analyst can make use of all available data, so that any participant with one or more time point can be included in the analysis. The assumption is that the data are missing at random (Little & Shenker, 1995), although the use of all available data increases the robustness of the results to nonrandom missingness (Schafer, 1996). The approach readily handles unequal spacing of time points across participants. For example, in a study of age-related change, one might wish to assess each subject on a monthly basis or on a given birthday, but the logistics of field research may make this impossible. Then the distance between time points will vary across participants. By viewing each person’s time-series data set as nested within a person, the model readily incorporates these individual differences in the number and spacing of the time-series observations. Neither missing data nor varying timing is gracefully handled within the framework of conventional univariate or multivariate analysis of variance of repeated measures. A second important advantage of the approach is its flexibility in modeling individual change and correlates of change. The change parameters can be defined in a variety of interesting ways. For example, in a linear growth structure, it might be useful to characterize the individual trajectory simply in terms of an initial status and a constant rate of change ‘The robustness of inferences to nonrandom missingness depends strongly on the fraction of missing information, which is minimized when all data are used in the analysis.
Alternative Covariance Structures
27
(Bryk & Raudenbush, 1987). Alternatively, one might characterize growth in terms of a mean level, an average velocity, and a rate of acceleration. Time-varying covariates can readily be included in the polynomial change model (Raudenbush & Chan, 1993; Ware, 1985). Piecewise linear models may be more useful than the standard polynomial model (Bryk & Raudenbush, 1992,Chapter 6). Examples of creative first-level modeling include Francis, Fletcher, Stubing, Davidson, and Thompson’s (1991) study of recovery from head injury; Huttenlocher, Haight, Bryk, and Seltzer’s (1991) study of vocabulary growth during the first year of life; and Horney, Osgood, and Marshall’s (1995) evaluation of contextual effects on anti-social behavior. Predictors of change in the second-level model can be continuous or discrete, and the model can be tailored t o allow one set of predictors for status and other sets of predictors a given aspect of growth. The foundation of this approach to studying change is the ”level-1” model, that is, the model for individual change (Rogosa, Brand, & Zimowski, 1982). This model must be psychologically meaningful because it is the level 1 model that defines the parameters of change that will become outcomes at level 2. However, the specification of the model has strong implications for the marginal variances and covariances of the time-series data, that is, a given hierarchical model ”induces” a set of assumptions about how the outcomes vary and covary over time. A potentially important empirical test of the model is whether these induced assumptions are consistent with the manifest variation and covariation over time. To test model fit requires that a broad array of covariance structures can be estimated and compared with the covariance structure induced by the hierarchical model. Willett and Sayer (1994) showed that two-level models for change typical in a growing number of applications of HLM can also be estimated within the framework of standard structural equation modeling (SEM) software such as LISREL (Joreskog & Sorbom, 1989). The measurement model of SEM corresponds t o the level 1 model of HLM. Here SEM’s manifest outcomes are the time-series data of HLM; the latent variables of SEM are the individual change parameters of HLM; the factor loadings of SEM are the polynomial predictors of HLM and the measurement error variance of SEM coincides with the HLM’s within-person or “level 1” variance. The structural model in SEM then corresponds t o HLM’s level-:! or ”between’A general rule of thumb is that any predictor of a high-level term in the polynomial (e.g., a quadratic term) should also be included as a predictor of each lower-level term (e.g., the intercept and linear terms) in order that results will be invariant under linear transformations of the predictors. This rule can be justifiably ignored when only a specific scaling of the predictors is of interest. 3Willett and Sayer built their approach directly upon the “latent growth curve” modeling approach of McArdle (1986); see also Meredith and Tisak (1990). In contrast with Willett and Sayer’s approach, the latent grwoth curve approach does not specify the factor loadings. Rather, these are estimated from the data, allowing a flexible and parsimonious data-driven representation of the mean growth curve in addition t o allowing a family of flexible models for the variance-covariance structure of the residuals. Although both approaches have important applications, the current chapter considers only the a priori specification of the growth parameters.
28
Raudenbush
person” model. A distinct advantage of the reformulation proposed by Willett and Sayer (1994) is that, once the model is translated into the framework of SEMI the full range of covariance structures implemented in SEM software becomes available. Thus, it is easy within SEM t o allow not only randomly varying intercepts and slopes but also heterogeneous or auto-correlated within-person residuals. Using SEMI one can therefore test a wide range of covariance structures, enabling empirical tests of the covariance structure induced by any given two-level model. A distinct disadvantage of the approach is that currently most available SEM software requires ” time-structured” data that is, each participant is required t o have the same number and spacing of time points. A second disadvantage is that SEM does not admit estimation of time-varying covariates having unequal distributions across persons. Thus it appears that analysts face a forced choice between the HLM approach, which allows a wide variety of data structures and level 1 models but a limited choice of covariance structures, and the SEM approach, which allows a wide array of covariance structures but is somewhat inflexible otherwise. Ironically, psychological researchers have been largely unaware of seminal work by Jennrich and Schluchter (1986), who developed a flexible approach t o studying time-series data having a multivariate normal distribution given a set of covariates. Their approach is founded on maximum likelihood estimation of the multivariate normal variance-covariance matrix for incomplete time series data. It also allows estimation of several alternative restricted models: ”random effects” covariance structures identical t o those normally specified in HLM, autocorrelation models, models for heteroscedastic level 1variances, and models having a factor-analytic structure. This approach assumes that the design is ”time-structured” (i.e., spacing between intended time points will not vary from person t o person), but it does allow randomly missing time-series data. This approach, popular in biomedical applications for more than a decade, thus combines advantages of flexibility in level 1 modeling and missing data while allowing a broad range of covariance structures. We label this approach the ”generalized multivariate linear model” approach or ”GMLM.” Taken together, the HLM, SEM, and GMLM approaches make available t o longitudinal researchers an array of modeling approaches of remarkable flexibility for the study of continuously distributed repeated measures. Indeed, there is evidence of a convergence of approaches as those interested in hierarchical models begin to build in richer covariance structures (Goldstein, 1995) and those interested in SEM generalize their approaches t o facilitate new and interesting extensions of growth curve modeling (Meredith & Tisak, 1990; Muthkn, 1991), including, for example, the use of latent initial status t o predict subsequent growth. However, at least two consid4This work was originally embodied in BMDP’s Program 5v, which has strongly influenced more recent software development by SPSS and SAS. ’Also of intense interest currently are repeated measures models for discrete data, but this topic goes beyond the scope of this chapter.
A1t erna tive Covariance Structures
29
erations remain unaddressed in the applications cited so far. First, it will often be the case that persons who are repeatedly observed will also be clustered within groups such as families, neighborhoods, schools, HMOs, and so on. The hierarchical model can readily incorporate such nesting by adding a level to the model, as illustrated by Bryk and Raudenbush (1992; Chapter 8), who studied school differences in rates of mathematics learning using a three-level model. At level 1 was a model for individual change over time. The second-level described individual differences within schools, and the third level described differences among schools. In effect, a two-level growth model was estimated within each of many schools. Extensions of the SEM or GMLM to the clustering of persons, however, are absent. Within the hierarchical model, Thum’s (1997) approach has potential for direct application to this problem as does the multilevel multivariate model of Goldstein (1995). A second issue for applications of either HLM, SEM, or GMLM is that nearly all of the published applications so far assume the multivariate normality of the residuals. Exceptions include Thum (1997) and Seltzer (1993), who allow specification of a multivariate t distribution for the residuals. This specification enables robust estimation in the presence of ”heavytailed” data, that is, data with more extreme values than are expected under normality. Alternatively, if interest is confined to the mean growth trajectory and its correlates, and if the number of participants is reasonably large, the generalized estimating equation approach of Zeger, Liang, and Albert (1988) can be applied to obtain standard errors that are essentially nonparametric. This chapter considers methods for studying alternative covariance assumptions in studies of polynomial growth and change within the framework of the hierarchical model. It begins with a brief review of the nowstandard two-level model for change and its correlates and describes how this model induces assumptions about variances and covariances over time. It then reconceives the multivariate models as a hierarchical model in which the multiple measures constitute a level in the model, following Goldstein (1987, 1995), Kalaian and Raudenbush (1996), and Raudenbush, Rowan, and Kang (1991). It then combines the ideas from the standard hierarchical model and the multivariate model into a single, hierarchical multivariate model that allows one to: 1. test a variety of covariance structures from the simplest HLM to a completely saturated (unrestricted) model 2. allow for randomly missing data and time-varying covariates having 6Raudenbush and Chan (1993) and Rasbash and Goldstein (1994) provide estimation theory and computational approaches for cross-classified random effects model. Such models allow repeated measures on persons who migrate across social contextual boundaries during the course of a longitudinal study. 71nference for hierarchical models based on bootstrap standard errors appears in Goldstein (1996) and Raudenbush and Willms (1995).
Raudenbush
30 different distributions within persons
3. examine sensitivity of inferences about change to alternative specification of the covariance structure 4. examine robustness of inferences about fixed effects to the assumption of multivariate normality 5. extend application to incorporate nesting of persons within social settings These analytic approaches are illustrated by re-analysis of longitudinal data from the National Youth Survey and the Sustaining Effects Study of Compensatory Education.
HIERARCHICAL MODEL AND ITS IMPLICATIONS FOR VARIATION AND COVARIATION OVER TIME Data To illustrate what has become a standard application of HLM in studies of change, we reanalyze data from the first cohort of the National Youth Survey (Elliott, Huizinga, & Menard, 1989). These data, summarized in Table 2.1, were analyzed by Raudenbush and Chan (1993). Members of the first cohort were sampled in 1976 at age 11 and interviewed annually until 1980 when they were 15. The outcome, described in detail in Raudenbush and Chan (1993), is a measure of attitudes toward deviant behavior] with higher values indicating greater tolerance of pro-deviant activities such as lying, cheating, stealing, vandalism, and drug use. We shall refer to this outcome as "tolerance." The table appears to indicate an increase in tolerance as a function of age during these early adolescent years. However, the means at each age are based on different sample sizes because of missing data. In fact, 168 persons had a full complement of five time-series measurements, whereas 45 had only four, 14 had three, and 5 had two, and 7 had one. To illustrate the SEM approach to the study of change, Willett and Sayer (1994) analyzed the subset of 168 participants with complete data. Our analysis, in contrasts, makes use of all available data from the 239 participants.
Simple Model The general theory of crime of Gottfredson and Hirschi (1990) predicts a near-linear increase in antisocial behavior during these early adolescent years, and it may be that tolerance to deviant thinking is similarly linear. Thus we might formulate the simple Iinear model for each person:
A1t ernative Covariance Structures
31
Table 2.1 Prescription of NYS Sample
Tolerance of Deviant Attitudes Number of Observations
Frequency
Age
n
m
sd
11
237
.217
.197
168
12
232
.241
.212
45
13
230
.332
.270
14
14
220
.410
.290
5
15
219
,444
.301
7
where Yij is the tolerance score for person j at occasion i; aij is the age minus 13 of that person at that occasion, so that 7roj represents the expected tolerance level for participant j at age 13; and 7r1j represents the annual rate of increase in tolerance between the ages of 11 and 15; and rij is the within-person residual, assumed independently normally distributed with mean 0 and constant variance 02. In sum, j indexes persons ( j = 1,...239) and i indexes occasions (i = 1,...,n j ) where nj is the number of interviews for person j with a maximum 5 in these data. The change trajectory for each person j consists of two parameters: 7roj = status age 13 and 7r1j = annual rate of increase. This pair of person-specific change parameters become outcomes in a level-:! model for variation between persons. The simplest level-2 model enables us to estimate the mean trajectory and the extent of variation around the mean: noj = Po0 Tlj
= PlO
+ uoj + u1j
Thus, Po0 is the population mean status a t age 13 and Plo is the population mean annual rate of increase from age 11 to 15. The person-specific random effects are uoj, the deviation of person j ’ s status at 13 from the population mean; and u l j , the deviation of person j ’ s rate of increase from the population mean rate. These random effects are assumed bivariate normally distributed, that is
so that TOO is the variance in status at age 13, 711 is the variance of the rates of change, and 701 = 710 the covariance between status at age 13 and rate of change.
Raudenbusli
32
Results We first consider the results for the fixed effects (Table 2.2, under "random linear slope"). Mean tolerance at age 13 is estimated to be &^= 0.327, se = 0.013. The mean rate of increase is significantly positive, Plo = 0.065, se = 0.0049, t = 13.15. In terms of the standard deviation of the outcome (Table 2.2), this is equivalent to an increase of roughly 20 to 25% of a standard deviation per year. The variance-covariance estimates give information about the degree of individual variation in status and change. For example, the variance of the rates of change is estimated at .ioo = .0025, equivalent to a standard deviation of about ,050. This implies that a p$rticipant who is one standard deviation below the mean in rate of change Po0 = .065 would have a rate of .065 - .050 = .015, quite near to zero, while a participant with a rate one standard deviation above the mean would have a rate .065 .050 = .115, quite a rapid rate of increase (at least a third of a standard deviation per year).
+
Implied Assumptions Concerning Variation and Covariation over Time If we combine the level-] model (Equation 2.1) and the level-2 model (Equation 2.2), we have the combined model
or
where
which has a mean of zero and a variance VUT(Eij)
= roo
2 + 2aij.ro1 + aij711 + cT2
(2.7)
Thus, under the linear model, the variance of an observation at a particular occasion is a quadratic function of aij = age - 13 for person j at time i. By taking the first derivative with respect to age, we also see that the variance will change as a linear function of age:
Thus, the rate of change in the variance has an intercept proportional of 701 and a slope proportional to 711. These are strong assumptions, and it is natural to ask whether the variances across the five time points behave in the way implied by the model.
A1t erna tive Covariance Structures
33
The model also has strong implications for the covariance between two outcomes Yij and Yi/j for person j , that is, outcomes observed, at occasion i and occasion i' for person j: C O V ( € i j ,Q j )
+
= TOO (aij
+
Ui/j)T01 $. U i j U i / j T 1 1
(2.9)
Again the question is whether the covariances between pairs of time points implied by Equation 2.9 accurately capture the "true" covariances. If a study is designed to collect T time points per participant, and if each person has the same variance-covariance matrix, there will be T variances and T ( T - l ) / 2 covariances for a total of T ( T 1)/2 variancecovariance parameters overall. In the current example, with T = 5, there will be 5 variances and 10 covariances for a total of 15 variance covariance parameters. Yet our simple linear model of Equations 2.1 and 2.2 implies that these 15 parameters are effectively linear functions of four underlying parameters: T O O , 701, 711, and c2 (see Equations 2.7 and 2.9). It is possible that four parameters are insufficient to adequately represent the 15 variances and covariances that might be estimated. In this case, our model, which is based on randomly varying linear change functions across the population of persons, is too simple to adequately represent the variation and covariation over time. We might then elaborate the model, for example, by formulating a quadratic model, which would have three random effects per person. The level-1 model might be
+
In this model, 7roj remains the status of person j at age 13; 7r1j becomes the "average velocity," that is, the average rate of increase in tolerance; 7 r 2 j becomes "acceleration." According to past research, tolerance of prodeviant thinking, although increasing during adolescence, will reach a peak and then decline in early adulthood. The quadratic model enables us t o assess whether this diminishing rate of increase has begun to occur as early as 15. If so, values of 7 r 2 j will tend to be negative. We might decide to keep the structure of the level-1 variance simple here so that the level-1 residuals are independent and homoscedastic. However, the variance-covariance structure is now elaborated at level 2: TOj Tlj
7i-2j
= Po0 =PlO = Pzo
+ uoj + Ulj + u3j
(2.11)
where we assume 'This is the homogeneity of dispersion assumption common in multivariate repeated measures. It provides a reasonable starting point for a multivariate analysis, aithough the modeling framework t o be presented is not limited to the homogeneity assumption.
Rauden b ush
34
Note that the level-2 model has six unique parameters: three variances and three covariances. Together with the level-1 variance, then, the model uses 7 parameters to represent the 15 marginal variance-covariance parameters of the five time points. It is of interest to assess whether this model provides a significantly better fit to the data than does the linear change model, which, as we have seen, generated 4 parameters to account for the 15 marginal variances and covariances. Alternatively, it might be that an even simpler between-person model might fit the 15 variances and covariances as well as those given by Equations 2.7 and 2.9. Suppose, for example, that in the linear model the variance of the linear rates of change were null, that is n11 = no1 = 0. Then the expression for the variance in Equation 2.7 would simplify t o Var(cij) equals ~~0 o2 and the expression for the covariance in Equation 2.9 would simplify to C O V ( E~~i~, j, = ) TOO. This model, which constrains the linear rates of change of all persons t o be the same but allows the intercept to vary, would then generate the compound symmetry model commonly assumed in univariate repeated measures analysis of variance. According to this model, variances are constant across time as are the covariances, and the 15 possible variance-covariance parameters associated with the five time points would be effectively reproduced by two parameters. These possibilities are explored in Table 2.2 The fits of the alternative models (linear mean change with compound symmetry, linear mean change with varying linear functions, quadratic mean change with varying quadratic functions) are compared by comparing model deviance statistics. Models can be compared by computing the differences between deviances, which are asymptotically distributed as x2 variates under the null hypothesis that the simpler model adequately fits the data as well as the more complex data does. The degrees of freedom for the test is the difference between the numbers of parameters estimated in the two models. The total number of parameters is the number of variance-covariance parameter plus the number of fixed effects. The results indicate clearly that the compound symmetry model with fixed linear slopes provides a poorer fit than does the model with randomly varying linear slopes. We reach this conclusion by computing the difference between the deviance based on the compound symmetry model and the deviance based on the model with randomly varying linear slopes, obtaining - 229.00 - (-338.07) = 109.07, comparable to the percentiles of the x2 distribution with df = 2, the difference in the number of parameters estimated (the compound symmetry
+
'If the model is estimated via restricted maximum likelihood, the number of parameters is just the number of covariance parameters. See Bryk & Raudenbush, 1992, Chapters 3 and 10.
36
Rauden bush
model has 4 parameters and the randomly varying linear model has 6), p = 0.000. In comparing the model with randomly varying linear func-
tions with the model with randomly varying quadratic functions, there is marginally significant evidence that the quadratic model fits better. Here the difference in deviances is -338.07 - (-348.23) = 10.16, df = 4, p = 0.037. Note also that the standard error estimated for is considerably smaller under the compound symmetry assumptions than under the other two models. Given that the compound symmetry model provides a poorer fit to the data than does either of the other two models, we must conclude that this smaller standard error is unjustified and that inferences about the fixed effects are sensitive to incorrectly assuming that compound symmetry holds. This illustrates a key point about inference for these models. The question is not only whether the data support the variance-covariance assumptions but whether inferences about fixed effects are sensitive to misspecification of the variance-covariance structure. Note, however, that none of the three models reported in Table 2.3 is compared with a model that estimates all 15 parameters associated with the 5 time points nor have we considered alternative covariance structures, including autocorrelated or heteoscedastic level-1 errors. We now turn to that problem.
THE GENERALIZED MULTIVARIATE LINEAR MODEL AS A HIERARCHICAL MODEL Following Jennrich and Schluchter (1986), we now seek to estimate multivariate regression model, where each person, in principal, has five time points, so that it will be possible to estimate the 5 by 5 variance-covariance matrix by pooling data across persons. This would not be difficult if each person had complete data and our results would reproduce the results of a conventional mutivariate analysis of variance (Bock, 1975) or of unrestricted SEM model based on a single population. When the data are unbalanced, however, the task is more challenging. The problem has been solved, of course, not only by Jennrich and Schluchter (1986) but in various algorithms for the imputation of missing data, following the work of Little and Rubin (1987).
Reformulation as a Two-Level Model To address this problem in a way that will readily generalize to a variety of two- and three-level hierarchical models, we adopt the approach of Goldstein, 1995; (see also Raudenbush, 1995; Kalaian & Raudenbush, 1996), who construct a level-1 model that relates the observed to the complete data:
A1t ernative Covariance Structures
37
T
y ~ = j
C
mtijYt;
(2.13)
t=l
where Yij is again the outcome for person j associated with occasion i. Here Ytj.is the value that person j would have displayed if that person had been observed at time t , and mtij is an indicator variable taking on a value of unity of y Z j is observed at time t and zero otherwise. Thus q:, t = 1,... T , represents the complete data for person j whereas y Z j , i = 1,..., nj represents the observed data, and the five indicator variables mtij tell us the pattern of missing data for person j . To make this clear, consider a person in the National Youth Survey who was interviewed at ages 11, 12, and 14 but not at ages 13 or 15; that is, this person would have data at times 1, 2, and 4 but not at times 3 and 5. Then Equation 2.13, in matrix notation, would be
( ;:; ) = ( i Y3j
0 0 0 1 0 0 0 0 1
(2.14)
Yj = MjY;
(2.15)
or
This model says simply that the three available data points for person j were observed at times 1, 2, and 4 so that data were missing at times
3 and 5. Although these data were missing, they do exist, in principle. Thus, every participant has a full 5 by 1 vector of "complete data" Yj* even though the nj by 1 vector of observed data, y j , will vary in length across persons. The level-2 model describes how the complete data change over time. We have, in the case of a mean structure that is linear
y,; = Po0 + P l O U t j + E t j
(2.16)
where utj is age-13 at time t for person j. Thus Po0 and retain their earlier definitions. If we allow the residuals E t j to have arbitrary variances and covariances as a function of time, we might write (2.17)
With these definitions in mind, we may generally write the level-2 model, in matrix notation. as
yj' = xjp + E j ,
Ej
N
N ( 0 , C)
(2.18)
38
Rauden bush
where Y; is again the T by 1 vector of complete data for person j , X j is the T by f matrix of predictors associated with the f ”fixed effects” regression coefficients contained in p, and ~j is a T by 1 vector of residuals assumed T-variate normal, each with mean zero and a T by T covariance matrix A having variances 6; and covariances In the present case, T = 5 and f = 2. In sum, our two-level formulation poses a level-1 model (Equation 2.14) that relates the observed data Y to the ”complete data” Y * ,that is, the data that would have been observed if the researchers were successful in obtaining outcome data a t every time point. Our level-2 model (Equation 2.18) is a standard multivariate normal regression model for the complete data. Algebraically substituting the level-2 expression for Y * into the level-2 model yields the combined model (2.19) Our strategy for estimating this model is described in the technical appendix.
Placing Restrictions on the Model To replicate the linear change model of Equations 2.1 and 2.2 (see also Equations 2.7 and 2.9), we might constrain these variances and covariances in A to be quadratic functions of four parameters:
Thus, the ”standard” hierarchical models for growth and change can be viewed as special cases of the multivariate model for incomplete data. In fact, using the algorithm described in the technical appendix, the results of Table 2.2, produced with ”standard” HLM software, were exactly reproduced by constraining the variances and covariances of the multivariate model as in Equation 2.20. This principle holds when the design calls for time-structured data (the same time-series design for each participant), although the time series data are incomplete as a result of data missing a t random. l o The idea of representing well-known models for change as special cases of an unrestricted multivaraite model with missing data is at the heart of Jennrich and Schlucter’s (1986) approach. The fit of the model in each case can be compared with that provided by the unrestricted model (Equation 2.17). For illustration, Table 2.3 presents results of estimating a series of models of increasing complexity. All models are of the form ‘OThe assumption of missing at random is not as restrictive as it may sound (Little & Rubin, 1987). The assumption is that the probability of a time-series outcome, Yttj being missing is conditionally independent of Ytj given the observed data.
Alternative Covariance Structures
39
Y,f = Do0 + P l o a t j + h o a g + c t j
(2.21)
What varies are the assumptions about the variances and covariances of the residuals c t j . Of interest are the fit of the alternative models for the covariance structure and the robustness of the inferences about the fixed effects, ,8, to misspecification of the variance-covariance matrix. All model comparison tests are reported in Table 2.4. Starting with the simplest, the following models are estimated. 1. The Compound symmetrycompound symmetry model. We have ctj
where uoj and implying
etj
=U
+
(2.22)
O ~ etj
are mutually independent and
etj,
is independent of
ettj
(2.23)
Thus, the compound symmetry model represents the 15 variances and covariance as functions of 2 parameters. The results are similar to those in Table 2.3 because &,is estimated to be very close t o zero. 2. A first-order autoregressive "AR(1)" model, which has the same form as Equation 2.23 but where e t j and e t l j are dependent: ctj
= (1 - p ) e t j
+
pet-lj
(2.24)
Thus
(2.25) (2.26) Note that the autocorrelation parameter is significantly different from zero, a result that can be deduced by referring to the model comparisoii test (model 1 versus model 2), x2 = 65.30, df = 1, = p = .OOO. The estimate is j? = .397, se = .054. 3. A model in which the linear rates of change vary randomly while the quadratic rates are held constant: ctj
= uoj
+ u l j a t j + etj
(2.27)
Here we have a standard HLM with a 2 by 2 covariance matrix at level 2 (Equation 2.3) and independent, homogenous level-1 variance. Thus, the 15 marginal variances and covariances are represented as functions of 4
Table 2.3: Some Alternative Covariance Structures for Autoregressive (1) at level 1 Fixed Effects Intercept, Linear,
Coeff
SE
POO 0.3276 0.0153
PIO
Quadratic,
0.0614 0.0048
PZO 0.0002 0.0034
Log linear at level 1
+
= poo ploaij
+ ,Bzoa21j+ eij
Separate level 1 variances f o r each
Unrestricted
Ratio
Coeff
SE
Ratio
Coeff
SE
Ratio
21.46
0.3281
0.0152
21.56
0.3276
0.0152
21.48
0.3202 0.0150
21.37
12.70
0.0620
0.0048
13.01
0.0608
0.0047
12.85
0.0593 0.0047
12.60
0.06
-0.0002 0.0032
-0.06
-0.0005 0.0032
-0.17
0.0003 0.0031
0.10
0.0403 0 0077 -0 0021
Level 2
yZj
+ 0.0243
i
0.0035 -0.0000 0.0008
]
0.0407 0.0074 -0.0024
+=[
0.0038 -0.0003 0.0011
]
Coeff
SE
Ratio
0.035 0.057 0.019 0.215 0.025 0.045 0.028 0.025 0.027
A =
0.073 0.053 0.048 0.086 0.066 0.090-
Level 1
m 2 = 0.0416
6 = ,3968
a-0'
= -.3503
62'
" 1 '
= -0.063
c-s3 = 0 0269
0'13 = -0.221
= 0 0260
m2'
= 0 0260
m-5'
= 0.0028
Model Fit Deviance
Df
-294.32
-360.99
-363.21
-378.27
ic?
6
12
14
18
CD
Q 5
U
A1ternative Covariance Structures
41
Table 2.4 Summary of Model Comparisons
(a) Sumarry of Fit 1.
2.
3.
4.
5. 6.
7. (b)
Random intercept model (homogeneous level 1 variance) Random intercept model [AR (I) at level 11 Random linear slope (homogeneous level 1 variance) Random quadratic model (homogeneous variance a t level 1) Random quadratic model (log-linear at level 1) Random quadratic model (separate level 1 variance for each time point) Unstructured Comparison of Nested Models
Model 1 versus 2 Model 1 versus 3 Model 1 versus 4 Model 1 versus 5 Model 1 versus 6 Model 1 versus 7 Model 2 versus 7 Model 3 versus 4 Model 3 versus 5 Model 3 versus 6 Model 3 versus 7 Model 4 versus 5 Model 4 versus 6 Model 4 versus 7 Model 5 versus 6 Model 5 versus 7 Model 6 versus 7
Deviance -229.02
df 5
-294.32
6
-338.07
7
-384.23
10
-360.99
12
-363.21
14
-378.27 Difference between Deviances 65.30 109.02 119.21 131.97 134.19 199.25 83.95 10.16 22.92 25.14 40.20 12.76 14.98 30.04 2.22 17.28 15.06
18 df
1 2 5 7 9 13 13 3 5 7 11 2
4 8 2 6 4
p
0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.017 0.001 0.001 0.000 0.002 0.005 0.000 0.330 0.009 0.005
42
Rauden bush
underlying parameters. This model also fits better than does the compound symmetry model x2 = 109.02, df = 2. l1 4. The "standard" HLM quadratic model (where both the linear and quadratic rates of change vary randomly): (2.28) This model corresponds to the HLM with a 3 by 3 level-:! covariance matrix of Equation 2.12 and homogeneous, independent level-1 variance, cz = o2 (results are in Table 2.2) . Now the quadratic as well as the linear growth rates are viewed as varying at random. This model also fits better than does the compound symmetry model x2 = 119.21, df = 5; it fits marginally better than does model 3, x2 = 10.16, df = 3, = 0.17, providing some evidence that participants do vary in rates of "acceleration" (that is, quadratic components of change). 5. A log-linear model for heteroscedastic level-1 variances (nine covariance parameters). In this case, we estimated the 3 by 3 level-2 covariance matrix of Equation 2.12. However, the level-1 errors, although independent, were heteroscadastic and modeled according to a log-linear model (2.29) This model fits better than does model 4, x2 = 12.76, df = 2, p = .002. In particular there is evidence of a quadratic relationship between age and the log of the level-1 variance, 6 2 = -.221, se = .077. 6. A model with time-varying level- 1 variances (11covariance parameters). Again, the level-2 covariance matrix is 3 by 3 as in Equation 2.12 and the level-1 errors are independent, but now a separate 0: is estimated for each t . This model does not fit better than the log-linear model, x2 = 2.22, df = 2, p = .33. 7. The unrestricted model, like the others, represents mean change with three fixed effects but estimates 15 variance-covariance parameters, that is, five variances, S t z , t = 1, ..., 5, and 10 covariances St,!. The results suggest that none of the previous models fits the data as well as does this unrestricted model. The random quadratic model with log-linear level-1 variance structure comes close, x2 = 17.28, df = 6 , p = .009. The unrestricted model appears to facilitate no theoretically interesting interpretation regarding variation in individual change. However, as we shall see, this unrestricted model can be viewed as a special HLM for individual change. Once seen in this light, we can assess sensitivity of inferences regarding variation in individual change to alternative models. I'Note that this model is not directly comparable with the AR(1) model. They are not nested models (one cannot be obtained by simplifying the other). It was not possible simultaneously to obtain a non-negative estimate of variance of the linear growth rates and an estimate of the autocorrelation parameter. It appears that models with random coefficients and autocorrelated level-1 variance will be useful only when the number of time points, T, is relatively large.
Alternative Covariance Structures
43
The Unrestricted Model as a ”Standard” HLM Model For time-structured data, there is always a ”standard” HLM that will duplicate the results of the unrestricted model. Specifically, if the level-1 model is a polynomial of degree T - 2 with all coefficients random, and if the level-1 variance is allowed to vary with time (one variance per time point), the two-level HLM will include T(T+1)/2 covariance parameters, exactly reproducing the marginal variance and covariance estimates of the unrestricted model. However, as T increases, this model will become theoretically uninteresting. In our example, with T = 5, a random cubic model with heterogeneous variances will reproduce the results of the unrestricted model. However, little theory is available to make sense of the cubic term in the change function.
Sensitivity of Inferences t o Alternative Covariance Specificat ions Researchers ought to evaluate models not only by their fit to the data but according to their interpretability. Suppose that interest were to focus primarily on the mean and variance in the ”average velocity,” that is, the average rate of increase in tolerance between the ages of 11 and 15. In that light, higher-order effects might be viewed as incidental to this primary focus. An important question would then be the sensitivity of key inferences to alternative specifications of the variance-covariance structure. Table 2.5 summarizes the results depending on the order of polynomial specified for mean change and for variation in change as well as on the assumptions about heterogeneity at level 1. The results across six models are generally similar. However, there are some differences. In particular, models d and e, which have quadratic mean change, varying quadratic functions across persons and heterogeneous level-1 variances, indicate somewhat greater between person-variation in linear rates than do models a, b, and c, which fit the data less well. However, the best fitting model (model f ) has a cubic mean structure, varying cubic effects, and heterogeneous level-1 variances. This model, in fact, corresponds to an unrestricted variance- covariance structure with cubic mean structure. Inferences about the mean and variation in linear rates for this model are more similar to those based on simpler models (a, b, and c) than to those based on moderately more complex models (d and e). It will generally be sensible to assess sensitivity of key inferences to alternative plausible covariance specifications.
On the ”Generality” of the Unrestricted Model With time-structured data, it is tempting to view the unrestricted model as the most general model that can be estimated within a family of multivariate normal models. However, the restricted model and all submodels presented in Table 2.3 share the assumption of homogeneity of dispersion
Raudenbush
44
Table 2.5 Sensitivity of Inferences Regarding Mean and Variation in Linear Rate of Increasea
Fixed Effects
Level 2
Level 1
Mean Rate
Variance
Deviance
df
a) Linear
Linear
homogeneous
,0647 (.00492)
,00251
-338.07
6
b) Quadratic
Linear
homogeneous
,0647 (.00492)
,00251
-338.07
7
c) Quadratic
Quad.
homogeneous
.0647 (.00493)
.00277
-384.23
10
d) Quadratic
Quad.
log-quadratic
,0620 (.00477)
,00346
-360.99
12
e) Quadratic
Quad.
heterogeneous
,0609 (.00474)
.00382
-363.21
14
f ) Cubic
Cubic
heterogeneous
,0639 (.00489)
.00297
-389.23
19
"Standard errors are in parentheses.
that is, the complete data of each participant are assumed to have a common marginal variance-covariance matrix, A. In fact, many interesting models will not meet that assumption. For example, in our data, we have at each occasion a measure of the exposure of the participant to deviant peers, that is, a measure of the extent t o which that person's peers are tolerant of deviant behavior (see Raudenbush & Chan, 1993, for details). This "exposure" variable will have a different distribution over time for different participants. Suppose we specify "exposure" as a predictor (a "time-varying covariate") in the level-1 model of HLM. Then the marginal variance-covariance matrix will be heterogeneous if either of two conditions hold: exposure has a random coefficient, or exposure is related t o the level1 variance. These models can readily be estimated within the framework of HLM. However, these models are not special cases of the generalized multivariate linear model nor can they be estimated within the framework of SEM. To illustrate, we estimate a "standard" two-level model, where a t level-1 we have (2.30) The level-1 variance is allowed to remain homogeneous. At level-2, all coefficients are random: (2.31)
Alternative Covariance Structures
45
Thus, the level-2 model has 10 parameters, with 11 covariance parameters overall. We compare this with the "unrestricted model"
which involves 15 variance covariance parameters. Results are displayed in Table 2.6. Note that inferences about the fixed effects are essentially identical. However, the deviance associated with the HLM model based on only 15 parameters is actually smaller than the deviance associated with the unrestricted model, which has 19 parameters. The models are not nested because the HLM model, which has fewer parameters, is not a submodel of the unrestricted model. The HLM model induces heterogeneous variancecovariance matrices across participants as a function of participant variation in exposure, whereas the unrestricted model is general only within the class of models assuming homogenous variance-covariance matrices. l 2
Robust Standard Errors All of the results discussed so far assume that the random effects in the model are multivariate normal in distribution. All but the last HLM model assume homogeneous covariance matrices, and that model implies a definite structure to the heterogeneity (it must depend on exposure). If interest is confined t o the fixed effects of the model (the P s ) and their standard errors, one can compute Huber-corrected robust standard errors that do not assume normality nor any particular covariance structure (cf. Zeger et al., 1988). The consistency of these standard errors depends on the number of participants. (See Appendix for technical details.) Table 2.7 displays results of standard error estimation for the model having a mean function that is quadratic. The model-based standard errors are founded on the assumption of random variation in quadratic change and homogeneous variance. The "robust" standard errors require no assumptions about the specific structure of the variances and covariances and do not require normality of the residuals. Yet the two sets of standard errors are identical t o three significant digits, implying that the results are not sensitive t o model assumptions in this case. This happy result will certainly not always arise, and assessing the robustness of standard errors is a useful strategy, especially when the number of participants is reasonably large.
INCORPORATING CLUSTERING OF PERSONS WITHIN SOCIAL SETTINGS Two-level models for repeated measures on persons can readily be adapted t o incorporate the clustering of persons within social settings. Bryk and "Note also that exposure is essentially continuous. Thus, the HLM results cannot be duplicated by construction of a finite set of covariance matrices.
Table 2.6: A Comparison between an Unrestricted Model (Homogeneous Dispersion) and a Model with Heterogeneous Dispersion Model 1 : Complete Data have Unstructured but Homogeneous Dispersion Faxed Effects
Coeff
Intercept, BOO
0.3252
Linear,
PIO
SE
*
Model 2: Dispersion Depending on Exposure to Deviant Peers
SE
Ratio
Coeff
Ratio
0.0127
25.67
0.3251
0.0125
25.86
0.0487
0.0045
10.74
0.0466
0.0047
10.00
Quadratic,
,I320
-0.0006
0.0030
-0.21
0.0006
0.0030
0.21
Exposure,
p30
0.3186
0.0244
13.07
0.3430
0.0295
11.62
Variance-covariance Component 0.035
A=
I
0.011
0.014
0.015
0.014
0.035
0.018
0.016
0.016
0.054
0.034
0.028
0.062
0.042 0.062
0.0236
0.0034 0.0021
-0.0016 0.0000
Df
-0.0029
T =
0.0038
0.0000
0.0457
-
2
= 0.0210
Model Fit Deviance
0.0072
-517.26
-520.63
19
15
ic?
E-cr E-
A1t erna tive Covariance Structures
47
Table 2.7 Robustness of Model-Based Standard Errors for Y z j = POO P10aij P z o ~ : ~E i j
+
+
+
HLM with + u1jaij + EZj homogeneous level-1 variance
Eij
Parameter Mean,
Po0
Linear, Pl0 Quadratic,
= uoj
t
Robust Standard Errors
t
Coeff
SE
0.3272
0.0153
21.38
0.0153
23.38
0.0647
0.0049
13.14
0.0249
13.14
PZO 0.0002 0.0032
0.05
0.0032
0.05
SE
Raudenbush (1988) studied school differences in children’s growth by adding a third level to the standard two-level model for individual change. The first level thus represented individual change over time, the second level represented individual differences in change within schools, and the third level represented variation between schools. Given time-structured data, the three-level HLMs are submodels of a two-level general multivariate linear model (cf. Thum, 1997). Thus, by adding a level to Jennrich and Schlucter’s (1986) multivariate model, we can estimate a range of covariance structures within a model that incorporates clustering. We shall reanalyze the Sustaining Effects Data earlier analyzed by (Bryk & Raudenbush, 1988). Mathematics achievement, measured on a vertically equated scale to reflect growth, is the outcome. Children were observed at spring of kindergarten and twice annually during first and second grades. Although the aim was to obtain five repeated measures, some children were absent at various testing times. Thus, the data were time-structured in design but incomplete in practice. The 618 participants were nested within 86 schools.
Three-Level HLM Model Bryk and Raudenbush (1988) formulated a three-level model, where, at level 1, the mathematics outcome for student j in school Ic was represented as depending linearly on time plus the effect of a time-varying covariate:
48
Raudenbush
Here time = 0,1,2,3,4 at times 1,2,3,4,5. Thus, 7rojk represents the initial status of student j in school k . “summer” takes on a value of 1 if the previous time was summer and 0 if not. It can readily be shown, then, that 7 r 1 j k is the calendar year growth rate for student J’ in school k ; T l j k T2jk is the summer growth rate and nljk - 7 r 2 j k is the academic year growth rate. Thus, three 7r’s capture the growth of each student as a function of an initial status, a calendar year growth rate and a summer effect. At level-2, the 7r’s become outcome variables in a model that explains variation between students within schools. For example, we might have
+
npjk
= PpOk f
P p l k ( c h i l d p0v)jk
+upjk
(2.35)
Here (child pov)jk is an indicator taking on a value of 1 if child j k is in poverty (as indicated by eligibility for a free lunch) and 0 if not. The variances and covariances among the u p j k are collected in matrix T. Combining these two models, we have
The model implies, then, that, within a school, math achievement at a given time depends upon the “time,” “summer,” and “child poverty,” two-way interactions of child poverty with “time” and “summer,” plus a random error:
The ”standard” three-level HLM can test alternative covariance structures by setting one or two of the u’s in Equation 2.35 to zero. The level-3 model accounts for variation between schools. For example, we might simply estimate
PlOk
=
+ WOOk 7100 + WlOk
P20k
=
7200 +W20k
POPk
=
YOp0,P
book
=
Yo00
(2.38)
>0
Here the variances and covariances of the
up0k
are collected in a matrix
R. In subsequent analyses, the random effect of summer is dropped from Equations 2.33 and 2.36.
A1t ernative Covariance Structures
49
Reformulation as a Hierarchical Multivariate Model with Incomplete Data Level-1 Model The first level of the model again represents the relationship between the observed and complete data. Schools are incorporated simply by adding a subscript to Equation 2.13: T=5
(2.39) Thus x j k is the math achievement score on the ith occasion (i = 1,...,njk) for student j in school k and q;kis the score that would have been observed if student j within school k had been present a t time t (t = 1,...5 ) . The indicator rntijk takes on a value of 1 if occasion i corresponds to time t.
Level-2 Model The level-2 model is a multivariate, within-school model for the complete data:
q*jk
=
POOk
+ PlOk(time)tjk + P20k(summer)tjk
+POlk(childpov)jk +Pzlk(child p 0 v ) j k
+ Pllk(childpov)jk * (time)ijk
(2.40)
* (SUmmer)ijk -k E t j k
This level-2 model has the same form as the level-2 model of HLM, but now the residuals may have an arbitrary variance-covariance matrix, A , composed of elements
Alternatively, the covariance matrix A may be structured as in the twolevel case. The level-3 model has the same form as Equation 2.38.
Results Table 2.8 compares results from three models. The first two models are three- level hierarchical models with randomly varying intercepts and annual growth rates at level 2 (between children within schools) and level 3 (between schools). They differ in that the first model assumes homogenous level-1 variance whereas the second model allows a separate level-1 variance for each time point. This second model fits better than the first, x2 = 29998.90 - 29966.80 = 32.10, df = 4, p = ,000. However, neither fits as well as the model with unrestricted variances and covariances at level 2. Despite the poorer fit of the simpler models, inferences about the fixed effects
50
Rauden bush
are remarkably similar. Of most importance in the original analysis was the extent of variation between schools in the annual rates of growth. The estimate of 14.78 based on the model with unrestricted level-:! covariance structure is similar to that based on the hierarchical model with heterogeneous variances, 14.33, and a bit smaller than the estimate based on the hierarchical model with homogenous level-1 variance. As in the two-level case, it is possible and often useful to estimate key parameters under a variety of plausible alternative specifications as a sensitivity analysis. It will also be useful and straightforward to compare standard errors for fixed effects based on robust estimation, especially when the number of level-3 units is reasonably large.
DISCUSSION In applying hierarchical models to repeated measures data, the level-1 or "within-person" model is often a polynomial function of age or time. Its parameters might include status, rate of change, and acceleration at a given age, as illustrated by reanalyses of data from the National Youth Survey (NYS). As we have seen, the level-1 model might include other time-varying covariates, such as exposure to deviant peers in the NYS data or the summer effect on learning in the Sustaining Effects data. Any parameter of the level-1 model might be viewed as varying randomly over persons. Such modeling decisions induce assumptions about the covariance structure of the time-series data. This chapter has considered how to compare the fit of alternative models and how to assess the sensitivity of key findings to alternative assumptions about the covariance structure of the time-series data. In comparing these alternative models, it is essential to make three fundamental distinctions: (a) between models that assume homogeneity of dispersion of the "complete data" and models that do not, (b) between models that assume multivariate normality and those that do not, and (c) between models that do or do not incorporate nesting of participants within social settings. These distinctions will have definite implications for choosing approaches to estimation and computational algorithms. Moreover, an awareness of these distinctions is essential in considering sensitivity of findings to alternative assumptions.
Homogeneity Versus Heterogeneity of Dispersion This chapter has contrasted what have become conventional hierarchical linear models (HLMs) with multivariate models for incomplete data (general multivariate models or GMLMs). The GMLM can be viewed as a special case of a hierarchical model where level-1 represents the relationship between the observed and complete data and level-:! represents the model for the complete data. If the complete data have homogeneous covariance
Table 2.8: Model Comparison for Repeated Measures on Children within School Fixed Effects Intercept, yo0 Child Poverty, yolo Time out, 7100 Child Poverty x Time out, yllo Summer Drop, yzoo Child Poverty x Summer Drop, 7 2 1 0 Variance-Covariance Parameters
Coeff
SE
Ratio
Coeff
SE
Ratao
Coeff
SE
Ratio
403.40 0.49 27.74 0.37 -27.27 -0.12
3.70 1.87 0.83 0.42 2.09 1.06
109.03 0.26 33.56 0.89 -13.03 -0.49
403.20 0.60 27.74 0.29 -27.27 -0.51
3.67 1.85 0.83 0.42 2.05 1.03
109.86 0.33 33.47 0.70 -13.33 -0.49
402.98 0.66 27.58 0.35 -27.46 -0.43
3.64 1.84 0.82 0.42 1.91 0.96
110.97 0.32 33.46 0.85 -14.38 -0.45
R=
116.36
-0.09 17.36
R=
126.71
-6.36 14.33
R=
140.07
1707
u 2 = 610.70
8.1= " 543.99 8; = 555.92 8; = 683.53 8: = 468.05 6-52 = 844.03
29998.90 13
29966.80 17
-2.80 14.78 1119 1143 1255 1212 2080 1463 1853
Model Fat VI r
Deviance Df
29941.26 24
1 1128 1260 1367 1346 2201
52
Raudenbusli
structure (each participant within a given population has the same timeseries covariance structure), alternative ”conventional” HLMs can, in turn, be viewed as submodels of the GMLM. This was illustrated in our reanalysis of the NYS data. We showed that by imposing certain specific restrictions on a full T by T covariance matrix, we could replicate the results of the now-standard HLM analysis. Within that framework, it is straightforward to compare alternative HLMs with each other, with the unrestricted model, and with models that have not been widely applied within the HLM framework, models including autocorrelated level-1 errors and log-linear models for heterogeneity of level-1 variance over time (see Tables 2.2, 2.3, and 2.4 and associated discussion). The conventional HLMs are therefore submodels of a GMLM, but only within the class of models that assume the complete data t o have homogeneity of dispersion within subpopulations. A key feature of such ”homogeneity” models is that level-1 predictors having random effects must have the same distribution within the complete data of every participant in a given subpopulation. For example, in the NYS data, age has the same distribution within every participant’s complete-data record: the planners of that study aimed to collect data annually, at ages 11, 12, 13, 14, and 15. However, exposure to deviant peers will have a different distribution for different participants. Thus, a model with randomly varying effects of exposure falls outside the class of GMLMs but not outside the class of HLMs. In that sense the GMLM is a special case of the HLM when the withinpopulation covariance structure is constant over participants (see Table 2.6 and associated discussion). Another example arises when the level-1 variance depends on a continuous level-1 predictor that varies randomly over participants. For example, if the level-1 variance were a function of exposure, the marginal variance-covariance structure would be different for every participant. In many cases, the theoretical focus will be on a small number of key parameters where other parameters are of more incidental interest. One can and generally should assess the sensitivity of key inferences to plausible alternative covariance specifications (see Table 2.5 and associated discussion).
Normal Versus Non-normal Models Both standard HLM and the GMLM require the assumption that the residual vector for each participant has a multivariate normal distribution. Seltzer (1993) and Thum (1997) relax this assumption by allowing the random effects at level 2 to have a multivariate t distribution, an approach that is particularly appropriate when the number of participants is small and one seeks robustness with respect to outlying participants. If the number of participants is large and attention is restricted to fixed effects, Huber-type robust standard errors are available that are essentially nonparametric. One can compare inferences based on these robust standard
A1t erna tive Covariance Structures
53
errors with inferences based on a given model for the covariance structure as a way of assessing sensitivity of inferences to assumptions about the covariance structure (see Table 2.8 and the associated discussion). An important but separate set of models is available for discrete outcomes, including binary data, ordinal data, and counts. The level-1 models are generalized linearized linear models (McCullogh & Nelder, 1989; Diggle, Liang, & Zeger, 1994), defined by a sampling distribution (e.g., binomial, multinomial, or Poisson), and a nonlinear link function (e.g., logit or log link). The coefficients of the level-1 model then vary randomly over the participants in a level-2 model. The level-2 random effects have most often been assumed multivariate normal (e.g., Goldstein, 1995; Longford, 1993). However, robust standard errors are available (e.g., Bryk et al., 1996; Zeger et al., 1988; Chapters 5 and 6).
Clustering of Participants Having reformulated the GMLM as a hierarchical model, it is straightforward to add levels to incorporate the clustering of repeatedly observed persons within social settings such as schools or treatment centers. The level-1 model relates the observed to the complete data for each participant; the level-2 model is a multivariate normal regression model for variation within clusters, the regression coefficients of which vary randomly over clusters in a level-3 model. Thus, it is possible to compare and contrast alternative level-1 covariance structures just as in the case of two-level models (see Table 2.8 and associated discussion). Again, robust standard errors are available for the fixed effects, and these are most applicable when the number of clusters is relatively large.
FINAL REMARKS In approaching the study of individual growth and change, this chapter emphasized certain key principles: (a) the formulation of level-1 models that are developmentally meaningful, (b) the specification of level-:! models that link to key hypotheses about individual differences in development, and (c) the examination of the sensitivity of key inferences to alternative models of the covariance structure in light of the adequacy of their fit to the data, and (d) examination of the sensitivity of inferences about fixed effects to parametric assumptions. A topic of vital interest not considered here is the robustness of inferences to non-ignorable missingness. An explicit model for the complete data and its relationship to the observed data, key to the approach we have adopted, is a foundation for inquiry into this problem (Schafer, 1996). As an increasing variety of new algorithms and packages become increasingly accessible, it is essential to keep the fundamental issues of model specification and statistical conclusion validity in the foreground. An as-
Raudenbush
54
sessment of the robustness of key inferences within a study appears crucial in building durable knowledge in behavioral science.
ACKNOWLEDGEMENTS Research reported here was supported by the Project on Human Development in Chicago Neighborhoods with funding from the John D. And Catherine T . MacArthur Foundation, the National Institute of Justice, and the National Institute of Mental Health.
TECHNICAL APPENDIX We can estimate the complete-data model of Equation 2.18 by applying maximum likelihood to the observed-data model of Equation 2.19. Setting X j = M j Xj*and ~j = Mj ~ j ,* we have
T=Xjp+Ej,
€j"(O,V,)
(2.42)
where V, = Mj AM?. Then, give V,,p can be estimated via generalized least squares:
Iterative re-estimation of Equation 2.43 based on an updated value of Vj gives the Fisher-scoring algorithm for estimation of p. To estimate 5, we need to estimate the unknown elements of A. Let 6 = the vector of unique elements of A and Ej = partial of V, with respect to S; let e j = yj - Xjp. Then, given the current estimates of Vj and p, A can be estimated by generalized least squares as
Iterative recomputation of Equation 2.44 based on updates of p from Equation 2.43 gives the Fisher scoring algorithm for maximizing the likelihood (Raudenbush, 1994). All complete-data models estimated in this chapter used this approach. The various applications require specifications of Ej and V,. Of course, algebraic manipulation is required to render these equations computationally feasible in each case. Details are available upon request to the author. Robust standard errors for /3 are computed at convergence of the algorithm as the square roots of the diagonal elements of
Alternative Covariance Structures
55
(2.45) (2.46)
REFERENCES Bock, R. (1975). Multivariate statistical methods in behavioral research. New York: McGraw-Hill. Bryk, A. S., & Raudenbush, S. W. (1987). Application of hierarchical linear models to assessing change. Psychological Bulletin, 101, 147-158. Bryk, A. S., & Raudenbush, S. W. (1988). Toward a more appropriate conceptualization of research on school effects: A three-level hierarchical linear model. A m e r i c a n Journal of Education, 97, 65-108. Bryk, A. S., & Raudenbush, S. W. (1992). Hierarchical linear models: Applications and data analysis methods. Newbury Park, CA: Sage. Bryk, A. S., Raudenbush, S. W., & Congdon, R. T. (1996). H L M : Hierarchical linear and nonlinear modeling with t h e H L M / 2 L and H L M / 3 L programs. Chicago, IL: Scientific Software International, Inc. Diggle, P. J., Liang, K.-Y., & Zeger, S. L. (1994). Analysis of longitudinal data. Oxford: Clarendon Press. Elliott, D., Huizinga, D., & Menard, S. (1989). Multiple problem youth: Delinquency, substance use, and mental health problems. New York: Springer-Verlag. Francis, D., Fletcher, J., Stubing, K., Davidson, K., & Thompson, N. (1991). Analysis of change: Modeling individual growth. Journal of Consulting and Clinical Psychology, 39, 27-37. Gibbons, R., Hedeker, D., Waternaux, C., & Davis, J. (1988). Random regression models: A comprehensive approach to the analysis of longitudinal psychiatric data. Psychopharmacology Bulletin, 24, 438-443. Goldstein, H. (1987). Multilevel models in educational and social research. New York: Oxford University Press. Goldstein, H. (1989). Models for multilevel response variables with a n application t o growth curves. New York: Academic Press. Goldstein, H. (1995). Multilevel statistical models (2nd ed.). New York: Halstead Press.
56
Raudenbusli
Goldstein, H. (1996). Consistent estimators for multilevel generalized linear models using an iterated bootstrap. Multilevel Modeling Newsletter, 8, 3-6. Gottfredson, M., & Hirschi, T. (1990). A general theory of crime. Stanford, CA: Stanford University Press. Horney, J., Osgood, D., & Marshall, I. (1995). Criminal careers in the short-term: Intra-individual variability in crime and its relation to local life circumstances. American Sociological Review, 60, 805-820. Huttenlocher, J . E., Haight, W., Bryk, A. S., & Seltzer, M. (1991). Early vocabulary growth: Relation to language input and gender. Developmental Psychology, 27, 236-248. Jennrich, R., & Schluchter, M. (1986). Unbalanced repeated-measures model with structured covariance matrices. Biometrics, 42, 809-820. Joreskog, K., & Sorbom, D. Mooresville, IN.
(1989). Lisrel 7 user's reference guide.
Kalaian, H., & Raudenbush, S. W. (1996). A multivariate mixed linear model for meta-analysis. Psychological Methods, 1, 227-235. Laird, N. M., & Ware, J. H. (1982). Random-effects models for longitudinal data. Biometrics, 38, 963-974. Little, R., & Shenker, N. (1995). Missing data. In G. Arminger, C. Clogg, & M. Sobel (Eds.), Handbook of statistical modeling for the social and behavioral sciences (p. 39-75). New York: Plenum Press. Little, R. J. A., & Rubin, D. B. (1987). Statistical analysis with missing data. New York: John Wiley and Sons. Longford, N. T. (1993). R a n d o m coeficients models. Oxford: Clarendon Press. McArdle, J . J. (1986). Latent growth within behavior genetic models. Behavioral Genetics, 16, 163-200. McCullogh, P., & Nelder, J. A. (1989). Generalized linear models. New York: Chapman and Hall. Meredith, W., & Tisak, J. (1990). Latent curve analysis. Psychometrika, 55. 107-122. Muthkn, B. 0. (1991). Analysis of longitudinal data using latent variable models with varying parameters. In L. C. Collins & J . L. Horn (Eds.), B e s t methods for the analysis of change (p. 1-17). Washington, DC: American Psychological Association.
Alternative Covariance Structures
57
Rasbash, J., & Goldstein, H. (1994). Efficient analysis of mixed hierarchical and cross classified random structures using a multilevel model. Journal of Educational and Behavioral Statistics, 19, 337-350. Raudenbush, S. W. (1994). Equivalence of fisher scoring t o iterative generalized least squares in the normal case with application t o hierarchical linear models. (Unpublished manuscript) Raudenbush, S. W. (1995). Hierarchical linear models t o study the effects of social context on development. In J. Gottman (Ed.), T h e analysis of change (p. 165-201). Hillsdale, NJ: Lawrence Erlbaum. Raudenbush, S. W., & Chan, W. (1993). Application of hierarchical linear model to the study of adolescent deviance in an overlapping cohort design. Journal of Clinical and Consulting Psychology, 61, 941-951. Raudenbush, S. W., Rowan, B., & Kang, S. (1991). A multilevel, multivariate model for studying school climate in secondary schools with estimation via the EM algorithm. Journal of Educational Statistics, 16, 295-330. Raudenbush, S. W., & Willms, J. D. (1995). The estimation of school effects. Journal of Educational and Behavioral Statistics, 20, 307-335. Rogosa, D. R., Brand, D., & Zimowski, M. (1982). A growth curve approach to the measurement of change. Psychological Bulletin, 90, 726-748. Schafer, J. (1996). Analysis of incomplete multivariate data. London: Chapman & Hall. Seltzer, M. H. (1993). Sensitivity analysis for fixed effects in the hierarchical model: A Gibbs sampling approach. Journal of Educational Statistics, 18, 207-235. Strenio, J., Weisberg, H., & Bryk, A. S. (1983). Empirical bayes estimation of individual growth curves parameters and their relationships to covariates. Biometrics, 39, 71-86. Thurn, Y. (1997). Hierarchical linear models for multivariate outcomes. Journal of Educational and Behavioral Statistics, 22, 77-108. Ware, J. H. (1985). Linear models for the analysis of longitudinal data. The American Statistician, 39, 95-101. Willett, J. B., & Sayer, A. G. (1994). Using covariance structure analysis to detect correlates and predictors of individual change over time. Psychological Bulletin, 116, 363-381. Zeger, S., Liang, K.-Y., & Albert, P. (1988). Models for longitudinal data: A likelihood approach. Biometrics, 44, 1049-1060.
Chapter 3
Structural Equation Modeling of Repeated Measures Data: Latent Curve Analysis Patrick J. Curran & Andrea M. Hussong University of North Carolina The statistical analysis of repeated measures data over time can be a remarkably challenging task that, if successful, has the potential for allowing significant insight into many important theoretical questions of interest. Over the years, a wide variety of longitudinal statistical models have been proposed t o address this challenge, including repeated measures t-tests, analysis of variance (ANOVA), analysis of covariance (ANCOVA), multivariate analysis of variance (MANOVA), multiple regression, and path analysis. Advances in structural equation modeling (SEM) over the past 25 years have provided many additional statistical methods for analyzing longitudinal data. One SEM method that has had a long and important history within a wide variety of social science research settings is the autoregressive crosslagged (ARCL) panel model. However, because of several limitations associated with this modeling approach when applied under certain conditions (e.g., Rogosa, 1995), the past decade has witnessed the rise of an alternative SEM-based analytic approach t o modeling longitudinal data, the latent curve model. Although latent curve analysis overcomes a number of limitations associated with the ARCL model, it is not without its own limitations. Applied researchers must be able to weigh the advantages and disadvantages of each of these analytic approaches so that an informed decision can be made about the optimal analytic strategy for evaluating
59
68
Curran and Hussoiig
the particular research question at hand (Curran & Bollen, in press). The goal of this chapter is to explicate the advantages and disadvantages of using the SEM-based latent curve model in applied longitudinal research. This will be accomplished both through a discussion of the basic concepts and equations underlying latent curve analysis and through an applied example concerning the development of antisocial behavior in children. We begin the chapter with a description of the theoretical framework, specific hypotheses, empirical sample, and measures that will be used in the applied example. We then briefly review the ARCL model and discuss the potential advantages and disadvantages of this analytic strategy for evaluating longitudinal research hypotheses. We follow this with an introduction to latent curve analysis and a detailed application of these models to a set of theoretically derived research questions. Our primary intent is for this chapter to address the needs of applied researchers by providing a detailed pedagogical introduction to the latent curve model that describes the analytic technique and highlights its advantages, limitations, and potential future directions.
T H E DEVELOPMENT OF ANTISOCIAL BEHAVIOR IN CHILDREN The onset and escalation of antisocial behavior during early childhood can place a child at increased risk for a variety of negative developmental outcomes in adolescence and adulthood, affecting academic attainment, mental health, substance abuse, social adjustment, criminality, and employment success (Caspi, Bem, & Elder, 1989; Loeber & Dishion, 1983; Reid, 1993). Relations between early childhood behavioral problems and later adjustment difficulties unfold in a developmental process (Patterson, Reid, & Dishion, 1992) such that more severe forms of adolescent and young adult conduct problems are likely to be initiated early in childhood and, without intervention, become increasingly difficult to modify over time (Coie & Jacobs, 1993). The seemingly intractable nature of antisocial behavior indicates that early prevention efforts targeting high-risk children within this developmental process are likely to be our most successful mode of intervention [Conduct Problems Prevention Research Group (CPPRG), 1992; Kazdin, 1993; Reid, 19931. However, many previous attempts to prevent and treat childhood antisocial behaviors have been ineffective (CPPRG, 1992; Kazdin, 1985, 1987, 1993), and the lack of developmental theory in conceptualizing these interventions has been at least partially implicated as underlying such intervention failures (Cicchetti, 1984; CPPRG, 1992; Dodge, 1986, 1993). The development of antisocial behavior in children is embedded within a series of complex reciprocal relationships among parents, children, and teachers set across the contexts of the home, school, and peer group (see, e.g., CPPRG, 1992; Patterson et al., 1992). For example, previous re-
Repeated Measures Data
61
search has shown that parents may contribute t o their children’s academic readiness for school entry by providing both emotional support and a cognitively stimulating home environment for their children (CPPRG, 1992). Children who show lack of academic readiness at school entry often experience greater impediments to learning at school, especially if combined with pre-existing inattentiveness, antisociality, and hyperactivity (Moffitt, 1990). As the child progresses through school, continued aggressive and antisocial behavior decreases the time children spend on school-related tasks, further delaying the development of academic skills (Patterson, 1982; Patterson et al., 1992; Wilson & Herrnstein, 1985). As a result, conductdisordered children are more likely t o display a number of academic deficiencies, particularly in the development of age-appropriate reading skills (I
62
Curran and Hussong
t o the initial levels or rates of change in antisocial behavior? Finally, are earlier levels of antisocial behavior related to later development in reading recognition, and are earlier levels of reading recognition related to later development in antisocial behavior? A series of latent curve models are applied to the empirical data both to better understand these research questions and to provide a detailed example of this analytic strategy.
SAMPLE AND MEASURES Subjects for the present study were drawn from the National Longitudinal Survey of Youth (NLSY) of Labor Market Experience in Youth, a study that was initiated in 1979 by the U S . Department of Labor to examine the transition of young people into the labor force. The original 1979 panel included a total of 12,686 respondents, 6,283 of whom were women. Beginning in 1986, an extensive set of assessment instruments was administered to the children of the 6,283 female respondents of the original NLS Youth sample. A total of 4,971 children was assessed in 1986, which represented approximately 95% of those children eligible for interview. These child assessments were again administered every other year following the 1986 interview: 6,266 children were interviewed in 1988, 5,803 in 1990, and 6,509 in 1992. The mothers of the children continued to participate in the NLS Youth annual interviews during this time. As of 1992, at least 1 interview was obtained on 9,360 biological children of the original 6,283 women first interviewed for the NLSY in 1979. Although a very large number of children of the original NLS Youth mothers was interviewed at least one time to date ( N = 9360 by 1992), a much smaller number of mother-child pairs was considered for the present study. Three key criteria were required for inclusion in the sample. First, children must have been between the ages of 6 and 8 years at the first wave of measurement. This choice of age range insured that all children were eligible t o be assessed on the measures of interest to the present study. Second, children must have reported complete data on all measures of interest to the present study at the first wave of measurement. Subjects were included in the present sample if they were missing data on some or all measures after the first wave of measurement. Finally, only one biological child was considered from each mother. Based upon these three criteria, a total sample of N = 405 children was considered for the present analyses. All N = 405 children and mothers were interviewed at Time 1, N = 374 were interviewed at Time 2, N = 297 were interviewed at Time 3, N = 294 were interviewed at Time 4, and N = 221 were interviewed at all four assessments. Of the total sample of N = 405, 49.9% were female and the average child age at Time f was 6.9 years (sd = .64). Data for the NLSY Child survey were primarily collected using personal home interviews administered by trained interviewers. Measures used in the present study were a subset drawn from the much larger complete battery
Repeated Measures Data
G3
of assessments administered to the NLSY mothers and children. Antisocial behaviorAntisocia1 behavior was measured using the Behavior Problems Index (BPI) antisocial behavior subtest, one of six subtests of the BPI developed by Zill and Peterson (Baker, Keck, Mott, & Quinlan, 1993). The antisocial behavior subscale consisted of the mother’s report on six items that assessed the child’s antisocial behavior having occurred over the previous 3 months. The three possible response options were not true (scored = 0 ) , sometimes true (scored = 1) or often true (scored = 2). These six items were summed to compute an overall measure of antisocial behavior, and scores could range in value from 0 to 12. The child’s reading recognition skill was measured using the 84-item Peabody Individual Achievement Test (PIAT) Reading Recognition subtest, one of five subtests of the PIAT. The reading recognition subtest measures word recognition and pronunciation ability, components considered essential to reading achievement. The reading recognition measure was computed by summing the total number of correct items for the 84-item subtest, and scores could range in value from 0 to 84. The final reading recognition scores were divided by 10 to better equate these variances with the other variables under consideration for later analyses. Antisocial behavior and reading recognition were assessed every other year for four assessments. Four explanatory variables were of interest to the current paper. Emotional support and cognitive stimulation provided to the child were assessed using the Home Observation for Measurement of the Environment-Short Form (HOME-SF; based on Baker et al., 1993). Emotional support was computed as a summation of 13 dichotomously scored items as reported by the mother and as observed by the interviewer, and scores could range in value from 0 to 13. Cognitive stimulation was computed as a summation of 14 dichotomously scored items as reported by the mother, and scores could range from 0 to 14. Only Time 1 measures of emotional support and cognitive stimulation were considered. Finally, child age was measured in years at Time 1, and child gender was dichotomously scored such that female was coded as zero and male was coded as one. The means, standard deviations, and correlations for all measured variables are presented in Table 3.1. Given that there was missing data evident at Times 2, 3, & 4, these descriptive statistics are based upon all available cases at that particular time point. A standard statistical analysis of this data would use a listwise deletion procedure in which only cases that had measures on all time points would be included. If this approach were applied here, 45% of the cases (184 of 405) would be deleted. However, recent developments in both statistical theory and available software have provided extremely important options to avoid this significant loss of data. For all models presented here, the newly developed program Mplus (Muthkn & M u t h h , 1998) was used, which allows for the utilization of a missing data estimator. Drawing on statistical theory described by Little and Rubin (1987), maximum likelihood estimation is used incorporating the EM algorithm that provides proper parameter estimates, standard errors,
Table 3.1: Means Standard Deviations, and Correlations for all Measured Variables (1) (2) (3) (4) (5) (6) (7) (8) (9) (10) (11) 1.00 (405) (2) T 2 Anti .45 1.00 (374) (374) .56 1.00 .42 (3) T3 Anti (297) (280) (297) .41 .58 .58 1.00 (4) T 4 Anti (294) (275) (277) (294) (5) T1 Read - .07 - .08 - .02 - .10 1.00 (405) (374) (297) (294) (405) - .13 - .12 - .13 - .10 .66 1.0; (6) T2 Read (375) (362) (283) (279) (375) (375) (7) T3 Read - .14 - .10 - .15 - .08 .54 .78 1.00 (275) (263) (271) (255) (275) (264) (275) (8) T 4 Read - .22 - .18 - .19 - .14 .45 .76 .80 1.00 (270) (260) (256) (264) (270) (261) (239) (270) .22 .24 .21 .18 - .05 - .08 - .04 - .04 1.00 (9) Gender (405) (374) (297) (294) (405) (375) (275) (270) (405) (10) Age .09 .01 .06 - .07 .60 .31 .19 .12 - .04 1.00 (405) (374) (297) (294) (405) -(375) (275) (270) (405) (405) - .12 - .21 - .14 - .22 .12 .19 .20 .20 - .03 .03 1.00 (11) Cognitive (405) (374) (297) (294) (405) (375) (275) (270) (405) (405) (405) (12) Emotional - .24 - .18 - .23 - .22 .05 .14 .18 .22 .04 - .03 .36 (405) (374) (297) (294) (405) (375) (275) (270) (405) (405) (405) 8.89 6.93 5.77 0.50 4.08 5.01 2.06 2.52 1.83 Means 1.66 2.03 (405) (374) (297) (294) (405) (375) (275) (270) (405) (405) (405) 2.58 0.64 1.25 0.50 1.08 1.16 2.15 0.92 1.90 2.04 Standard Deviations 1.66 Note: Italicized numbers in parenthesis indicate the available sample size for each pair-wise correlation. Numbers on the horizontal axis of the table refer to variables numbered on the vertical axis of the table. The mean and standard deviation fox each variable are reported in the last two lines of the table.
(12)
(1) T1 Anti
1.00 (405) 9.20 (405) 2.31
65
Repeated Measures Data
and an omnibus likelihood ratio (or x 2 ) test statistic. A full discussion of this method of estimation is beyond the scope of this chapter, but the end result of this approach is the utilization of all available cases at each time point. See Little and Rubin (1987) and Muthkn and Muthkn (1998) for further details. Prior to applying the latent curve models to the empirical data, we will first present a brief review of the structural equation model followed by an introduction to the ARCL modeling approach.
THE STRUCTURAL EQUATION MODELING FRAMEWORK Structural equation modeling, or SEM, is a term used to define a broad class of multivariate statistical models that are widely used in the social sciences. The SEM approach simultaneously estimates relations between observed variables and the corresponding underlying latent constructs and between the latent constructs themselves (Bentler, 1980, 1983; Joreskog, 1971a, 1971b; Joreskog & Sorbom, 1978). From this perspective, the factor analytic model relates the observed variables y to the underlying latent construct q such that y =w
+ Aq +
E
(3.1)
where w is a vector of measurement intercepts, A is a matrix of factor loadings (or measurement slopes), and E is a vector of measurement residuals. Conceptually, this reflects that the observed measures on y are used to define the unobserved latent construct, or factor, believed to have given rise to the measures. Further, the observed variance in y is partitioned into two parts: (a) that which is explained by the underlying factors and (b) the remainder, which is considered error. Thus, the variance of the underlying factors are theoretically “error free.” Once these factors are estimated, then relations among the factors can be examined such that 7 = 01
+BqfC
(34
where a is a vector of structural intercepts, B is a matrix of structural slopes, and C is a vector of structural residuals and V ( [ )= 9 represents the covariance structure among the latent factors. Thus, Equation 3.1 relates the observed measures to the underlying latent factors, and Equation 3.2 relates the latent factors to one another. The SEM framework allows for the estimation of a wide array of powerful and flexible statistical models that can be applied to test research hypotheses in many areas of social science research. Although a comprehensive review of all such techniques is beyond the scope of this chapter, more detailed discussions of these models can be found in Dwyer (1983) and Windle (1997). However, two longitudinal SEMs that have received significant recent attention are the autoregressive crosslagged (ARCL) model and
Curran and Hussong
66
the latent curve model. We next provide a brief review of the ARCL model followed by a more detailed discussion of the latent curve model.
T H E AUTOREGRESSIVE CROSSLAGGED MODEL Our analytic goal is to empirically evaluate the four research questions described earlier. One technique available to accomplish this is the ARCL modeling strategy. The ARCL approach is based on the classic simplex model developed by Guttman (1954) and extended by Anderson (1960), Humphreys (1960) , Heise (1969), and Joreskog (1979). This modeling strategy incorporates two main components. First, later measures of a construct are predicted by earlier measures of the same construct, thus giving rise to the term “autoregressive.” For example,
ANTISOCIALit = pt
+ ptANTISOCIALi,t-l + &it
(3.3)
indicating that the measure of antisocial behavior for individual i a t time point t is an additive combination of a time specific intercept ( p t ) , a weighted contribution of the prior measure of antisocial behavior ( p t ) , and an individual and time specific random error ( & i t ) . Larger positive values of the regression parameter are usually interpreted as indicating greater stability of the construct over time (but see Rogosa, 1995, for concerns about this interpretation), that is, on average, scores above the mean at time t tend also to be above the mean at time t 1. Whereas the model described in Equation 3.3 is uniuariate (given that the repeated measures of only a single construct is considered), this can be extended to a multivariate model in which two or more constructs are examined simultaneously. Here, not only are the later measures of one construct regressed upon earlier measures of the same construct but the later measures of one construct are also regressed on earlier measures of other constructs as well. For example, the measures of reading recognition could be incorporated such that
+
indicating that later measures of antisocial behavior are a function of an intercept, the weighted contribution of the prior measure of antisocial behavior, the weighted contribution of the prior measure of reading recognition, and a random error term. The model in Equation 3.4 thus has both the autoregressive component as before, but now also has the crosslagged prediction such that earlier measures of reading predict later measures of antisocial behavior. This is sometimes referred to as a residualized change model given that earlier measures of reading recognition predict later measures of antisocial behavior above and beyond the effects of earlier antisocial
Repeated Measures Data
67
behavior. This model can be extended to examine bidirectional relations such that earlier measures of antisocial behavior predict later measures of reading recognition as well. Despite the widespread use of the ARCL modeling approach in many areas of social science research, this analytic technique has been subjected to a great deal of criticism. Critiques address both statistical and theoretical aspects of the model, and many of these are articulated in Rogosa (1995), Rogosa and Willett (1985), and Willett (1988). Briefly, there are three primary concerns within ARCL modeling when addressing hypotheses similar to our four research questions. First, the ARCL is a fixed effects model, meaning that a single value is estimated for the regression parameters (eg., the p’s) that holds for all subjects in the sample. As such, the magnitude of these influences does not vary across individuals, often an unrealistic condition to impose upon the model. Second, the observed mean structure is usually omitted from the ARCL model, thus ignoring potentially important information about mean changes over time. Although means can be introduced into the ARCL model (as was shown in Equations 3.3 and 3.4), it is difficult to explicitly model the structure of the means as a function of time. Finally, changes in the construct between two time points is independent of the influence of both earlier changes and later changes in the same construct. So, change between Times 2 and 3 does not consider change between Times 1 and 2 nor change between Times 3 and 4. This approach thus tends to “chop up’’ multiple repeated measures into a series of two-time point comparisons, which is often not consistent either with substantive theory nor with the structure of the observed empirical data.
LATENT CURVE ANALYSIS Given these limitations, there has been a call for the development of alternative statistical models for analyzing repeated measures over time. One particularly promising technique is often referred to as latent curve analysis. Although the historical roots underlying the latent curve model can be traced back to the seminal work of Tucker (1958) and Rao (1958), the SEMbased latent curve model was first proposed by Meredith and Tisak (1984) and formalized by Meredith and Tisak (1990). This was further developed and expanded upon in a variety of important ways by McArdle (1986, 1988, 1989, 1991), hlcArdle and Epstein (1987), Muthkn (1991, 1993, in press), and Muthkn and Curran (1997), among many others. Examples of recent substantive applications using latent curve analysis includes Curran and Bollen (in press), Curran and Muth6n (1999), Curran, Stice, and Chassiri (1997), Duncan, Duncan, and Hops (1996, 1998), and Stoolmiller (1994). Latent curve analysis draws on the many strengths of the structural modeling framework. One of the basic concepts in structural modeling is that, although we have a set of observed measures of a theoretical construct of interest (e.g., “antisocial behavior”), we are not inherently interested in
68
Curran and Hussong
this set of observed measures. Instead, we are interested in the unobserved latent factor that is thought to have given rise to the set of observed measures. Similarly, within the latent curve framework we are not inherently interested in the observed repeated measures of the construct over time. Instead, we are interested in the unobserved factors that are hypothesized to underlie these repeated measures. However, unlike in the standard SEM factor model where we would like to estimate the latent factor of say “antisocial behavior”, in the latent curve model we would like to estimate latent factors that represent the growth trajectories thought to have given rise to the repeated measures over time. McArdle and Epstein (1987) nicely described these as chronometric factors. So, instead of evaluating the repeated measures from the ARCL framework in which the relations among the observed variables are examined between Time 1 and Time 2, and then between Time 2 and Time 3, and so on, the latent curve model attempts to smooth over the observed measures to estimate the continuous trajectory that gave rise to these time specific observed measures. This conceptualization highlights the elegance of the SEM framework in the estimation of these unobserved growth factors that generated the observed repeated measures. Instead of focusing our explicit interest on the observed repeated measures themselves, we instead use these observed measures to estimate an unobserved latent component thought to underlie our set of measures. Once estimated, the key focus of the analyses is then on these unobserved components of growth. To better understand the basics of the latent curve model, we now progressively explore the research questions about the relation between antisocial behavior and reading recognition described previously. We first present the equations that describe the various latent curve models and then apply these models to the empirical data and interpret the resulting findings. We start by examining the characteristics of developmental trajectories in antisocial behavior over an 8-year period. We then extend this model to include our four explanatory variables of interest and explore several options available for including the influences of reading recognition. Developmental trajectories in reading will then be examined, and this model will be combined with the antisocial behavior latent curve model to evaluate a comprehensive developmental model of the relations between these two constructs. Finally, we discuss limitations of this modeling approach along with interesting directions for future research.
LATENT CURVE MODELING OF T H E PROPOSED RESEARCH QUESTIONS To summarize thus far, the ARCL model approaches the analysis of the repeated measures data in a series of two-time point comparisons. Regardless of how many assessments are made, the relations among the constructs are examined between time t and time t 1. The latent curve model ap-
+
69
Repeated Measures Data
proaches the analysis of the repeated measures data in a rather different way. Instead of focusing on specific time adjacent comparisons, the latent curve model uses the SEM analytic framework to estimate the growth factors believed to underlie the observed measures. It is these growth factors that then become of primary interest for later analysis. To accomplish this, the observed repeated measures of antisocial behavior are expressed as
ANTISOCIALit = qa;
+ AtVP; + &it
(3.5)
where At = 0, 1, 2, 3 for T = 4 assessments. In other words, Equation 3.5 reflects that the observed measure of antisocial behavior for each person i at time point t is expressed as an additive combination of the individual’s own intercept ( Q ~ , )their , own slope ( ~ p , multiplied ) by the coefficient of time ( A t ) , and an individual and time specific random error (&it). Note the subscripting used on these latent factor q’s. First, both the intercept and slope are subscripted with an i indicating that these values are allowed to vary across individuals. This helps overcome one of the limitations of the ARCL model in which o n e parameter estimate was used to represent the stability of the construct for all individuals. Further, there is not a subscript of t on either the intercept or slope indicating that, although these influences are individually specific, they are independent of time. This means that the observed repeated measures taken a t each time point t have been used to estimate the underlying growth trajectories that are independent of t. What this modeling strategy thus provides is a way of expressing the individual intercept and slope components of growth as a function of group and individual influences. These are expressed as
indicating that an individual’s own intercept and slope can be expressed as an additive function of an overall mean intercept ( p a ) and mean slope ( P O ) for the entire sample plus the individual’s own deviation from each of these mean values. The mean values are sometimes referred to as fixed effects and the deviation values as r a n d o m effects. This formulation accomplishes two important things. First, the observed repeated measures of antisocial behavior have now been smoothed over to provide an estimate of the trajectories thought to underlie the repeated measures. Second, the variance of the deviation terms in Equations 3.6 and 3.7 is a direct estimate of the degree of individual variability in the intercepts and slopes within the sample; the greater the variance, the greater the individual differences in starting point and rates of change over time. This model is often referred to as an unconditional model given that there are no predictor variables
70
Curran and Hussong
Figure 3.1: Unconditional univariate latent curve model of antisocial behavior. on the right hand side of Equations 3.6 and 3.7, that is, variance in the intercept and slope factors are not modeled as a function of other explanatory variables. We will see in a moment that these equations can easily be extended to incorporate predictors of the individual differences in these developmental trajectories. As a starting point for the analysis, the unconditional univariate latent curve model presented in Equations 3.5, 3.6, and 3.7 was fit to the sample data (see Figure 3.1). This model will provide initial insight into the first research question relating to the characteristics of developmental trajectories in antisocial behavior prior to introducing additional explanatory measures of interest. For this and all subsequent analyses, model fit was evaluated using the following criteria: the relative magnitude of the omnibus x2 test statistic and the associated p-value where small x2 values indicate better model fit; the root mean squared error of approximation (RMSEA; Browne & Cudeck, 1993) accompanied with a 90% confidence interval (C190) where values falling below about .05 to .08 indicate better model fit; examination of modification indices that are 1 df tests of improvement in model fit associated with the freeing of a particular parameter; and examination of covariance and mean structure residuals as indication of potential local misspecification (see Curran, 2000, for further discussion of assessing model fit). Other traditional incremental goodness-of-fit indices are not available due to the use of the missing data estimator. Using these fit criteria, we concluded that the unconditional latent curve model for antisocial behavior fit the data moderately well [x2(5) = 14.9, p = .01; RMSEA = .07, CIg, = .03, . l l , p(RMSEA < .05) = .18]. There was a significant mean estimate for both the intercept (fia = 1.7) and slope (fro = .15) factors, indicating that, on average, the group was reporting
71
Repeated Measures Data
significant initial antisocial behavior of 1.7 units and significant linear increases of .15 units per time point. Further, there was a sjgnificant variance estimate for both the intercept = 1.29) and slope (Go = .133) factors, indicating that there was meaningful individual variability around both of these group level estimates. Thus, some children were reporting high levels of initial antisocial behavior whereas others were reporting lower levels, and some children were reporting steep increases in antisocial behavior over time, whereas others were reporting no increases at all. Finally, there was a marginally significant correlation between the intercept and slope factors = .36) suggesting that children who reported higher initial levels of antisocial behavior also tended to report steeper increases in antisocial behavior over time. These modeling results indicate that developmental trajectories thought to underlie the four repeated measures of antisocial behavior are characterized by both significant fixed and random effects. Because we found significant individual variability in the intercept and slope factors, we may now introduce our four explanatory variables to try to predict this observed variability. To accomplish this, Equation 3.5 remains the same, but Equations 3.6 and 3.7 are extended to include the effects of our predictor variables of interest such that:
(Ga
7o.i =
+ y5AGEi + 7sGENi + y7COGi + ysEMOTEi + coi
(3.9)
This model is often referred to as a conditional growth model because individual differences observed in the initial starting point and rate of change over time in antisocial behavior are being modeled as a function of the child’s age, gender, and cognitive and emotional support in the home. Note that the explanatory variables are denoted with a subscript i to clarify that these values vary across individuals, but the variables are n o t denoted with a t to denote time. This is because all of the variables were measured only at the first time point. Variables such as these are often referred to as time-invariant couariates to denote that they do not vary as a function of time (Bryk & Raudenbush, 1992). Note also that the regression parameters linking the explanatory variables to the growth factors (e.g., the eight y parameters) are n o t subscripted by i, indicating that these values are estimated pooling over all individuals. The path diagram corresponding to Equations 3.8 and 3.9 is presented in Figure 3.2. This model was estimated and found to fit the data moderately well [x2(13) = 25.2, p = .02; RMSEA = .05, CI90 = .G2, .08, p(RMSEA < .05) = .50]. Results indicated that gender, age, and emotional support all significantly predicted the intercept factor such that boys, older children, and children with lower emotional support at home reported higher
72
Curran and Hussong
Figure 3.2: Conditional univariate latent curve model of antisocial behavior with four exogenous variables. All paths between exogenous variables and growth factors were estimated, but only significant ( p < .05) paths are displayed. Values associated with each path are standardized regression coefficients. initial levels of antisocial behavior. Further, age and cognitive support significantly predicted the slope factor, indicating that younger children and children with lower cognitive stimulation at home reported steeper increases in antisocial behavior over time. Taken together, the set of four explanatory variables accounted for 27% of the observed variance in the intercept factor and 11%of the observed variance in the slope factor. We have found thus far that there are significant initial levels of antisocial behavior, there are significant linear increases in antisocial behavior over time, there is significant individual variability in both starting point and rate of change, and this variability can be modeled as a function of our four explanatory variables. In addition to these important questions, recall that one of our key interests is to better understand the relation between antisocial behavior and reading recognition over time. One option would be to include the Time 1 measure of reading recognition in Equations 3.8 and 3.9 and thus be able to draw the same type of conclusions that were drawn for the other four predictors. However, this is not ideal given that we would be ignoring the latter three measures of reading recognition, thus disposing of a great deal of important information. A better option would be to treat the repeated measures of reading as a time varying covariate. Here, instead of considering the effects of Time 1 reading on the underlying growth trajectories of antisocial behavior (as would be accomplished by including Time 1 reading in Equations 3.8 and 3.9), we instead can introduce the effects of reading on antisocial behavior into Equation 3.5. This is given as
Repeated Measures Data
73
indicating that the observed measure of antisocial behavior for person
i at time t is a function of the underlying intercept factor, the underlying slope factor, and the time-specific reading recognition score for person i at time t. The model in Equation 3.10 is nearly equivalent to the timevarying covariate model within the HLM framework (Bryk & Raudenbush, 1992, Equation 6.21). The gamma (y) parameter estimates in Equation 3.10 would indicate the unique effects of reading directly upon the timespecific measures of antisocial behavior above and beyond the effects of the underlying developmental trajectory of antisociality. Although Equation 3.10 limits a cross-sectional relation between reading and antisocial behavior, Curran, Muthkn, and Harford (1998) describe extensions to the latent curve model that allows for the longitudinal prediction from the time varying covariates as well. The model described in Equation 3.10 (extended to allow for the longitudinal effects of reading on antisocial behavior) is presented in Figure 3.3. This model was estimated and found to fit the data well [x2(19) = 29.1, p = .06; RMSEA = .05, Clgo = 0, .06, p(RMSEA < .05) = 301. Prior to interpretation of the final results, equality constraints were imposed across the regression parameters relating each reading measure to all later antisocial measures (e.g., the regression of all four measures of antisocial behavior on Time 1 reading were set to be equal). These constraints were imposed to evaluate if the regression parameters were of the same magnitude across all time points, and these constraints were not associated with a significant decrement in model fit and were thus retained. The final model fit the data well [x2(25)= 36.9, p = .06; RMSEA = .06, CIgo = 0, .06, p(RR/ISEA < .05) = 371. All the results relating the growth factors to the four explanatory variables remained nearly identical to those found for the previous model. Interestingly, the Time 1 measure of reading recognition was significantly and negatively associated with all four measures of antisocial behavior, although no significant relations were found for the Time 2, 3, or 4 reading recognition assessments. This indicates that higher levels of Time 1 reading recognition were associated with lower levels of antisocial behavior above and begond the effects of the developmental trajectory underlying antisocial behavior. However, the results suggest that, once the effects of Time 1 reading recognition are considered, later measures of reading exerted no unique effect on antisocial behavior. Although this model provides initial evidence that there is indeed some longitudinal relation between reading recognition and antisocial behavior over time, two key limitations remain. First, both developmental theory and previous research would strongly suggest that reading recognition itself is characterized by an underlying developmental process that is clearly not being modeled here. Given that significant intercept and slope factors were found to underlie the four measures of antisocial behavior, it is important that a similar process be examined for reading recognition. Second, although earlier reading recognition was found to predict later antisocial behavior, this model does not evaluate the effects of earlier antisocial bc-
Curran and Hussong
74
Figure 3.3: Conditional univariate latent curve model of antisocial behavior with effects of reading as a time varying covariate. All paths between exogenous variables and growth factors were estimated, but only significant ( p < .05) paths are displayed. All paths between time varying covariates and repeated measures of antisocial behavior were estimated, and non-significant paths are denoted with a value of zero. All other values associated with each path are standardized regression coefficients. havior on later reading recognition. The model must be further extended to allow for the introduction of both of these important characteristics. Prior to estimating this more complicated latent curve model, we must first take a step back and estimate a univariate model to examine the developmental trajectories in reading recognition. This is because we must first accurately understand the unconditional growth process of reading recognition prior to examining the relation of this process with other influences of interest (see, e.g., MacCallum, Kim, Malarkey, & Kielcolt-Glaser, 1997). We thus repeat the unconditional latent curve model presented in Equation 3.5 with the simple modification that instead of modeling the four repeated measures of antisocial behavior, we consider the four repeated measures of reading recognition. This is given as
+
READit = ?la; XtVoi
+ &it
(3.11)
It is extremely important to appreciate the differences between Equations 3.10 and 3.11. Placement of reading recognition on the right side of Equation 3.10 indicates that these measures are treated as independent variables and are used to predict the time specific measures of antisocial behavior. However, reading recognition is placed on the left side of Equation 3.11 indicating that these measures are now dependent variables such that the observed scores depend upon the influences of the underlying growth trajectories . This model was estimated and was found to fit the observed data ex-
Repeated Measures Data
75
tremely poorly [x2(5) = 175.04, p < .001; RMSEA = .29, CIgo = .25, .33, p(RMSEA < .05) < .001].Although significant parameter estimates were found for both the fixed and random effects of the intercept and slope factors, these cannot be interpreted given the remarkably poor fit of the overall model. Further information is needed to better understand this lack of fit. First, an examination of the observed means and variances of the four reading recognition scores indicates that, although there appears to be a monotonic increase in these measures as a function of time, the increment of increase is not equal across time. For example, there is a 1.5-unit increase in the means from Times 1 to 2, a 1.0-unit increase from Times 2 to 3, and a .70-unit increase from Times 3 to 4. Thus, it appears that the pattern of mean change over time is curvilinear, whereas the model estimated in Equation 3.11 forces the functional form of the developmental trajectory to be linear (e.g., At = 0 , 1, 2, 3). Large and significant modification indices (1 df tests that evaluate the degree of improvement in model fit if some of the fixed factor loadings were to be freed) provided further support that the source of model misfit stems from imposing a linear growth function on data that demonstrate a strong curvilinear pattern of change. There are three general options for dealing with this nonlinear form of growth. The first is to simply ignore the issue and move ahead with the analyses. However, we cannot stress strongly enough how ill advised this choice can be. Given that the latent curve model is incorrectly specifying the pattern of growth that is evident in the observed data, no inferences can be made about any aspect of this model given the likely bias that exists throughout many (if not all) of the parameter estimates. Thus, if there is a significant curvilinear trend in the data, it is imperative that this be incorporated in some way into the model. The second choice is to retain the intercept and linear growth factor, but t o add a third factor to account for the nonlinear component in the trajectory (see, e.g., Willett & Sayer, 1994). To accomplish this, Equation 3.11 is extended to add this third factor such that
+
READit = 770; AtqL;
+ A F ~ Q+ &it~
(3.12)
where At = 0, 1, 2, 3 as before, and A: = 0, 1, 4, 9. Now qai represents the intercept, q~~ represents the linear component of the trajectory, and q~~ represents the curvilinear component of the trajectory. Then, just as was done in Equations 3.8 and 3.9, the intercept, linear, and curvilinear components of growth can be regressed upon our four explanatory variables. The model in Equation 3.12 was fit to the observed data and, although the inclusion of the quadratic factor led to a drastic improvement in fit over the linear-only model, the overall model still did not adequately reproduce the observed data [x2(1)= 10.65, p < .001; RMSEA = .P5, CIgo = .08, .24, p(RMSEA < .05) < .01].Examination of the fixed effects indicated a significant positive intercept, a significant positive linear component, and a significant negative curvilinear component. These parameters are consis-
76
Curran and Hussomg
tent with what we would have expected given the pattern of the observed means. However, although there was a significant random component for the intercept and linear factors, there was a near zero random component for the quadratic factor. This suggests that, although there is a curvilinear component of growth for the overall group, there is not significant individual variability in these curvilinear rates of change. Given the nonsignificant variability in the curvilinear component of growth, we re-estimated this model fixing the variance of the quadratic factor t o zero as well as the covariances of the quadratic factor with the other two growth factors (as described by Bryk & Raudenbush, 1992). This resulted in a n equally poor fit to the observed data [x2(4) = 28.0, p < .001; RMSEA = .12, CIgo = .08, .17, p(RMSEA < .05) < .002]. It appears, then, that the quadratic model does not adequately represent the four repeated measures of reading recognition. The third option that we consider is to return to the intercept and linear growth factor model presented in Equation 3.11, but instead of fixing the factor loadings to 0, 1, 2, & 3 as we did before, we fix the first two loadings to 0 and 1 (to set the metric of the slope factor), but we will freely estimate the second two loadings from the data. This approach was described by McArdle (1989), who proposed this completely latent curve model t o allow for "stretching" and "shrinking" of the time measure t o account for nonlinearity in the observed data. We estimated this model and found it to fit the observed data quite well [x2(3)= 4.4, p = .22; RMSEA = .03, CIgo = 0, .lo, p(RMSEA < .05) = .58]. The first two factor loadings were fixed to 0 and 1, and the third and fourth loadings were estimated to be 1.6 and 2.1, respectively. There were significant and positive fixed and random effects for both the intercept and slope factors. The interpretation of the significant fixed effect for the slope factor is that there is a large positive increase in reading recognition over the four time points, but the magnitude of increase diminishes as a function of time. Further, the significant random effects indicate that there is substantial individual variability in both the initial starting point and rate of change over time. We thus conclude that this final model best characterizes the observed mean and covariance structure in the four repeated measures of reading recognition. Now that we better understand the characteristics of the developmental process underlying reading recognition, we are ready to combine this univariate latent curve model with the latent curve model estimated for antisocial behavior described earlier. Given that we are also interested in the influences of the four explanatory variables, these will be included as well. We thus estimated a single multivariate latent curve model that included the four repeated measures of antisocial behavior and the underlying growth factors, the four repeated measures of reading recognition and the underlying growth factors, and the four explanatory variables. The intercept and slope factors were allowed to covary within construct, the intercept factors were covaried across construct, the slope factors were co-
Repeated Measures Data
77
varied across construct, each slope factor was regressed upon the intercept factor across construct, and the residual variances of the repeated measures were covaried within each time point. Finally, all four growth factors were regressed upon the four explanatory variables of gender, age, cognitive stimulation and emotional support. We estimated this model and found it t o fit the data very well [x2(35) = 49.8, p = .05; RMSEA = .03, CI,, = 0, .05, p(RMSEA < .05) = .93]. This model is presented in Figure 3.4 in which only significant ( p < .05) regression parameters are shown for sake of clarity. Of initial interest are the relations among the latent growth factors. There was a marginally significant ( p < . l o ) negative correlation between the residual variance of the intercept factor for antisocial behavior and the residual variance of the intercept factor for reading recognition. This suggests that, after estimating the influences of the exogenous variables, children reporting higher levels of initial antisocial behavior also tended t o report lower levels of initial reading recognition. Further, the antisocial behavior intercept factor negatively predicted the reading recognition slope factor, indicating that children reporting higher initial levels of antisociality also tended to report less steep increases in reading over time. However, a similar relation was not found between earlier reading recognition predicting later changes in antisocial behavior. This latter finding is quite interesting given the results of the time-varying covariate model (e.g., Equation 3.10) in which Time 1 reading was found to modestly predict later antisociality. This discrepancy in findings highlights the importance of articulating precisely what type of change is of most interest. The timevarying covariate model indicated that Time 1 reading negatively predicted later time-specific levels of antisociality after controlling for the underlying developmental trajectories of antisociality. In contrast, the final latent curve model indicates that earlier reading recognition is not reliably associated with later developmental trajectories of antisociality. So, the former is examining the relation while controlling for antisocial trajectories, and the latter is examining the relation with the antisocial trajectories. These are not necessarily contradictory findings, but instead are different insights into different questions. Also of interest is the relation between the four explanatory variables and the four growth factors. To summarize, boys were found t o report higher initial levels of antisocial behavior, but not differences in any of the other growth factors. Older children were found t o report higher initial levels and less steep increases in both antisocial behavior and reading recognition. Higher levels of cognitive stimulation were associated with less steep increases in antisocial behavior and, although higher levels of cognitive stimulation were associated with higher levels of initial reading recognition, cognitive stimulation was not associated with rates of change in reading recognition. Finally, higher levels of emotional support were associated with lower initial levels of antisocial behavior and were thus indirectly associated with later changes in reading recognition as mediated
78
Curran and Hussong
Figure 3.4: Final multivariate conditional trajectory model with exogenous variables. All paths between exogenous variables and growth factors were estimated, but only significant ( p < .05) paths are displayed. Values associated with each path are standardized regression coefficients.
Repeated Measures Data
79
through the antisocial intercept. The results of this final model thus allow for a number of interesting insights into the fourth theoretically derived research question described earlier.
CONCLUSIONS The goal of this chapter was to review the basics of latent curve analysis and to present an extended example demonstrating the analysis of individual differences in rates of change over time. The motivating substantive example for these analyses was an examination of the relation between developmental trajectories of antisocial behavior and reading recognition and the relation between these trajectories and child age, gender, emotional support, and cognitive stimulation. Drawing on theories of developmental psychopathology, four research questions of key interest were identified. These questions related to individual and group characteristics of developmental trajectories in antisocial behavior, in reading recognition, between trajectories of antisocial behavior and reading recognition, and between these developmental trajectories and measures of age, gender, emotional support, and cognitive stimulation. We estimated a series of latent curve models that allowed for inferences to be drawn about all four questions of interest. Specifically, both antisocial behavior and reading recognition were found to be characterized by underlying intercept and slope factors, and there were significant group and individual components of growth. Initial starting points of antisocial behavior and reading recognition were negatively related such that children reporting higher initial levels of antisocial behavior also tended to report lower levels of initial reading recognition. Further, although higher initial levels of antisocial behavior were associated with less steep increases in reading recognition, initial reading recognition was not reliably associated with later increases in antisocial behavior. Finally, interesting relations were found between developmental trajectories of both antisocial behavior and reading recognition and the set of four explanatory variables. Taken together, these results provide important information about the characteristics and predictors of developmental trajectories of antisocial behavior and reading recognition over an 8-year period. We believe that the latent curve modeling framework provides a number of potential advantages to the empirical study of repeated measures data over time. First, this approach is much more consistent with many types of theoretical questions of development and change compared to the ARCL model. This is because many developmental questions are inherently interested in the continuous underlying trajectories that vary across individual, not necessarily the time-specific levels of a particular measure. However, the ARCL approach can be advantageous in some settings, and recent work has moved toward integrating elements of both the ARCL and latent curve model into a single modeling framework (e.g., Bollen & Curran, 1999; Curran & Bollen, in press). Second, the latent curve model can easily
80
Curran and Hussong
be extended in a variety of interesting ways using the strength of the SEM framework. For example, multiple indicator latent factors could be used to model measurement error within each time point (e.g., Sayer, in press). Additional variables could be examined as potential mediators between the explanatory variables and the growth factors or between the growth factors themselves (e.g., Muthkn & Curran, 1997). Multiple group analysis could be used to examine the potential moderating influences of discrete measures such as gender or ethnicity (see, e.g., Curran et al., 1997; McArdle, 1989). Indeed, most any advantage offered by the general structural modeling framework is applicable to the latent curve model. Having said that, there are, of course, a number of limitations of the latent curve modeling approach as well, and these should be closely considered prior to adopting this technique in practice. First, as with all growth modeling approaches, latent curve analysis requires a minimum of three time points to estimate individual variability in growth trajectories. These models cannot be used for two time point comparisons. Second, although recent advances in software have allowed for the inclusion of partially missing data, the maximum likelihood estimator used here still assumes multivariate normal distributions, both for the repeated measures data as well as the distributions of the latent growth factors. This poses a problem in many areas of research such as child drug use or psychiatric symptomatology where distributions are expected to be significantly skewed. Third, when predicting the growth factors in latent curve analysis, the intercept and slope components are considered separately from one another, thus making interpretation sometimes difficult. For example, the models presented here discussed predictions of the slope factor, but this did not simultaneously consider the corresponding intercept level. Considering both the height and the rate of change of the trajectory simultaneously may be very important information to consider in many research applications. Finally, it remains difficult to evaluate the latent curve models in multilevel settings. For example, in the models presented here we assume that all of the children are independent from one another. However, there are many situations in which children might be nested within a higher level such as classroom or treatment condition, and this can pose a challenge for latent curve analysis (but see Muthkn & hluthkn, 1998, for new developments in this area). In conclusion, latent curve analysis is one example of a new generation of statistical models that is well suited for evaluating research questions about repeated measures data over time. The latent curve framework is extremely flexible and can be applied across a variety of applied research settings. Recall that all of the previous equations distinguished both i for individual and t for time. Although "individual" referred to children here, this could equivalently denote mouse, treatment group, school, or city. Further, although the increment of "time" referred to 24-month periods here, this could equivalently denote minute, week, or decade. There are thus a remarkably wide number of research settings in which latent curve
Repeated h!Ieasures Data
81
analysis can be applied to examine questions about individual differences in rates of change over time.
REFERENCES Anderson, T. W. (1960). Some stochastic process models for intelligence test scores. In K. J. Arrow, K. Karlin, & P. Suppes (Eds.), Mathematical methods in the social sciences, i 9 5 9 . Stanford, CA: Stanford University Press. Baker, P. C., Keck, C. K., Mott, F. L., & Quinlan, S. V. (1993). Nlsy child handbook: A guide t o the ig86-1990 national longitudinal survey of youth child data. Columbus, OH: Center for Human Resource Research. Bentler, P. M. (1980). Multivariate analysis with latent variables: Causal modeling. A n n u a l Review of Psychology, 31, 419-456. Bentler, P. M. (1983). Some contributions to efficient statistics for structural models: Specification and estimation of moment structure. Psychometrika, 48, 493-517. Bollen, K. A., & Curran, P. J. (1999). An autoregressive latent trajectory (alt) model: A synthesis of two traditions. (Paper presented at the 1999 meeting of the Methodology Section of the American Sociological Association) Browne, M. W., & Cudeck, R. (1993). Testing structural equation models. In K. Bollen & K. Long (Eds.), (p. 136-162). Newbury Park: Sage. Bryk, A. S., & Raudenbush, S. W. (1992). Hierarchical linear models: Applications and data analysis methods. Newbury Park, CA: Sage. Caspi, A., Bem, D. J., & Elder, G. H. (1989). Continuities and consequences of interactional styles across the life course. Journal of Personality, 57, 375-406. Cicchetti, D. (1984). The emergence of developmental psychopathology. Child Development, 55, 1-7. Coie, J. D., & Jacobs, M. R. (1993). The role of social context in the prevention of conduct disorder. Development and Psychopathology, 5, 263-275. Curran, P. J . (2000). Multivariate applications in substance use research. In J. Rose, L. Chassin, C. Presson, & J. Sherman (Eds.), (p. 1-42). Hillsdale, NJ: Erlbaum.
82
Curran and Hussong
Curran, P. J., & Bollen, K. A. (in press). The best of both worlds: Combining autoregressive and latent curve models. In A. Sayer & L. Collins (Eds.), N e w methods for the analysis of change. Washington, DC: APA. Curran, P. J., & M u t h h , B. 0. (1999). The application of latent curve analysis to testing developmental theories in intervention research. American Journal of C o m m u n i t y Psychology, 27, 567-595. Curran, P. J., M u t h h , B. O., tk Harford, T. C. (1998). The influence of changes in marital status on developmental trajectories of alcohol use in young adults. Journal of Studies o n Alcohol, 59, 647-658. Curran, P. J., Stice, E., & Chassin, L. (1997). The relation between adolescent and peer alcohol use: A longitudinal random coefficients model. Journal of Consulting and Clinical Psychology, 65, 130-140. Dodge, K. A. (1986). A social information processing model of social competence in children. In Minnesota symposium in child psychology (p. 7-125). Hillsdale, NJ: Erlbaum. Dodge, K. A. (1993). The future of research on the treatment of conduct disorder. Development and Psychopathology, 5, 311-319. Duncan, S. C., Duncan, T . E., & Hops, H. (1996). Analysis of longitudinal data within accelerated longitudinal designs. Psychological Methods, 1, 236-248. Duncan, T . E., Duncan, S. C., & Hops, H. (1998). Latent variable modeling of longitudinal and multilevel alcohol use data. Journal of Studies o n Alcohol, 59, 399-408. Dwyer, J . H. (1983). Statistical models for the social and behavioral sciences. New York: Oxford. Elliott, D. S., Huizinga, D., & Ageton, S. S. (1985). Explaining delinquency and drug use. Beverly Hills, CA: Sage. Guttman, L. A. (1954). A new approach to factor analysis. the radex. In P. F. Lazarsfeld (Ed.), Mathematical thinking in the social sciences (p. 258-348). New York: Columbia University Press. Heise, D. R. (1969). Separating reliability and stability in test-retest correlation. American Sociological Review, 34, 93-101. Humphreys, L. G. (1960). Investigations of the simplex. Psychometrika, 25. 313-323. Joreskog, K. G. (1971a). Simultaneous factor analysis in several populations. Psychometrika, 36, 409-426.
Repeated Measures Data
83
Joreskog, K. G. (1971b). Statistical analysis of sets of congeneric tests. Psychometrika, 36, 109-133. Joreskog, K. G. (1979). Statistical estimation of structural models in longitudinal developmental investigations. In J . R. Nesselroade & P. B. Baltes (Eds.), Longitudinal research in the s t u d y of behavior and development. New York: Academic Press. Joreskog, K. G., & Sorbom, D. (1978). Advances in f a c t o r analysis and structural equation models. Cambridge, MA: Abt Books. Kazdin, A. E. (1985). Treatment of antisocial behavior in children and adolescents. Homewood, IL: Dorsey. Kazdin, A. E. (1987). Treatment of antisocial behavior in children: Current status and future directions. Psychological Bulletin, 102, 187-203. Kazdin, A. E. (1993). Treatment of conduct disorder: Progress and directions in psychotherapy research. Development and Psychopathology, 5, 277-310. Little, R. J. A., & Rubin, D. B. (1987). Statistical analysis with missing data. New York: John Wiley and Sons. Loeber, R., & Dishion, T. J. (1983). Early predictors of male delinquency: A review. Psychological Bulletin, 94, 68-99. MacCallum, R. C., Kim, C., Malarkey, W., & Kielcolt-Glaser, J. (1997). Studying multivariate change using multilevel models and latent curve models. Multivariate Behavior Research, 32, 215-253. McArdle, J. J. (1986). Latent growth within behavior genetic models. Behavioral Genetics, 16, 163-200. McArdle, J. J. (1988). Dynamic but structural equation modeling of repeated measures data. In J . R. Nesselroade & R. B. Cattell (Eds.), Handbook of multivariate experimental psychology (2nd ed.). New York: Plenum Press. McArdle, J. J. (1989). Structural modeling experiments using multiple growth functions. In P. Ackerman, R. Kanfer, & R. Cudeck (Eds.), Learning and individual differences: Abilities, motivation and methodology (p. 71-117). Hillsdale, N J : Lawrence Erlbaum Associates. McArdle, J. J. (1991). Structural models of developmental theory in psychology. In P. Van Geert & L. P. Mos (Eds.), A n n a l s of theoretical psychology (Vol. VII, p. 139-160). New York: Plenum Press. McArdle, J . J., & Epstein, D. (1987). Latent growth curves within developmental structural equation models. Child Development, 58, 110-133.
84
Curran and Hussong
Meredith, W., & Tisak, J. (1984). ”tuckerizing” curves. (Paper presented at the aiinual meeting of the Psychometric Society) Meredith, W., & Tisak, J. (1990). Latent curve analysis. P s y c h o m e t r i k a , 55, 107-122. Moffitt, T. E. (1990). Juvenile delinquency and attention deficit disorder: Boy’s developmental trajectories from age 3 to age 15. Child D e v e l o p m e n t , 55, 107-122. Muthen, B. (1993). Latent variable modeling of growth with missing data and multilevel data. In C. M. Cuadras & C. R. Rao (Eds.), Multivariate analysis: Future directions 2 (p. 199-210). Amsterdam: North Holland. M u t h h , B. (in press). Second-generation structural equation modeling with a combination of categorical and continuous latent variables: new opportunities for latent class/latent growth modeling. In A. Sayer & L. Collins (Eds.), N e w mothods f o r t h e analysis of change. Washington, DC: APA. Muthkn, B. 0. (1991). Analysis of longitudinal data using latent variable models with varying parameters. In L. C. Collins & J. L. Horn (Eds.), B e s t methods f o r t h e analysis of change (p. 1-17). Washington, DC: American Psychological Association. h I u t h h , B. O., & Curran, P. J. (1997). General longitudinal modeling of individual differences in experimental designs: A latent variable framework for analysis and power estimation. Psychological Methods, 2, 371-402. Muthkn, L. K., & Muthkn, B. 0. (1998). M p l u s user’s guide. Los Angeles: Muthken and M u t h h . Patterson, G. R. (1982). Coercive f a m i l y process. Eugene, OR: Castalia. Patterson, G. R. (1986). Performance models for antisocial boys. A m e r i c a n Psychologist, 41, 432-444. Patterson, G. R., Reid, J. B., & Dishion, T. J. (1992). Antisocial boys. Eugene, OR: Castalia.
Rao, C. R. (1958). Some statistical methods for comparison of growth curves. B i o m e t r i k a , 51, 83-90. Reid, J. B. (1993). Prevention of conduct disorder before and after school entry: Relating interventions to developmental findings. D e v e l o p m e n t and Psychopathology, 5, 243-262.
Repeated Measures D a t a
85
Rogosa, D. R. (1995). Myths and methods: ”myths about longitudinal research” plus supplemental questions. In J. Gottman (Ed.), T h e analysis of change (p. 2-65). Hillsdale, NJ: Lawrence Erlbaum Associates. Rogosa, D. R., & Willett, J. B. (1985). Satisfying simplex structure is simpler than it should be. Journal of Educational Statistics, 10, 99107. Sayer, A. (in press). Multiple indicator latent variables in latent curve analysis. In A. Sayer & L. Collins (Eds.), N e w methods f o r t h e analysis of change. Washington, DC: APA. Stoolmiller, M. (1994). Antisocial behavior, delinquent peer association, and unsupervised wandering for boys: Growth and change from childhood to early adolescence. Multivariate Behavioral Research, 29, 263288. Tucker, L. R. (1958). Determination of parameters of a functional relation by factor analysis. Psychometrika, 23, 19-23. Willett, J. B. (1988). Measuring change: The difference score and beyond. In H. J. Walberg & G. D. Haertel (Eds.), T h e international encyclopedia of educational evaluation. Oxford, England: Pergamon Press. Willett, J. B., & Sayer, A. G. (1994). Using covariance structure analysis to detect correlates and predictors of individual change over time. Psychological Bulletin, 116, 363-381. Wilson, J. Q., & Herrnstein, R. J. (1985). C r i m e and h u m a n nature. New York: Simon and Schuster. Windle, M. (1997). Alternative latent-variable approaches t o modeling change in adolescent alcohol involvement. In K. J. Rryand & M. Windle (Eds.), T h e science of prevention: Methodological advances f r o m alcohol and substance abuse research (p. 43-78). Washington, DC: American Psychological Association.
Chapter 4
Multilevel Modeling of Longitudinal and Functional Data J. 0. Ramsay McGill University I signed on t o write this commentary because I was confused. Thirty-two years of teaching multivariate statistics t o psychology graduate students, and of being a consultant on research projects in a range of disciplines, had not left me with a clear sense of what multilevel analysis was all about. Seemingly related terms in statistics journals, such as hierarchical linear model (HLM), empirical Bayes estimation, random coefficient model, variance component, repeated measures, longitudinal data analysis, latent curve analysis, and even the term multilevel itself seem t o have similar but not completely consistent implications. Moreover, my own research preoccupation is with the modeling of functional data, that is, observations that are curves or images, and I wished that I had the links with these areas of statistics better arranged in my mind. So a read through these three fine papers convinced me t o organize what I thought I knew, learn some new things, and try t o share some of this with readers of this valuable volume. All three papers are solidly data oriented, all three sets of authors were already on my admiration list, and all propose some rather up-market statistical technology for a rather challenging data context. If I can add a little something, either to a reader’s appreciation, or t o future research efforts, it will be a great pleasure. I begin by exploring the term multilevel, which, it appears, means rather different things if applied t o data and research designs than it means if applied t o models and parameter spaces. Because for me data always have precedence over models and mathematics, I’m happy t o agree with the 87
88
Ramsay
authors in their use of the term. Moreover, more of my focus in what follows will be on data rather than model characteristics, hence, first two multilevel data sections, one on the general situation, and one focused on longitudinal data. There follows two multilevel model sections, again offering a general multilevel framework, followed by specifics for the data of focus. Then we return to consider features of longitudinal data, an important special type of multilevel data. The chapter concludes with a review of some modeling strategies along with an assessment of what these three papers have achieved. The references that have been especially helpful to me are annotated at the end. To save space, I hope that neither the readers nor the authors will object if I refer to Curran and Hussong; Kenny, Bolger, and Kashy; and Raudenbush as CHI KBK, and R, respectively.
MULTILEVEL DATA An observation, Yij,is organized on two levels. The lower or most basic level is associated with subscript j = 1,... , n i l and corresponds to the most elementary unit of data that we wish to describe. For CH and R, who consider longitudinal data, this is the value of a variable at the year indexed by j . For KBK, on the other hand, this basic unit is the partner with whom a person interacts. On the other hand, lower level units may, themselves, be collections of variable values. For example, we may refer to an individual’s entire longitudinal record as vector yij, so that the individual is the lowerlevel unit. However, observations are also organized by a higher level index i = 1,. . . , m, and this refers to one of m categories within which lower units fall, or within which they are nested. For the National Longitudinal Survey of Youth analyzed by CH, the National Youth Survey analyzed by R, and most longitudinal data, the upper level unit is the individual who is measured at ni time points. The upper level for KBK corresponds to the m = 77 persons, each of whom interacts with a number of partners. The number of lower basic units ni can vary over upper level index i. Upper level units are considered either random or fixed, with random units being categories that will normally change if the data collection process is replicated (e.g., subjects or schools), whereas fixed units are those such as treatment groups that will be reused over replications. Third and higher-level types of classification of units are often needed, and R, for example, considers schools to be a third level for the Sustaining Effects Data, where students are second- level units, and time points are at the lowest level. This necessitates another subscript, so that the observation in this case would be Yijk. Covariates are often a part of multilevel data designs, and these may be indicated by the vectors xi and zij, having p and qi elements, respectively, and with the number and nature of subscripts indicating whether these
Longitudinal and Functional Data
89
covariates are associated with upper or lower levels of data organization. We shall use xi to refer to covariates for upper-level units and zij for lowerlevel covariates. For example, R’s models for the National Youth Survey data use as covariates a constant level, coded by 1, and age, aij (he exchanges the roles of i and j in his notation). This means that the covariate vector zij = (1,a i j ) t . In fact, as we see, the same covariate vector is used for the upperlevel units (individuals), so that a set of covariates may appear twice in the same model, once for modeling lower-level effects and once for modeling upper-level effects. Similarly, CH propose the same multilevel model in their Equations 4.5 to 4.7, with the covariates being, in their notation, 1 and t , (their t corresponding to my j ) . KBH use the gender and physical attractiveness of the partner as lower-level covariates and subject gender as the upper level covariate.
SPECIAL FEATURES OF LONGITUDINAL DATA Repeated measures, longitudinal data, and times series are all terms with a long history in statistics referring to measurements taken over time within a sampling unit. Add to these the term functional data, used by Ramsay and Dalzell (1991) to describe data consisting of samples of curves. Each curve or functional datum is defined by a set of discrete observations of a smooth underlying function, where smooth usually means being differentiable a specified number of times. A comparison of Analysis of Longitudinal Data by Diggle et al. (1994) with the more recent Functional Data Analysis by Ramsay and Silverman (1997) highlights the strong emphasis that the latter places on using derivative information in various ways. The more classic repeated measures data are typically very short time series as compared with longitudinal or functional data, and time series data are usually much longer. Most time series analysis is directed at the analysis of a single curve and is based on the further assumptions that the times at which the process is observed are equally spaced and that the covariance structure for the series has the property of stationarity, or time invariance.
The Missing Data or Attrition Problem Longitudinal or functional data are expensive and difficult to collect if the time scale is months or years and the sampling units are people. Attrition due to moving away, noncompliance, illness, death, and other events can quickly reduce a large initial sample down to comparatively modest proportions. Much more information is often available about early portions of the process, and the amount of data defining correlations between early and late times can be much reduced. Although there are techniques for plugging holes in data by estimating missing values, a few of which are mentioned by CH and R, these methods
90
Ramsay
implicitly assume that the causes of missing data are not related to the size or location of the missing data. This is clearly not applicable to measurements missing by attrition in longitudinal data. Indeed, the probability of a datum being missing can be often related to the features of the curve; one can imagine that a subject with a rapidly increasing antisocial score in CH’s data over early measurements is more likely t o have missing later measurements. I am proposing, therefore, to use the term missing data to refer to data missing for unrelated reasons and the term attrition to indicate data missing because of time- or curve-related factors. The data analyst is therefore faced with two choices, both problematical to some extent. If only complete data records are used, then the sample size may be too limited to support sophisticated analyses such as multilevel analysis. Moreover, subjects with complete data are, like people who respond to questionnaires, often not typical of the larger population being sampled. Thus, an analysis of complete data, or a balanced design, must reconcile itself, like studies of university undergraduates, with being only suggestive by a sort of extrapolation of what holds in a more diverse sampling context. Alternatively, one can use methods that use all of the available data. The computational overhead can be much greater for unbalanced designs, and in fact the easily accessible software packages tend to work only with balanced data. In any case, as already noted, results for later times will tend to both have larger sampling variance, due to lack of data, and be biased if attrition processes are related to curve characteristics. Thus, although using ”missing-at-random” hole-filling algorithms, such as developed by Jennrich and Schluchter (1986) and Little and Rubin (1987), may make the design balanced in terms of the requirements of software tools in packages such as SAS and BMDP, the results of these analyses may have substantial biases, especially regarding later portions of the curves, and this practice is risky. I must say that my own tendency would be, lacking appropriate full data analysis facilities, to opt to live with the bias problems of the complete data subsample rather than those that attrition and data substitution are all too likely to bring.
Regist rat ion Problem Figure 4.1 displays curves showing the acceleration in growth for ten boys, and we see there that there are two types of variation happening. Phase variation occurs when a curve feature, such as the pubertal growth spurt, occurs a t different times for different individuals, whereas the more familiar amplitude variation occurs when two units show differences in the characteristics of a feature even though their timings may be similar. We can see that averaging these curves results in a mean curve that does not resemble any of the actual curves; this is because the presence of substantial phase variation causes the average to be smoother than actual curves. In effect, phase variation arises because there are two systems of time
91
n
10-
-15;
8
10
12
14
16
18
Years
Figure 4.1: Acceleration curves for the growth of 10 boys showing a mixture of phase and amplitude variation. The heavy dashed line is the mean curve, and it is a poor summary of curve shapes.
involved here. Aside from clock time, there is, for each child or subject, a biological, maturational or system time. In clock time, different children hit puberty at different times, but, in system time, puberty is an event that defines a distinct point in every child's growth process, and two children entering puberty can reasonably be considered at equivalent or identical system times, even though their clock ages may be rather different. From this perspective, we can imagine a possibly nonlinear transformation of clock time t for subject i, which we can denote as hi(t),such that, if two children arrive at puberty at ages tl and t2, then hl(t1) = hz(t2). These transformations must, of course, be strictly increasing. They can be referred t o as time warping functions. If we can estimate for each growth curve a suitable warping function h such that salient curve features or event are coincident with respect t o warped time, then we say that the curves have been registered. In effect, phase variation has been removed from the registered curves, and what remains is purely amplitude variation. The first panel of Figure 4.2 displays the warping functions hi(t) that register the 10 acceleration curves in Figure 4.1. The second panel of Figure 4.2 shows the registered acceleration curves in Figure 4.1. The multileveling modeling methods considered by these three authors, and in most applications, concern themselves solely with amplitude variation. However, it seems clear that children evolve in almost any measurable respect at different rates and at rates that vary from time t o time within individuals. The potential for registration methods t o clarify and enhance multilevel analyses seems important. Recent references on curve registration are Ramsay and Li (1998) and Ramsay and Dalzell (1995).
Ramsay
92 18
15
ir^
2
2 16 (d $14
10
$0 5 v
-g
-
12
.5
0
2
!!
10
a, a,
0 -5
8 -10 a
t - 8
-
66
8
10 12 14 16 18 Years
156
8 10 12 14 16 18 Years
Figure 4.2: The left panel contains the warping functions for registering the growth acceleration curves in Figure 5.1, and the right panel contains the registered curves.
Serial Correlation Problem Successive longitudinal observations usually display some amount of correlation, even after model effects are removed. Typically, this correlation between successive residuals decays fairly rapidly as the time values become more widely separated. This serial correlation is obvious in CH’s Table 3.1, for example, and we know from studies of human growth that heights measured 6 months apart have correlations between residuals of about minus 0.4. This may be due t o ”catch-up” processes that ensure that slow growth is followed by more intense growth episodes. Negative serial correlation causes actual observations t o tend to oscillate rapidly around a smooth curve that captures the longer-scale effects, whereas positive serial correlation results in slow smooth oscillations. Modeling these serial effects requires introducing some structure in the covariances among successive errors eij. Diggle et al. (1994) have an excellent discussion of these effects, as well as modeling strategies, and all three papers mention the problem. We return t o this issue in the modeling sections.
Resolution of Longitudinal Data The amount of information about a curve that is available in a set of ni measurements is not well captured by ni itself. Let us consider other ways of thinking about this. An event in a curve is a feature or characteristic that is defined by one or more measurable characteristics. 1. Levels are heights of curves and are defined by a minimum of one observation. The antisocial score of a child at time 1 in CH’s data is a level.
Longitudinal and Functional Data
93
2. Slopes are rates of change and require at least two observations to define. Both CH and R are concerned with slope estimation. 3. B u m p s are features having (a) the amplitude of a maximum or minimum, (b) the width of the bump, and (c) its location; they therefore are three-parameter features. We may define the resolution of a curve as the smallest size on the time scale of the features that we wish to consider. Discrete observations are usually subject to a certain amount of noise or error variation. Although three error-free observations are sufficient, if spaced appropriately, to define a bump, even the small error level present in height measurements, which have a signal-to-noise ratio of about 150, means that five observations are required to accurately assess a bump, corresponding to twice- yearly measurements of height. More noise than this would require seven measurements, and the error levels typical in many social science variables will imply the need for even more. This is why R says that slope inferences cannot be made with only two observations. We may, therefore, define the resolution of the data to be the resolution of the curve that is well-identified by the data. The five observations per individual considered by CH are barely sufficient to define quadratic trend, that is, a bump, and confining inferences to slopes for these data is probably safer, especially given their high attrition level. However, the data resolution is not always so low, and many data collections methods under development in psychology and other disciplines permit the measurement of subjective states over time scales of hours and days rather than weeks and months. Brown and Moskowitz (1998) offer an interesting example of higher-resolution emotional state data. Equipment for monitoring physio!ogical or physical data are now readily available with time scales of seconds or milliseconds and with impressive signal-to-noise ratios. The optical motion tracking equipment that is used in studies such as Ramsay, Heckman, and Silverman (1997) has sampling rates of up to 1,200 Hz and records positions accurate to within a half a millimeter.
MULTILEVEL MODELS Notation Here we use notation that has become fairly standard in statistics texts and journals, such as Diggle et al. (1994) and Searle, Casella, and McCulloch (1992) as well as for multilevel analysis in other sciences. We shall use and ui to indicate regression coefficient vectors to be applied to covariate vectors xi and zij, respectively, so that
Ramsay
94
This model can be expressed more cleanly in matrix notation, so we will need to use the vector notation y to indicate a vector containing all of the lower-level observations. The length of this vector is N = n n i . Within this vector, the lower level index j will vary inside the upper index i. At the same time, we need to gather the covariate vectors xi and zij into matrices X and Z , respectively. These two matrices will both have N rows. Here is what these matrices look like for R’s rather typical longitudinal data model specified in Equations 2.1 and 2 . 2 , which, when these equations are combined, and we exchange the roles of i and j , is
Kj = Po0 + Ploaij + U O +~ uliaij + rij First, let matrix Ti have ni rows and 2 columns; and in column 1 place l’s, and in column 2 place the centered age values a i j . Then the matrix X has 2 columns, and contains
and matrix Z contains 2 m columns and is 0
Ti 0
T2
...
.. .
I
0
10
I
I
... ... ... ...
0 0 I
. ..
IT,
Corresponding to matrix Z, we also define vector u as containing, one after another, the individual covariate vectors ui = (uoi,U I ~ ) ~ .
Upper Model Level The behavior of observations yZj given fixed values of regression coefficients of the residual or error term e i j . In the most commonly used multilevel model, this error distribution is defined to be normal with mean 0. In some applications, it is assumed that these errors are independently distributed for different pairs of subscripts, and that they have variance c2.However, for longitudinal data where the lower level of data is an observation of an outcome variable at a specific time, this may be too simple. In this setting, therefore, we specify the covariances among errors by gathering the errors eij into a large vector e, just as we did for the observations yZj , and declare in the upper level model that e is normally distributed with mean vector 0 and with variance-covariance matrix of R order N . The upper level model may also be expressed as, conditional on p and u,
P and ui is determined by the behavior
95
Longitudinal and Functional Data
or, in other words, given fixed known values of p and u, y has a normal distribution with mean Xp Zu and variance-covariance matrix R. Thus, the upper level model for statisticians corresponds to the lowest level of the data. Perhaps this indicates that they see themselves as reporting up to mathematicians rather than down t o we scientists on the ground.
+
Lower Model Level One of the paradigm shifts in statistics has been the gradual acceptance by statisticians of a Bayesian perspective. The process has been fraught with controversy, and there is no lack of statisticians who would not like t o be thought of as Bayesians, but the concept that parameters, too, have random distributions has caught on. According to the Bayesian view, we must now specify the random behavior of our regression coefficients /3 and U.
Our lower model level assumptions are that (i) p and u are independently distributed, (ii) p is normally distributed with mean PO and variancecovariance matrix B , and (iii) u is normally distributed with mean 0 and variance-covariance matrix D. These assumptions imply that the observation vector y is unconditionally normally distributed with mean XP and variance-covariance matrix
v a r ( y ) = XBXt
+ ZDZt + R
Upper level (in the model sense) parameters p and 2~ are often called effects, lower model level matrices B, D, and R are called variance components, and coefficient vector PO can be called a hyperparameter.
Variance Components Notice that R is N by N , and therefore too large to estimate from the data without strong restrictions on its structure. It is often assumed to be diagonal and of the form a21. However, special alternative structures may be required for longitudinal data, and we consider these in the next section. However, D may also be large, and is of order 2m in the models used by CH and R. Because m may be large, special structural assumptions are required to reduce the number of variances and covariances to be estimated from the data. If we can assume that between-subject covariances are 0, then D has a block-diagonal structure, with the same variance-covariance submatrix, denoted here by C , in each diagonal block, that is, D has the structure
Ramsay
96
This simplifies things considerably because we now only have to estimate q(q 1 ) / 2 variance components. For CH’s example, q = 2, and C is specified in their Equation 3.1. However, sometimes this independence between upper data level units is not sensible; schools as an upper level, for example, which are geographically close, will share many features in common, and their effects should be considered to be correlated. Parameter independence assumption i is especially important and is perhaps the central structural characteristic of the multilevel model. It asserts that lower level variation in the data sense is unrelated to higher level variation. There are certainly situations where this might be challenged. For example, we might suppose that where lower level units are students and upper level units are schools, certain types of students will respond differently to being placed in certain types of schools, such as parochial, than they will in other types, such as public schools. In fact the assumption is critical to methods for fitting the model. Assumptions ii and iii indicate, by their specifications of the mean vectors for p and u, that the general or upper-level mean structure is determined by hyperparameters 0 , and that lower-level unit means are centered on this mean structure.
+
Effects as Fixed or Determined by Further Covariates A specialization of this model is often assumed; the upper data level parameter is declared to be a fixed parameter, which corresponds to the assumption that B-1 -+ 0. This case is referred to as the mixed multilevel model. In this case, the variance-covariance matrix for y simplifies to UUT(Y)
= ZDZ’
+ RandP =
On the other hand, because we can think of ,/3 as a random variable, we can propose a linear model for its variation. This is what CH do in their Equations 3.8 and 3.9. Consequently, let W be a p by r matrix of covariates for modeling p, and let a be a vector of T regression coefficients for the model /3 = W a e * . Then we have
+
wur(y) = XWBWtXt
+ ZDZt + R
where B is now the variance-covariance matrix for a. Actually, this does not really require a modification of our multilevel model notation because the net effect is simply to replace X by XW, and /3 by a , and this is therefore formally the same model.
Multilevel Modeling Objectives The multilevel model does three things. First, it specifies a relatively specific covariance structure for observation vector y. The smaller the numbers of columns for X and Z, and/or the more specialized their structure, the
Longitudinal and Functional Data
97
more restrictive this model is. The papers of CH and R offer examples of how a simple initial covariance structure specification can be incrementally extended, with a test after each extension indicating whether the fit to the observed variance covariance matrix has significantly improved. Moreover, as we shall see in discussing longitudinal data, there is the potential in most multilevel software to further specialize the structure of the two variance-covariance matrices D and R. Secondly the multilevel model is especially powerful for estimating lowerlevel data effects defined by the q / s . It is often the case that there are few measurements within upper-level units to define these effects, or in extreme cases insufficient data to define them uniquely, but the model permits each unit t o borrow information from other units to supplement sparse or missing information. The result is much like increasing the sample size for each unit. This borrowing of strength depends, first, on the data within a unit being relatively weak, either because of the limited number of observations available or because of the high noise level in measurements, and second on the degree of uniformity of variation across upper level units that is, the extent to which variance component matrix D is small relative to error component R. Finally, upper model level effects captured in /3 are often the main interest and in this multilevel analysis shares goals with ordinary multiple regression. Indeed, /3 may be captured by averaging across equivalent lower level effects within the ui1.s. This is what KBK aim at in their use of ordinary and weighted least squares methods, and they are quite right to point out that, relative to this task, there may be little achieved by going to the more demanding multilevel technology. How important is it to get the variance-covariance structure right, if all we need are upper-level fixed effect estimates? Diggle et al. (1994) suggest that in many situations a simple variance components model, such as that in classical repeated measures ANOVA, may be preferable from the perspective of effect estimation, even when the data confirm a more complex model. I tend to agree; when the sample size is modest, it’s more important to conserve degrees of freedom for error by economizing on the model than pulling out all the stops to make bias small. However, in social science applications, one has the impression that the primary focus is on estimating variance components, so that the multilevel model belongs squarely in the long tradition of such models, which includes principal components, factor, and canonical correlation analysis as well as structural equation modeling. Both CH and R model how well the fit to the data improves when both effects and variance components are added to an existing model.
98
Ranisay
SPECIAL FEATURES OF LONGITUDINAL MODELS AND FUNCTIONAL DATA Let us now have a look at modeling longitudinal data, with the possibility that the number of observations ni may be rather larger than for the data sets considered by CH and R. These remarks are selected from material in Ramsay and Silverman (1997).
Within-Subject or Curve Modeling There are two general approaches t o modeling curves f
1. Parametric: A family of nonlinear curves defined by a specific model is chosen. For example, test theorists are fond of the three-parameter logistic model, P ( 0 ) = c + (1- c)/{l + ezp[-a(0 - b ) ] } , for modeling the relationship between the probability of getting a test item correct as a function of ability level. 2 . Nonparametric: A set of K standard fixed functions & , called basis functions, are chosen, and the curve is expressed in the form
The parametric approach is fine if one has a good reason for proposing a curve model of a specific form, but this classic approach obviously lacks the flexibility that is often called for when good data are available. The so-called nonparametric approach can be much more flexible given the right choice of basis functions and a sufficient number of them, although it is, of course, also parametric in the sense that the coefficients k must be estimated from the data. However, t,he specific functional form of the basis functions tends not to change from one application t o another. However, this approach is linear in the parameters and therefore perfect for multilevel analysis. Here are some common choices of basis functions: 1. Polynomials: Q,(t) = t”’. These well-known basis functions (or equivalent bases such as centered monomials, orthogonal polynomials, and so on) are great for curves with only one or two events, as defined previously. A quadratic function, for example, with K = 3, is fine for modeling a one-bump curve with 3 to 7 or more data points. These bases are not much good for describing more complex curves, though, and have been more or less superseded by the B-spline basis described subsequently. 2 . Polygons: 0, = 1 if j = k, and 0 otherwise. For this simple basis, coefficient a k is simply the height of the curve at time t k , and there is a basis function for each time at which a measurement is taken. Thus, there is no data compression; and there are as many parameters to estimate as observations.
Longitudinal and Functional Data
99
3. B-splines: This is now considered the basis of choice for nonperiodic data with sufficient resolution to define several events. A B-spline consists of polynomials of a specified degree joined together at junction points called knots, and is required to join smoothly in the sense that a specified set of derivatives are required to match. The B-spline basis from which these curves are constructed has the great advantage of being local, that is, nonzero only over a small number of adjacent intervals. This brings important advantages at both the modeling and computational levels. 4. Fourier series: No list of bases can omit these classics. However, they are more appropriate for periodic data, where the period is known, than for unconstrained curves of the kind considered by CH and R. They are, too, in a sense, local, but in the frequency domain rather in the time domain.
5 . Wavelets: These basis functions, which are the subject of a great deal of research in statistics, engineering and pure mathematics, are local simultaneously in both the time and frequency domain and are especially valuable if the curves have sharp local features at unpredictable points in time. Which basis system to choose depends on (a) whether the process is periodic or not and (b) how many events or features the curve is assumed to have that are resolvable by the data at hand. The trick is to keep the total number K of basis functions as small as possible while still being able to fit events of interest. Basis functions can be chosen so that they separate into two orthogonal groups, one group measuring the low-frequency or smooth part of the curve and the other part measuring the high-frequency and presumably more variable component of within-curve variation. This opens up the possibility of a within-curve separation of levels of model. The large literature on smoothing splines follows this direction, and essentially uses multilevel analysis, but refers to this as using regularization.
Variance Components We have here the three variance components: (a) R containing the withincurve variances and covariances, (b) D containing the second level variances and covariances, and (c) B containing the variances and covariances among the whole-sample parameters a. The behavioral science applications that I’ve encountered tend to regard components in R as of little direct interest. For nonlongitudinal data, it is usual to use R = ~ ’ 1 and, , as Diggle et al. (1994) suggest, this may even be a wise choice for longitudinal data because least squares estimation is known to be insensitive to mis-specification of the data’s variance-covariance structure. However, in any case, the counsel tends to be to keep this part of
100
ltamsay
the model as simple as possible, perhaps using a single first-order autocorrelation model if this seems needed. More elaborate serial correlation models can burn up precious degrees of freedom very quickly and are poorly specified unless there are large numbers of observations per curve. It is the structure of D that is often the focus of modeling strategies. If, as is usual, between-subject covariances are assumed to be 0, then we noted previously that D is block-diagonal and the m submatrices C in the diagonal will be of order K and all equal. Both CH and R consider saturated models in which the entire diagonal submatrix is estimated from the data. Of course, as the number of basis functions K increases, the number of variances and covariances, namely K ( K 1)/2, increases rapidly. The great virtue of local basis systems, such as B-splines, Fourier series, and wavelets, is that all covariances sufficiently far away from the diagonal of this submatrix can be taken as 0, so that the submatrix is band-structured. The PROC MIXED program in the SAS (1995) enables a number of specialized variance-covariance structures. The contents of B will be of interest only if the group parameters in p are considered random. This seems not often to be the case for behavioral science applications.
+
CONCLUSIONS Multilevel analysis requires a substantial investment in statistical technology. Programs such as SAS PROC MIXED can be difficult to use because of unfriendly documentation, poor software design, and options not well adapted to the structure of the data at hand. Although this is not the place to comment on algorithms used to fit multilevel models, these are far from being bulletproof, and failures to converge, extremely long computation times, and even solutions that are in some respect absurd are real hazards. Indeed, the same may be said for most variance components estimation problems, and there is a large and venerable literature on computational strategies in this field. The best known variance components model for behavioral scientists is probably factor analysis, where computational problems and the sample sizes required for stable estimates of unique variances have led most users with moderate sample sizes to opt for the simpler and more reliable principal components analysis. The emphasis that CII and R place on testing the fit of the various extensions of their basic models might persuade one that fitting the data better is always a good thing. It is, of course, if what is added t o the model to improve the fit is of scientific or practical interest. In the case of multilevel models, what are added are the random components and whatever parameters are used to define the variance-covariance structures in D and
R. On the other hand, adding fitting elements may not be wise if the effects of interest can be reasonably well estimated with simpler methods and fewer
Longitudinal and Functional Data
101
parameters. For example, if there is little variability in the curve shapes across individuals, then adding random coefficients burns up precious degrees of freedom for error. The resulting instability of fixed effect parameter estimates, and loss of power in hypothesis tests for these effects, will more than offset the modest decrease in bias achieved relative to that for a simple fixed effects model. As for variance components, is worth saying again along with Diggle et al. (1994) and others that modeling the covariance structure in D and R will usually not result in any big improvement in the estimation and testing of the fixed effect vector /3. KBK have added an important message of caution in this regard. If you are specifically interested in the structure of D and/or R, and have the sample size t o support the investigation, then multilevel analysis is definitely for you. If you want to augment sparse or noisy data for a single individual by borrowing information from other individuals, so that you can make better predictions for that person, then this technique has much to offer. However, if it is the fixed effects in /3 that are your focus, then I wish we knew more about when multilevel analysis really pays off. If you have longitudinal data, it might be worth giving some consideration t o the new analyses such as curve registration and differential models emerging in functional data analysis. To sum up, which method t o use very much depends on your objectives.
REFERENCES Brown, K. W., & Moskowitz, D. S. (1998). Dynamic stability of behavior: The rhythms of our interpersonal lives. Journal of Personality, 66, 105-134. Diggle, P. J., Liang, K.-Y., & Zeger, S. L. (1994). Analysis of longitudinal data. Oxford: Clarendon Press. Jennrich, R., & Schluchter, M. (1986). Unbalanced repeated-measures model with structured covariance matrices. Biornetrics, 42, 809-820. Little, R. J. A., & Rubin, D. B. (1987). Statistical analysis with missing data. New York: John Wiley and Sons. Ramsay, J. O., & Dalzell, C. (1991). Some tools for functional data analysis (with discussion). Journal of the Royal Statistical Society, Series B, 53, 539-572. Ramsay, J. O., & Dalzell, C. (1995). Incorporating parametric effects into functional principal components analysis. Journal of the Royal Statistical Society, Series B, 57, 673-689. Ramsay, J. O., Heckman, N., & Silverman, B. (1997). Spline smoothing with model-based penalties. Behavior Research Methods, 29(1), 99106.
102
Ramsay
Ramsay, J. O., & Li, X. (1998). Curve registration. Journal of the Royal Statistical Society, Series B, 60, 351-363. Ramsay, J. O., & Silverman, B. W. (1997). Functional data analysis. New York: Springer.
SAS Institute. (1995). Introduction t o the M I X E D procedure course notes (Tech. Rep.). Cary, NC: SAS Institute Inc. Searle, S.R., Casella, G., & McCulloch, C. E. (1992). Variance components. New York: Wiley.
Chapter 5
Analysis of Repeated Measures Designs with Linear Mixed Models Dennis Wallace University of Kansas Medical Center
Samuel B. Green Arizona State University Repeated measures or longitudinal studies are broadly defined as studies for which a response variable is observed on a t least two occasions for each experimental unit. These studies generate serial measurements indexed by some naturally scaled characteristic such as age of experimental unit, length of time in study, calendar date, trial number, student’s school grade, individual’s height as a measure of physical growth, or cumulative exposure of a person to a risk factor. The experimental units may be people, white rats, classrooms, school districts, or any other entity appropriate to the question that a researcher wishes to study. To simplify language, we will use the term ”subject” t o reference the experimental units under study and the term ”time” to reference the longitudinal indexing characteristic. In most general terms, the objectives of repeated measures studies are to characterize patterns of the subjects’ response over time and to define relationships between these patterns and covariates (Ware, 1985). Repeated measures studies may focus simply on change in a response variable for subjects over time. Examples include examining changes in satisfaction with health as a function of time since coronary bypass surgery, assessing the pattern of change in self concept for adolescents as a function of height, and characterizing the growth of mathematical ability in elemen-
103
104
Wallace and Green
tary school children as a function of grade level. Studies may incorporate simple, non-time-varying covariates that differentiate subjects into two or more subpopulations. The primary emphasis in these studies is on characterizing change over time for each subpopulation and assessing whether the patterns of change differ across subpopulations. For example, an objective of a study might be to assess whether the growth rate of emotional maturity across grade levels differs between male and female students. Besides having an interest in differences between average growth rates for subpopulations, we could also be interested in the differences between variability in growth rates for these subpopulations. The non-time-varying covariate could also be a continuous variable, such as family income at beginning of study, and the focus could be on growth rate as a function of income. Alternatively, a repeated measures design might include a covariate that varies over time. For example, a study examining weekly ratings by kneesurgery patients of their health satisfaction might include weekly ratings by physicians of the inflamation level of the patients’ knees. Regardless of whether a study includes non-time-varying covariates and/or time-varying covariates, we may choose not to include time as a variable in our analysis but to have time act simply as an index for defining different observations. One example might be a study in which we explore the relationship between body image and sexual self-confidence over time. We could ignore time as a predictor in the analysis and investigate the hypothesis that the slope coefficients in predicting sexual self-confidence from body image vary more greatly for obese men than for normal weight men. As suggested by the previous research questions, we may want to use repeated measures designs to answer questions about variability in behavior as well as average behavior. The linear mixed model allows us to model both means and variances so that we can investigate questions about both of these components. In addition, because the linear mixed model allows accurate depiction of the within-subject correlation inherently associated with the repeated measurements, we can make better statistical infcrences about the mean performances of subjects. This chapter introduces the linear mixed model as a tool for analyzing repeated measures designs. Basic terminology for repeated measures studies (Helms, 1992) is introduced to simplify the subsequent discussion. A repeated measures study has a regularly timed schedule if measurements are scheduled at uniform time intervals on each subject; it has regularly timed data if measurements are actually obtained at uniform time intervals on each subject. Note that the uniform intervals can be unique for each subject, but the time intervals between observations are constant for any given subject. A repeated measures study has a consistently timed schedule if all subjects are scheduled to be measured at the same times (regardless of whether the intervals between times are constant) and has consistently timed data if the measurements for all subjects are actually made at the same times. If studies are designed with a regular and consistent schedule and data are actually collected regularly and consistently, analyses can be accomplished relatively easily with
Mixed Models
105
general linear models techniques. Although many experimental studies are designed t o have regularly and consistently timed data, they often yield mistimed data or data with missing observations for some subjects. In nonexperimental, observational studies, data are rarely designed and collected in this manner. Consequently, we need flexible, analytic methods for analyzing data that are not regularly and consistently timed, and the linear mixed model meets this need. Although many articles have been published throughout the history of statistics discussing methods for the analysis of repeated measures data, the literature has expanded rapidly in the past 15 years, as evidenced by the publications of Goldstein (1987), Lindsey (1993), Diggle et al. (1994), Bryk and Raudenbush (1992), and Littell, Milliken, Stroup, and Wolfinger (1996). Some authors have focused on the analysis of repeated measures data from studies with rigorous experimental designs and discuss analysis of variance ( ANOVA) techniques using univariate and multivariate procedures. Procedures using an ANOVA approach are described in detail by Lindquist (1953), Winer (1971), Kirk (1982a, 1982b), and Milliken and Johnson (1992). Although these methods work well for some experimental studies, they encounter difficulties if studies do not have regularly and consistently timed data. Consequently, researchers faced with observational data have approached repeated measures data through the use of regression models, including hierarchical linear models (HLM) and random coefficients models. The discussions of HLM by Bryk and Raudenbush (1992) and of random coefficients models by Longford (1993) and Diggle et al. (1994) take this approach. As we illustrate in this chapter, linear mixed models provide a general approach t o modeling repeated measures data that encompasses both the ANOVA and HLM/random-coefficients approaches. The remaining sections of the chapter outline how the mixed model can be used to analyze data from repeated measures studies. We first introduce the linear mixed model and discuss it in the context of a simple example. Next we describe methods for estimating model parameters and conducting statistical inference in the mixed model. We also briefly introduce how t o implement these methods with the SAS procedure MIXED. Then we outline some of the issues that must be considered in constructing models arid discuss strategies for building models. In the last section, we present an example analysis illustrating the use of the MIXED procedure and the strategies for building models.
THE MIXED MODEL We describe the linear mixed model in the context of most repeated measures applications and ignore some of its most general features. With the linear mixed model, we predict y from the fixed component Xp and the random components Z u and error e :
Wallace and Green
106
y = Xp
+ Zu+ e
y is an N x 1 vector containing response variable scores y i j i at time j:
Y‘ = [YII 3/12
-.. 3/iivl
3/21 3/22
.- -
YZN~
---
3/11
---
(5.1) for subject
YIN~]
(5.2)
where the number of times that subject i is observed is Ni, the number of subjects is I , and the total number of observations is, therefore, N = g = I Ni. The first term on the right side of the equation, XP, is defined in the same way as it is in the general linear model. X is an N x q design matrix for the fixed component, containing the N scores for the q predictors assumed to be measured without error, whereas p is the q x 1 vector of unknown population parameters, P o , ...pp-l,. The ps are fixed-effects parameters for predicting subjects’ responses in general or, in other words, parameters that describe population mean behavior. The population parameters in p may inaccurately depict the relationship between the response variable scores, yij, and the predictors for some subjects. The random-effects component Zu allows us to define different relationships among subjects. Zi is an Ni x r design matrix for an individual subject and includes r predictors. For most models, the r predictors are a subset of the q predictors in the X design matrix, and the subset of predictors is the same for all subjects. Z is a block diagonal N x I r design matrix for the random component, with the matrices Z I , Z Z , ..., ZI down its diagonal. The vector ui, where ui[uoiuli...u(,-l)i] is an r x 1 vector of unknowns associated with Z i . The unknowns in ui are subject-specific random effects and represent the differences in coefficients for the T predictors for subject i and the s for the same predictors. The vectors u1, u ~ , ..., UI are stacked within u, an I r x 1 vector containing the subject-specific random effects for all subjects. Finally, the random-effects component e is a N x 1 vector containing the within-subject random errors, e i j :
...
e’ = [e11e12 e1N1e21e22...ezN 2...ell...elN1]
(5.3) The distributional assumptions for the model are that the residual coefficients in u are normally distributed with a mean of 0 and variance of G. G is an I r x I r block diagonal matrix, with each block, G;, containing the between-subject variances and covariances among the residual coefficients. As characterized by the block diagonality of G, vectors for any two subjects ui and ui, are assumed to be independent. Also the errors in e are independent of the residuals in u and are normally and independently distributed with a mean of 0 and covariance matrix R. R is N x N block diagonal matrix, with each block, Ri, containing the variances and covariances of the within-subject errors. Typically, we assume that Gi and Ri are homogeneous across subjects; however, the mixed model is flexible and permits
Mixed Models
107
Table 5.1 Data for a Simple Longitudinal Study
Time 1
Time 2
Time 3
Time 4
(0 months)
(3 months)
(6 months)
(9 months)
Subject 1
2
3
7
9
Subject 2
5
10
18
20
Subject 3
3
4
4
5
different covariance matrices for the between-subjects random effects (Gi,) or for the within-subject errors (Ri,) for different treatment groups. The variances and covariances among yij around their expected value Xp is V, where
V = ZGZr
+R
(5.4)
The parameters of the model and, consequently, the parameters that must be estimated are the fixed effects ( p ) , the random-effect variances and covariances ( G ) , and the error variances and covariances (R). The random effects (u) are not model parameters and often are not of interest to researchers because they are specific t o particular subjects (although the random effects can be predicted and the predictions could be included in hypotheses and tested).
A Simple Example To illustrate the mixed model, we consider a very simple example in which infants are assessed in their perceptual skills at 0, 3, 6, and 9 months of age, and we are interested in their rate of growth over this time period. The data are presented in Table 5.1. In developing a model, we assume that perceptual skills develop linearly across time, but that this relationship might vary from person t o person. First we substitute into the prediction equation for the mixed model, y = Xp Z u e , the appropriate values:
+
+
108
Wallace and Green Prediction Line
251
j01 Hxed -Effecls
..el 4-E,,
3
0
9
6
Tmein Months
Figure 5.1: Graph for simple example illustrating fixed and random effects.
-
-
2 3 7 9 5 10 18 20 3 4 4 5 d
-1
-
1 1 1 1 1 1 1 1 1 1 1
-
0 3 6 9 0 3 6 9 0 3 6 9-
-
[$;I+
1 0 0 0 0 0 1 3 0 0 0 0 1 6 0 0 0 0
-
-
el 1 el2 el3
0 0 1 3 0 0 0 0 1 6 0 0 0 0 1 9 0 0
:5.5) ez 4
e31
0 0 0 0 1 3 0 0 0 0 1 6 -0 0 0 0 1 9-
-
e32 e33 e34
-
The design matrix X and the fixed-effects coefficients ,Ll are defined in exactly the same manner as they are in the general linear model, and /30 and P1 have comparable meaning. According to a mixed model analysis, PO = 3.300 and P1 = .933; the intercept, PO, represents the mean perceptual skill for all children at 0 months of age, whereas the slope, P I , represents the mean perceptual skill for each month of age. The fixed-effects (or population-mean) prediction line is shown in Figure 5.1. Next we examine the random-effects component, Zu. Along the diagonal of the random-effects design matrix Z are the design matrices for the individual subjects, and these design matrices are the same for all subjects:
The random coefficients in u,uoi and uli, represent residual intercepts and slopes, respectively. For this model, the first two observations for subject 1 can be written algebraically as:
109
Mixed Models 'Subjectl: .SubjectZ: Subject3.
E ( y i j I U O ~ , U I I ) = .69xlj+ 2.40 E ( p j I~0l,u1?)=1.79~?,+5.11 E ( y 3 j I u o ~ , u o ~ ) = - 3 2 ~ 3 j +2.39..,. Subject2 -...-**
........
,..-.".....Fixed Effects:
3
0
6
Time in Months
E(J,,) = . 9 3 ~+ j 3.30
9
Figure 5.2: Graph for simple example illustrating the variability of the prediction lines for individual subjects around the line for the population.
and y12
= Po f 3pi 4-2101
+ 32111 f e12
(5.8)
As shown in Figure 5.1, the prediction line for subject 2 has an intercept of ?!a, 2102 and a slope of 1 + u12,whereas the prediction line for the population has an intercept of PO and a slope of 81. Accordingly, the residual intercept, 2102, is the difference between the intercept for subject 2 and the intercept for the population prediction line. Similarly, the residual slope for subject 2, u12, is the difference between the slope for subject 2 and the slope for the population prediction line. Although typically we would not estimate the random coefficients, we did so for illustrative purposes. The residual intercepts are -.900, 1.806, and -.907 for subjects 1, 2, and 3, respectively, whereas the residual slopes are -.241, .852, and -.612 for the same three subjects. Given the fixed-effects intercept and slope are 3.300 and .933, respectively, the intercepts and slopes are 2.400 (3.300 .900) and .692 (.933 - .241) for subject 1, 5.106 (3.300 1.806) and 1.785 (.933 .852) for subject 2, and 4.393 (3.300 - ,907) and .321 (.933 - .612), for subject 3, respectively. The differences in the intercepts and slopes for subjects are graphically displayed in Figure 5.2. The model parameters in G are the variances and covariances among elements of the subject-specific vectors imbedded in the vector u. In this example, we included not only the variances for the residual intercepts and residual slopes but also the covariances between the residual intercepts and slopes. Given that no distinctions were made among subjects, we
+
+
+
110
Wallace and Green
assumed homogeneity of variances and covariance of the random effects across subjects. Under these conditions, the covariance matrix G; is the same for all subjects,
Gi =
[
ff;
ff12
(712
u;
]
(5.9)
where a: is the variance in intercepts between subjects, 0: is the variance in slopes between subjects, and u: is the covariance between the intercepts and slopes. The structure of G is:
r
0;
O12
012
a;
0 0 0 0
0 0 0 0-
G=I
0 0
0 0
OT
(712
ff12
0 0
u2”
0 0 0 0
O 0 0 0
0 0
ff;
ff12
012
u;
l
I
(5.10)
For our example, a?,a;, and 0 1 2 were estimated to be 2.707, .591, and 1.130, respectively. The magnitudes of these variances and covariance are consistent with the values for the estimated residual intercepts and slopes. For example, the estimated residual intercepts and slopes appear strongly positively related, and the correlation between them based on variances and covariances in G is .89(= 1.130/42.707 * .591). As illustrated in Figure 5.1, a within-subject error, e i j , represents the deviation of an observation at time j for subject i from that subject’s individual prediction line. These deviations may represent departures of the subject’s growth curve from linearity, measurement errors, or a combination of both. The variance of these errors is a model parameter and was estimated in our example to be 1.333. Given we do not hypothesize any covariance among the within-subject errors and assume homogeneity of these errors across all subjects, R is a 12 x 12 matrix with 1.333 along its diagonal. The covariance matrix among the dependent variable scores is a function of the random-effects design matrix, the covariance matrix among the random effects, and the covariance matrix among the within-subject errors, that is, V = Z G Z l + R. In our example, we estimate V as
- 4.04
6.10 9.49 6.10 16.15 23.52 9.49 23.52 38.88 12.88 32.22 51.57 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 - 0 0 0
0 12.88 0 0 0 32.22 0 0 0 51.57 0 0 0 72.25 0 0 0 4.04 6.10 9.49 0 6.10 16.15 23.52 0 9.49 23.52 38.88 0 12.88 32.22 51.57 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 12.88 32.22 51.57 72.25 0 0 0 0
0 0 0 0 0 0 0 0 4.04 6.10 9.49 12.88
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 6.10 9.49 16.15 23.52 23.52 38.88 32.22 51.57
0 0 0
0 0
0 0 0 12.88 32.22 51.57 72.25
(5.11)
111
Mixed Models
V indicates that variances increase with time and correlations are larger if they are between later observations. These results are consistent with the data as displayed in Figures 5.1 and 5.2 that show a greater divergence in the Y scores from the line for the fixed effects at later observations.
ESTIMATION AND STATISTICAL INFERENCE Estimation is more complicated in the mixed model than in the general linear model in that, not only are the fixed-effects parameters in unknown, but also the variance and covariance parameters in G and R must be estimated. Unlike the general linear model, they generally cannot be estimated independently of the fixed-effects parameters. As noted by Littell et al. (1996), the appropriate method for estimating the components of ,b is no longer ordinary least squares but rather generalized least squares using the inverse of V as the weight matrix. However, the elements of V are seldom known. Consequently, estimation is generally based on likelihoodbased methods as described in detail by Laird and Ware (1982), Harville (1977), and Jennrich and Schluchter (1986) among others, assuming that u and e are normally distributed. Likelihood-based methods yield distinct estimating equations for the fixed-effects parameters and variance components, with the fixed-effects estimating equations requiring estimates of the variance components and estimating equations for the variance components requiring estimates of the fixed effects. Consequently, the overall estimation procedures require iteration between the fixed-effects equations and variance components equations t o reach a final solution. First, we consider methods t o obtain variance component estimating equations followed by a discussion of the fixed-effects estimating equations. The two principal methods for estimating variance components are maximum likelihood (ML) and restricted (or residual) maximum likelihood (REML). Under the assumption of normality for both the random effects and the error terms, the likelihood function for the mixed model for all subjects combined has the form (Hocking, 1985):
N 1 1 l o g ( 2 ~) -log IVI - -(y - X p ) ’ V - l ( y - XP) (5.12) 2 2 2 1 1 = - -log IVI - -(y - xp)’v-’(y- xp) 2 2 in which the variables in the model are defined previously. Standard maximization procedures can be used t o derive a set of nonlinear estimating equations for the components of V, but these equations generally involve the elements of p. Patterson and Thompson (1971) proposed REML as an alternative method for estimating covariance parameters. This procedure relies on -
,
--
c
112
Wallace and Green
using the error space of the fixed-effects design matrix t o estimate the covariance parameters. The justification for the REML suggested by Patterson and Thompson (1971) relies on modifying the ML procedure via factorization of the likelihood function. Conceptually, this process involves partitioning the data space into two orthogonal components: the estimation space, defined as the column space of X , and the orthogonal complement of the estimation space, the error space. Mathematically, one can project the data vector onto the error space and estimate the variance components from this error space independently of the fixed effects. This procedure accounts for the loss of degrees of freedom needed to estimate the fixed effects and is analogous t o using the mean square error t o estimate o2 in classic ANOVA. Harville (1974) offers a justification of this procedure from a Bayesian framework and presents a convenient form of a likelihood function that can be used t o obtain REML estimates:
1 1, = C - - log IX'V-lXJ 2
-
1 2
1 2
- log IVI - - ( y - XP)'V-'(y - X b )
(5.13)
Estimates for the components of V can be obtained by maximizing this function, again to obtain a set of nonlinear estimating equations dependent on the elements of p. The SAS procedure MIXED can be used t o develop either ML or REML estimates, with REML estimates generally preferred unless the data sets are quite large. Conditional upon V being known, either ML methods or general leastsquares methods can be used t o derive the following set of estimating equations for the fixed effects parameters:
p = [x'v-'x]-1x'v-'y
(5.14)
To obtain estimates for p, the estimates of the variance components from the NIL or REML estimation procedure are used in the previous estimating equation. Either the ML or REML estimation can be implemented with the SAS MIXED procedure. Statistical inference in the mixed model is generally concerned with tests concerning the fixed effects that can be formulated in terms of sccondary parameters that are linear combinations of the fixed-effects parametcrs in p. More precisely, c secondary parameters are in a column vector 8, with 0 = C1p - 80,where C1 is c x q matrix that contains the coefficients for tlic c linear combinations of fixed-effects parameters and 00 is a c x 1 vector consisting of constants, in practice almost always zeros. The hypotheses of interest are of the form Ho : 8 = 0 versus Ha : 19 # 0. Three types of test statistics have been proposed for testing hypotheses of this type: likelihood ratio tests, Wald tests, and approximate F tests (Littell et al., 1996). Simulation studies indicate that Wald tests tend t o be overly liberal, possibly by as much as a factor of 2 for mixed models (Woolson & Leeper, 1980), and that likelihood ratio tests also tend t o be overly liberal (Andrade
113
Mixed Models
& Helms, 1986). 11s contrast, simulation studies by McCarroll and Helms (1987) suggested that an approximate F-test provides reasonable Type I error protection. Consequently, we recommend that the approximate F test, as described by Littell et al. be used. Using hypothesis tests as defined previously, the approximate F-test has the form:
where V = $I' . The numerator degrees of freedom for the test are generally accepted t o be the rank of C1, but as discussed in the section on modeling issues, no general consensus has been reached as to the best choice of denominator degrees of freedom for the test. The test statistic is implemented in the SAS MIXED procedure.
ISSUES IN APPLYING MIXED MODELS TO LONGITUDINAL DATA Statistical modeling provides best results if researchers have a clear understanding of the characteristics and structure of their data and implement a well-formulated analysis strategy in response t o explicitly defined research questions. Issues such as how measurements are made, how data are collected, bias in subject selection, and distributional properties of outcomes that are important in constructing a statistical model using classic regression or ANOVA techniques are also important for analysis of repeated measures data with mixed models. Investigators must address these issues t o obtain reliable results from mixed model analyses; however, we will focus on issues that are unique t o mixed models and repeated measures data. Longitudinal data with multiple observations on the same subject are inherently more complicated statistically than independent observations from different subjects. Two interrelated factors that contribute t o this complexity are that observations on the same subject are correlated and that the data contain both between-subject and within-subject information. The linear mixed model provides great flexibility for handling these added complexities, but the flexibility comes with a price, as investigators must decide among diverse options in selecting a model structure. All statistical models involve distributional assumptions about the mean and variance structures of the data, but modelers tend t o focus on the mean or expected value component given the limited choices among the variance components for most statistical procedures. In contrast, application of linear mixed models t o longitudinal data requires the modeler to address both the expected value (fixed effects) and the variance (or random effects) components explicitly. Recall that the mathematical formulation of the mixed model is: y =X p
+ Zu + e
(5.16)
114
Wallace and Green
with ,b representing the fixed effects and both u and e representing random effects, although the actual parameters of the random effects are associated with the covariance matrices G and R of u and e, respectively. Hence, in developing the model, the researcher must specify the parameters that are included in the vector p , generally through the specification of the design matrix X, and the structure of G and R. Specifying the structure of the individual parameters in the fixed effects vector ,b is a bit more complicated in the mixed model with longitudinal data than in the classic linear model for independent data in that the investigator must determine how to handle between-subject and within-subject information. The paragraphs that follow first discuss some of the options available for characterizing the variance components of the model. Then, issues related to handling withinsubject and between-subject information in the fixed-effects component of the model are discussed.
Random-Effects Components of the Mixed Model The variance-covariance components in the random-effects components of the linear mixed model can be partitioned into two types-those associated with u, the between-subject variance-covariance components, and those associated with e , the within-subject variance-covariance components. The ways in which the variability in the outcome can be partitioned between these two components and the structures that can be used for G and R are quite diverse and are limited only by the capabilities of the software and the insight of the researcher. Because the number of structures that one might encounter is so wide ranging, we describe only some of the more widely used ones in this chapter.
Between-Subjects Random Effects (G Matrix). One of the simpler models that falls within the framework of the mixed model is the classic univariate approach to repeated measures analysis of variance. In this model, the variance is partitioned into two components, a between-subject variance component ( 0 % )and a within-subject variance component (0;). We start with this relatively simple model and then generalize to models with more complex random-effects components. Consider a simple repeated measures experiment with 2 groups representing 2 subpopulations, 3 subjects per group, and 3 replications, for a total of 18 observations. The fixed-effects component of the model can be modeled in the usual way and is not of concern here. To obtain the appropriate structure for the variance components, which is a compound symmetric V matrix under the usual assumptions of repeated measures ANOVA, we construct a block diagonal Z matrix with 18 rows and 6 columns as shown in Figure 5.3. The G matrix is simply 0: * I6 (where IN is an N x N identity matrix), and the R matrix is 0; * 118. Under these conditions, V is compound symmetric, as can be shown by substituting into the following
Mixed Models
115
1 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0
Z=
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
1 1 1 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 1 1 1 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 1 1 1 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 1 1 1 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 1 1 1
Figure 5.3: Random effects design matrix for repeated measures model.
+
equation, V = ZGZ’ R. Conceptually, this classic model can be viewed as a model in which the subjects are random effects (hence the blocks in the Z matrix), with each subject having a mean score (or intercept) that deviates from the mean for the subject’s subpopulation. The residual intercepts are assumed to be normally distributed with a mean of 0 and a variance of 0;. Now let’s assume that we want to apply a more complicated model that allows for different linear growth curves for different subpopulations. The data are from a study like the one just described (3 subjects in each of 2 groups, with each subject measured at 3 times), except the three times each subject is measured differs across subjects. Ignoring the fixed-effects component of the model, the random-effects component can be viewed as an expansion of the one for the traditional repeated measures ANOVA model: Subjects are still random, but each subject deviates from the subpopulation average behavior not only in location but also in slope. Because different subjects are measured a t different times in this study, they are likely to show differential slopes assuming growth is occurring, even if in theory growth is uniform over a fixed period of time for all subjects. The appropriate Z matrix for this model is shown in Figure 5.4. Note the block diagonal structure of this matrix with each block of two columns representing a single subject. For each subject, the structure in Z is constructed in essentially the same way that a design matrix is structured in a univariate linear model.
Wallace and Green
116
1 t l l O 1 t 1 2 O
Z=
1 t 1 3 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
O
0 0 0
1 t 2 1 1 t 2 2 1 t 2 3
0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0
0 0 0
0 0 0 O 0 O 0 O 0 1 t 3 1 I t 3 2
1 t 3 3 O
0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0
0 0 0 0 0 0
0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 O 0
0 0 0 0 0 0 0 0 0
1 t 4 1 0 1 t 4 2 O 1 t 4 3 O
0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0
1 t 5 1 O I t 5 2 0
I t 5 3 0 0 0 I t 0 0 I t 0 0 I t
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 6 1 6 2 6 3
Figure 5.4: Random effects design matrix for random slope model.
Within each block, the first column is a column of 1’s associated with the subject-specific deviations from the population average (fixed-effect) intercept, and the second column contains the times of each observation for a specific subject. These times are associated with the subject-specific deviations from the population average (fixed-effect) slope. Assume that the R matrix for this model is the same as the earlier model, that is, ui * I ~ s We . can begin to see the flexibility of the mixed model by assessing how to model the G matrix. The matrix must have greater structure than the traditional repeated measures model because we must now consider between-subject variability in slopes as well as intercepts. In addition, we must consider whether the subject-specific deviations in intercept and slope are independent or correlated with each other. We could allow the variances for residual intercepts and slopes to differ or constrain them t o be equal; however, we would have difficulty developing a research scenario that would justify constraining these variances to be equal in the context of our example. To demonstrate the flexibility of the mixed model, we also introduce the possibility that the variance components for the intercept and the slope differ between groups. Figures 5.5-5.7 outline three different G matrices that might be specified based on these considerations. Figure 5.5 presents a covariance matrix for the random effects in which the residual intercepts and slopes can have different variances, the covariance between the residual intercepts and slopes is
117
Mixed Models
G=
- 0 ;
0
0 0 0 0 0 0 0 0 0 0 0
0;
0 0 0 0 0 0 0 0 0 0
0 0 0;
0 0 0 0 0 0 0 0 0
0 0 0 0;
0 0 0 0 0 0 0 0
0 0 0 0 0;
0 0 0 0 0 0 0
0 0 0 0 0 0;
0 0 0 0 0 0
0 0 0 0 0 0 0;
0 0 0 0 0
0 0 0 0 0 0 0 0;
0 0 0 0
0 0 0 0 0 0 0 0 0;
0 0 0
0 0 0 0 0 0 0 0 0 0;
0 0
0 0 0 0 0 0 0 0 0 0 0;
0 0 0 0 0 0 0 0 0 0 0
0
0;
Figure 5.5: Alternative form of G matrix for random slopes model. constrained to 0, and the variances are constrained to be equal between the two groups. Figure 5.6 presents a covariance matrix for the random effects in which the residual intercepts and slopes can have difference variances, the covariance between the residual intercepts and slopes is constrained to 0, but the variance components are allowed to differ between the two groups. Finally, Figure 5.7 presents a covariance matrix for the random effects in which the residual intercepts and slopes can have difference variances] the covariance between the intercept and slope is allowed to be nonzero, and the variance and covariance parameters are constrained to be equal between the two groups. We address how we should choose among these different structures in a later section about modeling strategy.
Within-Subjects Random Effects Matrix. The previous discussion examined alternative approaches to modeling the between-subject random-effects component with little attention to the withinsubject variance component. In all cases, we assumed that the R matrix would have a very simple structure, that is, 0; * I. An alternative strategy for modeling is to focus on the R rather than the G matrix. As discussed earlier, the R matrix characterizes the variance structure for the withinsubject residuals after the model accounts for the population or fixed effects and the subject-specific random-effects deviations from those fixed effects. One approach to modeling longitudinal data is to choose the Z matrix to be 0 and to model all of the variation from the fixed effects with the R matrix. Even within this relatively general approach, the assumptions described earlier place some constraints on the structure of the R matrix. First, subjects are generally assumed to be independent] forcing the structure of the matrix to be block diagonal with each subject representing a block. Second, at least some level of homoscedasticity across subjects is as-
118
Wallace and Green
'a;IO
0
oa;so
G=
0 0 0 0 0 0 0 0 0 0
O F t I
0 0 0 0 0 0 0 0 0
oa 0 0 0 0 0 0 0 0
0 0
0 0 0
0 0 O 0 ;so 0 Oa;IO 0 oa;s 0 0 O 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 o 0 0 0 & O 0 0 0 oa,2so 0 0 0 0 Oa,2,O 0 0 0 0 Oa,2s0 0 0 0 0 oa,210 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0
Figure 5.6: Alternative form of G matrix for random slopes model.
G=
Figure 5.7: Alternative form of G matrix for random slopes model.
Mixed Models
119
Figure 5.8: Alternative structure for the R matrix. Unstructured.
sumed, constraining the variance components to be the same in each block. If a study contains multiple subpopulations, this constraint may be relaxed somewhat to allow different variance components for different subpopulations, but within subpopulations, subjects are assumed to have the same variance-covariance components. Again, consider the study with three subjects in each of two groups, with measurements at three time points for each subject. Assume that all deviations from the fixed effects are to be modeled in the R matrix. Figures 5.8-5.13 present six common variance-covariance structures that are used to model the within-subject variability. Each matrix represents a block for a single subject (Ri) from the block-diagonal R matrix. Figure 5.8 represents the most flexible structure, the unstructured covariance matrix. This matrix allows variances to differ at all time points and a different covariance between any pair of time points and requires estimation of 6 variance-covariance parameters. In contrast to Figure 5.8, Figure 5.9 represents the most constrained structure, with variances at all time points assumed to be equal and covariances between all time points assumed to be 0. This structure implies that the observations on the same individual are independent of each other and is often unrealistic for longitudinal studies, unless subject effects are addressed in the G matrix. The structure in Figure 5.10 expands the structure from 5.9 to allow different variances at the three time points. Again, unless subject effects are addressed in the G matrix, this model is likely to be unrealistic for longitudinal studies because it does not permit dependence among observations for subject. Figure 5.11 represents a compound symmetric structure with equal variances a t each time point, and correlations across time assumed to be epual (Le., the correlation between subject i’s observations at times 1 and 2 is equivalent to the correlation between the observations a t times 1 and 3). The resultant V matrix differs slightly from the one for the repeated measures ANOVA model in that the correlation across time may be negative or positive in Figure 5.11, but is constrained to be non-negative with the repeated measures ANOVA structure. The matrix in Figure 5.12 is a slight expansion from that in 5.11 in that variances are allowed to differ at different time points, but correlations between time points are constrained to be equal. Finally, Figure 5.13 represents a first-order autoregressive model in which the covariances decrease between observations that are further apart. The variance-covariance structures depicted in Figures 5.8 through Fig-
Wallace and Green
120
R=[!
!2]
Figure 5.9: Alternative structure for the R matrix. Independence.
R=[% 0 $ :0
:]
u3”
Figure 5.10: Alternative structure for the R matrix. Unequal time variance.
R=
[
o2
pa2
pa2 p 2
o2 pa2
pa2
s2]
Figure 5.11: Alternative structure for the R matrix. Compound symmetric.
Figure 5.12: Alternative structure for the R matrix. Heterogeneous compound symmetric.
Mixed Models
121
Figure 5.13: Alternative structure for the R matrix. First-order autoregressive. ure 5.13 vary greatly. As few as one variance component is estimated for Figure 5.9, whereas as many as six variance and covariance components are estimated for Figure 5.8. These structures represent only a few of the types of structures that might be considered. Each of these structures could obviously be extended to allow different variance-covariance components for different subpopulations. However, the complexity of t,he structure that can be developed is often limited by the amount of data available for the study. As suggested by the previous paragraphs, the combination of G and R matrices provide great flexibility in modeling variance-covariance structures. The model provides some redundancy in that the "same" model can be generated by partitioning variance components differently across the two matrices. For example, specifying a single between-subject variance component in G with an identity matrix for R results in the same structure as specifying no random components and a compound symmetric R matrix. Because these alternative formulations of the model produce the same results, the researcher selection of one over the other has no practical differences. Wolfinger (1993) suggests that random effects are more suited t o modeling correlation among a large number of observations whereas, covariance modeling in R is more local. Typically our approach is t o select an overall structure for V and partition the variance components between G and R in a way that makes the variances and covariances most interpretable within the study design.
Fixed-Effects Component of the Mixed Model Generally, the strategies for developing the fixed-effects components of the model are similar to those used to develop linear models when the observations are independent. However, one additional issue that must be addressed is the potential confounding of within-subject and between-subject relationships in the computation of fixed-effects estimates (Jacobs et al., 1999; Kreft 81. de Leeuw, 1998). Consider a study in which longitudinal data are collected at grades 1 through 6 and the primary question is whether student achievement in mathematics and teacher expertise in mathematics covary as students progress through these six grades. If a model is constructed that simply predicts student achievement from teacher expertise, the p1 estimate associated with student achievement is based on both
122
Wallace and Green
the within-subject and the between-subject relationships between these two variables. This estimate is interpretable only if the within-subject and the between-subject relationships are the same. In other words, to use this estimate, the observations within and between subjects are assumed t o be exchangeable, that is, the relationship between achievement and expertise for a particular student at two time points (within-subject) is assumed to be the same as the relationship between achievement and expertise for two different students at the same time point (between-subject relationship). This assumption is not likely to be viable in most behavioral science applications. For these cases, the fixed-effects component of the model should be constructed in such a way as to partition the relationship into between-subject and within-subject estimates. As described by both Kreft and de Leeuw (1998) and Jacobs et al. (1999), this partitioning can be accomplished by including in X (i.e, the fixed-effects design matrix) the mean predictor scores for subjects (subject-specific mean) and the deviations of the predictor scores around their subject-specific means. In our example, a subject-specific mean is computed by averaging the teacher expertise for a student across grades for that subject, whereas a deviation score is a subject’s teacher expertise score for a grade minus the student-specific mean teacher expertise score for that subject. The coefficient for the mean is then interpreted as a between-subject effect, and the coefficient for the deviation score is interpreted as a within-subject effect.
IMPLEMENTING MIXED MODELS USING SAS PROC MIXED PROC MIXED@ is implemented through a series of commands or statements. Many models can be fit using only five statements: PROC MIXED, CLASS, MODEL, RANDOM, and REPEATED. Each of these statements has a distinct purpose that is consistent with the mathematical structure of the mixed model described earlier. We briefly describe each of the statements.
PROC MIXED This statement initiates the algorithm and identifies the source of data to be used in the analysis. The METHOD= option allows the investigator to specify the method used to estimate parameters (e.g., R,EML and ML).
CLASS This statement identifies predictors as classification variables and creates dummy variables for these classification variables in the X and Z matrices.
Mixed Models
123
MODEL The MODEL statement in PROC MIXED is similar to the model statement in PROC GLM and establishes the structure of the fixed-effects component (XP).The fixed-effect predictors may include classification or continuous variables and includes the intercept by default. PROC MIXED uses a less-than-full-rank parameterization for classification variables, which is identical to the one used in PROC GLM (Green, Marquis, Hershberger, Thompson, & McCollam, 1999).
RANDOM This statement specifies the structure of the G matrix by listing the randomeffects predictors that are to be included in the Z matrix. Unlike the MODEL statement, an intercept is not included in the RANDOM statement by default and must be specified (INT) if required. The key options on the RANDOM statement are SUBJECT=, which defines the subjects in the data set needed to create the block diagonal structure in the Z matrix; TYPE=, which specifies the type of structure in the G matrix; and GROUP=, which allows heterogeneity in variance components across subpopulat i w s .
REPEATED It specifies the structure of the R matrix. If no REPEATED statement is used, the structure of R is assumed to be g21. Key options on the REPEATED statement are SUBJECT=, which defines the subjects in the data set needed to create the block diagonal structure in the R matrix, TYPE=, which specifies the type of structure in the R matrix; and GROUP=, which allows heterogeneity in variance components across subpopulations. The goal for this section was to show that the syntax for PROC MIXED was developed to reflect the various parts of the mixed model. Other source3 are required to understand how to structure the data set and how to write the code to conduct analyses using PROC MIXED. Readers are encouraged to read the SAS documentation (SAS, 1997) on PROC MIXED and consult excellent references by Littell et al. (1996) and Singer (1998).
A FOUR-STEP MODELING STRATEGY FOR BUILDING AND REFINING MIXED MODELS As with any model-based data analysis, investigators must clearly define the research questions that the model is designed to answer. If the questions are not well defined, the likelihood of the results yielding unambiguous answers is negligible. Although research questions may address parameters associated with the random-effects part of the model, most research questions are answered by the fixed-effects part. In constructing the design matrix
124
Wallace and Green
for the fixed-effects component, investigators should include not only the predictors that directly address the research questions but also possible confounding variables and moderator variables. In addition, if there are time-varying predictors, decisions must be made concerning how to include them in the model so as to differentiate or not differentiate between the between-subject and within-subject effects for them. If the random effects or fixed effects are misspecified, the hypothesis tests for these effects are biased. Accordingly, investigators must use some modeling process that requires them to evaluate what fixed-effects parameters are important to include in the model given the random effects parameters are correctly specified and to assess what random-effects parameters are important to include in the model given the fixed-effects parameters are specified correctly. No single approach yields the ideal modeling strategy because of the complexity of model, and different modeling strategies are often necessary for different types of analyses. Nevertheless, we suggest a modeling strategy that provides a platform for investigators who are novice users of mixed models that can be adapted with experience in analyzing data with mixed models. Multiple replications of a modeling strategy such as the one described subsequently are often needed to address research questions posed by a study. Step 1. Review past literature in the substantive area and conduct descriptive analyses o n the sample data to formulate a n understanding of the data for the development of the initial mixed model. By reviewing theories explaining the phenomenon of interest and relevant empirical literature, investigators should be in a better position to describe what they should find when they analyze their data. For example, an investigator might conclude from a literature review that growth on an outcome variable should generally show marked improvement over the initial observations but then show slower improvement. In addition, the pattern of growth should vary over individuals. The first conclusion would help an investigator formulate the fixed-effects component of the model, whereas the second conclusion would aid in the development of the random-effects component of the model. Investigators should gain further insights into the probable outcome of their analyses by computing descriptive statistics and creating graphs. For example, investigators could make judgments about the fixed-effects components by examining the box-plots on the outcome variable for each occasion as well as their means and standard deviations for occasions. Investigators could make decisions about both fixed-effects and random-effects components by developing plots of the subject-specific empirical growth curves (sometimes called spaghetti plots because of their appearance) that show the changes over time for individual subjects in a study. Depending on the number of subjects in the study, a single plot can contain the growth curves for all subjects (for a small number of subjects) or only a random selection of subjects if the number is so extensive that patterns are obliterated by the volume of data.
Mixed Models
125
Literature is developing that should help investigators with this step. Christensen, Pearson, and Johnson (1992) have extended graphical and diagnostic techniques developed for the general linear model to mixed models, and these techniques are often useful for model development. Also, Grady and Helms (1995) describe some ad hoc techniques useful for selecting an appropriate structure for the R matrix. Step 2. Formulate a n initial mixed model and evaluate t h e fixed-effects parameters. At this step, select a relatively simple structure for the random-effects component that does not conflict with the conclusions drawn from Step 1. Next choose a fixed-effects component with a relatively large number of parameters that is still consistent with the conclusions from Step 1. Then use a data-driven strategy to generate a more parsimonious fixed-effect component by eliminating predictors that do not appear to be related to the outcome. However, regardless of the results, some parameters are likely to be retained in the model to address the research questions appropriately. Use the approximate F-test to evaluate predictors (see Equation 5.6). S t e p 3. Evaluate t h e structure of t h e random-effects component, while t h e fixed-effects component includes t h e predictors that appeared important o n t h e basis of Step 2. Using the fixed-effects structure from Step 2, model the covariance structure, starting with the most feasible complex structure that is consistent with the conclusions from Step 1. Next use a data-driven strategy t o move toward a simpler structure. For nested models with no change in fixed effects, a likelihood ratio test based on differences in REML likelihoods can be used to discriminate between models. For models that are not nested, likelihood-based selection criteria, such as Akaike’s and Schwarz’s criteria, have been adapted for mixed models, but information about their performance in small to moderate samples is limited (Bozdogan, 1987; Wolfinger, 1993). S t e p 4. F i n e t u n e t h e fixed-effects structure and random-effects covariance structure. Accuracy of tests of the fixed effects depends on whether the randomeffects component is correctly specified, and accuracy of tests of the random effects depends on whether the fixed-effects component is correctly specified. Consequently, the fixed-effects component of the model could be re-examined next, whereas the random-effects component is specified based on the results of Step 3. Potentially, this process of alternately assessing random-effects and fixed-effects components of the model continues until no changes in the model structure are necessary.
EXAMPLE MIXED MODEL ANALYSIS Schumacher et al. (1999) conducted a longitudinal study that investigated the relative effectiveness of two treatment programs designed to reduce ad-
126
Wallace and Green
diction in homeless individuals with cocaine addiction. Each participant in the study was assigned to one of two treatment groups and assessed on multiple measures prior to treatment and at 2, 6, and 12 months during treatment. One treatment group received immediate abstinent-contingent housing followed by abstinent-contingent housing, work therapy, and aftercare (DT+), whereas the other group received only work therapy and aftercare (DT). The hypotheses for our analyses focus on the effect of treatment on drug addiction and the relationship between depression and drug addiction. Drug addiction was measured by the Addiction Severity Index (ASI), which yields aggregate scores ranging from 0 to 100, whereas depression was measured by the Beck Depression Inventory (BDI), which produces total scores ranging from 0 to 63, with a score of 16 used as the cutoff for labeling individuals as depressed. Our analyses included 141 subjects, 69 from the DT group and 72 from the DT+ group. The number of subjects with scores on the AS1 varied across the observations at 2, 6, and 12 months postintervention: 48, 49, and 43, respectively, for the D T subjects and 62, 61, and 57 for the DTf subjects. Retention in the study was greater for DT+ than for D T subjects, but for simplicity of presentation we conducted our analyses assuming observations were missing at random.
Step 1 We examined a series of issues in Step 1. Because both the AS1 and the BDI measures were assessed over time, we had to consider whether the relationship between them is the same within a subject as it is between subjects. Equality of the within-subject and the between-subject relationships appeared conceptually untenable. Accordingly, we created two predictors to examine the relationships separately: a between-subject BDI predictor and a within-subject BDI predictor. Scores on the between-subject BDI predictor were computed as mean BDI across time for each subject, and scores on the within-subject BDI predictor were calculated as deviations between a BDI score for a particular time and the mean BDI across time for each subject. Also at Step 1, we examined graphs to assess the within-subject relationship between depression and addiction across time for each subject as well as graphs to evaluate the between-subject relationship between depression and addiction across subjects at each time. For both sets of graphs, the variables appeared to be monotonically related with no readily apparent nonlinear effects. The level of addiction and magnitude of linear slope seemed to vary across subjects, which would suggest the possibility of nonzero variances for both of these terms. We decided not to consider nonlinear relationships between the BDI and AS1 scores for three reasons: (a) graphical results failed to support nonlinear relationships; (b) no theoretical rationale could be provided for these relationships, and (c) an insufficient number of time points (at most three observations) were collected t o ex-
Mixed Models
127
amine these nonlinear relationships reliably. We visually examined the plots also t o evaluate whether the model should include interactions between depression and treatment. Although the plots showed no apparent differences in relationships between drug addiction and depression as a function of treatment, both between-subject and within-subject interactions were considered for inclusion in the model because of the difficulty in detecting interactions in the complex graphical displays and because theoretical rationales could be provided for their existence. Finally, general practice in the addiction literature is to control for baseline level of addiction in reporting results; consequently, we included the baseline drug addiction score in all models.
Step 2 At Step 2, we developed an initial model with a relatively simple structure for the random-effects component and a relatively complex fixedeffects structure that was consistent with the conclusions from Step 1. The random-effect component included only a random intercept or location effect in G and a common residual variance in R. In this case, both G and R are diagonal matrices, with G having the between-subject variance and R having the within-subject variance on the diagonal. The dimension of G is the number of subjects in the study (141) and the dimension of R is the total number of observations at all three time points (320). Letting tx = treatment (0 = DT and 1 = DT+), beckmn = between-subject BDI predictor, beck-dev = within- subject BDI predictor, i d n = subject ID, drug-b = baseline ASI, and drug = AS1 at a particular time, the key SAS code for Model 1 is shown in Table 5.2. This code creates a less-than-full-rank fixed-effects design matrix (X) with 10 columns: a column for the intercept, 2 columns for the treatment effect, 1 column for the between-subject BDI predictor, 1 column for the within- subject BDI predictor, 2 columns for the interaction of treatment with the between-subjects BDI predictor, 2 columns for the interaction of treatment with the within-subjects BDI predictor, and 1 column for the baseline AS1 predictor. The model was fit with REML estimation techniques. Results from this model (Model 1) were examined with a primary emphasis on simplifying the fixed effects using approximate F tests. The results (not shown) suggested that treatment-specific slopes were not needed for the within-subject relationship between BDI and AS1 ( p = 0.51 for the treatment by beck-dev interaction). Results also suggested that the treatment intercept did not differ by group ( p = 0.46) and the baseline AS1 was unimportant ( p = 0.23). However, these latter parameters were retained in the model; the first source addresses one of the research hypotheses and the second source is required t o minimize confounding. A revised model (Model 2) was fit next: the SAS code for this model is shown in Table 5.2. Model 2 was identical to Model 1 except the interaction of treatment with the within-subjects BDI predictor was eliminated based
128
Wallace and Green
Table 5.2 SAS Code for Example Models of Drug Addiction Model 1 PROC MIXED data=work.analysis noclprint method=reml; CLASS id-n treat; MODEL drug=treat beck-dev beck-mn treat*beck-dev treat*beck-mn drug-b/ddfm=satterth; RANDOM int/sub=id-n; Model 2 PROC MIXED data=work.analysis noclprint method=reml; CLASS id-n treat; MODEL drug=treat beck-dev beck-mn treat*beck-mn drug-b/ddfm=satterth; RANDOM int/sub=id-n; ESTIMATE 'DT Intercept' int 1 treat 1 0; ESTIMATE 'DT+ Intercept' int 1 treat 0 1; ESTIMATE 'Between Slope--DT' beck-mn 1 treat*beck-mn 1 0 ; ESTIMATE 'Between Slope--DT+' beck-mn 1 treat*beck-mn 0 1; ESTIMATE 'Within Slope' beck-dev 1; ESTIMATE 'DT Level Beck of 16' int 1 treat 1 0 beck-mn 16 treat*beck-mn 16 0; ESTIMATE 'DT+ Level Beck of 16' int 1 treat 0 1 beck-mn 16 treat*beck-mn 0 16; ESTIMATE 'Treatment Effect Beck of 16' treat*beck-mn 16 -16; Model 3 PROC MIXED data=work.analysis noclprint method=reml; CLASS id-n treat; MODEL drug=treat beck-dev beck-mn treat*beck-mn drug-b/ddfrn=satterth; RANDOM int beck-dev/sub=id-n group=treat type=un; Model 4 PROC MIXED data=work.analysis noclprint method=reml; CLASS id-n treat; MODEL drug=treat beck-dev beck-mn treat*beck-mn drug-b/ddfm=satterth; RANDOM int/sub=id-n grouytreat;
Mixed Models
129
on the results of Model 1. Table 5.3 gives the SAS output for Model 2 from the PROC MIXED analysis. (Ignore for the moment the ESTIMATE statement and its associated results.) First, the iteration history summary indicates that the estimation procedure converged. The covariance parameter estimates show that the between-subject variance associated with the intercept (INTERCEPT) is approximately 14, whereas the within- subject residual variance (Residual) is approximately 73. Of primary interest are the tests of fixed effects, which indicate that the AS1 scores are related to depression both longitudinally ( p < .0001 for the B E C K D E V effect) and cross-sectionally ( p < 0.001 for BECK"), and the cross- sectional effect differs by treatment group ( p = 0.005 for TREAT*BECKMN). Because we wish to retain the treatment intercept effect and the baseline AS1 in the model regardless of the significance test results, no further simplification of the fixed-effects component of the model is required. In other words, Model 2 represents a final model for Step 2.
Step 3 In Step 3, we consider models with alternative random-effects components while maintaining the same fixed-effects component based on the results of Step 2. We initially considered two alternative covariance structures. For both structures, we included variances for both intercepts and slopes and a covariance between the intercept and slope. For Model 3a (see Table 5.2 for SAS code) we allowed separate covariance parameters for the treatment groups (by using the GROUP = t x option in the RANDOM statement), whereas for Model 3b we constrained the variance and covariance parameters to be the same for the two treatment groups. For both of these models, R was the same diagonal matrix of dimension 320 used earlier. For both models, the G matrix is block diagonal with blocks of dimension 2 with each block having a between-subject variance in intercept deviations, a between-subject variance in slope deviations, and the covariance between intercept and slope deviations. The structure of the two models is the same, but one model allows for six variance components and the second model constrains the number of components to three. Both models failed to converge, but the small estimates for the slope variance on the last iteration of each model suggested that the models might be overspecified. Convergence problems with the slope in the model may also have been related to the small number of observations per subject (3) and the limited variability in BDI scores for some subjects. One final model, Model 4, was fit (see Table 5.3 for SAS code) that allowed for a different variance for the intercepts for each treatment group but constrained the variance for the slopes to be zero. Because Model 2 is nested within Model 4?we compared the REML likelihoods, with the -2 REML log likelihoods being 2295.42 for Model 2 (see Table 5.3) and 2295.38 for Model 4 (result not shown). The likelihood ratio test for comparing the two models is the difference in these two values. The resulting x2 = 0.04 with 1 df suggests
Wallace and Green
130
Table 5.3 Illustrative PROC MIXED Output Based on Model 2 REML Estimation Iteration History Iteration Evaluations Objective Criterion 0 1 2
1 1731.7981954 2 1725.6730361 1 1725.6729872
0.00000006 0.00000000
Convergence criteria met Covariance Parameter Estimates (REML) Cov Parm Subject Estimate INTERCEPT ID-N 14.14489415 Residual 73.03833753
Model Fitting Information for DRUG Description Value Observations 316.0000 Res Log Likelihood -1147.71 Akaike's Information Criterion -1149.71 Schwarz's Bayesian Criterion -1153.44 -2 Res Log Likelihood 2295.415
Tests of Fixed Effects NDF DDF Type I11 F Pr > F Source TREAT 1 115 0.55 0.4587 BECK-DEV 1 192 52.76 0.0001 BECK-MN 1 120 29.84 0.0001 BECK-MN*TREAT 1 120 8.13 0.0051 DRUG-B 1 108 1.44 0.2330 ESTIMATE Parameter Estimate DT Intercept 1.47005477 DT+ Intercept 2.86058241 Between Slope--DT 0.72189909 Between Slope--DT+ 0.22235728 Within Slope 0.62155641 DT Level Beck of 16 13.02044016 DT+ Level Beck of 16 6.41829881 Trtmnt Eff Beck of 16 7.99266899
Statement Results Std Error DF 1.77670353 1.91948090 0.13367473 0.11145007 0.08557450 2.05050588 1.74330889 2.80361672
110 107 126 112 192 116 111 120
t
Pr > It1
0.83 1.49 5.40 2.00 7.26 6.35 3.68 2.85
0.4098 0.1391 0.0001 0.0485 0.0001 0.0001 0.0004 0.0051
Mixed Models
131
that the simpler Model 2 is preferred. Because Model 2 was selected in Step 2 as the final model, the fitting process is complete, making Step 4 unnecessary. The results from the ESTIMATE statement used in Model 2 are now helpful in interpreting the final model. The ESTIMATE statement generates an estimate of a linear combination of the fixed-effects parameters, the standard error of this linear combination, and the test statistic and p value for the test that the true population value for this linear combination is 0. The results indicate that a within-subject change of 10 units on the BDI results in a change of 6 units on the AS1 scale, whereas a difference of 10 units on the BDI results in a difference of 7.2 units in the DT group and a difference of 2.2 units in the DT+ group. At a BDI score of 0, the DT and DT+ groups show a nonsignificant difference in the drug addiction score of 1.4 units, whereas at a Beck score of 16 (a level defined as signifying a depressed state) the difference in mean between DT and DT+ groups is about 8 units, with the DT-t group having on the average lower scores on the AS1 than DT group (p = 0.005). The results suggest that addiction level is related to depression and that the abstinent-contingent housing component has a greater effect in reducing addiction in more depressed individuals. Potentially, additional analyses could be conducted to address the research questions. For example, time could be introduced into the analyses as a covariate to assess if the relationship between the BDI and AS1 maintains itself controlling for time. To address this question, we would have to initiate the modeling strategy again but now include not only the predictors that we previously used but also time and higher-order interaction terms with time.
CONCLUSION We believe that the mixed model has wide applicability in the analyses of repeated measures data collected by behavioral science researchers. We have attempted to describe the mixed model and how it can be applied so that behavioral scientists can see how they might use it to answer their research questions. Because this chapter was written as only an introduction to mixed models, we encourage readers to seek more in-depth treatments of this topic (e.g., Diggle et al., 1994; Laird & Ware, 1982; Littell et al., 1996).
REFERENCES Andrade, ID. F., & Helms, R. W. (1986). Ml estimation and Ir tests for the multivariate normal distribution with general linear model mean and linear-structure covariance matrix: K-population, complete-data case. Communications in Statistics, Theory and Methods, 15, 89-107.
132
Wallace and Green
Bozdogan, H. (1987). Model selection and akaike’s information criterion (aic): The general gheory and its analytical extensions. Psychometrika, 52, 345-370. Bryk, A. S., & Raudenbush, S. W. (1992). Hierarchical linear models: Applications and data analysis methods. Newbury Park, CA: Sage. Christensen, R., Pearson, L. M., & Johnson, W. (1992). Case-deletion diagnostics for mixed models. Technometrics, 34, 38-45. Diggle, P. J., Liang, K.-Y., & Zeger, S. L. (1994). Analysis of longitudinal data. Oxford: Clarendon Press. Goldstein, H. (1987). Multilevel models in educational and social research. New York: Oxford University Press. Grady, J. J., & Helms, R. W. (1995). Model selection techniques for the covariance matrix for incomplete longitudinal data. Statistics in Medicine, 14, 1397-1416. Green, S. B., Marquis, J. G., Hershberger, S. L., Thompson, M. S., & McCollam, K. M. (1999). The overparameterized analysis-of-variance model. Psychological Methods, 4, 214-233. Harville, D. A. (1974). Bayesian inference for variance components using only error contrasts. Biometrika, 61, 383-385. Harville, D. A. (1977). Maximum likelihood approaches to variance component estimation and to related problems. Journal of t h e A m e r i c a n Statistical Association, 72, 320-338. Helms, R. W. (1992). Intentionally incomplete longitudinal designs: I. methodology and comparison of some full span designs. Statistics in Medicine, 11, 1889-1913. Hocking, R. R. (1985). T h e analysis of linear models. Monterey, CA: Brooks/Cole. Jacobs, D. R., Hannan, P. J., Wallace, D., Liu, K., Williams, 0. D., & Lewis, C. E. (1999). Interpreting age, period and cohort effects in plasma lipids and serum insulin using repeated measures regression analysis: The cardia study. Statistics in Medicine, 18, 655-679. Jennrich, R., & Schluchter, M. (1986). Unbalanced repeated-measures model with structured covariance matrices. Biometrics, 42, 809-820. Kirk, R. E. (1982a). Experimental design: Procedures f o r t h e behavioral sciences (2nd ed.). Pacific Grove, CA: Brooks/Cole. Kirk, R. E. (198213). Experimental design: Procedures f o r t h e behavioral sciences (3rd ed.). Pacific Grove, CA: Brooks/Cole.
Mixed Models
133
Kreft, I., & de Leeuw, J. (1998). Introducing multilevel modeling. London: Sage Publications. Laird, N. M., & Ware, J. H. (1982). Random-effects models for longitudinal data. Biornetrics, 38, 963-974. Lindquist, E. G. (1953). Design and statistical analysis of experiments in psychology and education. Boston: Houghton Mifflin. Lindsey, J . K. (1993). Models for repeated measurements. Oxford: Clarendon Press. Littell, R. C., Milliken, G. A., Stroup, W. W., & Wolfinger, R. D. (1996). S a s s y s t e m for linear mixed models. Cary, NC: SAS Institute. Longford, N. T . (1993). R a n d o m coeficients models. Oxford: Clarendon Press. McCarroll, K., & Helms, R. W. (1987). An evaluation of s o m e approximate f statistics and their small sample distributions for the mixed model with linear covariance structure. Chapel Hill, NC: University of North Carolina Department of Biostatistics. Milliken, G. A., & Johnson, D. E. (1992). Analysis of m e s s y data (Vol. 1). Belmont, CA: Wadsworth. Patterson, H. D., & Thompson, R. (1971). Recovery of interblock information when block sized are unequal. Biometrika, 58, 545-554. Schumacher, J. E., Milby, J . B., McNamara, C. L., Wallace, D., Michael, M., Popkin, S., & Usdan, S. (1999). Effective treatment of homeless substance abusers: The role of contingency management. In T. S. Higgins & K. Silverman (Eds.), Motivation behavior change among illicit-drug abusers (p. 77-94). Washington, DC: American Psychological Association. Singer, J. D. (1998). Using SAS PROC MIXED to fit multilevel models, hierarchical models, and individual growth models. Journal of Educational and Behavioral Statistics, 24, 323-355. Ware, J. H. (1985). Linear models for the analysis of longitudinal data. T h e A m e r i c a n Statistician, 39, 95-101. Winer, B. J. (1971). Statistical principles in experimental design (2nd ed.). New York: McGraw-Hill. Wolfinger, R. D. (1993). Covariance structure selection in general mixed models. Communications in Statistics, Stimulation and Computation, 2R 1079-1106.
Wallace and Green
134
Woolson, R. F., & Leeper, J. D. (1980). Growth curve analysis of complete and incomplete longitudinal data. Communications in Statistics, A9,
1491-1513.
Chapter 6
Fitting Individual Growth Models Using SAS PROC
MIXED Judith
D.Singer
Harvard University PROC MIXED is a flexible statistical computer program suitable for fitting individual growth models to data. Its position as an integrated program within the SAS statistical package makes it an ideal choice for empirical researchers seeking to do data reduction, management, and analysis within a single program. Because PROC MIXED was developed from the perspective of a “mixed” statistical model with random and fixed effects, its syntax and programming logic may appear unfamiliar to users who express individual growth models using sets of linked multilevel models. This chapter is written as a step-by-step tutorial that shows how to use PROC MIXED to fit individual growth models to repeated measures data on individuals.
INTRODUCTION As individual growth models increase in popularity, the need for credible flexible software to fit them to data increases. In their 1998 review of software for multilevel analysis, de Leeuw and Kreft found that most programs require users to conduct preliminary data reduction and data processing in a general all-purpose statistical package before outputting data files to the more specialized packages for analysis. Although the last few years have seen improvements in the front ends of the two most popular packagesHLM for Windows (Bryk et al., 1996) and MLwiN (Prosser, Rasbash, & Goldstein, 1996)-it is clearly attractive to be able to do all one’s data analysis in a single multipurpose piece of software.
135
136
Singer
In 1992, SAS Institute made this integration possible by adding PROC MIXED to their extensive menu of statistical analysis routines. In subsequent releases, SAS has updated and expanded the models and options available as part of PROC MIXED to the point that the program is now a reasonable choice for researchers fitting individual growth models. Although the documentation for PROC MIXED is complex (SAS Institute, 1992, 1996) and the “defaults” must be overridden to yield the specifications appropriate for individual growth modeling (Littell et al., 1996), the ability to do data reduction, management, and analysis in a single package is the hallmark advantage of using PROC MIXED. Because PROC MIXED was developed from a perspective distinctly different from that employed by most social and behavioral scientists, its syntax and programming logic can appear unusual. Unlike HLM and MLwiN, which were written with the kinds of models used by social scientists in mind, PROC MIXED was written by agricultural and physical scientists seeking a generalization of the standard linear model that allows for both fixed and random effects (McLean, Sanders, & Stroup, 1991). Although the SAS documentation does not make it immediately obvious, it is indeed the case that, with careful specification, you can fit many of the individual growth models discussed in this book. In a recent paper, I presented a step-by-step tutorial for using PROC MIXED to fit a wide variety of multilevel models (Singer, 1998). In this chapter, I narrow this tutorial to focus exclusively on the individual growth model. Rather than try to cover a broad array of models (without providing sufficient depth to clarify the logic underlying the syntax), I focus exclusively on the two-level individual growth model, with and without personlevel covariates. In addition, because the use of PROC MIXED does not obviate the need for substantial data processing in preparation for analysis, I begin with a brief discussion of strategies for handling longitudinal data in SAS. This chapter does not substitute for the comprehensive documentation available through SAS, including the general PROC MIXED documentation (SAS Institute, 1996) and The SAS System for Mixed Models (Littell et al., 1996). To use this chapter effectively, you need a basic understanding of the ideas behind individual growth modeling (as well as a basic understanding of the use of SAS). In particular, you must understand: (a) the difference between a fixed effect and a random effect; (b) the notion that the error variance-covariance matrix can take on different structures; and (c) that centering can be a helpful way of parameterizing models so that the results are more easily interpreted. My goal is to provide a bridge to users familiar with growth modeling because the SAS documentation is thin in this regard. 1 have found that PROC MIXED’S flexibility has led many an unsuspecting user to write a program, obtain results, and have no idea what model has been fit. The goal for the user, then, is to specify the model and to learn the syntax necessary for ensuring that this is the model fit to the data.
SAS PROC MIXED
137
CREATING A PERSON-PERIOD DATA SET Before using PROC MIXED to fit an individual growth model, you must structure your data file in a format suitable for analysis. When working with cross-sectional nonhierarchical data, this task is relatively straightforward as there is only one logical way of arranging the data-using a single record for each individual. When working with longitudinal data, this task is more complex as there are two equally plausible ways of arranging the information: (a) as a person-level data set, in which each person has one record and you use multiple variables to record the data for each occasion of measurement; or (b) as a person-period data set, in which each person has multiple records-one for each occasion of measurement. In a person-level file, there are only as many records as there are people. As the number of occasions of measurement grows, the file gains new variables, but no new cases. In a person-period file, there are many more records-one for each person-period combination. As data collection lengthens, so does the data set grow. I illustrate the difference between these two file formats using a small data set presented in Willett (1988). On each of four equally spaced occasions of measurement, 35 individuals completed an inventory assessing their performance on a simple cognitive task called “opposite naming.” At the outset of the study, each participant also completed a baseline inventory on a covariate thought to be associated with the growth of skill in this domain. Table 6.1 presents these data in a person-level format. Each individual has his or her own row of data containing the values of outcome variable on each of the four occasions (SCORE1, SCORE2, SCORES, and SCORE4). Each record also contains an identifying variable, ID, as well as the covariate, COVAR. Table 6.2 presents the same data in a person-period format. To conserve space, I present only three of the cases-IDs 1, 2, and 35. In the person-period format, the data set contains two variables identical to those in the person format (ID and COVAR) and two new variables: WAVE, which identifies the occasion of measurement to which the record refers; and Y , which records the individual’s score on that occasion of measurement. The entire person-period data set for this study has a total of 140 records, 4 for each of the 35 individuals in Table 6.1. To use PROC MIXED to fit an individual growth model, your data must be arrayed in a person-period format. If your data are already organized this way, you are ready for analysis. If your data have been stored in the person format, you must first convert the structure. Fortunately, this task is relatively simple, even for complex longitudinal studies. If the data set in Table 6.1 is called p e r s o n with six variables (ID, SCORE1-SCORE4, and COVAR), you can convert the file to a new data set called p e r s p e r using the code: data p e r s p e r ; set person;
Singer
138
Table 6.1 Person-Level Data Set with Four Waves of Data on the Growth of Opposite Naming over Time ID
SCORE1
SCORE2
SCORE3
SCORE4
COVAR
1
205
217
268
302
137
2
219
243
279
302
123
3
142
212
250
289
129
4
206
230
248
273
125
5
190
220
229
220
81
6
165
205
207
263
110
7
170
182
214
268
99
8
96
131
159
213
113
9
138
156
197
200
104
10
216
252
274
298
96
11
180
225
215
249
125
12
97
136
168
222
115
13
145
161
151
177
109
14
195
184
209
213
95
15
162
138
204
195
118
16
119
148
164
208
120
17
144
166
236
261
118
18
107
165
193
262
115
19
167
201
233
216
120
20
156
156
197
246
118
21
165
228
279
290
126
22
197
181
185
217
121
23
206
209
230
255
108
24
182
196
217
199
104
25
174
198
229
236
118
26
199
238
253
282
104
27
160
178
189
229
124
28
184
231
260
292
130
29
174
194
189
188
87
30
215
226
257
3 10
131
31
147
188
197
232
109
32
127
172
222
273
115
33
165
217
230
286
104
34
76
139
150
214
110
35
166
197
203
233
110
139
SAS PROC MIXED
Table 6.2 Selected Records from the Person-Period Data Set on the Growth of Opposite Naming over Time
WAVE
Y
COVAR
1
1
205
137
1
2
217
137
1
3
68
137
1
4
302
137
2
1
219
123
2
2
243
123
2
3
279
123
2
4
302
123
35
1
166
110
35
2
197
110
35
3
203
110
35
4
233
110
ID
etc.
140
Singer
array score [4] scorel-score4; do i = l to 4 ; wave = i; y = score[il ; output ; end ; drop i scorel-score4; run ;
Without going line-by-line through the program, I draw your attention to the most important aspect of the code: the presence of the output statement within the do loop. Placing the output statement within the loop ensures that the code creates a person-period structure because it outputs a new record to the persper file multiple times-every time the loop is executed. As you work with longitudinal data in SAS, you will discover a need to move back and forth between data sets in the two different formats (person and person-period). Strategies for most of the important conversions are given in Singer (1998). To illustrate the ease with which you can move from this person-period data set back to a person-level data set, the code:
data person; array score [4] scorel-score4; do i=l to 4 until(last.id); set persper; by id; score[i]=y; end; drop i wave y ; run;
will convert the person-period data set (persper) to a person-level data set (person). In this program, it is the presence of the set statement within the do loop that creates the requisite structure. Were we to run this program using the person-period data set in Table 6.2, we would obtain the person level data set in Table 6.1.
FITTING A BASIC INDIVIDUAL GROWTH MODEL TO DATA Individual growth models can be expressed in at Peast three different ways: (a) by writing separate equations at multiple levels; (b) by writing separate equations a t multiple levels and then substituting in to arrive at a single equation; and, (c) by writing a single equation that specifies the multiple
SAS PROC MIXED
141
sources of variation. Bryk and Raudenbush (1992) specify the model for each level separately, and their software program (HLM) never requires you to substitute back to derive a single equation specification. Goldstein (1995) expresses the multilevel model directly using a single equation, and his software program, MLwiN, works from that single-level representation. PROC MIXED also requires that you provide a single level representation. For pedagogic reasons, in this cha.pter I take the middle ground, initially writing the model at multiple levels (kept here to two) and then substituting in to arrive a t a single equation representation.
Unconditional Means Model Let us begin with an unconditional means model, in which we explore the variation in the focal outcome across the multiple occasions of measurement. In this model, we do not explore any systematic variation in Y over time, instead simply quantifying the extent to which Y varies. Let y Z j represent the value of the outcome for individual j on the ith occasion of measurement. One way of expressing the variation in Y is to use the familiar one-way random effects ANOVA model:
y23. . -p where aj
and rij
-
+ aj + rij
(6.1)
iid N ( 0 , ~ o o )
- iid
N ( 0 ,u 2 )
In Equation 6.1, p represents the grand mean of Y across individuals and occasions of measurement, the aj represent the deviation of person j from that grand mean, and rij represents a random error associated with individual j on the ith occasion of measurement. When we use sample data to fit the one-way random effects ANOVA model, we estimate the values of the one fixed effect ( p ) and the two variance components: TOO, representing the variation in Y that occurs between persons; and u 2 , representing the variation in Y that occurs within persons. The representation in Equation 6.1 is not the only way of parameterizing an unconditional means model. An alternative approach is to use a two-level growth model that generalizes more easily to the inclusion of predictors. Under this strategy, we express the occasion-specific outcome, y Z j , using a pair of linked models: a within-person model (the level-1 model) and a between-persons model (the level-2 model). By convention (and to facilitate extension to 3-level models in which individuals within groups are tracked over time), we use the symbol 7r to represent the parameters in the level-1 (within person) model and the symbol p to represent parameters in the level-2 (between-persons) model. Because this is an unconditional means model, we do not include the effect of TIME in either equation.
142
Singer
At level-1 (within-person), we express individual’s j’s score on the ith occasion of measurement as the sum of an “intercept” for that person (noj) and a random error ( r i j ) associated with that person on that occasion:
ya3. . - *oJ
+ rij, where rij
-
N(O,u 2 )
(6.2a)
Although it may appear unusual to label the roj’s “intercepts” instead of “means,” we adopt this nomenclature because it adapts so easily to the inclusion of additional predictors (such as TIME). At level-2 (betweenpersons), we express the person-level intercepts (the r o j ) as the sum of a common intercept (Poo) and a series of random deviations from that common intercept ( u o j ) : * ~= j
POO+ uoj, where
-
u ~ j N ( 0 ,TOO)
(6.2b)
Substituting Equation 6.2b into Equation 6.2a yields the multilevel model known as the unconditional means model:
% = Po0 + uoj + rij
where
-
U O ~
N ( O , T ~and ~ ) rij
-
N ( o , ~ ~(6.3) )
Notice the direct equivalence between the one-way random effects ANOVA model in Equation 6.1 and the unconditional means model in Equation 6.3. The grand mean p is now represented by Pool the effect of person (the crj) is now represented by the person-level intercepts (the uoj), and the residual for person j on the ith occasion of measurement remains r i j . Although the names for the parameters have changed, their interpretation remains the same. In essence, then, the unconditional means model is identical to a one-way random effects ANOVA model. One important feature of the multilevel representation in Equation 6.3 is that we can partition it explicitly into two components: (a) a fixed part, which contains the single effect Po0 (for the overall intercept); and (b) a random part, which contains two random effects (one for the intercept, uoj, and another for the within-person residual, r i j ) . When we fit this-or any other-multilevel model to data, we are equally interested in learning about both the fixed effects (here, pool which tells us about the average value of Y in the population) and the random effects (here, TOO, which tells us about the variability in person-level means and u 2 , which tells us about the variability in Y within individuals). Another important feature of the multilevel representation in Equation 6.3 is that it postulates that the variance and covariance components take on a very particular structure. First, because we have not indicated otherwise, the model assumes that rij and the uoj are independent. Second, if we combine the variance components for the two random effects together into a single matrix, we would find a highly structured block diagonal matrix. For example, if there were three occasions of measurement for each person, we would have:
SAS PROC M I X E D
143
0 0 0 0 0 0 7-00
Too
0
0
0
000
Too
Too
Too
+ ff2
If the number of occasions of measurement per person varied, the size of each of these submatrices would also vary, but they would still have this common structure. The variance in Y at any specific occasion of measurement is assumed t o be TOO u’. The covariance of Y across any two occasions for a single person is assumed t o be TOO. And the covariance of Y for any two occasions of measurement for different individuals is assumed t o be 0. The highly constrained structure shown in Equation 6.4 is known as Compound symmetrycompound symmetry. The representation of the multilevel model in Equation 6.3 leads directly t o the specification of the unconditional means model in PROC MIXED. The syntax is:
+
proc mixed covtest; class id; model y = /solution; random intercept/subject=id; run ;
Each statement in this program has an important function. The PROC statement invokes the procedure and specifies any options that you might want t o select for the entire model. In this program, the COVTEST option indicates that you would like SAS t o print hypothesis tests for the variance and covariance components (described subsequently). The CLASS statement indicates that ID is a CLASSification (nominal) variable whose values do not contain quantitative information. But, it is the MODEL statement, which specifies the fixed effects and the RANDOM statement, which specifies the r a n d o m effects, that is most important for a user t o understand. Let us therefore examine each of these two statements in great detail. Begin with the MODEL statement, which in this program appears odd because it has no explicit predictors. Like all MODEL statements in SAS, the MODEL statement in PROC MIXED always includes one implied predictor: the vector 1, which represents an intercept. PROC MIXED, like HLM and most computer programs for fitting regression models t o data,
144
Singer
includes an intercept by default. Other programs, such as MLwiN and Hedeker’s MIXREG require the user to specify the intercept explicitly. (If you ever want to fit a model without an intercept, just add the option NOINT to the MODEL statement.) The SOLUTION option does just what it says. It tells SAS to print the “solution” to the fixed part of the model specified on this line-the estimates of the fixed effects. The RANDOM statement is crucial, and its specification is usually the trickiest part about fitting growth models to data. By default, a mixed model always includes at least one random effect, here the lowest-level (within-individual) residual, r i j . (This is similar to the default “random effect” in a typical regression model, representing the error term.) By explicitly including the variable INTERCEPT on this RANDOM statement, we indicate that we want to fit a model with a second random effect. This tells SAS that the INTERCEPT in the MODEL statement (which is not explicitly present but implied) should be treated not only as a fixed effect (represented by POO)but also as a RANDOM effect (represented by uoj and estimated by T O O ) . The second crucial aspect of the RANDOM statement is the SUBJECT = option, which specifies the multilevel structure. In essence, the SUBJECT = option (which may be abbreviated as SUB=) indicates how the level-1 units (the within-person observations) are divided into level-2 units (persons). In most growth modeling contexts, the subject identifier will be an individual’s ID number, as it is here. The importance of correctly specifying the SUB= option cannot be overemphasized. Had this code not included the SUBJECT = ID option, SAS would fit the model y Z j = Poo-trij, not the unconditional means model in Equation 6.3. In other words, the variance component representing the effect of person (for the uoj, which has variance 7 0 0 ) would be omitted. Table 6.3 presents the results of fitting this unconditional means model t o the person-period data set for the opposite naming task. After documenting the ID numbers of the cases used in the analysis, SAS provides the iteration history, which describes the rate at which the estimates converged. In a completely balanced data set like this, convergence is rapid. Here, it took just two iterations to derive stable estimates, the minimum amount of time necessary for evaluating convergence. PROC MIXED is a very efficient program, making it particularly nice for fitting of a wide range of models. Of course, when fitting more complex models to data sets that have missing values, collinearity, or a high degree of imbalance, convergence will take longer to achieve. When fitting individual growth models, it is common to first examine the estimates for the fixed effects (ironically presented in the last section of the output). As there is only one fixed effect in the unconditional means model, the estimate of 204.81 tells us that the “average person” in this sample has an average score, across his four measurement occasions, of 204.81. Because each person was observed for an equal number of occasions, this estimate is identical to the average score across all members of the sample. If the
SAS PROC MIXED
145
Table 6.3 Results of Fitting an Unconditional Means Model Class Level Information Class
Levels
ID
Values
1 2 3 4 5 6 7 8 9 10 I1 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35
35
Iteration History Iteration
Evaluations
0 1
1 1
-2 Res Log Like
Criterion
1467.49961722 1454.95385040
0.00000000
Convergence criteria met. Covariance Parameter Estimates
Cov Parm
Subject
Intercept Residual
ID
Estimate
602.85 1583.72
Standard Error
248.33 218.57
Z Value
2.43 7.25
Pr Z
0.0076 < .OOOl
Fitting Information Res Log Likelihood Akaike’s Information Criterion Schwarz’s Bayesian Criterion -2 Res Log Likelihood
-727.5 -729.5 -731.0 1455.0
Solution for Fixed Effects
Effect Intercept
Estimate
Standard Error
DF
204.81
5.3420
34
t Value
38.34
Pr > It1
<. 0001
146
Singer
number of observations per person varies, remember that this fixed effect is an average of the person-level averages, not an average of the individual scores. Next turn to the estimates for the random effects portion of the model, shown under the label “Covariance Parameter Estimates.” Here we find that the estimated value of TOO is 602.85 and the estimated value of 0’ is 1583.72. There are many things we can do with these estimated variance components. One is to conduct tests of the null hypothesis that the population value of each is 0. In this case, both tests reject, suggesting that: (a) there is variation between individuals in their opposite naming skill; and (b) individuals also differ in their opposite naming skill over time. Unfortunately, the validity of these hypothesis tests is questionable because they rely on large sample approximations (not useful with small sample sizes such as these) and variance components have skewed (and bounded) sampling distributions that render normal approximations questionable. Although many other multilevel programs use similar testing strategies (e.g., MLwiN and MIXREG), SAS has responded to these concerns by eliminating these tests from the default PROC MIXED output. That is why we need the COVTEST option on the PROC MIXED statement-it tells SAS to output the information shown. We can also compare the relative magnitude of the variance components so that we can learn more about how the outcome varies. A simple method of comparison is to take the ratio of the estimates. Here, the fact that the variance component within individuals is nearly 2.5 times larger than that between individuals suggests not only that people do differ in their average values of Y but also that there is even more variation across occasions of measurement within persons. Another comparative approach for variance components is to compute the intraclass correlation, p, which tells us what portion of the total variance occurs between individuals. We compute p by taking the ratio of the variance component between persons (here, 602.85) to the sum of the variance components between and within persons (here, 602.85+1583.72): 602.85 = .38 602.85 + 1583.72 foo n2 This substantial value indicates that there is a great deal of similarity in the values of Y within persons across occasions. Harkening back to the oneway random effects ANOVA model introduced earlier in this section, the intraclass correlation is a function of elements of the variance-covariance matrix shown in Equation 6.4. Specifically, it estimates the nonzero offdiagonal elements in this matrix expressed in correlation form (with 1’s on the diagonal and p on the appropriate off-diagonal elements). When fitting a series of growth models to data, we also examine the model goodness-of-fit statistics (presented in the “Fitting Information” section of the output). However, as examination of these statistics requires a comparison model (which we do not yet have because we have only fit p=--
foo
+
-
SAS PROC MIXED
147
the unconditional means model), I defer discussion of this information to a later section of the chapter.
Unconditional Linear Growth Model We are now ready to fit an unconditional growth model, which allows us to explore systematic changes in Y over time. Let us begin by assuming that growth is linear (a testable assumption that can be investigated by fitting multiple models). We can write an unconditional linear growth model by first specifying a level-1 model that uses two parameters, an intercept ( n o j ) and a slope (7rl.j) to describe the trajectory of growth for individual j over time. For example, using the variable WAVE that is already on the personperiod data set, we can write:
y2J. . - ro3.
+ rlj(WAVE)ij + rij
where rij
-
(6.6a)
N(o,o~)
We can then write a pair of level-2 models, each expressing the variation in the level-1 parameters-both the intercepts (the 7roj’s) and the slopes (the nlj’s)-as a function of some overall parameter (either Po0 for the intercept or P I 0 for the slope) and a set of random effects: Xoj
r1j
= Po0 = PlO
+ uoj +Ulj
(6.6b)
where
Because this is an unconditional linear growth model, the level-2 models in Equation 6.6b do not include any person-level predictors. The unconditional linear growth model represented in Equations 6.6a and 6.6b differs from the unconditional means model (represented in Equation 6.3) in three important ways. First, it includes the linear effect of a level-1 predictor, WAVE, which allows the model t o describe systematic linear change in the outcome (opposite-naming skill) over time. Second, having included this additional fixed effect, the model also includes an additional random effect (the u l j ) , Thus, not only do we stipulate that an individual’s value of Y varies linearly over time, but we also stipulate that the rate of growth in Y (the slopes, 7rlj) can vary across individuals. (If we did not want to allow the slopes to vary across individuals, we could have fixed this term by eliminating ulj from the equation for the slopes, r l j , in Equation 6.6b). Third, having allowed the intercepts and slopes to vary across individuals, we now have a larger tau matrix to represent the random effects. Not only are there two elements representing the variance components for the intercept and slope (TOO and 711, respectively) but
148
Singer
there is also a covariance component, a function of the correlation between intercepts and slopes ( ~ ~ which 0 , is numerically identical to 701). Although this unconditional linear growth model can be fit easily in PROC MIXED, I do n o t present the code for doing so because of an interpretation issue that stems from this specific parameterization. The problem surrounds the interpretation of the intercept coefficients-the noj’s in the level-1 model and Po0 in the level-2 model. Because the temporal predictor WAVE never takes on the value 0, the intercepts do not represent the expected value of Y at a meaningful point in time, but rather a t an arbitrary imaginary point in time, one occasion of measurement earlier than the beginning of the study. If this lack of meaning seems problematic now, it becomes even more so as person-level predictors are added to the model. This is because, as we add person-level predictors, we generally like to explore how the intercepts relate t o the predictors. For statements about these relationships to have substantive meaning, the temporal predictor in the level-1 model must be scaled in such a way that the intercept is meaningful. One approach to facilitating interpretation is to center the level-1 temporal predictor, subtracting its sample mean from each observed value. Unlike some specialized software programs (e.g., HLM), which can center automatically, the user of PROC MIXED must make explicit decisions about centering, which are then implemented through the construction of new variables. Given the misconceptions and misunderstandings surrounding the rationale behind, and the effects of, centering (Kreft et al., 1995), some might argue that this lack of automatic provision is an advantage of the program. In growth modeling, instead of centering around the sample mean, we usually rescale the level-1 temporal predictor so that the intercept describes the value of Y at a meaningful point in time, often the first occasion of measurement. For these data, we can achieve this goal easily by creating a new predictor TIME defined as WAVE-1. Having constructed this predictor, we now write the unconditional growth model as: u,j
+ 7rlj(TIME)ij+ T i j = Po0 + uoj T l j = PlO + U l j
= 7roj
TOj
(6.7a)
where
In this model, the intercept (Poo) estimates the true value of oppositenaming skill at occasion 0 (“initial status”) and the slope (&) estimates the rate of change in true opposite-naming skill across occasions. Having chosen this scale for TIME, the parameters in the level-1 growth model are
SAS PROC MIXED
149
interesting in their own right, making their exploration in a level-2 model a useful vehicle for answering research questions about interindividual differences in growth. To emphasize that the unconditional growth model in Equation 6.7a can be expressed as the sum of two parts-a fixed part and a random part-let us substitute the level-2 models into the level-1 model and collect together the fixed and random effects using brackets:
The two terms in the first bracket represents the fixed part of the model, which are just the two betas. The three terms in the second bracket represent the random part of the model, which includes the uoj (representing variation in intercepts), the ulj (representing variation in slopes) and the rij (representing the remaining variation left over that occurs within individuals). When examining the representation in Equation 6.7c, remember that we do not estimate the random effects in terms of the parameters in the second bracket directly, but rather through their respective variance components shown in Equation 6.7b. The multilevel representation of the unconditional growth model in Equations 6 . 7 ~and 6.7b provides the necessary information for writing the correct PROC MIXED code. As with the unconditional means model, we specify the fixed effects on the MODEL statement and random effects on the RANDOM statement. In both cases, the model (by default) already includes one effect of each type: the intercept (for the fixed part) and the within-individual variance component (for the random part). We may therefore fit the unconditional growth model in Equations 6 . 7 ~and 6.7b using the code: proc mixed noclprint covtest; class id; model y = time/solution ddfm=bw; random intercept time/subject=id type=un; run;
The only change in the first two lines of code is the addition of the NOCLPRINT option, which tells SAS n o t t o print the classification information t o the listing file. This option can dramatically reduce the volume of output when you are studying large numbers of individuals, but you should invoke this option only after you have convinced yourself that you are including all appropriate cases in your analysis. The primary function of the MODEL statement in an unconditional growth model is to identify the functional form for growth over time. Although here we assume growth is linear, be sure t o consider the wide range
150
Singer
of alternative options discussed elsewhere in this book. If your data set includes more than 3 waves of data, for example, you might want to explore models allowing for curvilinear growth. You can also tinker with the coding of the TIME variable so that you alter the meaning of the intercept. For example, in addition to specifying a model in which the intercept represents initial status (as it does here), you might consider models in which the intercept represents average status (by centering TIME) or even final status (by coding time using negative numbers and letting 0 represent the last wave). The DDFM option on the MODEL statement tell SAS how to compute the denominator degrees of freedom for the tests of fixed effects. The default method, DDFM = CONTAIN, is generally n o t appropriate for individual growth models. The preferred option for these models is DDFM = BW, as invoked here. This tells SAS to use the betweenlwithin method, an intuitively appealing strategy that divides the residual degrees of freedom into two components: (a) a between-persons portion, based on the number of individuals in the data set; and (b) a within-persons portion, based on the number of person-periods in the data set. Tests for personlevel predictors that are constant across all records for an individual use the between-persons portion; tests for within-person predictors that vary across the multiple records for an individual use the within-persons portion. The more computationally intensive DDFM = SATTERTH method uses Satterthwaite’s approximation t o compute the residual degrees of freedom for each test. Although this approximation may be suitable for a variety of growth models, little is known about its small-sample properties. Readers wishing to learn more about the different ways of specifying the denominator degrees of freedom should consult (Littell et al., 1996) and the SAS documentation. The RANDOM statement, which indicates the identity and structure of the random effects, is usually the most difficult statement to write correctly. When you write any RANDOM statement, remember that one random effect is always included by default: for the rij , representing variation within you ~ individuals. To fit the unconditional growth model in Equation 6 . 7 ~ must add the two additional sources of variation: for the INTERCEPTS and for the slopes for TIME, as in the previous code. The options on the RANDOM statement are also crucial for they indicate how to structure the variance-covariance matrix representing these sources of variation: The SUB = ID option continues to indicate that the level-1 units represent data values from different individuals, a hallmark feature of the individual growth model. 0
The T Y P E = option specifies the structure of the variance covariance matrix for the random effects. Specifying TYPE = UN indicates that you would like the variance-covariance matrix for the intercepts and
SAS PROC M I X E D
151
slopes to be “UNstructured,” with a separate variance (or covariance) component for each element. The UNstructured option is essential when using a RANDOM statement to fit an “intercepts” and “slopes” as outcomes representation for a growth model as only this option ensures that you will not place any structure on the variance/covariance matrix, as specified in Equation 6.7b. This feature is important as these variances (and covariances) are unlikely to be identical (e.g., TOO is unlikely to equal 7 1 1 ) and the matrix is unlikely to have any specific type of structure. Table 6.4 presents the results of fitting the unconditional linear growth model to person-period data set excerpted in Table 6.2. Focus first on the estimates of the fixed effects. Because this model includes no person-level covariates, interpretation is simple: 164.37 is our estimate of the average Interceptintercept across persons (the average value of initial status) and 26.96 is our estimate of the average slope (the average rate of growth) across persons. Hence, we estimate that the average person began the study with a score of 164 and gained an average of 27 points per testing occasion. Standard errors and tests for the fixed effects are interpreted in the usual ways. As in PROC GLM, two sets of tests are given: (a) t tests of the null hypothesis that the associated parameter is 0 in the population; and (b) F tests of the pooled null hypothesis that the s e t of associated parameters is 0 in the population. When each predictor in the fixed portion of the model is a single variable, the two sets of tests are redundant (as they are here). When one or more predictors is a nominal variable identified on the CLASS statement, the pooled tests become useful supplements to the individual tests. (As in PROC GLM, the “Type 3” label for the pooled test indicates that it adjusts for all other variables in the model.) Because we have no nominal predictors in this data set, for the remainder of this chapter I discuss only the individual t tests (omitting the F tests). Here we find that both tests (for the intercept and for the slope) reject. Although this information is not very helpful when it comes to interpreting the intercept (we have no reason to expect a score of 0 on the first testing occasion), the test for the slope does indicate that, on average, there is systematic linear change over time. Focus next on the variance-covariance estimates for the random effects displayed in the Covariance Parameter Estimates section. Although SAS presents the estimated variance-covariance components in list form, it is often helpful to rewrite the first three elements in the list as: 1198.78 -179.26 132.40 In this format, it is easy to see that 1198.78 tells US about the variability in initial status (intercepts), 132.40 tells us about the variability in growth rates (slopes), and -179.26 tells us about the covariance between initial status and growth. Estimated standard errors and tests of the null hypotheses
Singer
152
Table 6.4 Results of Fitting an Unconditional Linear Growth Model Iteration History Iteration
Evaluations
0
1 1
1
-2 Res Log Like
Criterion
1387.72627343 1266.82273974
0.00000000
Convergence criteria met. Covariance Parameter Estimates
Cov Parm
Subject
UN(1,1) UN(2,l) UN (2,2) Residual
ID ID
Estimate
Value
318.38 88.9634 40.2107 26.9566
1198.78 -179.26 132.40 159.48
ID
Z
Standard Error
3.77 -2.01 3.29 5.92
Pr Z
< * 0001 0.0439 0.0005 < .0001
Fitting Information Res Log Likelihood Akaike’s Information Criterion Schwarz’s Bayesian Criterion -2 Res Log Likelihood
-633.4 -637.4 -640.5 1266.8
Solution for Fixed Effects
Effect Intercept time
Estimate
Standard Error
DF
t Value
Pr > It1
6.1188 2.1666
34 104
26.86 12.44
<. 0001 <. 0001
164.37 26.9600
Type 3 Tests of Fixed Effects
Effect time
Num DF
Den
1
104
DF
F Value
154.84
Pr > F
<. 0001
SAS PROC MIXED
153
that each of these components is 0 are given in the remaining columns of the list. What do we see? First, that the intercepts are very variable, indicating that people do differ in their initial status in opposite naming skill. Second, that the slopes are also variable, indicating that different people improve a t this skill at different rates. Third, that there is a statistically significant relationship between intercepts and slopes (covariance component -179.26, p = .0439). This tells us that individuals with weaker initial opposite naming skills improve at a faster rate, on average, than individuals with stronger initial skills. When interpreting the random effects across a series of individual growth models, it is important to recognize that the use of identical symbols across multiple models does not imply that the resulting variance components retain the same meaning. For example, although both Equations 6.3 and 6.7b include a variance component for intercepts labeled T~~, these two components have v e r y different meanings arising from the very different meanings of their associated random effects. In the unconditional means model, the intercept ,800 represents the average value of Y across all occasions of measurement. In the unconditional growth model, Po0 represents the average value of Y on the f i r s t occasion of measurement. As a result, it makes no sense to compare the magnitude of the associated variance components (the TOO’S) for these two models as they do not represent the same quantities. There is one variance component, however, that does retain a stable meaning across all growth models we fit: 0 2 ,the estimate of the withinperson variance component. In the unconditional means model, we estimated this variance t o be 1583.72; in the unconditional growth model, the estimate plummets to 159.48. This dramatic decline (a decrease of 1424.24) indicates that the predictor in the second model (here, just TIME) “explains” a large portion of the within-person variation in opposite naming skill. Just how much of the original total variation does the predictor TIME explain? We can address this question by computing the reduction in the magnitude of this variance component from one model to the next. For these two models, we compute 1424.2411583.72 = 2399, or 89.9%. This percentage tells us that 90% of the original within-person variability in opposite naming skill is “explained by time.” As this is a short term study, this improvement is not likely a result of cognitive development p e r se, but rather represents a “practice effect.” The substantial magnitude of this percentage tells us that most of the original within-person variability in the unconditional means model is not “random scatter,” but rather is systematically related to (linear) time. If you are tempted to conclude that explaining 90% of the total variation suggests that the job of longitudinal analysis is done, think again. Even though this percentage is very high, we can still assess whether additional variation remains to be explained by conducting a hypothesis test for this variance component. As shown in the final section of the random effects,
154
Singer
this test rejects strongly at the .0001 level, indicating that there is still additional variation within-persons that might potentially be explained by adding time-varying predictors to the model. Let us conclude our examination of this model by discussing the utility of its goodness-of-fit statistics (a topic deferred from the previous section). To understand the value of this information, you must first understand how PROC MIXED fits models t o data. Under the method of m a x i m u m likelihood (ML), we estimate a model’s parameters using an iterative strategy in which we: (a) construct a likelihood function, an equation that expresses the probability of observing the sample data as a function of the model’s unknown parameters (both the fixed and random effects); and, (b) numerically examine the relative performance of competing alternative estimates of these unknown parameters until those values that maximize the likelihood function are found. Although ML estimation is easy for simple models, it can be difficult for complex models such as the ones being studied here. The crux of the problem is that the growth models necessitate the creation of a circular web: t o estimate the random effects we need estimates of the fixed effects and t o estimate the fixed effects we need estimates of the random effects. Statisticians have developed two approaches to resolving this conundrum: (a) f d l information ML (FML); and (b) restricted information ML (REML). Under FML, we estimate the random effects under the (admittedly false) assumption that the estimated values of the fixed effects are the unknown population values. Uncertainty concerning the true values is ignored as is the inevitable loss of degrees of freedom. By overstating the degrees of freedom for random effects, FML leads to underestimated (biased) variance components. Under REML, in contrast, we account for the uncertainty in the estimates of the fixed effects by adjusting the estimates of the random effects accordingly. This iterative strategy is more appealing conceptually-in that we iterate through the circular web-and is less biased in small samples. Unfortunately, though, it is also less precise. In their 1998 discussion of the different estimation methods, Kreft and deLeeuw conclude that neither approach is uniformly better, although REML is certainly more popular in practice. By default, PROC MIXED uses the REML method (and I do so throughout this chapter). Although this may seem like an unimportant technical detail, it has an important practical consequence. Under REML, the associated log-likelihood statistic (-2RLL) and its relatives the AIC (Akaike’s Information Criterion) and the SBC (Schwarz’s Bayesian Criterion)-all the information presented in the “Fitting Information” section of the outputonly describe the quality of fit for the r a n d o m effects portion of the model. As a result, you can use these statistics only t o compare the random effects specification of multiple models with the same fixed effects. If you wish to compare models with different fixed effects, you must use the FML method (which you can do by adding the option METHOD = ML to the initiating PROC statement). With a small sample like this, however, FML may be
SAS PROC MIXED
155
unwise. So, despite the temptation to compare the goodness of fit of the unconditional and conditional growth models using either the -2RLL statistic (or the AIC and SBC), this makes no sense under REML estimation. It is only when fitting models with identical fixed effects and varying random effects that changes in these statistics are meaningful. Because such models do arise as an integral part of further longitudinal analysis, we defer discussion of this approach to a later section of the paper.
ADDING PERSON-LEVEL COVARIATES T O T H E INDIVIDUAL GROWTH MODEL Having fit an unconditional linear growth model, we may now consider a conditional model in which we explore whether variation in intercepts (initial status) and slopes (growth rates) is related to a covariate (here, COVAR). One way of writing this model is:
(6.10) where
Were we to fit this model to data, the interpretation of the fixed effects would require us to think about an individual who obtained the value 0 on the covariate. As this predictor never even approaches 0, this parameterization is not particularly meaningful. Instead, let us parameterize the conditional growth model by Centeringcentering the covariate a t its grand mean, substracting its average value across all individuals in the sample. This leads t o the model:
where rzj
and TO^ = Po0 i =j pi0
~
where
-
N(O,a2)
+ Pol (COVARj - COVAR) + uoj + Pi1 (COVARj - COVAR) + ~ l
(6.12) j
156
Singer
Now, the interpretation of the fixed effects is straightforward: ,BOOrepresents the average individual’s initial status, P I 0 represents the average individual’s rate of growth, pol indicates the relationship between initial status and the covariate, and P 1 1 represents the relationship between the rate of growth and the covariate. Substituting the pair of level-2 models into the level-1 model yields the combined representation that most closely resembles the statements needed to use PROC MIXED: yij
=
[Po0
+ Plo(T1ME)ij + Poi(COVARj - COVAR)
+Pi1 +[UOj
(COVAR), - C O V A R ) ( T I M E ) i j ]
+ U l j ( T 1 M E ) i j+
(6.13)
Tij]
To fit the model in Equation 6.13b to data, we need to compute a new variable that represents the centered covariate. Letting CCOVAR represent this variable, we can fit this model by writing: p r o c mixed n o c l p r i n t c o v t e s t ; class id; model y = t i m e ccovar time*ccovar/s ddfm=bw; random i n t e r c e p t t i m e /type=un s u b = i d g c o r r ; run;
Notice the similarity between this syntax and that for the unconditional linear growth model. We have made only two minor changes: (a) the MODEL statement includes two additional predictors that register the main effect of the centered covariate and the interaction between the centered covariate and time; and (b) the RANDOM statement includes the option GCORR, which tells SAS to print the estimated correlation matrix among the random effects. The results of fitting this conditional growth model to the oppositesnaming data are shown in Table 6.5. Because we grand-mean centered the level-2 covariate, the estimates for the INTERCEPT and for TIME (i.e., for Po0 and @ l o ) are numerically identical to their values in the unconditional model shown in Table 6.4. As a result, their interpretation is similar as well; the only difference now is that we must add the phrase “controlling for the covariate” to every statement. The coefficients for the centered covariate and its interaction with time are new. The coefficient for CCOVAR (-0.11) quantifies the relationship between the covariate and initial status. As the standard error is over four times larger than the estimate itself, we conclude that there is no relationship between these quantities. With respect t o the growth rates, however, we do find a statistically significant effect of the covariate. The estimate of .43 indicates that individuals who differ by 1.0 on CCOVAR have growth rates that differ by 0.43.
SAS PROC MIXED
157
Table 6.5 Results of Fitting a Conditional Linear Growth Model Iteration History Iteration
Evaluations
-2 Res Log Like
0 1
1 1
1381.00965468 1260.28476370
Criterion
0.00000000
Convergence criteria met. Estimated G Correlation Matrix Row 1 2
Effect
ID
Intercept time
1
COll 1.0000 -0.4895
1
c012 -0.4895
1.0000
Covariance Parameter Estimates
Cov Parm
Subject
UN(1,l) UN (2,1) UN ( 2 , 2 1
ID ID
ID
Residual
Z
Estimate
Standard Error
Value
Pr Z
1236.41 -178.23 107.25 159.48
332.40 85.4298 34.6767 26.9566
3.72 -2.09 3.09 5.92
<. 0001 0.0370 0.0010 < .OOOl
Fitting Information Res Log Likelihood Akaike’s Information Criterion Schwarz’s Bayesian Criterion -2 Res Log Likelihood
-630.1 -634.1 -637.3 1260.3
Solution for Fixed Effects
Effect Intercept Time Ccovar Time*ccovar
Estimate
Standard Error
DF
t Value
Pr > It1
164.37 26.9600 -0.1136 0.4329
6.2061 1.9939 0.5040 0.1619
33 103 33 103
26.49 13.52 -0.23 2.67
< .OOOl < .OOOl 0.8231 0.0087
158
Singer
The estimate for c2 has remained unchanged at 159.47. This is not surprising as it is difficult for a person-level covariate to help explain within person variability. Estimates for the variance-covariance matrix for the intercepts and slopes have changed, however, to:
(
?o 7 10
2; ) ( =
1236.41 -178.23 107.25
(6.14)
Comparing these estimates to those from the unconditional growth model (in the previous section and Table 6.4), we see that, when it comes t o estimating initial status, inclusion of the covariate does not help at all (it did not reduce the size of the variance component for intercepts). Indeed, this variance component actually increased slightly! However, inclusion of the covariate did improve the fit of the growth rates. The variance component for growth rates went from 132.40 to 107.25. Computing (132.40 - 107.25)/132.40 = 0.19, we find a 19% reduction. In other words, the covariate accounts for 19% of the “explainable” variation in growth rates. [Note that this is not the same as a traditional R2 statistic. This percentage only talks about the fraction of “explainable” variation that is explained. If the amount of variation between individuals is small, we might be explaining a large amount of very little! For further discussion, see Snijders and Bosker (1994) or Kreft and de Leeuw (1998).]
EXPLORING THE STRUCTURE OF THE VARIANCE COVARIANCE MATRIX WITHIN PERSONS The classic growth models fit so far place a common, but sometimes unrealistic, assumption on the behavior of the r i j , the within-person residuals over time. Were we to fit a model in which only the intercepts varied across persons (which we will do in a minute), we would be assuming a compound symmetric error covariance matrix for each person. When we fit a model in which the slopes vary as well, we introduce heteroscedasticity into this error covariance matrix (which can be seen through the inclusion of the effect of TIME in the random portion of the model in Equation 6 . 7 ~ ) . How realistic are such assumptions? One of the strengths of PROC MIXED is that it allows you to compare different structures for the error covariance matrix. Instead of the intercepts and slopes as outcomes model in Equations 6.7a to 6.7c, consider the following simpler model for observations over time:
where
SAS PROC MIXED
159
In this model, the intercepts and growth rates are assumed to be constant across people. But, the model introduces a different type of complexity: the residual observations within persons (after controlling for the linear effect of TIME) are correlated through the within-person error variance-covariance matrix C. By considering alternative structures for C (that ideally derive from theory), and by comparing the goodness of fit of resulting models, you can determine what type of structure is most appropriate for the data at hand. Many different types of error-covariance structures are possible. If there are only three waves of data, it is worth exploring only a few of these possibilities because there are so few data for each person. With additional observations per person (in this example we have four), additional structures for the C matrix (called the R matrix in the language of PROC MIXED) are possible. The interested reader is referred to pages 92-102 in the SASSAS System for Mixed Models, the PROC MIXED documentation, and the helpful paper by Wolfinger (1996) devoted to this topic. The structure of the within-person error covariance matrix is specified using a REPEATED statement. To fit the model in Equation 6.15 under the assumption that C is compound symmetric we write: proc mixed noclprint covtest noitprint; class id wave; model y = time/s notest; repeated wave/subject=id type=cs r; run ;
Notice that I have added a second CLASS variable (WAVE) to indicate the time-structured nature of the data within person and I have used WAVE on the REPEATED statement. By including WAVE on the CLASSification statement, SAS treats this variable as a series of dummies. By not including TIME on the CLASSification statement, SAS treats this predictor as continuous. In this representation, then, we have two ways of including the temporal information. The REPEATED statement uses syntax similar to the RANDOM statement to specify the structure of the error variance-covariance matrix. However, the matrix being described exists at level-l, not level-2. The variable specified on the REPEATED statement must be categorical (although it need not be equal interval). The SUBJECT=ID option continues t o tell SAS that the matrix should have separate blocks, one for each subject. Once again, the TYPE = option is crucial, for it specifies the form of each of the within-person error variance-covariance submatrices. However, when using a REPEATED statement, we do not necessarily routinely invoke the unstructured specification usually used on the RANDOM statement. Although the UNstructured option is still viable, other possibilities include the compound symmetry specification (CS) shown here and AR(1) for autoregressive with a lag of 1. Indeed, the possibility of alternative structures
160
Singer
Assumption Compound symmetry ARP) Unstructured
N parameters 2 2 10
AIC -652.17 -636.73 -641.71
SBC -655.10 -641.66 -656.35
-2RLL 1300.34 1273.47 1263.42
How do we decide which model to adopt? Although a smaller -2RLL statistic indicates better fit, improvement often requires the use of additional parameters. This is where the AIC and SBC statistics come in. Both take the log-likelihood (half of the negated -2RLL) and penalize it for the number of parameters estimated, with the SBC exacting a higher penalty for increased complexity. The larger the AIC and SBC statistics, the better the fit. (Note that, when their values are negative, as they are here, lower numbers in absolute values are preferred.) Although there are no formal tests for comparing these statistics across models, Raftery (1995) offers some rough rules of thumb for differences in SBC: 0 t o 2 suggests ‘Lweak” evidence of a better model; 2 to 6 suggests “positive” evidence; 6 to 10 suggests “strong” evidence; and any difference in excess of 10 suggests “very strong” evidence. Readers interested in learning more comparing models using these statistics should consult Raftery (1995). What do we find when we compare error structures for the opposites naming data? From the perspective of the -2RLL statistic, the totally unstructured C yields the smallest value. The estimated variance covariance matrix from this model is:
8f
812
813
814
1,308
977
921
”)
921 1,120 977 1,018 1,289 1,018 1,081 641
642
843
564
(6.16)
856 1,081 1,415
But, this superior value of -2RLL requires 10 parameters, many more than we might need given the structure of the estimated variance covariance matrix. In examining this matrix, notice that it has similar variances along the diagonal and the off-diagonal elements decrease for covariances between errors that are further spaced in time. This type of structure is exactly that
SAS PROC MIXED
161
Table 6.6 Results of Assuming Compound Symmetry for the Within-Person Covariance Structure Estimated R Matrix for ID 1 Row 1 2 3
4
c012
COll
1280.71 904.81 904.81 904.81
904.81 1280.71 904.81 904.81
Co13
904.81 904.81 1280.71 904.81
‘2014 904.81 904.81 904.81 1280.71
Covariance Parameter Estimates
Cov Parm
Subject
cs
ID
Residual
Estimate
Standard Error
904.81 375.90
242.59 52.1281
Z Value
3.73 7.21
Pr Z
0.0002
< .OOOl
Fitting Information Res Log Likelihood Akaike’s Information Criterion Schwarz’s Bayesian Criterion - 2 Res Log Likelihood
-650.2 -652.2 -653.7 1300.3
Solution for Fixed Effects
Effect Intercept Time
Estimate
164.37 26.9600
Standard Error
DF
5.7766 1.4656
34 104
t Value
28.45 18.40
Pr > It1
<. 0001 <. 0001
162
Singer
specified by the lagged autoregressive structure. As a result, it is not surprising that the AR(1) model, which uses only two parameters, yields AIC and SBC statistics that are considerably superior than the totally unstructured model even though the -2RLL statistic is actually larger (worse). The AR(1) model estimates C to be:
”)
901 1,324 1,092 1,092 1,324 1,092 901 1,092 1,324 1,092 901 1,092 1,324 743 which is quite similar to the unstructured estimate, but with only two parameters, c2 and p. These analyses suggest that the AR(1) structure provides a better fit to the data. Were this an actual analysis, however, we would also consider alternative structures before stopping at this conclusion. Having established a method for specifying the structure of the withinperson error covariance matrix, we may now consider what happens when we combine this specification with the intercepts and slopes as outcomes specification considered earlier. We allow the intercepts and slopes to vary across people by writing: p r o c mixed n o c l p r i n t c o v t e s t ; c l a s s i d wave; model y = t i m e ccovar t i m e * c c o v a r / s o l u t i o n n o t e s t ddfm=bw; random i n t e r c e p t t i m e /type=un s u b = i d g ; r e pea te d wave/type=ar(l) s u b j e c t = i d r ; run;
which yields the output shown in Table 6.7. Notice that we have added the option “g” to the RANDOM statement, asking SAS to print the variance covariance matrix for the random effects in matrix form (in addition to list form). When interpreting this output, it is useful to compare it with the simpler model in Table 6.5, which included the covariate and random effects for the intercepts and slopes, but which imposed no additional structure on the error covariance matrix (beyond the heteroscedastic structure of the intercepts and slopes as outcomes model). When we make these comparisons, all signs point toward the conclusion that we do not need to add the extra complexity of the autoregressive error structure, once the covariate has been taken into account. I emphasize this last phrase because the error covariance structure within persons describes the behavior of the errors-in other words, what is “left over” after removing the other fixed and random effects in the model. In this instance, and in m a n y others, the autoregressive structure is no longer needed after other fixed and random effects are taken into account.
SAS’ PROC M I X E D
163
Table 6.7 Results of Assuming an Autoregressive Within-Person Covariance Matrix and Including a Person Level Covariate Estimated R Matrix for ID 1 Row
COll
c012
Co13
Co14
1 2 3 4
141.37 -19.3631 2.6522 -0.3633
-19.3631 141.37 -19.3631 2.6522
2.6522 -19.3631 141.37 -19.3631
-0.3633 2.6522 -19.3631 141.37
Estimated G Matrix
Row 1 2
Effect
ID
Intercept time
1 1
COll
c012
1258.10 -182.41
-182.41 110.94
Covariance Parameter Estimates
Cov Parm
Subject
UN(1,l) UN(2.1) uN(2.2) AR(1) Residual
ID ID ID ID
Estimate
Standard Error
Value
Pr Z
1258.10 -182.41 110.94 -0.1370 141.37
333.25 84.5520 34.5299 0.2589 36.3449
3.78 -2.16 3.21 -0.53 3.89
< .OOOl 0.0310 0.0007 0.5968 < .OOOl
Z
Fitting Information Res Log Likelihood Akaike’s Information Criterion Schwarz’s Bayesian Criterion -2 Res Log Likelihood
-630.0 -635.0 -638.9 1260.0
Solution for Fixed Effects
Effect Intercept Time Ccovar Time*ccovar
Estimate 164.42 26.9082 -0.1234 0.4357
Standard Error 6.1990 1.9775 0.5034 0.1606
DF
t Value
Pr > It1
33 103 33 103
26.52 13.61 -0.25 2.71
<. 0001 <. 0001 0.8079 0.0078
164
Singer
What evidence am I using to reach this conclusion? First, consider the covariance estimate for the autoregressive parameter. We are unable to reject the null hypothesis that this estimate, -0.14, could have been obtained from a population in which its true value was 0. Thus, there is little supporting evidence to increase the complexity of C by adding offdiagonal elements. Second, when comparing the two models that include the covariate and its interaction with time, differing only in the inclusion of the autoregressive parameter, the -2RLL statistic improves only trivially, from -630.1 without this assumption to -630.0 with this assumption. This improvement is so small that the AIC and SBC, which both penalize for the additional parameter, actually get worse. Therefore, despite the fact that there appears to be an autoregressive error structure when the covariate is not included and the slopes are not treated as random, the need for this additional structure disappears when these features are added to the model. As this example shows, a range of models can be fit to the same data. Experienced data analysts know that selecting among competing models can be tricky, especially when the number of observations per person is small. Were we conducting this analysis to reach substantive conclusions about the relationship between the outcome and predictors, we would fit several additional models to these data, including one with an AR(1) error covariance matrix and random intercepts. Readers interested in learning more about specifying the error covariance matrix and comparing results across models should consult VanLeeuwen (1997); Goldstein, Healy, and Rasbash (1994); and Wolfinger (1993, 1996).
A BRIEF COMMENT ABOUT EXAMINING ASSUMPTIONS Whenever we fit a statistical model, we invoke assumptions. Wlieii CLLiiig individual growth models, the number of assumptions mushrooms as we invoke them both at level-1 and level-2. Some of these assumptions have already been spotlighted in the context of model building: for example, the need to investigate the shape of the underlying growth trajectory and the need to explore the structure of the error variance covariance matrix. Others have been mentioned only in passing (such as the normality assumptions that appear in every model specified). Despite a pressing need for sound strategies for investigating the many assumptions, model criticism in the multilevel world remains an underdeveloped area of methodological research. Many excellent books on multilevel modeling do not even mention the issue, and both the SAS docunientation and Eittell et al. (1996) offer little practical advice with regard to PROC MIXED. Among the statistical packages for fitting multilevel models, MLWin currently offers the widest menu of diagnostic tools. PROC MIXED is thinner in this regard, although residuals and predicted values can be easily output to a data set for analysis. For example, to obtain the level-l
SAS PROC MIXED
165
residuals, all you need do is add the option OUTP = datasetname t o the MODEL statement. Below, I comment briefly on how you can gain insight into the tenability of your model’s assumption. Let us begin with the linearity assumption because violations of it can often be spotted quickly using simple sample plots. Through judicious use of PROC GPLOT and PROC REG you can detect most violations early enough so that you can avoid wasting analytic energy fitting illspecified models. At level-1, examine separate plots-for each member of the sample-of the outcome over time. Superimpose the hypothesized trajectory fitted using ordinary least squares (OLS) regression. Even though the OLS estimates are imperfect measures of the underlying true trajectories, they are usually adequate for diagnostic work. If you see consistent evidence of nonlinearity for many members of your sample, try transforming the outcome, temporal predictor, or both. So, too, if you see evidence of skewness (or ceiling or floor effects) , transformations will usually help. Don’t proceed to model fitting until you have convinced yourself that you have identified a workable functional form. At level-2, linearity can be investigated reasonably well by plotting OLS estimates of the intercepts and slopes versus the values of the level-2 predictors. The unbiased OLS estimates are imprecise, but they are usually adequate for this purpose. Equally important , these plots are invaluable for reminding the empirical researcher what is being modeled (at least conceptually) when using PROC MIXED. Too many researchers move far away from their data when fitting complex models like these. These simple sample plots often provide a necessary anchor to ground the research back in a data analytic framework. Normality assumptions about the errors in both the level-1 and level-2 growth models can be evaluated by examining residual plots. For the conditional growth model in Table 6.5, Figure 6.1 presents three types of plots that you might want to examine. The top panel presents a normal probability plot of the level-1 residuals across all person-period observations; the other panels present normal probability plots of the level-2 residuals (computed, not surprisingly, at the person-level). For each person, there is only one set of level-2 residuals: one for the intercept (in the middle panel) and one for the slope (in the bottom panel). This stands in contrast to the level-1 residuals, for which each person has as many as he or she has periods of observations. If the normality assumptions hold, we should see a straight line in each plot. As when examining any normal probability plot of residuals, we look for any consistent sign of skewness or lumpiness. For these data at least, we see little evidence of a problem. These quick data-analytic tools are only the first steps in model criticism. Ideally, we would separately examine each of these residuals versus each potential predictor. Further details on model criticism in the context of multilevel models is given in Bryk and Raudenbush (1992) and Goldstein (1995).
166
Singer
7.0'
0.8-
8.8.
0.4.
0.0
0.2
d.4
d. B
7.0'
/#-/
/.-
ma-
0.4. 0.B-
-:A 0. D 0.0
0.2
6.8
d. B
0.B
Figure 6.1: Normal probability plots for the unconditional growth model in Table 6.5. Top panel: level-1 residuals; middle panel: level-2 residuals for the intercept; bottom panel: level-2 residuals for the slope.
SAS PROC MIXED
167
CONCLUSION Statistical software does not a statistician make. That said, without software, few statisticians and even fewer empirical researchers would fit the kinds of sophisticated statistical models being promulgated today. The availability of flexible integrated software for fitting individual growth models holds the possibility that larger numbers of users will be able to fit reasonable statistical models to their data. Of course, as software becomes easier to use, we face the danger that statistical programming will substitute for clear statistical thinking and model development. Readers of the 1995 special issue of the Journal of Educational and Behavioral Statistics entitled Hierarchical Linear Models: Problems and Prospects (Kreft et al., 1995) were reminded that no piece of software will resolve the challenging statistical issues underlying decisions about model specification with complex data structures. Yet readers of this special issue were also reminded that without software, few users would fit the models, we would like to see applied in the social sciences. The ideas presented in this paper can be easily extended to three-level (and higher-level models). If your longitudinal analyses track individuals nested within groups, the specifications in this paper can be combined with hierarchical specifications (given in (Singer, 1998)) to yield three-level models. For example, if you have longitudinal data on students nested within teachers, you can fit a three-level individual growth model with the syntax: proc mixed covtest; class student teacher; model y = time/solution ddfm=bw; random intercept time/type=un sub=teacher; random intercept time/type=un sub=student(teacher); run;
Note that we have specified the option /TYPE=UN on both random statements to ensure that estimation of the variance-covariance matrix is totally unconstrained. Many other options are available to the user interested in fitting more complex growth models. For example, it is easy to fit models in which: 0
0
Not all predictors are continuous. Dichotomies can be included directly on the MODEL statement and polychotomies can be included simply on the MODEL statement by declaring their measurement properties on the CLASS statement. Each individual has a different number of observations. A completely balanced data set was used for illustration only. All analyses presented in this chapter can be replicated using the identical code even if the number of observations varies across individuals.
168
Singer
T h e multiple observations o n each individual are n o t equally spaced. Balanced or not, all you need do is ensure that the level-1 temporal predictor on the person-period data set appropriately notes the particular moment when the observation was made. T h e error variance-covariance m a t r i x is n o t homogeneous. If you think that the variance-covariance matrix might be different for men and women, for example, you can specify a model with two distinct matrices by using a GROUP option on the RANDOM statement. T h e outcome is n o t be assumed t o be normal (or e v e n continuous). Version 8 of SAS includes a new procedure, PROC NLMIXED, which can fit generalized linear mixed models and nonlinear mixed models. This allows the modeling of dichotomies and other non-normal outcomes. These options and more are discussed fully in the SAS documentation. If I can offer one final word of advice to researchers contemplating the use of PROC MIXED it is this: think very carefully about how to specify both the fixed and random portions of the model. I find it easiest to write the model out at separate levels, interpreting each of the parameters and making sure they are meaningful, before writing computer code to fit the model. Experience has convinced me that, as the models get more complex, it is not always obvious how to express the model in such a way that the parameters estimated directly address interesting research questions. I have seen too many people-even experienced users-inadvertently fit the wrong model and then be surprised a t the illogical “findings.” Many mistakes can be avoided by taking the time to work slowly and carefully through the parameterization. With this caveat in mind, I believe that PROC MIXED represents a valuable addition to the statistical toolkit for fitting individual growth models t o data.
ACKNOWLEDGMENTS Portions of this chapter were previously published in Judith D. Singer (1998), Using SAS PROC MIXED to fit multilevel models, hierarchical models, and individual growth models, Journal of Education and Behavioral Statistics, 24(4), 323-355. Thanks are due to Russ Wolfinger of SAS Institute who read and cornmented upon a previous version of this paper.
REFERENCES Bryk, A. S., & Raudenbush, S. W. (1992). Hierarchical linear models: Applications and data analysis methods. Newbury Park, CA: Sage.
SAS PROC MIXED
169
Bryk, A. S., Raudenbush, S. W., & Congdon, R. T . (1996). HLM: Hierarchical linear and nonlinear modeling with the HLM/2L and HLM/3L programs. Chicago, IL: Scientific Software International, Inc. Goldstein, H. (1995). Multilevel statistical models (2nd ed.). New York: Halstead Press. Goldstein, H., Healy, M. J. R., & Rasbash, J. (1994). Multilevel time series models with applications to repeated measures data. Statistics in Medicine, 13, 1643-1655. Kreft, I., & de Leeuw, J. (1998). Introducing multilevel modeling. London: Sage Publications. Kreft, I. G. G., de Leeuw, J., & Aiken, L. S. (1995). The effect of different forms of centering in hierarchical linear models. Multivariate Behavioral Research, 30, 1-21. Littell, R. C., Milliken, G. A . , Stroup, W. W., & Wolfinger, R. D. (1996). Sas system for linear mixed models. Cary, NC: SAS Institute. McLean, R. A., Sanders, W. L., & Stroup, W. W. (1991). A unified approach to mixed linear models. The Americian Statistician, 45, 54-64. Prosser, R., Rasbash, J., & Goldstein, H. (1996). MLwiN user’s guide. London: Institute of Education. Raftery, A. E. (1995). Bayesian model selection in social research. Sociological Methodology, 25, 111-163. SAS Institute. (1992). Technical report P-229, S A S / S T A T software: Changes and enhancements (Tech. Rep.). Cary, NC: SAS Institute, Inc. SAS Institute. (1996). S A S / S A T software: Changes and enhancements through release 6.12 (Tech. Rep.). Cary, NC: SAS Institute. Singer, J. D. (1998). Using SAS PROC MIXED to fit multilevel models, hierarchical models, and individual growth models. Journal of Educational and Behavioral Statistics, 24, 323-355. Snijders, T. A. B., & Bosker, R. J. (1994). Modeled variance in two-level models. Sociological Methods and Reserch, 22, 342-363. VanLeeuwen, D. M. (1997). A note on the covariance structure in a linear model. The American Statistician, 51, 140-144. Willett, J. B. (1988). Questions and answers in the measurement of change. In E. Rothkopf (Ed.), Review of research in education (p. 345-422). Washington, DC: American Educational Research Association.
170
Singer
Wolfinger, R. D. (1993). Covariance structure selection in general mixed models. Communications in Statistics, Stimulation and Computation, 22, 1079-1106. Wolfinger, R. D. (1996). Heterogeneous variance-covariance structures for repeated measures. Journal of Agricultural, Biological, a n d Enuironmental Statistics, 1, 205-230.
Chapter 7
Multilevel Modeling of Longitudinal and Functional Data Terry E. Duncan, Susan C. Duncan, Fuzhong Li, and Lisa A. Strycker Oregon Research Institute, Eugene, Oregon In recent years a number of statistical techniques and programs for the analysis of longitudinal multilevel data have become available, allowing researchers the opportunity to more adequately address questions related to development and change and to simultaneously study behavior at different levels (e.g., individual, family, school, and neighborhood levels). Although many of these statistical approaches are widely available, they are still relatively underused, and there is uncertainty among researchers as to the appropriateness and usefulness of different approaches for analyzing longitudinal data. The purpose of this chapter is to compare three analytic approaches to answering longitudinal multilevel research questions using unbalanced adolescent and family alcohol use data. The analytic approaches include (a) a full information maximum likelihood (FIML) latent growth modeling (LGM) approach using an extension of a factor-of-curves model, (b) a limited information Multilevel LGM (MLGM) approach using Muthkn’s ML-based estimator (MUML), and (c) a full information hierarchical linear modeling approach (HLM). Data are from the National Youth Survey (NYS) and comprised 888 adolescents (443 males and 445 females: mean age = 13.86 years) from 369 households. Results demonstrate similarity in outcomes and interpretations derived from the three analytic techniques. Discussion includes comparison of the three techniques, including advantages and limitations.
171
172
Duncan et al.
Researchers have struggled for some time with the special analytic problems posed by hierarchically structured data. The analysis of data sets that contain measurements from different levels of a hiera.rchy requires techniques matched t o the hierarchical structure. Traditional fixed effects analytical methods (e.g., analysis of variance [ANOVA]) are limited in their treatment of the technical difficulties presented by nested designs and in the questions they are able to address. Unlike these traditional fixed-effects methods, random-effects models are more suited to the hierarchical data structure generally found in behavioral and social science data. If repeated observations are collected on a group of individuals in which measurement occasions are not the same for all individuals, the observations may be conceived of as nested within individuals. The longitudinal data may have no multilevel structure beyond the longitudinal and individual levels; however, each individual might also be nested within another social unit, such as families. Within the random-effects model of a hierarchical structure, each of the levels in the data structure (e.g., repeated observations within individuals or individuals within families) is represented by its own submodel, which represents the structural relations and variability occurring at that level. Hierarchical models represent a useful extension of the traditional variance component models discussed by Winer (1971) and Searle et al. (1992). They make use of within-cluster differences in parameter estimates, treating these differences as a meaningful source of variance rather than as withingroup error or a nuisance (Kreft, 1992). Various analytic techniques better suited t o the hierarchical data structure have recently emerged under the labels of hierarchical, or multilevel, models (see, e.g., Aitkin & Longford, 1986; Burstein, 1980; de Leeuw & Kreft, 1986; Duncan, Duncan, Hops, & Stoolmiller, 1995; Goldstein, 1986; Longford, 1987; Mason, Wong, & Entwistle, 1984; Raudenbush & Bryk, 1988. Muthen and Satorra (1989) point out that such models account for correlated observations and observations from heterogeneous populations with varying parameter values, gleaned from hierarchical data. Hierarchical data analysis techniques are now available for standard regression and analysis of variance situations as well as for covariance structure models such as factor analysis, structural equation modeling (SEM), and latent growth modeling (LGM). In this chapter, we describe three useful statistical approaches to modeling data that are both longitudinal (e.g., repeated measures of individuals) and hierarchical (e.g., individuals nested within larger social groups): (a) FIML estimation approach, (b) a limited information MLGM using Muthen and M u t h h ’ s (1998) MUML, and (c) MLM. In particular, we illustrate the application of these three techniques for analyzing longitudinal multilevel substance use data comparing their substantive yields.
Longitudinal Multilevel Analyses
173
FULL INFORMATION MAXIMUM LIKELIHOOD (FIML) McArdle (1988) presented two FIML methods that are appropriate for hierarchical analyses with longitudinal data. Originally formulated to model growth for multiple variables or scales over multiple occasions, these two methods are easily extended to modeling growth for multiple informants over multiple occasions (e.g., longitudinal and hierarchically nested data). These methods are termed the factor-of-curves and curve-of-factors models. The factor-of-curves model can be used t o examine whether a higherorder factor adequately describes relationships among lower-order developmental functions (e.g., intercept and rate of change). The curve-of-factors method, on the other hand, can be used t o fit a growth curve to factor scores representing what the lower-order factors have in common at each point in time. An application of these two methods can be found in Duncan and Duncan (1996). When there are many clusters of different size (e.g., unbalanced data), FIML estimation can be accomplished using a model-based extension of the multiple groups framework. In this chapter, we provide an example of the FIML model most generalizable t o the other multilevel methods presented, the factor-of-curves model.
LIMITED INFORMATION MULTILEVEL LATENT GROWTH MODELING (MLGM) Although FIML approaches can be used for multilevel longitudinal data, they can be computationally heavy and input specifications can be very tedious if group sizes are large. Muthkn (1991, 1994) proposed an ad hoc estimator within the SEnI framework, using a limited information estimation approach, that is simpler to compute than FIML. Muthkn (1994) showed that the estimator provides full ML estimation for balanced data (e.g., hierarchical clusters of the same size) and gives similar results t o full ML for data that are not too badly unbalanced. Within the Mplus SEM program (Muth6n & Muthkn, 1998), Muthkn’s ad hoc approach greatly simplifies model specification for unbalanced hierarchically nested longitudinal data. This suggests that, with large groups of different sizes, little may be gained by the extra effort of FIML computation. Taken together, these developments make possible the construction, estimation, and testing of a variety of complex models involving hierarchically structured longitudinal data.
HIERARCHICAL LINEAR MODELING (HLM) One of the most common approaches used to estimate random effects models is HLM (Bryk & Raudenbush, 1992). This approach allows for: (a) improved estimation of effects within individual units, (b) the formulation and testing of hypotheses about cross-level effects, and (c) the partitioning
174
Duncan et al.
of variance and covariance components among levels. Using a two-level approach, data can be nested within subject to create one-way random effects ANOVA. However, if data are additionally nested within a higher order, the effects of the larger influences can be explicitly identified using a three-level modeling procedure (Bryk & Raudenbush, 1992). The simplest three-level model is fully unconditional (i.e., no predictor variables are specified at any level) and allocates variation in an outcome measure across the three levels. Conditional models include predictors and can specify a general structural model at each level. Three-level models have the capability to: (a) include predictors at each level, (b) test fixed, nonrandomly varying, or random effects at each level, and (c) specify alternative models for the variance-covariance components. With the availability of different statistical approaches comes the burden of weighing their advantages and limitations to select the most appropriate technique for a given research question. Thus, a comparison of techniques performed with the same data set to answer the same research question provides a useful framework for understanding the relative applicability and practicality of the different statistical approaches. Therefore, the purpose of the present study was to model hierarchically nested longitudinal adolescent alcohol use data using three different statistical techniques-FIML (factorof-curves) LGM, MLGM, and HLM-to provide a brief overview of each method and to compare these approaches and discuss their advantages and limitations. In addition to the methodological considerations, this study addressed substantive questions related to developmental changes in alcohol use among adolescent siblings over a 4-year period.
HIERARCHICAL NATURE OF ADOLESCENT ALCOHOL USE Over the last 3 decades, we have witnessed a gradual increase in the complexity of theoretical models that attempt to explain problem behavior, such as alcohol use and abuse in children and adolescents. With respect to alcohol use behaviors, intrafamilial influences that may play a role in increased problem behavior among adolescents include social modeling (Akers & Cochran, 1985; Bandura & Walters, 1963) by parents through tacit approval and marital problems and divorce (Farrington, 1987). There is general consensus that separation and divorce place children a t risk for adjustment problems. At least in the short term, separated and divorced families (e.g., single-parent families) seem to have more negative outcomes than do intact families (Capaldi, 1989; Capaldi & Patterson, 1991). This has serious implications in that it is estimated that nearly half of all children will live in a single-parent household for some time before age 18, most of this occurring as a result of divorce or separation (Sweetser, 1985). Moreover, single-parenting has been associated with increased substance use (Byram & Fly, 1984), even when controlling for age, race, and gender
Longitudinal Mu1tilevel Analyses
175
(Flewelling & Bauman, 1990). The relative influence of parental attitudes and norms regarding adolescent alcohol use is also of theoretical and practical importance in determining adolescent alcohol use and development. Findings in the extant literature are inconsistent. Some research suggests parental modeling effects on alcohol initiation or use (Kandel & Andrews, 1987; Thompson & Wilsnack, 1984) whereas other studies suggest that “permissive” parental attitudes exert a stronger influence than actual parental alcohol use behaviors (Barnes & Welte, 1986; Brook, Gordon, Whiteman, & Cohen, 1986; McDermott, 1984). Thus, it was hypothesized that growth in adolescent alcohol use would occur over time and that the development of alcohol use among siblings would be homogeneous. It was also expected that alcohol use among families would be heterogeneous (Duncan et al., 1998). Thus, two familylevel variables-parent marital status and parental tacit approval of alcohol use-were included in the models as predictors of family-level alcohol use, to determine whether the heterogeneity among families could be explained by these variables.
METHOD Research Participants The National Youth Survey (NYS; Elliott, 1976) is a random sample of 1,725 adolescents selected from across the United States. For these analyses, 888 adolescents (443 males and 445 females) from 369 households with complete data were assessed annually for 4 years. Of the 369 families, there were 29 four-sibling, 92 three-sibling, and 248 two-sibling families. The age of participants ranged from 11 to 17 years with a mean of 13.86 years (SD = 1.89). Of the adolescents, 76.8% were White, 17.3% African American, 4.0% Hispanic, 0.7% American Indian, and 1.2% Asian. Parents of the adolescents were also assessed. The parent data were used for the predictor variables: parent marital status and parental tacit approval of alcohol use. Of the 369 families, 96.2% of the reporting adults were the biological parent of at least one adolescent in the family and 91.9% of the participating adults were female.
Measures Within-Level Measures Adolescent alcohol use was measured via self-reports from siblings of frequency of alcohol use over the past year, which was coded 1 = “Never,” ‘Elliot, D. National Youth Survey [United States]: Wave 1, (1976) [Computer file]. ICPSR version. Boulder, CO: University of Colorado, Behavioral Research Institute [producer], (1977). Ann Arbor, MI: Inter-University Consortium for Political and Social Research [distributor] (1994).
Duncan et al.
176
Table 7.1 Descriptive Statistics for Within- and Between-Level Variables
Mean
SD
N
Alcohol use time 1
2.72
1.92
888
Alcohol use time 2
3.17
2.11
888
Alcohol use time 3
4.14
2.30
888
Alcohol use time 4
4.42
2.25
888
Marital status
.83
.38
369
Tacit approval
1.35
.53
369
Within Level
Between Level
2 = “Once or twice a year,” 3 = “Once every 2 - 3 months,” 4 = “Once a month,” 5 = “Once every 2 3 weeks,” 6 = “Once a week,” 7 = “2 - 3 times a week,” 8 = “Once a day,” or 9 = “2 - 3 times a day.”
Between-Level Measures
Parents reported on their marital status, coded 0 = “Not married” or 1 = “Married.” Parents were also asked how wrong they felt it was for their children to use alcohol. This variable was labeled “tacit approval” and was coded 4 = “Not wrong at all,” 3 = “A little bit wrong,” 2 = “Wrong,” or 1 = “Very wrong.” These predictors were assessed at the first time point; thus, their effects are contemporaneous with initial status and precede growth. Descriptive statistics for the variables of interest are presented in Table 7.1.
STATISTICAL MODELING APPROACHES This following section gives a brief overview of the modeling specifications for the three approaches used in this study: (a) a FIME LGM approach, (b) a limited information MLGM, and (c) HLM. Models were specified across the three approaches so that each would answer the same research question involving predictors of change in family-level alcohol use-and results from the different techniques could be directly compared.
Longitudinal Multilevel Analyses
\
177
r
Person 3 of gth fami&
Person 4 of gth
ramib
Figure 7.1: Factor-of-curves model (FIML LGM).
Full Information Maximum Likelihood To test the degree to which relations among the growth factors for different family members could be described by a higher-order family construct, the latent growth curve model was parameterized as a factor-of-curves LGM (see Figure 7.1). This higher-order model follows a structure similar to a first-order multivariate associative LGM (e.g., Duncan et al., 1996; Muthbn, 1990), but the covariances among the first-order factors are hypothesized to be explained by the higher-order latent growth factors. To represent the system of relationships in the model shown in Figure 7.1, three sets of equations are needed: (a) an equation modeling level 1 within-person data, (b) an equation modeling level 2 between-person within-family data, and (c) an equation modeling level 3 between-families data. Elaboration of these equations is presented subsequently. The equation for modeling level 1 within-person data can be expressed as
where vjg is a vector of repeated measures representing the empirical growth record for individual j in the family g, A, is a fixed basis matrix containing a column of ones and a column of constant time values, 7 is a vector
178
Duncan et al.
containing latent growth factors; q j g l = ( q i j g ,q s j g ) , where qijg represents initial status and q s j g represents rate of change over time, and c j g is a vector containing measurement errors, where it is assumed that, for simplicity, Cov(ejg)is a diagonal matrix. Under this equation, the repeated measures data are represented as vjg
= Avqjg
+‘jg
where in the Av matrix, column 1 defines initial status by fixing all factor loadings at 1 and column 2 defines the rate of change. Because previous findings have demonstrated relatively linear growth in alcohol use during adolescence (e.g., Duncan & Duncan, 1996), a basic growth model hypothesizing linear growth in alcohol use was tested by fixing the elements for the slope factor to known values (i.e., A12 = O , A 2 2 = 1 , & 2 = 2,A42 = 3), respectively, for each person. Setting A12 = 0 on the slope factor allows interpretation of the latent growth factors, qijg and r p j g , as initial status at t = 0 and rate of change across time. The model can also accommodate nonlinear growth. For example, an unspecified model (i.e., freeing A32 and A42 parameters) enables the developmental trajectory for each person t o be freely estimated, allowing for possible nonlinearity and maximal fit to the data (e.g., Duncan et al., 1997 and Meredith & Tisak, 1990). The level 2 between-person equation considers individual variation within family as a function of family-level variation. As such, the model equation at this level can be expressed as
In this equation, q j g is defined as before, the A9 is an identity matrix of within-family factor basis terms, qg contains the gth family mean growth parameters qi, and qs,. For model identification purposes, the latent means for the gth family contained in qg are fixed to zero (a null vector). These means are estimated at level 3. Cjg is a vector containing disturbance terms (or residuals) associated with the regression of each of the lowerorder factors, q i j , and q s j g , with the variance-covariance matrix, 9 ,of latent residual parameters of the following form: (7.4)
Longitudinal Mu1tilevel Analyses
179
where the element of 1,b21 in represents the covariance between the latent residual factors. The level 2 equation model just described is further accounted for by the level 3 higher-order latent growth factors, which have the following form:
where a is a vector containing the latent growth means of the betweenfamily latent growth parameters (i.e., a/ = (ai,as) and (, is a vector of deviation scores reflecting the family level variation (i.e., /(, = qi, ail qs, - a s ) . Specifically, this model has the following latent growth vector form:
where ai and a s denote overall family mean values in initial status and rate of change and (is and ( s , represent deviations of individual families growth parameters from the overall family means as defined earlier. The vector is of special interest in this multilevel analysis because it represents family level variation. Cg is distributed with zero mean vector and covariance matrix @ with the following parameter matrix: (7.7) where the element of $21 in @ denotes the covariation, at the family level, between initial status and rate of change over time. To summarize, in Equations 7.1 through 7.7, the growth modeling provides for level 1 (within-individual; e.g., repeated observations), level 2 (between individual within family), and level 3 (between families) models that represent hypotheses about the growth structure underlying the repeated measures family data. An extension of this model involves incorporating exogenous predictors of the growth factors at level 3, where such predictors may be measured or latent. For example, to incorporate exogenous predictors 2 , the model is written as
where x is the vector of exogenous predictors, T~ contains the mean vector, A, is an identity matrix, contains the exogenous predictors deviated from their means, and 6 is a null vector. Centering the predictors of change ensures that the a vector in the growth portion of the model will retain its interpretation as the family mean vector of initial status and growth (see Willett & Sayer, 1994).
<
Duncan et al.
180
Unbalanced Data In the present example, the family clusters are of varying size (i.e., unbalanced). Therefore, the analysis can be viewed as a regular multiplegroup LGM with data missing for some families. Although unbalanced data can be accommodated in SEM analyses by expanding the usual structural equation model to include means, or regression intercepts, and partitioning the sample into subgroups with distinct data patterns (Duncan, Duncan, Strycker, Li, & Alpert, 1999), a FIML procedure, as implemented in many SEM software packages such as Mplus (Muthkn & Muthkn, 1998), Amos (Arbuckle, 1995), and Mx (Neale, 1991), represents a much simpler and more direct approach to handling unbalanced continuous data. The current example contains only one form of missingness (i.e., unbalanced data), but the raw ML approach used for the factor-of-curves model also allows for data that can be considered missing by omission or missing by attrition, as long as the missing data mechanism can be considered ignorable. Thus, unlike any of the other multilevel methods presented here, the raw ML approach allows for unbalanced and/or missing data at each level of the hierarchy without the need for prior or two-stage multiple imputation procedures (see Duncan et al., 1999).
Limited Information Multilevel Latent Growth Modeling Analysis The multilevel LGM approach of Muthhn (1997) involves two generalizations of the factor-of-curves SEM model: (a) SEM growth modeling as generalized to cluster data and (b) SEM multilevel modeling as generalized t o mean structures. Within the latent variable modeling of longitudinal and multilevel data, the total covariance matrix, S T , is decomposed into two independent components, a between-families covariance matrix, SB, and a within-families covariance matrix, S p w , or ST = SB + Spw. The multilevel latent growth model (MLGM) makes use of Spw and SB simult aneously.
Latent Growth Formulation Muthkn (1997) presented a latent variable formulation that can be applied to the three-level data case shown earlier. To express this hierarchical model in latent variable notation similar to that of Equation 7.1, we have ujg
=L q j g
+ cjg
(7.9)
where u j g is a 4 x 1 vector of observed outcomes for the four time points, A, a 4 x 2 matrix of factor loadings, qjg a 2 x 1 vector of latent variables representing the growth parameters, and c j g a 4 x 1 vector of error terms. Figure 7.2 shows a path diagram for describing the model specification of Muthkn’s (1997) latent variable growth model formulation for two-level, T
Longitudinal Multilevel Analyses
181
8
E
za=
1
w E E
N
L
Figure 7.2: Multilevel latent growth model (MUML MLGM).
= 4 data on the observed variables v. Again, assuming linear growth, there
are two latent growth factors, qi and qs, representing initial status and rate of growth, respectively. Following structural modeling conventions, squares in Figure 7.2 denote observed variables and circles denote latent variables. The top part of the model refers to the between structure; the part below refers to the within structure. The MLGM model with between and within level decomposition can be written in matrix notation as
where the vjg represents the repeated measures vector for the j t h person (person; j = 1 , 2 , . . . , J ) in the gth family (g = 1 , 2 , . . . , G). The first term on the right side of the equation represents the mean of the initial status and the mean of the linear growth rate. The second and third terms correspond to between-family variation. The fourth and fifth terms correspond to within-family variation. Writing out Equation 7.10, ujg yields the following matrix representation:
Duncan et al.
182 1 1 1 1
+[;
0 1 2 3
[,.:I+
:
1 3
On the between-level component, we note that < i and ~ < ~S B ~factors correspond to the
Model Estimation As can be seen from Figure 7.2, the MLGM approach significantly simplifies model specification. Rather than specifying separate LGMs for each family member, a single LGM represents growth for each person within the family. For actual estimation using the Mplus SEM program (Muthkn & Muthkn, 1998), a model must be specified for each level of data. Mplus is able to estimate models with two levels of data. However, for longitudinal designs, the repeated measures part of the model is not considered t o be one of the two levels. Rather, it is treated as a multivariate measurement model in the form of a factor-analytic, multiple-indicator model for the repeated measures. Therefore, the model is set up with Sg used as input for the between part of the model and Spw used as input for the within part of the model. The two sample covariance matrices Spw,and SB are expressed as
Longitudinal Multilevel Analyses
183
and G
(7.13) g=1
Muthkn (1994) demonstrated that the pooled within matrix, S p w , is a consistent and unbiased estimator of while the between matrix, S B , is a consistent and unbiased estimator of C C B ,where C reflects the family size, computed as:
Cw, Cw +
c = “2
(7.14)
g=1
For unbalanced data, C is close to the mean of the family sizes. Note that the between-family matrix, S B , is the covariance matrix of family means weighted by the family size. Therefore, the ML estimate of Cw is S p w , whereas the ML estimate of C B is
The fitting function that is minimized is
The top line of Equation 7.16 corresponds to the between part of the model setup, which captures the between-level contribution to the total variation, weighted by G I the number of families. The bottom line of the equation corresponds to the within part of the model, which captures the within-level contribution to the total variation, and is weighted by N - G I the total sample size minus the number of families. Note that for balanced data, C is the common family size and the two-group approach above is equivalent to FIML estimation. The mean str-ucture for the MLGM arises from the four observed variable means for alcohol use, expressed as functions of the latent factor means of the qi and qs between-level factors. The means of,the within-level growth factors are fixed at zero. For a complete exposition and computational details concerning ad hoc estimation procedures, see Muthkn (1994). The FIML and MUML LGhiI analyses in the present study were performed using the Mplus SEM computer program (Muthkn & Muthkn, 1998).
184
Duncan et al.
Hierarchical Linear Modeling A three-level HLM model was used for this example to account for longitudinal and family relations. The three-level model consists of three submodels, one for each level. When defining the HLM model of interest, the level 1 equations model the observed person data for each time point. The level 2 equations model the individual level. The level 3 equations model the family level. Adopting notation from Bryk and Raudenbush (1992), a three-level, unconditional growth model (no predictors) for estimating change in alcohol use is described. For the level 1 model, the equation is vtjq
= rojq
+ r 1 j g a t + Etjg
(7.17)
where vtjq = the observed score at time t for person j in family 9,7r0jq = initial status for person j g , 7 r l j q = growth rate for person j g , at = specified basis term (eg., set a‘ = [0 1 2 31 to correspond to the interpretation of the measurement occasion for person j in family g a t time t = 0 as the initial status), and E t j q = within person random error, assumed to be independent and normally distributed with mean = 0 and constant variance u 2 . Thus, change in alcohol use for each person j consists of two growth parameters: 7r0jq = initial status defined at t = 0, and 7r1jq = rate of change across the four measurement occasions. It should be noted that in HLM the basis terms must be specified by the user. This is different from both the FIML factor-of-curves and MLGM models, in which unspecified growth functions (i.e., basis parameters freely estimated by the software program) can be estimated. The person-specific change parameters in Equation 7.17 become outcomes in the level 2 model for variation between persons. The simplest level 2 model enables us to estimate the mean trajectory and the extent of variation around the mean, which has the following equations: Tojg
= Poog
+ Tojq
Tljg
= PlOq
+ Tljg
(7.18)
and where P0oq = mean initial status within family g at t = 0, P1og = mean rate of change within family g across time, T denotes the person-specific random effect, with r0jq = deviation of person j’s initial status within family g at t = 0 from the mean of the gth family, and rljg = deviation of person j’s rate of change within family g from the mean of the gth family growth rate. These random effects are assumed bivariate normally distributed with a 2 x 2 variance and covariance matrix, T,, of the following form: (7.19)
Longi t u din a1 Mu1tilevel Analyses
185
where ~ ~ is0 the0 variance in initial status at t = 0,~,11 is the variance of the rate of change, and T~~~ is the covariance between status at t = 0 and rate of change. For the level 3 model, the growth parameters within family g become the outcomes of level 3 parameters, as expressed by the following equations: Poog
= "loo0
+ uoog
PlOg
= 7100
+ UlOg
(7.20)
and
where yoOo= overall mean initial status, y l 0 o = overall mean growth rate, and u denotes random effect reflecting family variability, with u0og = family variation in status t = 0 and u1og = family variation in growth rate across time. The random effects at level 3 are assumed normally distributed with variance and covariance matrix: (7.21) where 7000 is the variance in initial status at t = O , T B ~is~ the variance of the rate of change, and ~ p 1 0is the covariance between status at t = 0 and rate of change. The specification of Equations 7.17, 7.18, and 7.20 can be further extended to incorporate predictors of variability associated with the levels 2 and 3-persons and families. Consider, for example, a single-time invariant predictor of initial status and growth for person j in the gth family denoted as x j g . Then, the level 2 model can be written as Tojg
= Po09
+ P0lZjg + T0jg
Tljg
= PlOg
+PllZjg +Tljg
(7.22)
and One can also consider that zg, a level 3 predictor, is related to the family level variation (e.g., in our example xg represents marital status or tacit approval) giving rise to the following level 3 model Poog
= Yo00
+ " l o o l ~ g+ u o g
b o g
= YlOO
+ Y l O l ~ g+ 2119
(7.23)
and
The actual estimation procedure for the three-level model uses a full ML procedure to estimate both covariance components and the fixed effects (levels 2 and 3 coefficients). The HLM analyses in the present study were performed using the HLM program (Bryk, Raudenbush, & Congdon, 1996).
186
Duncan et al.
RESULTS
Full Information Maximum Likelihood In accommodating analyses with unbalanced data, Mplus (Muthkn & Muthkn, 1998) treats the unbalance nature of the data as an incomplete or missing data problem and produces model parameter estimates using a ML fitting function. Fitting the basic factor-of-curves LGM produced the following chi-square fit statistic, ~ ~ ( 1 4 = 3 )372.776, p < .001, RMSEA = .066. Mplus program input for the unconditional factor-ofcurves model is shown in Appendix A. Parameter estimates for the factor-of-curves model are shown in Table 7.2. Significant mean levels were evident for the common second-order intercept, ai,and common slope, a s , indicating growth in family levels of alcohol use over time. Between-level factor, cis and <sg, and within-level factor, cijg and <sjg,variances for the intercept and slope were also significant, indicating variation in the intercept and slope of alcohol use across families and individuals. The between- (family-level) to-total factor variance ratios for the intercept and slope suggest that approximately 31% and 42% of the individual variation in the intercepts and slopes, respectively, could be accounted for by family membership. A factor-of-curves LGM with the inclusion of family-level predictors 1) was also tested, resulting in the following chi-square fit statistic, ~ ~ ( 1 7= 414.945, p < .001, RMSEA = 0.62. Examination of the parameter estimates for this model indicated that the family-level predictor, parental tacit approval, was a significant predictor of the between-level intercept, P = ,551, se = .142, t = 3.875. Greater tacit approval was associated with more pronounced levels of family alcohol use. The effects of marital status = .198, t = -.427, and the on the between-level intercept, = -.085, effects of marital status, ,9 = . - 008, g = ,071, t = -.116, and tacit approval, P = ,041, se = .051, t = 306, on the between-level slope were not significant.
'In estimating the factor-of-curves model, the same growth function is fit to the repeated measures for each member of a particular family (e.g., linear growth) and the contribution of the family-level factors are the same for each family member. To provide comparable results to the other methods, and given the restrictions of fixed basis terms and regression parameters relating the first- and second-order factors, four within-level parameters, specified t o be equal across family members (the latent withinlevel disturbances, Cijg and Csjg , within-level errors, c t j g , and within-level covariances, $ 2 1, and 5 between-level parameters (Cis, Cs,, 421, ai,and as),were estimated in the test of the hypothesized factor-of-curves model.
188
Duncan et al.
Limited Information Multilevel Latent Growth Modeling Analysis The MLGM model tested incorporated a two-factor latent growth structure for both within and between levels. Model-fitting procedures produced a chi-square test statistic of ~ ~ ( 1 =4 151.255, ) p < ,001, RMSEA = .105. Mplus program input for the unconditional MTGM is presented in Appendix B. Parameter estimates for the factor-of-curves model are shown in Table 7.2. Significant mean levels were evident for the common secondorder intercept, ai, and common slope, a s , indicating growth in family levels of alcohol use over time. Similar t o those from the FIML analysis, between-level factor, Cis and Csg, and within-level factor, C i j g and C s j g , variances for the intercept and slope were also significant, indicating variation in the intercept and slope of alcohol use across families and individuals. The significant between-level variances indicated that substantial heterogeneity existed among typical families. The between- (family-level) to-total factor variance ratios for the intercept and slope suggest that approximately 28% and 32% of the individual variation in the intercepts and slopes, respectively, could be accounted for by family membership. In the MLGM with family-level predictors, the family-level variables can be seen as influencing the family-level part of the individual family members’ slopes and intercepts indirectly through the family-level factor components. Although the within part of the model is that of a regular twofactor LGM, the between part allows for correlations among the between components of the alcohol use variables that are explained not only by the two common factors but also by correlations via the predictors and their direct and indirect effects. Fitting the two-factor structural latent growth model to the data resulted in a chi-square test statistic of ~ ~ ( 1 = 8 163.168, ) p < .001, RMSEA = .095. The family-level predictor, parental tacit approval, was a significant predictor, p = .529, g = .141, = 3.754, of the between-level intercept. The effects of marital status on the between-level intercept, /3 = -.084, se = ,196, t = -.426, and the effects of marital status, p = -.012, g = .070, $ = -.070, and tacit approval, p = .040, g = .050, t = .804, on the between-level slope were not significant.
Hierarchical Linear Modeling Parameter estimates for the unconditional HLM three-level model using the HLM heierarchical linear modeling software program of (Bryk et al., 31n the specific growth model shown in Figure 7.2, the mean structure arises from the four observed variable means being expressed as functions of the means parameters ai and a s , here applied on the between side as shown in Equation 7.10. In the model estimation, the means are included in the between-level with a scaling constant (of family size) whereas the means on the withiu component are fixed at zero. This implies that durnmy zero means are entered for the within family component.
Longitudinal Multilevel Analyses
189
1996) are presented in Table 7.2. HLM program input for the unconditional model is presented in Appendix C. Significant mean levels were evident for the intercept and slope, indicating growth in family levels of alcohol use over time. Approximately 31% and 42% of the individual variation in the alcohol intercept and slope scores, respectively, is between families (level 3). The significant level 3 variance indicated that there was significant variability in the families’ intercepts and slope scores. These parameters can be treated as outcomes. With the inclusion of predictors at the third level, family deviations can be predicted from the grand mean. Parameter estimates for this intercept- and slope-as-outcomes HLM resulted in similar parameter estimates with parental tacit approval emerging as a significant predictor, p = .550, % = .141, t. = 3.891, of the between-level intercept. The effects of marital status on the between-level intercept, p = -.087, se = .193, t = -.449, and the effects of marital status, p = -.008, = .068, t = -.121, and tacit approval, p = ,042, = .050, t = .847, on the between-level slope were not significant.
DISCUSSION The aim of the present study was to compare results across three different statistical approaches to modeling multilevel longitudinal alcohol use data. The research question involved development in family-level alcohol use over a 4-year period and the effects of family-level predictors of marital status and parental tacit approval on adolescent alcohol use. From a substantive point of view, several results are worth noting. First is the significant upward trend in the development of alcohol use among families. This finding is consistent with other developmental studies (e.g., Duncan, Duncan, & Hops, 1996) that have assessed the developmental nature of alcohol use among adolescents. Moreover, significant variation in alcohol use and development existed across families. Given this variation in alcohol use, it was of interest to determine if the family variation in alcohol use scores could be accounted for by family-level covariates. Thus, contextual variables were used to explain the heterogeneity in alcohol use among families. The most interesting finding of this study was the effect of parental tacit approval of alcohol use on the between-level intercept for alcohol use. Results of all analyses indicated that, the more parents viewed alcohol use as an acceptable behavior for their children, the greater the initial level of family alcohol use. This finding adds support to other studies that suggest “permissive” parental attitudes are an important influence on adolescent alcohol use behaviors (Barnes & Welte, 1986; Brook et al., 1986; McDermott, 1984). The effects of marital status on the betweenlevel intercept, and the effects of marital status and tacit approval on the between-level slope, were not significant in any of the analyses. The similarity in results of the three approaches suggests that in many
190
Duncan et al.
cases there may be no one “correct” way to analyze multilevel longitudinal data. All three of the analytic techniques were able to answer the research questions regarding homogeneity in alcohol use development among siblings and predictors of alcohol use across families. Models were specified to ensure that growth functions (e.g., linear growth) and, as a result, findings were similar across these techniques. Researchers should ultimately select an analytic technique based on the benefits and limitations of the approach as well as appropriateness to the research question. The following sections outline some of the advantages and limitations (summarized in Table 7 . 3 ) of each of the three analytic approaches presented in this chapter.
Full Information Maximum Likelihood Factor-of-Curves LGM The FIML LGNI approach presented was an extension of higher-order factor analytic methods. The factor-of-curves model used in the present study describes individual differences within separate univariate series and forms a common factor model to describe individual differences among these basic growth curves. In practice, this multivariate extension offers researchers opportunities for evaluating the dynamic structure of both intra- and interindividual change. The higher-order LGM approach using FIML estimation has several attractions in the analysis of longitudinal multilevel data. It takes into account all available data and produces consistent and efficient estimates. The approach is also favored for its simplicity in program specification and flexibility in handling unbalanced data. The ability to handle missing data at each level of the hierarchy is especially noteworthy. Although the current example incorporated only unbalanced data (i.e., data that might be considered missing by design), the FIML approach allows for data that can also be considered missing by omission (i.e., a subject fails to complete an item within a survey or fails to complete a survey) or missing by attrition (i.e., some participants drop out of the study and are not remeasured, a common problem for longitudinal studies) as long as the missing data mechanism can be considered ignorable (see Graham, Hofer, & Piccinin, 1994 for a complete discussion). The FIML approach takes into consideration all available causes of missingness and employs the same statistical model to handle the missing data that is used to perform the desired analysis. Standard errors aie a convenient by-product of the analysis. The factor-of-curves approach has other benefits. Conducted within the SEM framework, the factor-of-curves model can include latent variables and thus account for measurement error. This technique also can handle some types of nonlinear trajectories (unspecified growth functions) for each family member. Rather than focusing solely on quantitative change, using invariant growth functions across all family members, this approach has the potential to allow for modeling qualitative differences among family
3
\D r
Table 7.3: Advantages and Limitations of Three Longitudinal Multilevel Analytic Techniques Analytic Technique Advantages Limitations Data Characteristics Factor-of Curves 1. Simple program 1. Requires large samples Words with large samples (> 100) having small to medium-sized (FIML Estimation) specifications 2. Requires SEM expertise clusters (2-6), few time points (32. Accommodates latent 3. Tedious input specifications and heavy computations with 6), measurement error, different variables large clusters and many time patterns of change across 3. Accounts for error 4. Unspecified growth allowed points individuals and clusters, known or unknown growth curves, normal 5. Uses readily available distributions software 6. Accounts for qualitative and quantitative change 1. Requires large samples Works with large samples (> 100) Multilevel Latent 1. Flexible 2. Requires SEM expertise Growth Modeling 2. Accommodates latent having many groups (> 50), similar 3. Requires may groups patterns of growth across (MLGM) variables 3. Accounts for error 4. Requires invariance across individuals and clusters, known or betweedwithin growth unknown growth curves, 4. Unspecified growth allowed functions 5. Models between and within measurement error, normal 5. Separate betweedwithin distributions level structures analyses 1. No latent variables Works with samples with range of Hierarchical Linear 1. Results simple to interpret 2. Allows for nonnormal 2. Can’t account for measurement cluster numbers and sizes, up to Modeling (HLM) error three hierarchical levels, nonnormal outcomes 3. Models up to three levels 3. No unspecified growth outcomes, little or no measurement functions 4. Works within wide range of error, linear or known growth cluster numbers and sizes 4. No goodness-of-fit indices 5. Can not use systems of equations
192
Duncan et al.
members. For example, Zajonc and Mullally (1997) introduce the concept of collective potentiation that specifies collective side effects of birth order. They argue that, in contrast t o genetic theories that have repeatedly downsized and outsourced environmental factors such as family influences, their confluence model quantifies the differential environmental contributions t o development of successive siblings. The assumption is that the qualitative differences are quantifiable and that they would define an emerging secondorder family growth factor (see Patterson, 1995, for an example of analyses of qualitative shifts in factor patterns over time). Finally, the FIML model can be estimated using a number of currently available SEM software programs. The FIML approach also has limitations. This technique requires large samples and some SEM expertise. With large group sizes, the input specifications can be tedious and estimation tenuous. In the present example, model specification included 4 repeated measures for each of a maximum of 4 family members, resulting in 16 observed dependent variables. With the inclusion of covariates, a total of 18 variables were modeled. As the number of time points and number of family members increase, the total number of variables t o be modeled increases, as does model complexity. Increased complexity can decrease the likelihood of achieving an acceptable solution. Although model-fitting procedures in the present study revealed similarities between the factor-of-curves and other multilevel approaches, this may not be the case with different data structures and assumptions concerning factorial invariance of the growth parameters across individuals. Because both empirical and substantive differences may be critical for correct interpretation of the dynamics and influences of change, further applications of these approaches investigating both qualitative and quantitative change should be pursued.
Limited Information Multilevel Latent Growth Modeling Analysis The MLGM uses a two-stage approach t o model estimation. In the first stage, the total variance-covariance matrix is decomposed into within, Spw , and between, Sg, components. In the second stage, the MLGM model is specified using both Sg and SPW matrices ‘within the SEM framework. The developers of Mplus have made this two-stage approach inconspicuous for the user. In doing so, the MLGM, as operationalized within Mplus, greatly simplifies model specification. Rather that dealing with a model incorporating 16 observed variables (4 repeated measures on 4 family members) the model is specified using only the 4 repeated measure variables. The flexibility of the basic MLGM approach makes it an attractive analytic tool for a variety of SEM analyses with longitudinal multilevel data. Like the full ML model, the limited information approach has the capacity t o estimate and test relationships among latent variables and account for measurement error. It explicitly models the within- and between-level co-
Longitudinal Mu1tilevel Analyses
193
variance matrices while avoiding the problems of FIML when group sizes are large. Muthkn (1994) shows that this simpler estimator is consistent, is identical to the FIML estimator for balanced data, and gives results close to those of FIML for data that are not too badly unbalanced. Limitations of this technique include the necessity of SEM experience, large samples, and a sizable number of groups. Although MLGM allows for the specification of nonlinear trajectories, unlike the FIML approach it requires that these trajectories are invariant across all family members and requires that the contribution of the common between-level factor to the within-level growth functions be equal across individuals. At present, the MLGM approach is also unable to handle data that are missing by omission or attrition without separate imputation procedures being employed because of its limited information approach.
Hierarchical Linear Modeling HLM efficiently examines questions about the hierarchical nature of longitudinal alcohol use data within the regression format. Its use of regression notation makes interpretation and dissemination of results relatively simple. At present, HLM can explicitly model up to three levels of data, which is an advantage few programs offer. Three kinds of parameter estimation are available in a three-level HLM model: empirical Bayes estimates of randomly varying level 1 and level 2 coefficients, ML estimates of the level 3 coefficients (there are also generalized least square estimates), and ML estimates of the variance-covariance components. HLM can handle nonnormal outcomes and incorporates some forms of missing data. The HLM approach is able to handle missingness at level 1 only, through pairwise and listwise deletion of cases. These approaches follow the conventional routines used in standard GLM procedures found in most statistical packages. Clearly, with substantial missingness, these approaches should be used with caution given the likelihood of resulting nonpositive definite covariance matrices. Another plus is that HLM functions well within a wide range of cluster numbers and sizes. Drawbacks of the HLM approach include its inability to readily allow for the specification of systems of structural equations at the within and between levels of clustering (i.e., variables cannot be specified as both independent and dependent in the same analysis; Kaplan & Elliott, 1997). Unlike both the FIML LGM and MLGM approaches, HLM is unable to fit unspecified growth functions (basis parameters freely estimated by the software program) , although specific nonlinear functions can be designated with known values by the user. Qualitative differences among family members, therefore, cannot be accommodated using this method. Also, unlike the SEM-based approaches (i.e., FIML LGM and MLGM), HLM provides no overall goodness-of-fit indices, although HLM does allow the user to easily conduct residual analyses for checking model accuracy.
194
Duncan et al.
Summary Three approaches for the analysis of clustered and/or longitudinal data that may contain multiple levels were compared in this chapter. These procedures are sometimes referred to as random effects (e.g., Stiratelli, Laird, & Ware, 1984), hierarchical, mixed effects, multilevel, or cluster-specific models. Results show that all three analytic procedures emerged as viable methods for use with multilevel, longitudinal data and that they represent different ways of modeling data with repeated observations. Researchers should weigh the advantages and limitations of the different approaches in relation to their research questions when selecting a technique. Decisions will necessarily be influenced by aspects such as normality of the data, sample size, latent variable format needs, missing data, factorial variance/invariance, and the number of levels to be modeled as well as accessibility, ease, and experience. The flexibility of these techniques make them attractive analytic tools for a variety of analyses investigating growth and development among variables of interest with multilevel data. Within the specific area of adolescent behavior, hierarchical models allow for potentially greater insight into the developmental nature, antecedents, and sequelae of a variety of adolescent problem behaviors.
APPENDIX A TITLE : Multivariate growth model for alcohol use observed over four time points for each family member. Families of size 2, 3, and 4. Linear growth. Same growth model functions for each family member. Missing data analysis. Growth process: vl, v2, v3, v4 - family member 1 v5, v6, v7, v8 - family member 2 v9, v10, vll, v12 - family member 3 v13, v14, v15, v16 - family member 4 DATA : FILE IS focms.dat; VARIABLE : NAMES ARE vl-vI6; MISSING = ALL (99.OO> ; USEVAR = vl-vl6; ANALYSIS : TYPE = MEANSTRUCTURE MISSING H 1 ; COVERAGE=.05; MODEL : wintf BY vl-v401; wslpf BY vfQ0 v2Qf v3Q2 v4Q3; wint2 BY v5-v8Q1; wslp2 BY v5QO v6Q1 v7Q2 v8Q3;
Longitudinal Multilevel Analyses
105
w i n t 3 BY v9-vl201; wslp3 BY v9QO vlOQ1 v l l Q 2 v12Q3; w i n t 4 BY v13-vl691; wslp BY ~ 1 3 9 0v14Q1 ~ 1 5 6 2v16Q3; Cvl-vl6QOl ; b i n t by w i n t l Q 1 wint2Q1 wint3Q1 wint4Q1; b s l p by w s l p l Q 1 wslp2Q1 wslp3Q1 wslp401; [bint bslpl; [wintIQO wint2Q0 wint3QO wint4QOl; CwslplQO wslp2QO wslp390 wslp4QO1 ; b i n t WITH b s l p ; vl-v16( I) ; w s l p l wslp2 wslp3 wslp4 PWITH w i n t l wint2 w i n t 3 w i n t 4 ( 5 ) ; wintl_wint4(6); w s l p 1-wslp4 (7 ) ;
OUTPUT: STANDARDIZED;
APPENDIX B TITLE : Growth model f o r a l c o h o l u s e observed o v e r f o u r t i m e p o i n t s . L i n e a r growth. Complex sample a n a l y s i s , m u l t i l e v e l modeling ( d i s a g g r e g a t e d modeling). Same growth model f u n c t i o n s f o r between- and w i t h i n - l e v e l model p a r t s . Growth p r o c e s s : v l , v2, v3, v4
DATA : FILE IS m s . d a t ; FORMAT i s 918.2; VARIABLE : NAMES ARE g l g2 v l v2 v3 v4 x l x2 c l u s t e r ; USEVAR = v l - v 4 ; CLUSTER = c l u s t e r ; ANALYSIS: TYPE = TWOLEVEL; ITERATIONS = 100; ESTIMATOR = ML; MODEL : %BETWEEN% b i n t BY vl-v4@1; b s l p BY vlQO v2Q1 v3Q2 v4Q3; CVl-V4QOl ; Ebint-bslpl ; b i n t WITH b s l p ; Vl(1) ;
v2(1) ; v3(1) ;
Duncan et al. v4(1) ; Vl(2) ; v2(2) ; v 3 (2) ; v 4 (2) ;
%WITHIN% wint BY vl-v4Q1; wslp BY vlQ0 v2Q1 v3Q2 v4Q3; wint WITH wslp;
vl (2) ; v2(2) ; v 3 (2) ; v4(2) ;
OUTPUT: SAMPSTAT STANDARDIZED;
197
Longitudinal Multilevel Analyses
APPENDIX C HLM Setup
Comment
Levell: ALC=INTRCPTl+LIN
Level 1 model (see Equation 7.17)
Level2: INTRCPTl=INTRCPT:!+random/
Level 2 model for Equation 7.18
Level2: LIN=INTRCPTZ+random/
Level 2 model for nljg in Equation 7.18
Level3: INTRCPT2=INTRCPT3+random/
Level 3 model for Equation 7.20
Level3: INTRCPT2=INTRCPT3+random/
Level 3 model for P l o g in Equation 7.20
Nonlin: N
A nonlinear analysis is not requested
Nuxnit: 100
Request for maximum of 100 iterations
Stopval : 0.000001
0.000001 is the convergence criterion
Fixtau2: 3
Option 3, “automatic fixup”, invoked if variancecovariance matrix at Level 2 is not positive definite
Fixtau3: 3
Option 3, “automatic fixup”, invoked if variancecovariance matrix at Level 3 is not positive definite
Accel: 5
Controls interaction acceleration. Default is 5
Resfil2: N
No Level 2 residual file will be output
Resfil3: N
No Level 3 residual file will be output
Hypoth: N
No optional hypothesis testing requested
CONSTRAIN: N
No equality constraints imposed on fixed effects
Title: Growth curve modeling using HLM
Title of the setup
Output: HLM.out
Results be HLM.out
n0jg
P0og
saved
in
in
in
198
Duncan et al.
ACKNOWLEDGMENTS Preparation of this manuscript was supported by Grant DA11942 and Grant DA09548 from the National Institute on Drug Abuse and Grant AA11510 from the National Institute on Alcohol and Alcoholism. The National Youth Survey was supported by Grant MH27552 from the National Institute of Mental Health.
REFERENCES Aitkin, M., & Longford, N. (1986). Statistical modeling issues in school effectiveness studies. Journal of the Royal Statistical Society, 149, 1-43. Alters, R. L., & Cochran, J. E. (1985). Adolescent marijuana use: A test of three theories of deviant behavior. Deviant Behavior, 6, 323-346. Arbuckle, J. L. (1995). A m o s user’s guide. Chicago, IL. Bandura, A., & Walters, R. H. (1963). Social learning and personality development. New York: Holt, Rinehart and Winston. Barnes, G. M., & Welte, J. W. (1986). Patterns and predictors of alcohol use among 7-12th grade students in New York state. Journal of Studies o n Alcohol, 47, 53-62. Brook, J. S., Gordon, A. S., Whiteman, M., & Cohen, P. (1986). Some models and mechanisms for explaining the impact of maternal and adolescent characteristics on adolescent stage of drug use. Developm e n t a l Psychology, 22, 460-467. Brylt, A. S., & Raudenbush, S. W. (1992). Hierarchical linear models: Applications and data analysis methods. Newbury Park, CA: Sage. Bryk, A. S., Raudenbush, S. W., & Congdon, R. T. (1996). H L M : Hierarchical linear and nonlinear modeling with t h e H L M / 2 L and H L M / 3 L programs. Chicago, IL: Scientific Software International, Inc. Burstein, L. (1980). The analysis of multilevel data in educational research and evaluation. Review of Research in Education, 8, 158-233. Byram, 0. W., & Fly, J. W. (1984). Family structure, race, and adolescents’ alcohol use: A research note. A m e r i c a n Journal of Drug and Alcohol Abuse, 10, 467-478. Capaldi, D. M. (1989). T h e relation of f a m i l y transitions and disruptions t o boys’ adjustment problems. (Paper presented at the conference of the Society for Research in Child Development, Kansas City, MO)
Longitudinal Multilevel Analyses
199
Capaldi, D. M., & Patterson, G. R. (1991). Relation of parental transitions to boys’ adjustment problems: I. A linear hypothesis. 11. Mothers at risk for transitions and unskilled parenting. Developmental Psychology, 27, 489-504. de Leeuw, J., & Kreft, I. (1986). Random coefficient models for multilevel analysis. Journal of Educational Statistics, 11, 57-85. Duncan, S. C., & Duncan, T. E. (1996). A multivariate latent growth curve analysis of adolescent substance use. Structural E q u a t i o n Modeling, 3, 323-347. Duncan, S. C., Duncan, T. E., & Hops, H. (1996). Analysis of longitudinal data within accelerated longitudinal designs. Psychological Methods, I , 236-248. Duncan, T. E., Duncan, S. C., Alpert, A., Hops, H., Stoolmiller, M., & Muthkn, B. (1997). Latent variable modeling of longitudinal and multilevel substance use data. Multivariate Behavioral Research, 32, 275-318. Duncan, T. E., Duncan, S. C., & Hops, H. (1998). Latent variable modeling of longitudinal and multilevel alcohol use data. J o u r n a l of S t u d i e s o n Alcohol, 59, 399-408. Duncan, T. E., Duncan, S. C., Hops, H., & Stoolmiller, M. (1995). An analysis of the relationship between parent and adolescent marijuana use via generalized estimating equation methodology. Multivariate Behavioral Research, 30, 317-339. Duncan, T. E., Duncan, S. C., Strycker, L. A., Li, F., & Alpert, A. (1999). An introduction t o latent variable growth curve modeling: Concepts, issues, and applications. Mahwah, NJ: Lawrence Erlbaum Associates. Elliott, D. (1976). National Youth Survey [United States]: Wave I [Computer file]. ICPSR version. Boulder, CO: University of Colorado, Behavioral Research Institute [producer], (1977). Ann Arbor, MI: Interuniversity Consortium for Political and Social Research [distributor], (1994). Farrington, D. P. (1987). Early precursors of frequent offending. In J. Q. Wilson & G. C. Loury (Eds.), F r o m children t o citizens: Families, schools, and delinquency prevention (Vol. 111, p. 27-51). New York: Springer-Verlag. Flewelling, R. L., & Baunian, I<. E. (1990). Family structure as a predictor of initial substance use and sexual intercourse in early adolescence. J o u r n a l of Marriage and t h e Family, 52, 1106-1111.
200
Duncan et al.
Goldstein, H. I. (1986). Multilevel mixed linear model analysis using iterative general least squares. Biometrika, 73, 43-56. Graham, J. W., Hofer, S. M., & Piccinin, A. M. (1994). Analysis with missing data in drug prevention research. In L. M. Collins & L. Seitz (Eds.), Advances in data analysis f o r prevention intervention research. National Institute o n Drug A b u s e Research Monograph Series (#l42). Washington, DC: National Institute on Drug Abuse. Kandel, D. B., & Andrews, K. (1987). Processes of adolescent socialization by parents and peers. T h e International Journal of t h e Addictions, 22, 319-342. Kaplan, D., & Elliott, P. R. (1997). A didactic example of multilevel structural equation modeling applicable to the study of organizations. Structural Equation Modeling, 4, 1-24. Kreft, I. G. (1992). Multilevel models for hierarchically nested data: Potential applications in substance abuse prevention research. In L. Collins & L. Seitz (Eds.), Technical review panel o n advances in data analysis f o r prevention intervention research. NIDA Research Monograph, 108. Longford, N. T . (1987). A fast scoring algorithm for maximum likelihood estimation in unbalanced mixed models with nested effects. Biometrika, 74, 817-827. Mason, W.M., Wong, G., & Entwistle, B. (1984). Contextual analysis through the multilevel linear model. In S. Leinhardt (Ed.), Sociological methodology (p. 72-103). San Francisco, CA: Jossey-Bass. McArdle, J. J. (1988). Dynamic but structural equation modeling of repeated measures data. In 3 . R. Nesselroade & R. B. Cattell (Eds.), Handbook of multivariate experimental psychology (2nd ed.). New York: Plenum Press. McDermott, D. (1984). The relationship of parental drug use and parent's attitude concerning adolescent drug use to adolescent drug use. Adolescence, 19, 89-97. Meredith, W., & Tisak, J. (1990). Latent curve analysis. Psychometrika, 55, 107-122. Muthhn, B. (1994). Multilevel covariance structure analysis. Sociological Methods & Research, 22, 376-398. M u t h h , B. (1997). Latent variable modeling of longitudinal and multilevel data. In A. Raftery (Ed.), Sociological methodology (p. 453-4801" Boston, MA: Blackwell.
Longitudinal Mu1tilevel Analyses
201
Muthkn, B., & Satorra, A. (1989). Multilevel aspects of varying parameters in structural models. In R. D. Bock (Ed.), Multilevel analysis of educational data (p. 87-99). San Diego, CA: Academic Press. Muthkn, B. 0. (1990). M e a n and covariance structure analysis of hierarchical data. (Paper presented at the Psychometric Society meeting, Princeton, NJ) Muthkn, B. 0. (1991). Analysis of longitudinal data using latent variable models with varying parameters. In L. C. Collins & J. L. Horn (Eds.), B e s t methods for t h e analysis of change (p. 1-17). Washington, DC: American Psychological Association. Muthkn, L. K., & Muthkn, B. 0. (1998). M p l u s user’s guide. Los Angeles: Muthken and Muthkn. Neale, M. C. (1991). M x : Statistical modeling (Tech. Rep.). Box 3 MCV, Richmond, VA: Department of Human Genetics. Patterson, G. R. (1995). Orderly change in a stable world: The antisocial trait as a chimera. In J. M. Gottman (Ed.), T h e analysis of change (p. 83-101). Mahwah, NJ: Lawrence Erlbaum Associates. Raudenbush, S. W., & Bryk, A. S. (1988). Methodological advances in studying effects of schools and classrooms on student learning. In E. Z. Roth (Ed.), R e v i e w of research in education (p. 423-475). Washington, DC: American Educational Research Association. Searle, S. R., Casella, G., 81. McCulloch, C. E. (1992). Variance components. New York: Wiley. Stiratelli, R., Laird, N., & Ware, J. H. (1984). Random-effects models for serial observations with binary response. B i o m e t r i c s , 40, 961-971. Sweetser, D. A. (1985). Broken homes: Stable risk, changing reasons, changing forms. Journal of Marriage a n d t h e Family, 47, 709-715. Thompson, K., & Wilsnack, R. (1984). Drinking problems among female adolescents: Patterns and influences. In S. Wilsnack & L. Beckman (Eds.), Alcohol problems in w o m e n (p. 37-65). New York: Guilford. Willett, J. B., & Sayer, A. G. (1994). Using covariance structure analysis to detect correlates and predictors of individual change over time. Psychological Bulletin, 116, 363-381. Winer, B. J. (1971). Statistical principles in experimental design (2nd ed.). New York: McGraw-Hill. Zajonc, R. B., & Mullally, P. R. (1997). Birth order: Reconciling conflicting effects. A m e r i c a n Psychologist, 52, 685-699.
Chapter 8
Times Series Regressions Steven Hillmer University of K a n s a s It is often of interest to study the impact of a known intervention upon a second variable. Examples include the impact of an increase in tobacco taxes on teenage smoking, the impact of an environmental regulation on pollution, or the impact of an advertising campaign upon sales. Denoting the variable that is believed to be affected by the known intervention by yt, then the relationship yt=X;P+et
f o r t = 1 , ..., T
(8.1)
can be used to not only model the impact of the intervention but also to capture the impact of other secondary factors affecting yt. In Equation 8.1 Xl is a row vector of independent variables including a dummy variable representing the impact of the intervention, p is a column vector of parameters, and the et is assumed to be independent and identically distributed random variables with mean 0 and variance a’. Thus, Equation 8.1 is the standard multiple regression model including the use of dummy variables to capture the affect of the known intervention. It is often the case that yt is a time series, a sequence of data arranged sequentially in time. Experience with time series data suggests that the data for any time point are often correlated with its own past values, in other words, yt is often correlated with Y t - k where k is a positive integer. This property is termed autocorrelation. When yt is autocorrelated, it is likely that the errors et in Equation 8.1 will also be autocorrelated, violating one of the assumptions of the standard multiple regression model. Thus, when model 1 is used with time series data, it is prudent to be prepared for the possibility of autocorrelated error terms. The most common approach for dealing with the possibility of autocorrelated errors is to estimate the parameters, p, assuming that the errors are
203
204
Hillmer
uncorrelated and to perform diagnostic checks on the residuals to identify if autocorrelation is a problem. Under this approach, problems with autocorrelated errors may not be detected in diagnostic checking of the residuals. In addition, when the errors are autocorrelated, it has been established that the standard errors of the ordinary least squares estimators are incorrectly computed, which can lead to false conclusions regarding the significance of the regression coefficients (Anderson, 1971). When dealing with time series data, a more realistic approach would be to specifically allow for autocorrelation in e t . It is easy to specify a model for et which not only allows for the possibility of autocorrelation but also includes the assumption of independent errors as a special case. Under this approach the model parameters would be estimated based upon the assumption of autocorrelated errors unless there was evidence from the data to the contrary. The possibility of autocorrelated errors, et in Equation 8.1, can be included by assuming that et follows some form of common time series model.
TIME SERIES MODELS An important set of models that describes one form of probabilistic structure for time series data is the class of stationary processes. Stationary processes follow a specific form of statistical equilibrium in which the probabilistic properties of the process are unaffected by a change in the time origin. In other words, the joint probability distribution of any set of observations et, , e t z , . . . , ett is the same as the joint probability distribution of the observations etl+k , e t z + E , .. . , et,+, . Thus, the joint probability distribution of any set of observations is unaffected by shifting all the observations forward or backward by any k time units. This property implies that the mean and variance of a stationary process are both constant. If the assumption of normality is made, then it follows that the probability distribution for any stationary process is completely determined once the mean, variance, and correlation coefficients between et and et+k, for each k, are specified. The mean will be denoted by p and the variancc will be denoted by c,".The covariance between et and the et+k, is called the autocovariance a t lag k and is defined by
The autocorrelation at lag Ic is
Because the time series, e t , is the error in a regression model, it follows that p = 0.
Time Series Regressions
205
LINEAR STATIONARY MODELS Although in theory there are many stationary time series models, experience has suggested that a relatively small set of models can be used to approximate the types of autocorrelation structure that often occurs. One class of linear stationary models are the so called autoregressive models. If at denotes a sequence of independent normal random variates whose mean is zero and whose variance is of then the first-order autoregressive model for the time series et is et
= 4 l e t - 1 = at
(8.4)
In Equation 8.4, $1 is a parameter that is restricted to the range -1 < 41 < 1 so that the model is stationary. This model has a form like a regression equation with the lag one value of the time series in the role of the independent variable. Box, Jenkins, and Reinsel (1994) show that the variance of et in Equation 8.4 is o,"= o:/(l- 4:) and the autocorrelations follow the difference equation pk = 41pk-1 for all integers k 2 1. Because po = 1 it follows that Pk = (41)' for k > - 0. A plot of the pk versus the lag k is called the autocorrelation function of the time series. Because pk = p-k, the autocorrelation function is symmetric around zero and it is only necessary to plot the values of pk for positive values of Ic. The autocorrelation function is important because, for any stationary time series, knowledge of the autocorrelations and the variance of the time series is necessary and sufficient for specification of the model associated with that time series. Thus, the autocorrelation function provides one means to specify the associated time series model. In the case of the first order autoregressive model, the autocorrelation function will exhibit a pattern of exponential decay starting with P O . A more complex stationary model is the second orderautoregressive model et
= 4let-1
+ 42et-1 + at
(8.5)
The parameters 41 and 4 2 in Equation 8.5 must satisfy the conditions, 41 4 2 < 1,4z - 41 < 1, and -1 < 4 2 < 1 for this model to be stationary. Box et al. (1994) show when et follows Equation 8.5, o," = o:(1 - $ 2 ) / ( 1 + 4 2 ) { 1 + 4 2 ) ' - & } and pk follows the second order differ42pkl-l for k > 0 with starting values po = 1 ence equation p k = 41pk-I and p1 = & / ( 1 - $ 2 ) . This implies that the autocorrelation function of Equation 8.5 will exhibit one of the following general patterns. (i) The autocorrelations remain positive while they decay to zero in an exponential type pattern. (ii) The autocorrelations alternate in sign as they decay to zero. (iii) The autocorrelations display a type of periodic behavior that is commonly characterized as a damped sine wave. The autocorrelations follow the pattern of a plot of the sine function; however, the autocorrelation also decay toward zero as the fluctuate according to a sine pattern.
+
+
206
Hillmer
A general order autoregressive model, the so called AR(p) model is et = 4 1 e t - 1
+ 4 2 e t - 2 + . . . + $pet-p + at
(8.6)
The parameters $1, $ 2 , . . . , 4p in Equation 8.6 must satisfy a complex set of conditions for this model to be stationary (Box et al., 1994). Although it is possible to define the general AR(p) model, in practice most time series can be appropriately modeled by a low-order AR model or one of the other models discussed later. Although in theory the autocorrelation function of the first-order autoregressive model is different from the autocorrelation function for the second-order autoregressive model, in some cases it is difficult to tell the difference between these two autocorrelation functions from this plot. This has led to the introduction of a second plot to help distinguish between different order autoregressive models, the partial autocorrelation function. The partial autocorrelation function is based upon the fact that whereas an AR(p) model has an autocorrelation function that is infinite in extent, it is really determined by p nonzero functions of the autocorrelations. In other words, for an AR(1) model all the autocorrelations are a function of only 1 parameter, $ 1 , and for an AR(2) model all the autocorrelations are functions of the two parameters, $1 and 4 2 . The partial autocorrelation for lag k is equal to the correlation between et and e t - k after removing the affect of the intermediate variables e t - l , . . . ,e t - k + l . Box et al. (1994) show for an AR(p) model the partial autocorrelations at lags greater than p will all be zero. In other words, the partial autocorrelation function for an AR(p) model will exhibit a cutoff after lag p . This feature makes it easy to recognize the fact that an autoregressive model is associated with a given set of partial autocorrelations. Although the class of autoregressive models are one type of model associated with stationary time series, there are a number of other stationary models that have proven useful. One class is called moving average models. The first-order moving average model is et = at - Olat-1
(8.7)
In Equation 8.7 the at's are a sequence of independent normal random variables with mean 0 and variance C T ~and 0 1 is a parameter in the range -1 < 01 < 1. Box et al. (1994) show that, when et follows Equation 8.7, CT," = (1 + O ~ ) C T ; ,=P -~@ ~ / ( l + @ and) pk , = 0 for k 2 2. Thus, the autocorrelation function for the first-order moving average model cuts off after lag 1. This makes it easy to recognize the first-order moving average model from its autocorrelation function. Box et al. (1994) show that the partial autocorrelation function of the first-order moving average model is dominated by exponential decay. The second order moving average model is
207
Time Series Regressions
with the parameters 01 and 0 2 satisfying the constraints 01+ 0 2 < 1 , 0 2 01 < 1, and -1 < 0 2 < 1. Box et al. (1994) show that when et follows 0: @), Equation 8.8 0,"= (1 0: O$)a;,pl = -01(1 - 0 2 ) / ( 1 p2 = 0 2 / ( 1 0: a;), and p k = 0 for k 2 3. Thus, the autocorrelation function of a second-order moving average model cuts off after the second lag making it easy to recognize this model from it's autocorrelation function. The general order moving average model, the MA(q) model, is
+ +
+
+
+ +
et = at - Olat-1 - 02at-2 . . . - @,at-,
(8.9) As was the case in the general AR model, the parameters in Equation 8.9 must satisfy a complex set of conditions. It can be shown that the autocorrelation function of the MA(q) model cuts off after lag q, thus, the autocorrelation function is a very useful device for recognizing when a moving average model is appropriate. The final simple stationary time series model that will be reviewed combines autoregressive and moving average model components, the ARMA( 1, 1): (8.10) et - 41et-1 = at - Olat-l In this model, the parameters must satisfy the constraints -1 < $1 < 1 and -1 < 01 < 1. Box et al. (1994) show that for the ARMA(1,l) model 0,"=
(l+@-24101)0%/(1- &), p1 = (1-41 01)(41- 01)/(1+0;-241
01)
and p k = 41pk-1 for k 1 2. Thus, the autocorrelation function of an ARMA(1,l) model will decay exponentially from the value p 1 . It can also be shown that the partial autocorrelation function for the ARMA(1, 1) model consists of a single initial value with subsequent values characterized by a decaying exponential term such as the partial autocorrelation function of a pure MA(1) model. Thus, for the ARMA(1,l) model both the autocorrelation function and the partial autocorrelation function exhibit behavior that decays exponentially after one initial value. All of the models that have been considered are part of the class of stationary models generally referred to as autoregressive moving average or ARMA models. Before an expression for the general ARMA model is given, it is useful to define the backshift operator B. The operator B applied to any value indexed by time ( t ) results in that value shifted back one unit of time, thus, Bet = et-1. Furthermore, application of the operator B k means apply the operator B k times so that Bkut = a t - k . Then 4 ( B ) = 1-41B-42B2...-4~BPand0(B) = 1 - 0 1 B - 0 ; ? B 2 . . . - 0 , B q a r e both linear operators with 4(B)et = (1-41B-42B2.. .-&BP)et = et-&et-14 ~ - .2 . . and O ( B ) a t = at - Olat-l . . . - @,at-,. The general ARMA(p, q ) model is
208
Hillmer
The parameters in the ARMA(p, y) model must be chosen so that the solutions t o the equation 4 ( B ) = 0 (with B treated as a variable) all have modulus greater than one and the solutions of O ( B ) = 0 also all have modulus greater than one. Notice that all the models discussed earlier were special cases of the general ARMA(p, y) model.
TIME SERIES MODELS FOR NONSTATIONARY DATA Many times actual time series data do not appear t o have one fixed mean as is required for a stationary model to be appropriate. On the other hand, these series do exhibit a kind of homogenous behavior in that, except for the fact that the level of the series is changing, each part of the series behaves very much like the other parts. Box and Jenkins (1970) were among the first authors t o propose that a useful way to model this type of data is t o suppose that one or more differences of the original time series may be modeled by a stationary model. This led t o the autoregressive integrated moving average (ARIMA) models. If et is a sequence of values, then the first difference of this sequence is et - et-1, or in terms of the backshift operator the first difference is (1 - B ) e t . The second difference of the sequence et is obtained by taking the difference of the data that has been differenced. In other words, the second difference is (1- B )(1- B)et or (1- B)2et. Then if the dth difference of a time series, et, follows an ARMA(p, q ) then the time series is said t o follow an ARIMA(p, d , y) model. This model is written
4(B)(1- B)%t = O ( B ) a t
(8.12)
In practice, it is usually the case that the number of differences required t o convert et t o a stationary time series is d = 1 or 2 and the values of p and q are usually less than 2.
A STRATEGY FOR BUILDING TIME SERIES MODELS FOR OBSERVED DATA One of the most important contributions that Box and Jenkins (1970) made was the strategy they advocated for building of a model for a given time series. Box and Jenkins bcgan by defining a class of models that would be rich enough to characterize the behavior of a wide variety of actual time series. For the purpose of this discussion suppose that class is the ARIMA models. Box and Jenkins then suggested the three-stage model building process: (a) tentative identification of an ARIMA model, (b) estimation of the model, and (c) diagnostic checking of the model. In practice this process is not always sequential and may take several iterations as the model builder gains insight from each step in the process. A brief review of each of these model building steps follows.
Time Series Regressions
209
MODEL IDENTIFICATION Box et al. (1994) suggest that “identification methods are rough procedures applied to a set of data to indicate the kind of representational model that is worthy of further investigation. The specific aim here is to obtain some idea of the values of p , d , and q needed in the general ARIMA model”(p 183). The first step in identification is to determine whether or not the time series being modeled needs to be differenced in order to render it stationary and if so what particular value of d is most appropriate. Two plots that are useful in determination of the degree of differencing are a sequence plot of the time series and a plot of the autocorrelation function of the time series. A time series whose difference is stationary can be thought of as an initial value plus the sum of a finite number of terms of a series fluctuating around a fixed mean. This sum will not have a fixed level; it will either meander over time or increase over time. Thus, one of the distinguishing features of a sequence plot of a series that requires differencing is the absence of a constant level. When data exhibit a meandering level, an increasing level, or a decreasing level, the need for a difference is suggested. The autocorrelations for a stationary time series were defined in Equation 8.3. In practice these values are unknown and need to be estimated from observed data. The most common estimator of the autocorrelations of T-k a stationary time series are T k = C k / C O where C k = Ct=l(et - E ) ( e t - k - E ) is the estimator of the autocovariance yk and is the mean of the time series. The T k values are called the sample autocorrelations, and a plot of these is called the sample autocorrelation function. Fuller (1976) has shown that T k and C k are consistent estimators when the data come from a stationary model and satisfy some general conditions that are met for ARMA models. Although in theory the values of P k do not exist for a nonstationary time series, it is still possible to compute the values of T k when the time series is nonstationary. Box et al. (1994) argue that a distinguishing characteristic of the sample autocorrelation function of a time series that requires differencing is that the autocorrelations fail to die out rapidly or will decrease in a linear fashion. Thus, this pattern is indicative of the need to difference the time series being modeled. Once the value of d, the differencing, has been determined, the general appearance of the sample autocorrelation function and the sample partial autocorrelation function of the appropriately differenced data are examined to provide insight about the proper choice of the autoregressive order, p , and the moving average order, q. Pure autoregressive models are most easily identified by consideration of the sample partial autocorrelation function, which should have only “small” values after lag p . Pure moving average models are most easily identified from- the sample autocorrelation function which should have only ‘kmall” values after lag q. Mixed ARMA models will have sample autocorrelation functions and sample partial autocorrelation functions that exhibit behavior that decays out rather rapidly. It is important to note that, whereas the sample autocorrelation and sam-
+
210
Hillmer
ple partial autocorrelation functions will exhibit behavior that is similar to their population counterparts, the sample versions are based upon estimates. Thus, in interpreting these sample functions, the individual values should be viewed in light of their associated sampling variation. In addition, it is known that large correlation can exist between neighboring values in the sample autocorrelation function; thus, individual values in these sample functions should be interpreted with care. Although the sample autocorrelation function and sample partial autocorrelation function are very useful in identification of pure AR or MA models, in the case of some mixed models these sample functions do not provide unambiguous results. In part because of this issue, there are some other tools that have been developed to help in identification of time series models. These tools include the R array and S array approach proposed by Gray, Kelley, and McIntire (1978), the extended sample autocorrelation function of Tsay and Tiao (1984), and the use of canonical correlations used by Akaike (1976), Cooper and Wood (1982), and Tsay and Tiao (1985). Another approach that has been used in model selection is to make use of a general model selection value based upon information criteria such as AIC advocated by Akaike(1974) or BIC by Schwartz (1978). Although tliese information criterion approaches can be helpful, they are best used as supplementary guidelines that are part of a more detailed approach to model identification.
MODEL ESTIMATION Once a model for a given time series has been tentatively identified, it is possible to formally estimate the parameters in this model. The most common approach to parameter estimation is to assume that the errors, a t , have a normal distribution and to compute the maximum likelihood estimates of the parameters. It has been shown that these estimates are consistent and asymptotically normally distributed even if the distribution of the errors is not normal. Although the likelihood function is complex so it is not possible to derive closed form expressions for the maximum likelihood estimates, there are many computer programs that numerically maximize the likelihood function and compute the parameter estimates along with their standard errors. The theoretical work on parameter estimation in time series models is important in establishing properties of the estimates being computed; however, for practical purposes, what is important is an easy-to-use computer software program. The software program should have the option to maximize the exact likelihood function because it is known that maximization of approximations to the likelihood function can result in estimates of moving average parameters that are biased under some circumstances (Hillmer & Tiao, 1982).
211
Time Series Regressions
MODEL DIAGNOSTIC CHECKING After the parameters of a particular time series model have been estimated, it is important to examine whether or not that model adequately fits the observed data. If there is evidence of an important model inadequacy, adjustments to the estimated model should be made before conclusions are drawn or the model is used. The process of performing diagnostic checks on the fitted model is an important part of determining the usefulness of that model. “If diagnostic checks, which have been thoughtfully devised, are applied to a model fitted to a reasonably large body of data and fail to show serious discrepancies, then we shall rightly feel more comfortable about using that model” (Box et al., 1994, p. 309). On the other hand, if diagnostic checks reveal a serious problem with the model, then the model builder can be warned to modify the fitted model to address the problems. Although there are many different types of checks that can be used, much can be learned from some relatively simple procedures. Two general methods will be reviewed: checking the autocorrelation function of the residuals from the fitted model and using a popular lack of fit test. Most diagnostic checks for a particular fitted model are based upon the residuals from the fitted model. To illustrate how the residuals are computed, suppose the fitted model is Equation 8.10 and the parameter estimates are $1 and 61 then taking 6 1 = 0 the residuals, a^t can be computed recursively from the observed time series, et, by a^t
= et - 4let-1+
6lc~t-~ for
t = 2 , . . . ,T
(8.13)
The residuals for other models can be computed in a similar manner. Notice that the residuals are in effect estimates of the errors terms at which are assumed to be independent random variables drawn from a common normal distribution. Thus, if the model being fit is approximately correct, the residuals should exhibit properties like independent normal random variates. Conversely, if the residuals do not exhibit these properties, there is an indication of a problem with the fitted model. In particular, the sample autocorrelation function of the residuals should not have values that are larger than twice their standard errors for lags k 2 1. Another common method to more formally test whether or not there are significant nonzero values in the sample autocorrelations of the residuals makes use of a whole set of these values. If r k ( 6 ) denotes the sample autocorrelation of the residuals a t lag k , then an overall test of the fitted model’s adequacy was first proposed by Box and Pierce (1970) and then modified by Ljung and Box (1978). Ljung and Box show that, if the model fitted for any ARIMA process is appropriate, then the statistic
Q = n(n + 2)
x(n k
- k)-’r:(6)
k=l
(8.14)
212
Hillmer
is approximately distributed as x2(I<- p - q ) where in Equation 8.14 n = T - d is the number of values available to estimate the parameters after the data have been differenced. Thus, the hypothesis that the fitted model is adequate will be rejected if for a given value of K the statistic Q exceeds x;(K - p - q ) , the value of a x2 random variable with K - p - q degrees of freedom that corresponds to a probability of a in the upper tail. If this hypothesis is rejected, efforts should be made to modify the fitted model.
Seasonal Time Series Models It often happens that time series data exhibits a distinct ”seasonal pattern with period s” when similarities in the series occur every s time intervals. For instance, many time series of monthly sales for retail establishments in the United States tend to have significantly larger values in November and December each year due to the Christmas holiday. These time series would have a period of s = 12, however, in other examples the period could be some other value. One of the important contributions of Box and Jenkins (1970) was to introduce a class of models that has over time proven t o do an excellent job in the modeling of seasonal time series. It is instructive to review the original rationale that Box and Jenkins (1970) used to develop their models for seasonal time series. Suppose a monthly time series exhibits periodic behavior with s = 12. Then the time series may be written in the form of a two-way table categorized one way by month and the other way by year. This arrangement emphasizes the fact that for a periodic time series there are two intervals of importance. It would be expected that relationships would occur between observations for the same month in successive years and between observations of successive months in the same year. This is similar to a simple two-way analysis of variance model. Box and Jenkins (1970) assume that a model that captures the relationship between the time series values of the same month in successive years is
@ ( B ~ )-( B I S ) D e t=
at
(8.15)
where the seasonal frequency s = 12, D is the number of seasonal differences, and @(@) and @(&’) are polynomials in the variable BS of degrees P and Q, respectively. Equation 8.15 is the general form of the ARIMA model introduced earlier; however, the model applies to data separated by s time units. Box and Jenkins (1970) make the assumption that the same model applies to all the months and the parameters contained in each of these monthly models are approximately the same for each month. The error components, a t , at-1,. . . , in Equation 8.15 would not in general be uncorrelated; thus, it would be necessary to specify a model that captured the relationships between successive values in the times series. Box and Jenkins (1970) assume the model
Time Series Regressions
213
@ q i - q d a t = e(B)a,
(8.16)
describes the relationship between successive values. In Equation 8.16 at are independent identically distributed normal random variables and $ ( B ) , O ( B ) are polynomials in the variable B of degree p and q respectively. Combining Equations 8.15 and 8.16 by multiplying both sides of Equation 8.15 by $(B)(1- B ) d and substituting Equation 8.16 into the right-hand side of the resulting expression yields Box and Jenkins general multiplicative seasonal model
The process involved in building a model for a seasonal time series is similar to that discussed earlier: a model is identified, the parameters of the tentatively identified model are estimated, and diagnostic checking is carried out. The main difference for seasonal models is that the autocorrelation functions and the partial autocorrelation functions for seasonal models are more complex than those for nonseasonal models. As an example, suppose that after appropriate differencing the model for a time series was et = (1 - el)(l - Olz)at
(8.18)
then it can be shown that the autocorrelations for Equation 8.18 are equal tozeroexcept po = 1 , p l = - e , ( 1 + 0 3 / ( i + e : ) ( 1 + o ~ 2 ) , p 1 ,= - @ 1 2 ( 1 + el)(l O:)(1 OS,), and p11 = p13 = ~ 1 ~ 1 Thus, 2. the autocorrelation function has the appearance of one "nonseasonal" spike that cuts off after lag one, one "seasonal" spike a t lag 12 that cuts off after one seasonal period, and the interaction between the nonseasonal and seasonal values that occur a t lags 11 and 13. For models of the general multiplicative type in Equation 8.17, the autocorrelation function can often be viewed as a nonseasonal part, which follows the patterns reviewed previously combined with a seasonal part that follows the same patterns but at the seasonal periods and some additional interaction factors. Some of the patterns for the autocorrelation functions for additional seasonal models can be found in Box et al. (1994).
+
+
NONLINEAR TRANSFORMATION OF THE ORIGINAL TIME SERIES DATA Sometimes it is the case that the variation in the time series is changing as the level of the series changes; in this case there is not only nonstationarity in the level but also in the variance. The usefulness of the general ARIMA and seasonal multiplicative ARIMA models can be expanded if the model builder is aware of the possibility of nonlinear transformation. In other words, it can happen that, whereas the Equation 8.17 may not provide an adequate representation for a given time series et, it may be approximately
HiJlmer
214
5000
4000
3000
2000 20
40
€4
80
100
120
140
160
Figure 8.1: Time series plot of males aged 16 to 19. correct for some nonlinear transformation, say, ln(et). A simple sequence plot of the original time series can often alert the model builder to the fact that the variance is changing and a sequence plot of various nonlinear transformations can suggest the most appropriate metric of analysis. The task is to determine the metric for which the amplitude of the local variation is independent of the level of the time series. One way to define a class of nonlinear transformations and to estimate an appropriate transformation from the data is given in Box and Cox (1964).
EXAMPLE As an illustration of modeling a seasonal time series, consider the monthly time series of employed males aged 16 to 19 in nonagricultural industries from January 1965 to August 1979. This data was modeled in Hillmer, Bell, and Tiao (1983) and is plotted in Figure 8.1. Judging from this plot, this series is seasonal, the level is changing over time, and the variability over time remains relatively stable. The failure of the sample autocorrelations, plotted in Figure 8.2, to die out rapidly and the large autocorrelations at multiples of 1 2 reinforce the observation that the data are nonstationary and seasonal. This suggests the need to difference the data. The sample autocorrelation function of the first difference of the data is plotted in Figure 8.3. That the autocorrelation pattern is repeated nearly exactly for every 12 lags suggests the need for a 12th difference. The sample autocorrelation function for the 12th difference of the data is plotted in Figure 8.4. The failure of the sample autocorrelations to die out rapidly suggests that both a 1st and 12th difference are necessary.
Time Series Regressions
215
I.
10
08
I
08 0.4 0.2
00 -0 2 -0 4 -0 8 -0 8 -1 0
Figure 8.2: Sample autocorrelation function of males aged 16 to 19.
I
I
I
I
5
zli
24
35
Figure 8.3: Sample autocorrelation function of the 1st difference of males aged 16 to 19.
10
*.a 06 0.4 0.2 0.0 4.2 4.4 6.6
*.a -1
.* I
I
I
s
d
25
I
?6
Figure 8.4: Sample autocorrelation function of the 12th difference of males.
216
Hillmer
I
I
l
I
s
Ili
a
2&
Figure 8.5: Sample autocorrelation function of the 1st and 12th differences of males. Table 8.1 Parameter Estimates for Equation 8.19
T Ratio
Parameters
Estimate
Standard Error
el
.2640
.0767
3.44
812
.7206
.0569
12.65
The sample autocorrelation function of the 1st and 12th differenced data is plotted in Figure 8.5. The most predominant feature of this plot is the large autocorrelations at lags 1 and 1 2 with another small autocorrelation at lag 11. This pattern is suggestive of the theoretical autocorrelations for Equation 8.18, thus the model (1- B ) ( I- B1’)et = (1 - O ~ B )-( IO
1 2 ~ ~ ~ ) a t
(8.19)
is tentatively identified as appropriate to fit this data set. The next step in the model building process is to estimate the parameters in the model Equation 8.19 by using the SCA-PC program (Hudak & Liu, 1991), the results are given in Table 8.1. Both of the parameters appear to be statistically significant. The sample autocorrelation function of the residuals is plotted in Figure 8.6, and a sequence plot of the residuals is plotted in Figure 8.7. These plots reveal no model inadequacies; further the Ljung-Box Q based on 36 lags equals 39.7, which is less than 48.6, the a = .05 chi-squared critical value with 34 degrees of freedom. Thus, the model in Equation 8.19 appears to be an adequate representation of this data set. The SCA program was originally a command-driven interactive program. The SCA-PC version has a Windows interface that essentially converts the user’s responses to the Windows prompts to commands. Once the user becomes familiar with the commands, it is probably easier to type
Time Series Regressions
217
Figure 8.6: Sample autocorrelation function of residuals from Equation 8.19.
Figure 8.7: Time series plot of residuals from Equation 8.19.
Hillmer
218
Table 8.2 Commands to Produce the Output in the Employed Males Example Command to create a time series plot of the variable DATA
GRAPH DATA. TYPE TPLOT. Command to plot the autocorrelations of the variable DATA
GRAPH DATA. TYPE ACF. Command to plot the partial autocorrelations of the variable
GRAPH DATA. TYPE PACF. Command to specify the model in Equation 8.19
TSMODEL NAME MODELI. MODEL DATA (1,12) = Q (I - THI*B)(I - TH2*B**12)NOISE. Command to estimate the parameters for the specified model
ESTIM MODELI. METH EXACT. HOLD RESIDUALS(RES1).
the appropriate command into the program's command window. The commands to produce the figures and the output for the employed males example are given in Table 8.2. These commands are written based upon the assumption that the data have been read into the program and are stored in a variable named DATA.
REGRESSION MODELS WITH TIME SERIES ERRORS Now that the class of ARIMA and multiplicative ARIMA models has bccn reviewed, attention can return to the model in Equation 8.1 and consider the case in which et is not characterized by independent normal random variables but rather follows some type of ARIMA model. If the vector y' = ( y l , . . . , y ~ )the , vector e' = ( e l , . . . , e T ) and the matrix X ' = ( X I , .. . , X T ) then Equation 8.1 can be written as
y=xp+e
(8.20)
In Equation 8.20 the covariance matrix of e,Cow(e) = V , is determined by the time series model for the error component. It is well known that the least squares estimator for the parameters in Equation 8.20 is =
Time Series Regressions
219
(X’V-lX)-lX‘V-ly and Cov(6) = (X’V-lX)-l. However, if a researcher proceeds as if the errors in Equation 8.20 are independent, then standard regression programs will compute the estimates = (X’X)-lX‘y so that Cov(p) = (X’X)-l(X’V-’X)(X’X). Thus, not only will the parameter estimates be incorrect, but more importantly the standard errors of the parameter estimates reported by standard computer programs and the associated t statistics used in hypothesis testing will be incorrect. Experience has suggested (see Box & Newbold, 1971) that this can cause extreme problems when the error terms in Equation 8.20 are nonstationary. Thus, it will be assumed that an observed time series yt is related to m independent variables X t = ( q , t , . . . ,x,,t)‘ so that
p
yt=X$+et
FOT t = 1 ,
...,T
(8.21)
with et following the model
The three-stage model building strategy outlined previously - identification estimation, and diagnostic checking - can be used to develop a model of the form in Equations 8.21 and 8.22 for an observed time series y t . The main changes in this three-stage process from the steps described earlier occur at the identification stage. In practice, there often is information about independent variables that are likely to be related to yt and that can be used to tentatively specify the regression portion of the model. As in the case of pure ARIMA models, the sample autocorrelation function and the time series plots play an important role in model identification. The first task in identification of a model for et is to identify the differencing needed to transform et to a stationary series. It is often the case (see Bell & Hillmer, 1983) that examination of the sample autocorrelation function of the original time series, yt, is useful in determination of the degree of differencing in the noise term, et. This is because the impact of the nonstationarity in et on the computed sample autocorrelations usually dominates the impact of the regression variables. This property has been demonstrated for some particular types of regression variables by Salineas (1983). In other words, one should look for evidence of the need for differencing in the sequence plot of yt and in the sample autocorrelation function of Yt. After yt (and thus e t ) have been appropriately differenced, the effect of the differenced et on the computed sample autocorrelations no longer dominates the effect of the differenced regression portion, X$. Thus, after the appropriate degree of differencing has been determined, the sample autocorrelation function and the sample partial autocorrelation function are determined by a combination of the effect of the differenced et and the differenced X l p . To identify the model for e t , apart from the differencing, the effects of the differenced X# must be approximately removed from the differenced yt. To achieve this the mode!
220
Hillmer
(1 - B)d(l- BS)Dyt= (1 - B)d(l- B S ) D Xt,B I
+ et
(8.23)
is fit by least squares regression. In Equation 8.23 the term (1- B)D(lB s ) D X l is a row vector whose ith element is the differenced independent variable, (1- B)d(l- B S ) D X i , t for i = 1 , . . . ,m. The sample autocorrelation function and sample partial autocorrelation function of the residuals from the model in Equation 8.23 are examined t o tentatively identify the autoregressive and moving average parts of the noise model in Equation 8.22. This approach can be justified because Fuller (1976) has shown that the sample autocorrelations (and thus the sample partial autocorrelations) of the residuals from the least squares fit of Equation 8.24 differ from those of (1- B ) d ( l - BS)Det by an amount that converges in probability to zero. This procedure will be illustrated in a subsequent example. Once the regression and time series components of the model have been tentatively identified, it is necessary to estimate the parameters. The model being estimated can be written as
Therefore, the differenced yt values are regressed on the differenced independent variables assuming a known stationary ARMA noise model. To estimate the parameters in Equation 8.24, access to software that computes maximum likelihood estimates for this model is needed. Although there are many programs that fit ARIMA time series models, not all of these fit models of the form Equation 8.24. One of the advantages of the SCA-PC program is that it is capable of estimation of parameters in models such as Equation 8.24. After the parameters have been estimated, the residuals from the estimated model should be checked for model inadequacies in the same manner outlined previously.
Example In Canada, retail trade stores used to be prohibited from being open on Sundays; however, this prohibition has been lifted in some provinces in recent years. For example, this prohibition was lifted in 1991 for the months of November and December in the province of New Brunswick, was further lifted to add the months of September through December in 1992, and in 1996 the month of August was included. There are several issues that may be important to policy makers as a result of the partial 'lifting of the prohibition of Sunday sales. In particular, there is interest in whether or not overall sales increased following the lifting of the prohibition. In addition, there is interest in whether or not there was a redistribution of sales among the days of the week to Sunday. The time series of the total department store sales in New Brunswick from 1981 t o 1996 (taken from
221
Time Series Regressions
Quenneville, Cholette, & Morry, 1999) can be analyzed t o provide partial answers to these types of policy interventions. The issue is to determine the degree to which these interventions affected the department store sales time series. This can be determined by building a regression model that incorporates dummy variables corresponding to the time a t which the policy interventions occurred. However, because the data is a time series, it is important to specifically allow for the possibility of autocorrelated errors. One way to access the impact of the policy change to allow stores to be open on some Sundays is to define the dummy variable It which takes the value 1 if t is a month including a Sunday opening and is otherwise equal to 0. It indicates the months in which Sunday sales were possible. Assume that the effect of stores being open on Sunday was approximately the same for each month, this effect can be estimated by including a term qIt in the regression portion of the model. It is known that time series of retail sales are frequently affected by what has become known as trading day variation. Trading day variation occurs when the activity of a business varies with the days of the week so that the results for a particular month partially depends upon which days of the week occur five times. In addition, accounting and reporting practices can create trading day effects in monthly time series. For instance, stores that perform their bookkeeping activities on Fridays tend to report higher sales in months with five Fridays than in months with four Fridays. Because it is likely that the time series being modeled is affected by trading day effects, it is important to include factors in the model that will allow for this phenomena. There are a number of ways to model trading day effects by inclusion of terms in the regression portion of the model. Let X i t , i = 1,.. . , 7 denote the number of Mondays, Tuesdays, and so on in month t. Then Bell and Hillmer (1983) suggest that a useful way t o model the impact of trading day variation on a monthly time series is by 7 inclusion of the regression terms T D t = PiTit where Tit = Xit - X7t for i = 1,.. . , 6 is the difference between the number of occurrences of each day of the week and the number of Sundays in month t and T7t = CPl Xit is the length of the month t. Salineas and Hillmer (1987) show that this parameterization has less of a multicollinearity problem than a parameterization involving the variables -Yit. In the expression for TDL, the parameters pi i = 1 , . . . ,6 measure the differences between the Monday, Tuesday, ..., Saturday effects and the average of the daily effects, which is estimated by p 7 . The difference in the Sunday effect and the average of the daily effects can be shown to be pi. Because p7 represents the average daily effect, which may be small in many time series, this term is often dropped from the model. Policy makers were also interested in whether or not allowing stores to be open on Sunday during some months would shift the sales from one day of the week to Sunday. One way to answer this question is to include the terms TCt = C:=l &ItTit in the regression portion of the model. The parameters 6i i = 1 , . . . ,6 represent the shift in impact of Monday, ...,
c:=l
222
Hillmer
Saturday during the months when stores were open on Sunday. It is also known that retail sales of department stores may be affected by what is known as the Easter effect, the increased buying that sometimes occurs before Easter. The placement of Easter is different from the placement of other holidays, such as Christmas, because most holidays occur in the same month each year, and thus their effect is included in the seasonal factors. In contrast, the date of Easter falls in March some months and in April other months. Bell and Hillmer (1983) propose that the effect for Easter can be modeled by a variable CuEt where the variable Et approximates the percent increase in sales each month associated with Easter. Bell and Hillmer (1983) assume that sales uniformly increase in the 15-day period before Easter Sunday. Thus, the value of Et is distributed between March and April proportionally to the fraction of the 15-day period falling in the respective months. The dates of Easter Sunday and the values of Et for the years 1981 to 1996 are given in Table 8.3. A final consideration in specifying the regression portion of a model for the department store sales is that a goods and services tax was initiated in January of 1991. This could have affected the level of sales in department stores that can be modeled by including a term XSt in the regression model. The variable St has values 0 for the months before January 1991 and 1 for the months afterward. Thus, the parameter X represents the change in the level of sales due t o the institution of the tax. In summary, a model to represent the monthly time series of department store sales in New Brunswick is Y t = VIt
6
6
i=l
i= 1
+ C PiTit + C SiIhTit + aEt + XSt + et
(8.25)
In Equation 8.25 the error term, e t, follows some type of time series model that will be assumed to be in the ARIMA class of models. It remains to tentatively identify the model for the error term. A good first step in identification of a model for et in Equation 8.25 is t o examine the sequence plot of the original time series in Figure 8.8. The seasonality of the data is evident in the plot as is the generally increasing level over time. In addition, the variability of the data is changing as the level of the data becomes larger. The changing variability is a type of nonstationarity that cannot be dealt with by differencing; however, as indicated previously changing variability can often be corrected by considering a nonlinear transformation of the data. One nonlinear transformation that often stabilizes the variability is the log; thus, the natural logs of the data are plotted in Figure 8.9. The variability of the data appears to be relatively constant throughout the data; thus, the subsequent analysis will be performed on ln(yt). The sample autocorrelation function of ln(yt) is plotted in Figure 8.10. The failure of the autocorrelations to die out suggests that the data need to be differenced. This is consistent with the appearance of the sequence plot
Time Series Regressions
223
Table 8.3 Dates of Easter and Et for the years indicated. (Et = 0 for all other months.) ~
Year
Date of Easter
March Value
April Value
1981
April 19
0
1
1982
April 11
5/15
10115
1983
April 3
13/15
2/15
1984
April 22
0
1
1985
April 7
9/15
6/15
1986
March 30
1
0
1987
April 19
0
1
1988
April 3
13/15
2/15
1989
March 26
1
0
1990
April 5
1/15
14/15
1991
March 31
1
0
1992
April 19
0
1
1993
April 11
5/15
10115
1994
April 3
13/15
2/15
1995
April 16
0
1
1996
April 7
9/15
6/15
Hillmer
224
I
I
I
50
100
150
Figure 8.8: Time series plot of department store sales for New Brunswick.
5.5
5 .O
4.5
4.0
Figure 8.9: Time series plot of log department store sales for New Brunswick.
Time Series Regressions
0.8 0.8 04
4-
0 2 - k 0.0 -' -0.2 -0.4 -
-
::::1
225
-.-.--------
*
-..-.---- _____-. I I I I I I I I I
-
-1.0
5
Is
2s
35
Figure 8.10: Sample autocorrelation function of log department store sales.
10
oa 08 04
02 00 -0 2 -0.4
-0.8 -0.8 -1 0
I
I
I
I
5
d
2s
35
Figure 8.11: Sample autocorrelation function of the 1st difference of log sales.
in Figure 8.9. The sample autocorrelations of the first difference of ln(yt) is plotted in Figure 8.11. The large values at lags 12, 24, and 36 together with the repeating pattern of autocorrelations every 12 lags suggests that an additional 12th difference in necessary. The sample autocorrelation of (1B)(1-B1')ln(yt) is plotted in Figure 8.12. The pattern is rather confusing and reflects the fact that the effect of the trading day variables, Easter, and the intervention variables in Equation 8.28 are confounded with the effect of the stationary time series model. One way to approximately eliminate the affect of the regression variables on the sample autocorrelations is to fit the regression model
226
Hillmer
Figure 8.12: Sample autocorelation function-1st log sales.
and 12th differences of
( 1 - B ) ( 1 - B12)ln(yt) = q ( l - B ) ( 1 - B12)It 6
i= 1 6
i= 1
+a(l - B)(1- B1')Et +X(1 - B ) ( 1 - B1')St et
+
(8.26)
and consider the sample autocorrelation function of the residuals from this model to help specify the autoregressive and moving average part the error model. The sample autocorrelation function of the residuals from this model is plotted in Figure 8.13. The most important features of this sample autocorrelation function are the large autocorrelations a t lags 1 and 12. These suggest a multiplicative moving average model, ( 1- 01)(1- 812)at, for the differenced error term. This example illustrates that, when attempting t o specify a model that contains both regression terms and a nonstationary ARIMA time series model, the general approach is t o determine the order of differencing from the sample autocorrelation function of the original data and to determine the form of the time series model from the sample autocorrelation function of the residuals from a regression model with the variables appropriately differenced so that the error term is presumeably stationary. Thus, the final form of the tentatively identified model is 6
6
i=l
i=l
(8.27)
Time Series Regressions
227
Figure 8.13: Sample autocorrelation function of the residuals from the regression model.
The next step in the model-building process is to estimate this model. Because the model specified in Equation 8.27 is quite complicated, it is not possible to estimate the parameters by using many of the available time series programs. One of the major advantages of the SCA-PC program is that it allows for the estimation of regression-type models that have ARIMAtype errors. The parameters are estimated by computing maximum likelihood estimators, assuming that the errors have a normal distribution. The estimators and their standard errors were computed by numerical methods in the SCA-PC program and are given in Table 8.4. There are a large number of parameters in the proposed model and, judging from Table 8.4, not all of these parameters are statistically different from zero. To eliminate those parameters whose values might as well be taken as zero, the parameters for which the absolute value of the t ratios were less than 2.00 were eliminated from the model and the reduced model was re-estimated. During this process, the parameter Q was retained because part of the reason for building the model was to evaluate the impact of the known intervention, and the parameter Q represents an important part of this intervention. This process of elimination of "nonsignificant" parameter estimates was repeated until all the remaining parameters (except possibly for the estimate of Q ) had t ratios whose absolute value was greater than 2. The parameter estimates of the resulting model are given in Table 8.5. Before these values are interpreted, diagnostic checks should be performed on the estimated model. The sample autocorrelation function of the estimated model is plotted in Figure 8.14, because all of the values are very small, there is no evidence in this plot of any model inadequacy. The time series plot of the residuals in Figure 8.15 also suggests that there is no problem with the estimated model. From Table 8.5, the parameters ,&,,&, and a are clearly nonzero; this verifies that it is important to allow for trading day and Easter effects in the model. In addition, the need for differencing the data and the fact that the parameters 81 and 82 are both nonzero reflects the need t o include the
228
Hillmer
Table 8.4 Parameter Estimates for Equation 8.26 and Their Standard Errors
Parameter
Estimate
Standard Error
Ratio
rl
.0175
.0101
1.73
P1
.0069
.0035
1.97
P2
-.0062
.0037
-1.68
P3
.0010
.0035
.27
P4
.0101
.0036
2.84
P5
.0112
.0036
3.12
P6
.0043
.0035
1.21
61
-.0238
.0155
-1.53
62
.0074
.0106
.70
63
-.0146
.0122
-1..20
64
.0173
.0119
1.46
65
-.0454
.0139
-3.28
66
.0344
.0124
2.78
a
.0456
.0074
6.15
x el
-.1310
.0221
-5.94
.3773
.0696
5.42
012
.7336
.0581
12.62
Y r
"0245
10
0.8
0.6 04
0.2 0.0 -0.2 -0.4
-0.6 -6.8 -1.0
Figure 8.14: Sample autocorrelation function of the residuals from the reduced regression-time series model.
Time Series Regressions
229
Table 8.5 Parameter Estimates for the Reduced Equation 8.26 and Their Standard Errors
Parameter
Estimate
Standard Error
Ratio
77
.0195
.0101
1.93
P4
.0091
.0024
3.83
P5
.0134
.0029
4.66
65
-.0370
.0108
-3.42
66
.0363
.0119
3.05
Q
.0463
.0073
6.35
x
-.1374
.0222
-6.19
61
.4042
.0685
5.90
012
.7418
.0576
12.87
cl
.0252
0.05 -
0.00 -
-0.05 I
I
I
50
100
150
Figure 8.15: Time series plot of the residuals from the reduced model.
Hillrner
230
Table 8.6 Additional Commands to Produce the Output in the Retail Sales Example Command to create the variable It
GENE IT. NROW 192. VALUE 0 FOR 130,1,1,0FOR 8,1,1,1,1,Q 0 FOR 8,1,1,1,1,0FOR 8,1,1,1,1,0FOR 7, 1,1,1,1,1. Command to create the variable St
GENE ST. NROW 192. 0 FOR 120, I FOR 72. Command to create the trading day variables
DAYS VARI TI TO T7. BEGIN 1981,i. END 1996,12. TRANSFORM. Command to create the Easter variable
EASTER VARI EW. BEGIN 1981,l. END 1996,12. DURATION 15. Command t o create ItTit
TCI = IT*T1. TC2 = IT*T2. continue TC6 = IT*T6. Command to create the natural log of the variable DATA.
NLOG = LN(DATA)
.
Command to specify the model in Equation 8.28
TSMODEL NAME MODEL2. MODEL NLOG(1,12) - Q. (W1) IT(BINARY, 1,12)+(Bl)T1 (B2)T2(1,12)+(B3)T3(1,12)@ + (B4)T4( 1,121+ (B5)T5 (1,12)+ (B6)T6( 1,12)+ (D1)TCl(l,12) Q + (D2)TC2 (1,12)+ (D3)TC3 (1,12)+(D4)TC4( 1,12)Q + (D5)TC5 (1,12)+ (D6)TC6 (1,12)+(W2)EW (1,12)+Q - THI*B) (I - TH2*B*12)NOISE. (W3)ST(BINARY,1,12)+(1 Command to estimate the parameters in the model in Equation 8.28
ESTIM MODEL2. METH EXACT. HOLD RESIDUALS(RES1).
Time Series Regressions
231
ARIMA time series component in the model. Finally, the fact that the parameter X is nonzero suggests the effect of the goods and service tax that was initiated in January of 1991 was a real level change. Although these conclusions are interesting, the reason for building this model was to access the impact of on department store sales of allowing stores t o be open on some Sundays. This impact is reflected by the parameters 71, 65, and 66 in the model. The t ratio for the parameter 7 is 1.93, which suggests that the evidence that there was an increase in sales due to the opening of stores on Sunday is very slight. In contrast, the t-ratios for S5 and 66 are both large enough in magnitude to suggest that there was a shift in the impact of the trading day effects, the impact of an extra Friday was decreased, and the impact of an extra Saturday was increased. The SCA program commands to specify the model in Equation 8.26 and estimate the parameters in that model are provided in Table 8.6. This example illustrates that it is possible to evaluate the impact of known policy decisions in cases where the data are affected in a complex manner by external extraneous factors and when the data are autocorrelated. The example shows how to make use of external knowledge about the time series being modeled to help specify the factors that are important to include in the model beyond the hypothesized intervention. In this case these factors included trading day and Easter factors as well as a secondary intervention. The example also illustrates that an understanding of the properties of time series models also plays an important role in the model building.
REFERENCES Akaike, H. (1976). Canonical correlation analysis of time series and the use of information criteria. In R. K. Mehra & D. C. Laniotis (Eds.), Systems identification advances and case studies (p. 27-96). New York: Academic Press. Anderson, T. W. (1971). T h e statistical analysis of t i m e series. New York: John Wiley and Sons. Bell, W. R., & Hillmer, S. C. (1983). Modeling time series with calendar variation. Journal of the American Statistical Association, 78, 526534. Box, G. E. P., & Cox, D. R. (1964). An analysis of transformations. Journal of the Royal Statistical Society, 26, 211-243. Box, G. E. P., & Jenkins, G. M. (1970). T i m e series analysis: Forecasting and control. San Francisco: Holden-Day. Box, G. E. P., Jenkins, G. M., & Reinsel, G. C. (1994). T i m e series analysis: Forecasting and control (3rd ed.). Englewood Cliffs, NJ: Prentice Hall.
232
Hillmer
Box, G. E. P., & Newbold, P. (1971). Some comments on a paper of Coen, Gomme, and Kendall. Journal of the Royal Statistical Society, 134, 229-240. Box, G. E. P., & Pierce, D. A. (1970). Distribution of residual autocorrelations in autoregressive-integratedmoving average time series models. Journal of the American Statistical of Time Series Analysis, 65, 1509-1526. Cooper, D. M., & Wood, E. F. (1982). Identifying multivariate time series models. Journal of Time Series Analysis, 3, 153-164. Fuller, W. A. (1976). Introduction to statistical time series. New York: John Wiley and Sons. Gray, H. L., Kelley, G. D., & McIntire, D. D. (1978). A new approach t o arma modeling. Communications in Statistics, 7, 1-77. Hillmer, S. C., Bell, W. R., & Tiao, G. C. (1983). Modeling considerations in the seasonal adjustment of economic time series. In A. Zellner (Ed.), Applied time series analysis of economic data. Washington, DC: U.S. Department of Commerce. Hillmer, S. C., & Tiao, G. C. (1982). Likelihood function of stationary multiple autoregressive moving average models. Journal of the American Statistical Association, 71, 63-70. Hudak, G., & Liu, L. (1991). Forecasting and time series analysis using the sca statistical system (Vol. 1). Oak Park, IL: Scientific Computing Associates. Ljung, G., & Box, G. E. P. (1978). On a measure of lack of fit in time series models. Biometrika, 65, 297-303. Quenneville, B., Cholette, P., & Morry, M. (1999). Should stores be open on Sunday? The impact of Sunday openings on the retail trade sector in New Brunswick. Journal of Oficial Statistics, 15, 449-463. Salineas, T . (1983). Modeling time series with trading d a y variation. Unpublished doctoral dissertation, University of Kansas. Salineas, T., & Hillmer, S. C. (1987). Multicollinearity problems in modeling time series with trading day variation. Journal of Business and Economic Statistics, 5, 431-436. Schwartz, C. (1978). Estimating the dimension of a model. Annab of Statistics, 6, 461-464.
Time Series Regressions
233
Tsay, R. S., & Tiao, G. C. (1984). Consistent estimates of autoregressive parameters and extended sample autocorrelation function for stationary arma models. Journal of the American Statistical Association, 79, 84-96. Tsay, R. S., & Tiao, G. C. (1985). Use of canonical analysis in time series model identification. Biometrika, 72, 299-315.
Chapter 9
Dynamic Factor Analysis Models for Representing Process in Multivariate Time- Series John R. Nesselroade and John J. McArdle The University of Virginia
Steven H. Aggen Virginia Commonwealth University
Jonathan M. Meyers The University of Virginia The collection of multivariate time-series and their analysis with mathematical models is necessary if we are effectively and fully t o represent process and change. Among the promising applications currently available are variations of the common factor model that integrate factor and time series modeling features in a common analytic framework. We highlight some differences and similarities between two kinds of time series models for common factors: a direct autoregressive factor score (DAFS) model and a white-noise factor score (WNFS) model. Particular specifications of these models are fitted t o data reflecting short-term changes in an intensively measured individual’s self-reported affect. Results of the model fitting underscore the importance of explicit differences in model specifications that define one’s view of the nature of process and change. In various guises, the common factor model has been applied t o multivariate time-series data for more than 50 years in effort t o represent process
235
236
Nesselroade et al.
and other kinds of change (Cattell, Cattell, & Rhymer, 1947; Hershberger, Molenaaar, & Corneal, 1996; McArdle, 1982; Molenaar, 1985; Wood & Brown, 1994). Despite the fact that particular applications of the model have been controversial (Anderson, 1963; Cattell, 196313; Holtzman, 1963; Molenaar, 1985; Steyer, Ferring, & Schmitt, 1992), the logic underlying its use seems to be sound (e.g., Bereiter, 1963), and the results have been instrumental in the development of important lines of behavioral research such as the trait-state distinction (Cattell, 1957, 1961; Horn, 1972; Kenny & Zautra, 1995; Nesselroade & Ford, 1985; Singer & Singer, 1972; Steyer et al., 1992). These applications have helped to fuel a long-standing interest in intraindividual variability as a source of measurable individual differences (Baltes, Reese, & Nesselroade, 1977; Cattell, 1957; Eizenman, Nesselroade, Featherman, & Rowe, 1997; Fiske & Maddi, 1961; Fiske & Rice, 1955; Flugel, 1928; Kim, Nesselroade, & Featherman, 1996; Larsen, 1987; Magnusson, 1997; Nesselroade & Boker, 1994; Valsiner, 1984; Wessman & Ricks, 1966; Woodrow, 1932, 1945; Zevon & Tellegen, 1982). In this chapter, we briefly examine some of the history and key issues of factor analyzing multivariate time-series and, to exemplify the methods, present analyses of some promising recent developments aimed a t further improving such applications. In the broader context of applying covariation designs to the study of behavioral phenomena, the application of the comrnon factor model to the data obtained when one individual is measured many times on many variables-multivariate time-series-has been called P-technique factor analysis (Cattell, 1952, 1961; Jones & Nesselroade, 1990; Luborsky & Mintz, 1972; Nesselroade & Ford, 1985). The analytical focus is on building a structural representation of patterns of within-person fluctuation of the variables over time. The intention of Cattell et al. (1947) in introducing this method of analysis was to discover “source traits” at the individual level. Cattell (1966) argued for some congruence between the way people change and the way they differ from each other. He declared that “we should be very surprised if the growth pattern in a trait bore n o relation to its absolute pattern, as an individual differences structure” (p. 358) thus arguing for a similarity of patterns of intraindividual change and interindividual differences (Hundleby, Pawlik, & Cattell, 1965). Bereiter (1963) noted that correlations between measures over individuals should bear some correspondence to “correlations between measures for the same or randomly equivalent individuals over varying occasions, and the study of individual differences may be justifiable as an expedient substitute for the more difficult P-technique.” (p. 15) The flip side of this interpretation, which is not so often played, is that some of what are interpreted t o be individual differences structures are actually intraindividual variability patterns that are asynchronous across persons and (perhaps erroneously) frozen in time by limiting the observations to a single measurement occasion. A key concern in either case is the degree of convergence between patterns of within-person change and among-person differences. Other authors have
Dynamic Factor Models
237
discussed this general topic under the label ergodicity (e.g., Jones, 1991; Nesselroade & Molenaar, 1999). The essential point is that investigation of variation (and covariation) in the individual over time is a meaningful and necessary enterprise, the results of which need to be integrated into the larger framework of behavioral research and theory. Since 1947 a large number of P-technique studies have been conducted (see Luborsky & Mintz, 1972; Jones & Nesselroade, 1990, for reviews). By the early 1960s, neither the proponents of P-technique factor analysis such as Cattell nor its critics such as Holtzman (1963) were satisfied with its ability to model the subtleties of intraindividual change. Consider, for example, the matter of influence exerted on observed variables by the unobserved factors. The common factor model as traditionally applied to individual differences information (e.g., ability test scores) implies that individual differences in the underlying factors are responsible for individual differences in the observed variables. In P-technique applications, however, there are no individual differences because only one person is measured. Rather, the differences are in that individual’s scores from one occasion to another, that is, they are changes. Changes in the underlying factors are modeled as producing changes in the observed variables. The original P-technique model implies that the total influence of a factor on an observed variable is exerted instantaneously. Restricting the coupling between factors and variables in this way implies that, on those occasions when the factor score is extreme, the variable score will also tend toward the extreme and, on those occasions when the factor score is moderate, the variable score will also tend to be moderate. The model does not afford explicit representation of more intricate (read realistic) patterns of influence of factors on variables such as persistence over time (e.g., the gradual dissipation or strengthening of the effects of extreme factor scores on one occasion on the variables at a later occasion). Moreover, the pattern of effect gradients may differ with different observed variables. Statements of the type “I’m okay now, I just can’t seem to stop shaking” illustrate the differences in the rates at which various components of a response pattern (e.g., self-reported internal state and objectively verifiable physical manifestations) return to equilibrium after the organism experiences an extreme in level of anxiety or fear. The basic P-technique model simply does not have the ability to represent the rich variety of relationships that we tend to associate with notions of process. Cattell (196313) himself called for refinements in the P-technique model that would allow representation of the effects exerted on the variables by the factors to dissipate or strengthen gradually over time rather than to be merely concurrent. It was not until the 1980s, however, that some key attempts to elaborate the P-technique factor model appeared that improved its capacity to represent change processes more veridically (e.g., Engle & Watson, 1981; Geweke & Singleton, 1981; McArdle, 1982; Molenaar, 1985). It is only in the past decade that the implementation of more promising, rigorous approaches to the study of intensively measured intraindividual
Nesselroade et al.
238
variability in the single case via multivariate modeling has begun seemingly in earnest. In the remainder of this chapter we focus on two of these approaches, labeled the DAFS (direct autoregressive factor score) and WNFS (white noise factor score) models. We provide descriptions of the models and an example of fitting them to data. In so doing, we draw further attention to the evolving interest in intraindividual variability phenomena in a wide variety of content domains and identify some research tools that seem particularly promising for rapid advance in these areas. To illustrate the applications concretely, the factor models will be presented, discussed, and compared in the context of fitting them t o real data using standard structural equation modeling software (e.g., LISREL 8 by Joreskog & Sorbom, 1993).
LINEAR STRUCTURAL EQUATION MODELS FOR TIME-SERIES We begin the presention of these linear structural equation models for multivariate time-series with a brief review and critique of the basic P-technique factor analysis model. The alternative specifications are then presented, fitted to empirical data, the results are described and discussed, and some implications for future research are drawn.
Basic P-Technique Factor Analysis Model The essential novelty of the initial application of the common factor model to P-technique data (Cattell et al., 1947) was the fitting of the factor model t o the covariation of multiple variables measured across time on only one individual. The P-technique application of the common factor model contrasts sharply with the usual one of fitting the model to the pattern of covariation of multiple variables measured across a sample of persons on a single occasion (R-technique factor analysis). Although in both cases one analyzes a symmetric, “variables x variables” covariance matrix, in P-technique the elements signify the extent to which the variables covary with each other over time (within one individual) whereas in R-technique the covariance elements indicate the extent to which the variables covary across persons (based on a single occasion of measurement). For the traditional P-technique application, no accommodations are made in the basic factor model for time-related dependencies in the data. The traditional common factor model is applied straightforwardly to the multivariate time-series data. This factor model can be represented as
d t ) = A f ( t )+ u ( t )
(9.1)
where y ( t ) is a p-variate observed time-series measured at time t ( t = P,2, ...,T ) ,A is a p x Ic matrix of factor loadings, f ( t ) is a Ic-variate timeseries of (unobserved) factor scores at time t (t = 1,2, ...,T ) , and u ( t ) is a
Dynamic Factor Models
239
p-variate time-series of unique parts (specificity plus error) of the observed scores at time t (t = 1,2,...,T ) that may have an autocorrelational but no cross-correlational structure. In fitting the common factor model to observed multivariate time-series, time-dependent properties of the common factor scores are ignored and the unique parts of the observed scores are assumed to be “white noise.” A path diagram of the model is represented in Figure 9.1. As the time index t indicates, the factors are modeled as exerting their influence on the variables, but only concurrently in relation to the successive occasions of measurement. Thus, on a given occasion, a given observed score is constituted as a linear combination of the factor score(s) at that occasion of measurement and the “unique part” of that variable a t that occasion. No lagged effects (e.g., the influence of yesterday’s factor score on today’s observed variable score) are represented in this model. Thus, the model’s ability to represent explicitly aspects of process is limited to concurrent relationships. Similarly, no time-dependent structure is recognized for the unique parts of the variables. In a covariance or correlation metric, the expectation for a unique part on a given day is zero, regardless of the magnitude of the unique part on the previous occasion. As was noted earlier, this initial P-technique factor model was criticized by a number of writers (Holtzman, 1963; Molenaar, 1985; Steyer et al., 1992) and was acknowledged by (Cattell, 1963a) as possibly inadequate for representing some kinds of time-series data. Cattell’s (1963a) call for improvements in the factor analytic representation to allow for lags in the action of the factors in driving the observed variables went largely unanswered for two decades. In the early 1980s a number of writers attempted to fit the factor model to multivariate time-series data in ways that implicitly, if not explicitly, would improve on the original P-technique applications (Engle & Watson, 1981; Geweke & Singleton, 1981; McArdle, 1982; Molenaar, 1985). In the next two sections, the particular adaptations presented by McArdle (1982) and Molenaar (1985) are summarized and discussed. The differences that the two models signify in terms of conceptual meaning of the nature of process dimensions will be pointed out.
Direct Autoregressive Factor Score Model The DAFS model was proposed by Engle and Watson (1981) and used as a SEM with psychological variables by McArdle (1982). The model is depicted graphically in Figure 9.2. It explicitly incorporates lagged effects of factors on variables by allowing the factor scores to manifest time-related dependencies in the form of auto-correlations. Yesterday’s factor scores, for instance, can directly influence today’s factor scores and thereby have an impact on today’s observed variable scores. Earlier factor scores do not ~~~~
‘Although it is possible to include them as parameters in the model, to simplify this presentation, cross-correlations among the common factors have been omitted.
240
Nesselroade et al.
Time c
I.
Figure 9.1: P-technique factor model.
Dynamic Factor Models
241
exert influence directly on later manifest variable scores. Rather, they exert only indirect influence on later variable scores through their effect on later factor scores. The DAFS model specification can be written as the pair of equations
and
f ( t ) = B1 . f ( t - 1)+ B2 * f(t
- 2)
+ . . . + Bs
*
f(t
- S)
+v(t)
(9.3)
where, as defined previously, y(t) is a pvariate observed time-series measured at time t (t = 1 , 2 , ...,T ) , A is a p x Ic matrix of factor loadings, f(t - w), (t = 1,2, ...,T ;w = 0,1,2, ...s) is a Ic-variate time-series of (unobserved) factor scores w occasions prior to occasion t , u ( t ) is a p-variate time-series of unique parts of the observed scores that may have an autocorrelational but (for simplicity here) no cross-correlational structure, B k is a weight matrix with element P k i j in general reflecting the magnitude of influence of the ith factor from k occasions earlier on the current value of the j t h factor, and v ( t ) is a disturbance term signifying concurrent contributions to the factor scores that are not part of the direct autoregressive structure. By substitution we have y ( t ) = A[B1 . f(t - 1)
+ . . . + Bs . f ( t - s)] + A . v ( t ) + ~ ( t )
(9.4)
As noted previously, for the purposes of this discussion, P k i j = 0 for i # j (no cross-regressions). Equation 9.4 mandates that the influence of prior factor scores on the observed variables at time t is mediated through a given B k . Precluding earlier factor scores from directly influencing later variable scores is a key point of difference between the DAFS model and that by Molenaar (1985) presented subsequently. The distinguishing features of the DAFS model are (a) the factor loadings are invariant with respect to amount of lag (only one A is defined in the model), (b) possible autocorrelational structure in the common factor scores, and (c) possible autocorrelational structure in the uniquenesses of the variables. Thus, in this model the influence of the factor on the observed variables follows the same pattern regardless of the amount of lag, but the magnitude of the influence is diminished (or possibly enhanced) with increasing lag. For example, an extreme score on a factor at a given occasion would tend to produce an extreme score on a highly loaded variable on that occasion. For the score on that same observed variable one occasion later, however, that extreme factor score’s influence might be effectively diminished by a weight, say .5. Suppose the value of .5 held for all lags of one occasion. Then the effects of a lag of two occasions ought to
242
Nesselroade et al.
Figure 9.2: Direct autoregressive factor score (DAFS) model. An alternative representation involves replacing the variance on the common factors (circles labeled with F's) with another circle representing an unmeasured innovation with a loading of unity and a variance of u. The more compact path modeling notation used here is consistent with that of Horn and McArdle (1980; p. 521).
Dynamic Factor Models
243
be representable by a weighting of .5 x .5 = .25. Indeed, the invariance of the scaling weights across different amounts of lag is a statistically testable proposition. To reiterate, in the DAFS model, the effect of yesterday’s factor score on today’s observed scores is mediated both by the level of time-related dependency residing in the factor scores and the elapsed time (number of lags). Although for many applications a simple autoregressive model would seem appropriate, in the case of strong delayed effects, for example, the data would be better accounted for by a large scaling value placed on the factor scores at some prior occasion of measurement. This does not necessarily pose a problem for the estimation of the model parameters unless there are insufficient degrees of freedom due to an excessive number of free parameters in the model. Note that the model as estimated applies to the entire time-series. A lag of 2 applies equally to occasions 1 versus 3 and occasions 101 versus 103. Therefore, unique occurrences (e.g., a one-time “sleeper effect” such as an occurrence a t occasion 5 having an influence on the organism at occasion 50) are “errors” in regard to both this model and that of Molenaar presented subsequently.
White Noise Factor Score Model (WNFS) An early version of a WNFS model was presented by Geweke and Singleton (1981). Molenaar’s (1985, 1994; see also Hershberger et al., 1996; Wood & Brown, 1994) proposal for fitting the common factor model t o multivariate time-series data-dynamic factor analysis involved allowing earlier factor scores to influence directly later values of the manifest variables. Thus, today’s observed scores are influenced both by today’s factor scores and by yesterday’s factor scores. To identify the time series of q ( t ) within the estimation framework he was using, Molenaar specified the common factor scores as a “white noise” time series. In general, other specifications could be used for the purpose of identification. Molenaar’s specification can be represented in the following way
+
y(t) = A(O)q(t) A ( l ) q ( t - 1)
+ . . . + A ( s ) q ( t- + ~ ( t ) S)
(9.5)
where y(t) is apvariate observed time-series measured at time t(t=1,2, ...,T), and q ( t ) is a k-variate time-series of (unobserved) factor scores at time t , ~ ( tis) a vector of uniqueness at time t , and A(s) is a p x k matrix of factor loadings at lag s. Note that, in contrast to the DAFS model, in the specification of the WNFS model A carries a lag identifier e.g., A(1). The WNFS model is shown graphically in Figure 9.3. Distinguishing features of the WNFS model include (a) the factor loadings (regression-like weights that link observed variables to factors) differ according to the amount of lag, (b) as was noted above, instead of possible autocorrelational structure in
Nesselroade et al.
244
z
Figure 9.3: White noise factor score (WNFS) model. the common factor scores over time, they are assumed to behave as “white noise,” and (c) possible autocorrelational structure in the uniquenesses of the variables. In this model, the current value of an observed variable is jointly influenced by today’s factor scores, yesterday’s factor scores, etc., today’s uniquenesses, and, possibly, yesterday’s specificity. Thus, the WNFS model represents such time-defined patterns of influence of factors on variables as decay, latency, and so on, via differences in the magnitude of the factor’s loadings on the variables as a function of time (lag).
Differences of the Two Models In the previous section, the essential nature of the two model specifications under consideration was presented. The purpose of this section is to compare these two specifications for modeling multivariate time-series at the conceptual level. A more technical comparison and examination of the differences between the two models is provided a t the end of the chapter in
245
Dynamic Factor Models Table 9.1 Comparison of Model Elements: Summary
Elements
WNFS Model
DAFS Model
Factor loadings
Vary with amount of lag
Invariant with amount of lag
Factor Correlations
Can be correlated within lags but no across lags
Can be correlated within lags and across lags
Unique parts
Uncorrelated across variables but may have autoregressive character
Uncorrelated across variables but may have autoregressive character
the Technical Appendix A. For easy reference, the similarities and differences of the two specifications are summarized in Table 9.1. The two models differ fundamentally in the presumed nature of the common factors-especially the nature of the loading patterns a t different lags-and thus represent process in distinctly different ways within the constraints dictated by the common factor model. Both models allow for a representation of continuity despite changes over time-a key feature of process-but the mechanisms by which factors drive variables, however, are notably different in the two model specifications. The basic differences between the two models can be illustrated substantively, using state of food deprivation as the underlying condition (factor) that drives two measurable variables-blood sugar level and self-reported feelings of hunger. An appropriate time frame to consider here is hours instead of days or weeks. According to the DAFS model, current blood sugar level and self-reported hunger level are directly dependent on one’s current state of deprivation. Blood sugar level and self-reported hunger level an hour ago were dependent on the state of deprivation an hour ago in exactly the same way that current blood sugar level and self-reported hunger level are dependent on current deprivation state. To the extent that deprivation state an hour ago influences current deprivation state, deprivation state an hour ago also influences current blood sugar and self-reported hunger levels. According t o the WNFS model, current blood sugar level and selfreported hunger level are dependent both on current deprivation state and deprivation state an hour ago. Suppose the individual ate one half hour ago, thus attenuating the relationship between current deprivation state and deprivation state that held an hour ago. Given the WNFS specification, deprivation state an hour ago can have some lingering inuence on
246
Nesselroade et al.
cation, deprivation state an hour ago can have some lingering inuence on blood sugar level, self-reported hunger level, or both. By contrast, given the DAFS specification, deprivation state an hour ago will influence both current blood sugar and self-reported hunger levels in an amount inversely proportional to the amount of attenuation in the relationship between current deprivation state and that of an hour ago. For example, if, after eating, blood sugar level and self-reported hunger level return t o “baseline” a t different rates, the WNFS model can represent the situation more flexibly but, as will be noted subsequently, the increased flexibility may have an accompanying cost of fewer degrees of freedom for evaluating model fit to the data.
Fitting the WNFS and DAFS Models to Data To provide some concrete guidance for readers who would like to fit either or both the WNFS and DAFS models to data, we now illustrate a set of procedures by which this can be done. The data to which the two models will be fitted for this demonstration purpose were published originally by Lebo and Nesselroade (1978). The subset of the Lebo data used here consists of repeated measurements of one subject who rated her moods daily on a variety of adjective rating scales. A series of 103 days of reports on six scales was selected for these analyses: active, lively, and peppy t o define an energy factor and sluggish, tired, and weary t o define a fatigue factor. The correlations among these six variables are presented for lag 0, lag 1, and lag 2 in Table 9.2.
Estimating the Model Parameters Following McArdle (1982), Molenaar (1985), and Wood and Brown (1994), the models were specified and their parameters estimated using LISREL 8 (Joreskog & Sorbom, 1993). To incorporate the lagged information that is needed to estimate the model parameters in the form of a symmetric matrix for input to LISREL 8, a block-Toeplitz matrix as shown in Figure 9.4 was constructed from the data described earlier (for more details on block-Toeplitz matrices see Wood & Brown, 1994 and Nesselroade & Molenaar, 1999). The submatrices comprising the block-Toeplitz matrix were constructed individually by lagging the observed data on themselves by the appropriate number of lags. For example, the correlation for x lagging y by one lag was obtained by pairing observation xi2 with observation yi1, xi3 with yi2,,.., and x i ~ - l with y i ~ .The correlation for y lagging x by one lag was obtained by pairing observation yi2 with observation x i l , yi3 with xi2, ..., and y i ~ - 1with xi^. Obviously, at lag 1, the last observation of the lead variable and the first observation of the lagging variable are unmatched, so there is a consequent decrease in the functional N on which ’We are grateful to Dr. Michael A. Lebo for permission to use these data.
Dynamic Factor Models
247
Table 9.2 Correlations Among Scales for Lags 0, 1, and 2 Active
Lively
Peppy
Sluggish
Tired
Weary
Lag 0 Active
1.00
Lively
.76
1.00
Peppy
.64
.71
1.00
Sluggish
-.36
-.28
-.20
1.00
Tired
-.34
-.28
-.22
.69
1.00
Weary
-.39
-.39
-.32
.65
.81
1.00
Lag 1 Active
.21
.03
-.OO
-.05
-.OO
-.03
Lively
.23
.10
.05
-.03
.04
.02
Peppy
.08
.07
.04
.12
.27
.19
Sluggish
-.01
.04
.19
.15
.04
.15
Tired
.12
.15
.25
.04
.02
.09
Weary
.06
.09
.22
.07
.01
.ll
Lag 2 Active
.36
.28
.28
-.19
-.04
-.09
Lively
.33
.21
.30
-.12
-.02
-.09
Peppy
.27
.27
.32
-.08
-.01
-.04
Sluggish
-.04
.01
.10
.03
.01
.12
Tired
-.07
.07
.12
.10
.03
.06
Weary
-.lo
.04
.15
.14
.09
.10
248
Nesselroade et al.
Figure 9.4: Schematized block-Toeplitz, lagged covariance matrix of Lags 0, 1, and 2. the correlation is based. Additional sample size is lost with the taking of additional lags. Moreover, the lagged correlation matrices are not symmetric because the correlation between x and y when y leads x is most likely different from the correlation between z and y when z leads y. To fit either of these two models using programs such as LISREL 8, the input covariance matrix must be symmetric. Therefore, in order to include the asymmetry of the lagged portions of the covariance matrix yet produce a symmetric matrix to be fitted, the block-Toeplitz matrix exhibits considerable redundancy as shown in Figure 9.4. To cope with the “false” degrees of freedom generated by the redundancy of the constructed matrix, the portions of the block-Toeplitz lagged covariance matrix shown as lightly shaded are estimated as free parameters in the model. Only some (or all, for the 0, 1, 2 lag model) of the left-most column blocks are actually fitted by the various models. The general strategy followed was to fit the DAFS and WNFS models as they are represented in Figures 9.2 and 9.3 to the data of Table 9.2. This means that, in the lag 2 situation, the DAFS model allows for one factor to influence the score on another from both one and two occasions earlier. The lag 2 WNFS model allows for an observed variable to be influenced by a factor concurrently and from one and two occasions earlier. Both models were specified to have the same uniqueness structure, that shown in Figures 9.2 and 9.3. The models were fitted to the full, lag 2 covariance matrix (see bottom panel of Figure 9.5). Note that, if one were to fit this 2-lag matrix with models of fewer lags of factors on variables than 2, some blocks of the 3The lisrel code that was used is presented in Technical Appendix B.
Dynamic Factor Models
249
Figure 9.5: Schematized block-Toeplitz, lagged covariance matrix of Lags 0, 1, and 2, showing which lagged covariances are deliberately not fitted by the models (light shading) and which lagged covariances are forced t o a value of zero by the models (unshaded).
covariance matrix might not be fitted at all. Figure 9.5 gives one illustration of which portions of the lagged covariance matrix might be fitted by each of three models (lag 0, lag 1, and lag 2). To illustrate, consider the Ptechnique model situation shown in panel a of Figure 9.5. One of the major criticisms of P-technique factor analysis has been that it ignores auto and cross-correlation in the data. As panel a shows, only the lag-0 information is accounted for by the model. To the extent that there is statistically significant information in the lagged portions of the block-Toeplitz matrix, it is ignored by the P-technique model. Alternative specifications one might use include “forcing” the lagged portions of the left-most column blocks t o be zero versus “freeing” them as was done in our demonstration case with the elements in the lightly shaded portions of Figure 9.5.
250
Nesselroade et al.
To fit the DAFS model, we used the WNFS model specification to which we added several constraints. The first-order autoregressive effects were estimated by constraining the lag 1 factor loadings to A(1) = B1 . A(0) (see Technical Appendix A). The second-order autoregressive effects were estimated by constraining the lag 2 loadings to A(2) = Bf . Bz . A(0). By constraining the lagged factor loadings, we were able to impose stationarity on the solution by estimating a disturbance on the concurrent factor (which was then the total variance at each lag). Because the autoregressive components were actually estimated by constraining the lagged loadings instead of as direct paths between the lagged factors, there was no “residual” component in the variance estimates of the lagged factors (see Technical Appendix A). An alternative specification would be to estimate the direct effects of the lagged factors on each other with regressionlike weights from factor to factor. Under that specification, the variance or residual variance on each lagged factor would be constrained so that the total variance at each lag was the same (to achieve stationarity). We did not use such a specification because we found that LISREL 8 was better able to handle the constraints on the loadings. Either procedure requires that the user observe the rules of path analysis (or the matrix equivalent) to specify the direct and indirect effects for each component in the model. For example, the lag 2 loading represents a lag 2 direct effect plus an indirect effect composed of the product of the lag 1 effects. Thus, when constraining the lag 2 loadings, both of these components were used to define the lag 2 loading. Finally, as indicated earlier, in the interest of keeping the model simple, cross-factor regressions were not estimated. Such estimations can be made by adding additional constraints on lagged loadings.
Results To complete the illustration of fitting the WNFS and DAFS models to the subject’s data for lags of 0, 1, and 2 , summary information is presented in Tables 9.3 and 9.4, respectively. Both models appear to fit the data adequately but, more important for our purposes, the distinct features of both models are clearly manifested in the outcomes. Neither the Fatigue factor nor the manifest variables it loads appear to exhibit autoregressive characteristics. This is not the case for Energy and its indicators, however. The WNFS model (Table 9.3) shows the autoregressive nature of the system with statistically significant lagged factor loadings, whereas the DAFS model (Table 9.4) displays the analogous information by the statistically significant prediction of the Energy factor scores at time t by their values 41n his 1985 paper, Molenaar indicated that what is here referred to as the DAFS model is a state-space model and hence is a specific version of his dynamic factor model. Molenaar went on to indicate that if the loadings in a dynamic factor model are only nonzero at lag zero, the dynamic factor model reduces to a generalized state-space model and proved that in that case the covariance function of the latent factor series is identified and thus can be uniquely estimated.
Dynamic Factor Models
251
Table 9.3 WNFS Model Fit Outcomes (x2 = 77.54; df = 56; RMSEA = .054; probability of close fit = .4)
Factor Loadings Factor (Lag 0) Variable Energy
Fatigue
Factor (Lag 1) Energy
Fatigue
Factor (Lag 2) Energy
Active
.79
.18
.36
Lively
.88
.09
.36
Peppy
.67
-.01
.39
Fatigue
Sluggish
.73
.08
.12
Tired
.89
.08
.12
.10 Weary .87 .13 Note: Energy and Fatigue factors are correlated -.37 within occasions.
at time t - 2. Highly similar values were estimated for the correlation between Energy and Fatigue within occasions in the two cases (-.37 and -.38). Although not presented here, the patterns of autocorrelations of the unique parts were also very similar in the two cases.
DISCUSSION One of the general benefits for the study of behavior of applying the WNFS and DAFS models to multivariate time series to represent process is the forced confrontation with having to define process explicitly. The term process is widely used and easy to say but, to rigorously fit and evaluate different representations, it has to be given a specific operational expression. By means of the WNFS and the DAFS models and within the constraints of the common factor model we have rendered process operational in two distinctive ways. The WNFS and DAFS models rest on markedly different interpretations of the nature of the factor variables. In the WNFS model, the factors themselves represent unobserved, unpredictable “shocks” to the system of manifest variables. Thus, process in this case is defined in terms of the systematic, lagged effects on the manifest variables. In the DAFS model, the factor scores can be relatively stable and predictable with changes in those factor scores caused by unobserved, unpredictable ‘‘shocks” acting on the factors. The manifest variables in the DAFS model are multiple indicators of the underlying processes represented by the factors. Some readers will no doubt question the efficacy of the WNFS and DAFS
252
Nesselroade et al.
Table 9.4 DAFS Model Fit Outcomes (x2 = 87.19; df = 64; RMSEA = .046; probability of close fit = .55)
Factor Loadings Variable Energy Fatigue Active
.77
Lively
.86
Peppy
.67
Sluggish
.7P
Tired
.86
Weary
.87
Factor Score Autoregressions Energy (t-2) Fatigue (t-2) Energy (t-l)(t-2) Fatigue (t-1)
Fatigue (t) .09 .13 Note: Energy and Fatigue factors are correlated -.38 within occasions.
Dynamic Factor Models
253
models for representing process, and it is clear that many other possibilities exist, should an investigator wish to make use of them. Nevertheless, the specifications we have presented have behind them a halfcentury of concern with representing process in quantitative models by means of multivariate, latent variable specifications. Fitting these models to data underscores the level of precision that an investigator must be willing to provide to model process in quantitatively rigorous terms. The results of these model-fitting exercises hint at some of the gains that can accrue when one does so. Under the constraints outlined in Technical Appendix A, the DAFS model is a restrictive version of the WNFS model, which, due to additional constraints on the factor structure, may not fit the time-series data as well. For a given problem size (number of variables, number of factors, and number of lags) and three or more indicators per factor, the DAFS model generally has fewer parameters than the WNFS model, the exact numbers depending on the actual specifications. This can also be seen by counting the paths among latent variables in Figures 9.2 and 9.3. In cases where the data meet the proportionality constraints of the DAFS model (see Technical Appendix A), the DAFS specification will have smaller error of estimation because it has more degrees of freedom. Thus, the flexibility of the WNFS model to portray differentials in the rates at which variables change in response to changes in the factor scores (a situation that violates the proportionality constraints) costs degrees of freedom. The cost may well be worth it, however, in cases where changes in variables loaded by the same factor show quite different lag patterns. As a “restrictive” version of the more general WNFS model, the DAFS model is scientifically very useful. This restrictive relationship among alternative multivariate models parallels the mathematical basis and scientific use of a “common psychometric factor” model with behavioral genetics data (McArdle & Goldsmith, 1990). Obviously many variations on the WNFS and DAFS models described previously are possible. One can argue that conceptually the two models represent very different notions of process and thus should not necessarily be seen merely as alternative models for fitting a given set of data. Nevertheless, to make this parallel application of the two models as informative as possible, we specified both of them in a way that highlighted their similarities. For example, we specified the same uniqueness structure for both models. If the two models account in a similar way for the same data, then the argument of parsimony would favor the model with the fewest parameters. If the two differ in the way they account for the same data, then the evaluation has t o take into account the parsimony of the model as well as other features. Ultimately, it will be necessary to compare alternative specifications using many different kinds of data representing different kinds of processes if the further development and use of these models is to be maximally effective. At a more specific level, the models of process we presented illlustrate how refinements and subtleties in representations can be evaluated within
Nesselroade et al.
254
the context of a general quantitative model (e.g., the common factor model). We demonstrated the underlying kinship of what, on the surface, appear to be distinct representations of process information. By showing that one of the models is a more restrictive version of the other, one can enter the scientifically highly productive arena of testing and evaluating the appropriateness of substantively meaningful restrictions imposed on rigorous, quantitative models. To illustrate some of the subtleties of the process modeling issues, consider the importance of being able to model differential rates of return to equilibrium of different manifest variables after some extraordinary event. Disparate temporal paths to the restoration of equilibrium would seem, at least on the surface, to favor the representational flexibility of the WNFS model. However, it is likely to be the case that the mechanism of self-report “smooths” differences in the rates of change (reported) at the manifest variable level, thus tending to favor the more parsimonious DAFS representation over the WNFS one. Thus, media of observation (self-report, ratings by others, objective performance measures), for instance, may well interact with the model specifications in the attempt to represent process a t the latent variable level. Neither the WNFS nor the DAFS model specification should be regarded as a “winner” in some modeling context. It is entirely possible that a DAFS model is a reasonable representation for factor 1 but not for factor 2, and so on. The choice between which one to fit should be made primarily out of substantive considerations. Thus, it is important to continue t o explore alternative modei specifications using data drawn from a variety of content domains. Finally, it is our view that never before in the study of behavior has the time to focus on modeling process and change been so ripe (Nesselroade & Schmidt McCollam, 2000). Historically, we are at a confluence of several substantive and methodological influences that are drawing more and more attention to intraindividual variability phenomena (Nesselroade & Featherman, 1997). Success in capitalizing on this historical opportunity will rest heavily on our ability to proceed with precision and rigor as the cherished and familiar concepts called “processes” are rendered more operational.
TECHNICAL APPENDIX A. DIFFERENCES AND SIMILARITIES OF WNFS AND DAFS MODELS To examine the WNFS and DAFS models at a somewhat more technical level, consider these simplified (first order lags only) equations
+
y ( t ) = A(0) . ~ ( t )A ( l ) . q(t - 1) of the WNFS model and
+ +)
(9.6)
255
Dynamic Factor Models and
+
+
f ( t ) = B . f ( t - 1) + ~ ( t=)~ ( t )B . ~ (- t1) B2 . f ( t - 2)
(9.8)
of the DAFS model. A closer inspection of the relations between these two time series based factor models shows their equivalence under some constraints on the WNFS model. Starting with the DAFS model,
+
~ ( t=)A . [B . f ( t - 1) ~ ( t + ) ]~ ( t ) ,
(9.9)
or
~ ( t=)A . [ ~ ( + t )B . ~ (- t1)+ B2 . f(t - 2)] + ~ ( t )
(9.10)
Rearranging terms, we have y(t) = [A.~
( t+ ) ]A . [ B . ~ (- t1)+ B2 . f(t
- 2)]
+~ ( t )
(9.11)
By stationarity, f(t - 2) has a zero mean and the same variance as f ( t ) . Redefining
yields
. ~ ( t+)A(1) . ~ (- t1)+ ~ ( t+)4 (9.12) where q is a residual term = A . B2 . f(t - 2). The q term signifies an error y ( t ) = A(0)
of approximation random variable. Thus, for the WNFS and DAFS models to be equivalent, the pattern of first-order white noise factor loadings, A(1) is required t o be “proportional” to A(0) [i.e., A(0) = A and A(1) = A . B = A(0) . B] and A . B2. f(t - 2) must have a negligible covariance matrix. Consider the second-order lag equations of the same two model specifications. The WNFS model for Lag-2 effects can be written as
~ ( t=)A(0) . V ( t ) + A(1) . ~ (- t1) + A(2) . ~ (- t2) + E ( t )
(9.13)
and the DAFS model for lag-2 effects can be written as (9.14)
Nesselroade et al.
256
f(t) =
. f ( t- 1) + Bz . f(t
+ w(t)
(9.15) Bz . v(t - 2) B: . ~ (- t2 ) (9.16) +B,3.f ( t - 3) 2 B 1 . B 2 . f ( t - 3) +B; . Bz . f(t - 4) Bg . f ( t - 4)
B1
+
= ~ ( t )B1 . ~ (- t1)
+
- 2)
+
+
’
+
Combining the two terms involving w ( t - 2) we have
f(t)
~ ( t+)B1 . ~ (- t1) + B; . B2 . v ( t - 2 ) +B; . f ( t - 3) + 2 . B1 * B2 . f ( t - 3) +B1” . B2 . f(t - 4) + Bg . f(t - 4)
=
(9.17)
Similar t o the previous development,
y(t)
+ B1 . ~ (- t1)+ B; . Bz . ~ (- t2)] +A[@ . f(t - 3) 2 . B1 . Bz . f ( t - 3) +Bi . Bz . f(t - 4) Bi . f(t - 4)] + u ( t )
= A[v(t)
+
(9.18)
+
Redifining
and substituting yields y ( t ) = A(O) . q ( t )
+ A(1) . ~
+
(9.19)
+ B ? . B z . f ( t - 4) + B ; . f(t
- 4)]
(- t1) A(2) . ~ (- t2 ) + ~ ( t+)
where q , a residual term, represents
A [ @ . f ( t - 3) + 2.B1
.B2.
f(t
- 3)
This is the WNFS model with lags of 0, 1, and 2 plus the residual term. Thus, for the WNFS and DAFS models to be equivalent in the lag 2 case, the patterns of first- and second-order white noise factor loadings, A(1) and A(2), are required t o be “proportional” to the autoregression weights, B1 and Bz [i.e., A(0) = A, A(1) = A . BI = A(0) . B1, and A(2) = A . B? . Bz = A(0) B: .Bz]and for A [ B f f. (t - 3) 2 . B1. B2 .f ( t - 3) Bf. BZ.f (t- 4)
+
+
+
257
Dynamic Factor Models
Bi . f(t - 4)]to have a negligible covariance matrix. This process repeats in similar fashion for higher orders of lags. We return to Molenaar’s specification of the WNFS model (Equation 5) for further consideration.
+
y ( t ) = A(O)q(t) A(l)q(t - 1)
+ . . . + A(s)q(t - + ~ ( t ) S)
The WNFS model is a general model for any time series data with a factor analytic structure when s = 00. When the proportionality constraints described previously hold and BS+l is “approximately” a null matrix, the WNFS model “approximately” fits a time series with a DAFS structure. To the extent that the covariance matrices implied by the residual terms do not vanish, the lags of the WNFS model will tend t o be greater than the order of the DAFS model.
TECHNICAL APPENDIX B. LISREL CODE FOR RUNNING WNFS AND DAFS MODELS The WNFS MODEL-Lebo Data, 2 FACTORS, O,l,and 2 LAGS da ni = 18 no = 103 ma = km km sy = 1ebo.dat mo ny = 18 ne = 10 ly = fu,fi ps = sy,fi te = sy,fi be LA active0 lively0 peppy0 slugishO tired0 weary0 active1 lively1 peppy1 slugishl tired1 weary1 active2 lively2 peppy2 slugish2 tired2 weary2 LE ENERGY0 FATIGUE0 ENERGY1 FATIGUE1 ENERGY2 FATIGUE2 ENERGY3 FATIGUE3 ENERGY4 FATIGUE4 !LAG 0 FACTOR LOADINGS FR LY(1,l) LY(2,l) LY(3,l) LY(4,2) LY(5,2) LY(6,2) EQ LY(1,l) LY(7,3) LY(13,5) EQ LY(2,l) LY(8,3) LY(14,5) Eq LY(3,l) LY(9,3) LY(15,5) Eq LY(4,2) LY(10,4) LY(16,6) E4 LY(5,2) LY(11,4) LY(17,6) Eq LY(6,2) LY(12,4) LY(18,6) !LAG 1 FACTOR LOADINGS FR LY(1,3) LY(2,3) LY(3,3) LY(4,4) LY(5,4) LY(6,4) Eq LY(1,3) LY(7,5) LY(13,7) EQ LY(2,3) LY(8,5) LY(14,7) EQ LY(3,3) LY(9,5) LY(15,7) EQ LY(4,4) LY(10,6) LY(16,8) EQ LY(5,4) LY(11,6) LY(17,8) EQ LY(6,4) LY(12,6) LY(18,8)
=
fu,fi
258
Nesselroade et al.
!LAG 2 FACTOR LOADINGS FR LY(1,5) LY(2,5) LY(3,5) LY(4,6) LY(5,6) LY(6,6) EQ LY(1,5) LY(7,7) LY(13,9) EQ LY(2,5) LY(8,7) LY(14,9) EQ LY(3,5) LY(9,7) LY(15,9) EQ LY(4,6) LY(10,8) LY(16,lO) EQ LY(5,6) LY(11,8) LY(17,lO) EQ LY(6,6) LY(12,8) LY(18,lO) !UNIQUE VARIANCES AND COVARIANCES FR TE(1,l) TE(2,2) TE(3,3) TE(4,4) TE(5,5) TE(6,6) FR TE(7,l) TE(8,2) TE(9,3) TE(10,4) TE(11,5) TE(12,6) FR TE(13,l) TE(14,2) TE(15,3) TE(16,4) TE(17,5) TE(18,6) FR TE(7,7) FR TE(8,7) TE(8,8) FR TE(9,7) TE(9,8) TE(9,9) FR TE(10,7) TE(10,8) TE(10,g) TE(10,10) FR TE(11,7) TE(11,8) TE(11,9) TE(11,lO) TE(11,11) FR TE(12,7) TE(12,8) TE(12,9) TE(12,lO) TE(12,11) TE(12,12) FR TE(13,13) FR TE(14,13) TE(14,14) FR TE(15,13) TE(15,14) TE(15,15) FR TE(16,13) TE(16,14) TE(16,15) TE(16,16) FR TE(17,13) TE(17,14) TE(17,15) TE(17,16) TE(17,17) FR TE ( 18,131 TE ( 18,141 TE ( 18,151 TE ( 18,16) TE ( 18,U) TE ( 18,18) FR TE(13,7) TE(13,8) TE(13,9) TE(13,lO) TE(13,11) TE(13,12) FR TE(14,7) TE(14,8) TE(14,9) TE(14,lO) TE(14,11) TE(14,12) FR TE(15,7) TE(15,8) TE(15,9) TE(15,10) TE(.l5,11) TE(15,12) FR TE(16,7) TE(16,8) TE(16,9) TE(16,lO) TE(16,11) TE(16,12) FR TE(17,7) TE(17,8) TE(17,9) TE(17,10) TE(17,11) TE(17,12) FR TE(18,7) TE(18,8) TE(18,9) TE(18,10) TE(18,11) TE(18,12) !SCALING CONSTRAINTS VA 1.0 PS(1,l) PS(3,3) PS(5,5) PS(7,7) PS(9,9) VA 1.0 PS(2,2) PS(4,4) PS(6,6) PS(8,8) PS(10,lO) !FACTOR COVARIANCES FR PS(2,l) EQ PS(2,l) PS(4,3) PS(6,5) PS(8,7) PS(10,9) ST 1.0 TE(1,l) TE(2,2) TE(3,3) TE(4,4) TE(5,5) TE(6,6) ST 0.01 LY(1,l) LY(2,l) LY(3,l) LY(4,2) LY(5,2) LY(6,2) ou me = ml xm ns so se tv s s nd = 2 it = 500 ad = off
The DAFS MODEL-Lebo Data, 2 FACTORS, 0,1, and 2 LAGS da ni = 18 no = 103 ma = km km sy fi = Pebo.dat mo ny = 18 ne = 10 Py = fu,fi ps = sy,fi te = sy,fi
Dynamic Factor Models be = fu,fi ap = 4 LA active0 lively0 peppy0 slugish0 tired0 weary0 active1 lively1 peppy1 slugishl tired1 weary1 active2 lively2 peppy2 slugish2 tired2 weary2 LE ENERGY0 FATIGUE0 ENERGY1 FATIGUE1 ENERGY2 FATIGUE2 ENERGY3 FATIGUE3 ENERGY4 FATIGUE4 !FACTOR LOADINGS FR LY(1,l) LY(2,l) LY(3,l) LY(4,2) LY(5,2) LY(6,2) EQ LY(1,l) LY(7,3) LY(13,5) EQ LY(2,l) LY(8,3) LY(14,5) EQ LY(3,l) LY(9,3) LY(15,5) EQ LY(4,2) LY(10,4) LY(16,6) EQ LY(5,2) LY(11,4) LY(17,6) EQ LY(6,2) LY(12,4) LY(18,6) !FACTOR LOADING CONSTRAINTS (TO DEAL WITH LAG 1) FR LY(1,3) LY(2,3) LY(3,3) LY(4,4) LY(5,4) LY(6,4) EQ LY(1,3) LY(7,5) LY(13,7) EQ LY(2,3) LY(8,5) LY(14,7) EQ LY(3,3) LY(9,5) LY(15,7) EQ LY(4,4) LY(10,6) LY(16,8) EQ LY(5,4) LY(11,6) LY(17,8) EQ LY(6,4) LY(12,6) LY(18,8) !FACTOR LOADING CONSTRAINTS (TO DEAL WITH LAG 2) FR LY(1,5) LY(2,5) LY(3,5) LY(4,6) LY(5,6) LY(6,6) EQ LY(1,5) LY(7,7) LY(13,9) EQ LY(2,5) LY(8,7) LY(14,9) EQ LY(3,5) LY(9,7) LY(15,9) EQ LY(4,6) LY(10,8) LY(16,lO) EQ LY(5,6) LY(11,8) LY(17,lO) EQ LY(6,6) LY(12,8) LY(18,lO) !UNIQUE VARIANCES AND COVARIANCES FR TE(1,l) TE(2,2) TE(3,3) TE(4,4) TE(5,5) TE(6,6) FR TE(7,l) TE(8,2) TE(9,3) TE(10,4) TE(11,5) TE(12,6) FR TE(13,l) TE(14,2) TE(15,3) TE(16,4) TE(17,5) TE(18,6) FR TE(7,7) FR TE(8,7) TE(8,8) FR TE(9,7) TE(9,8) TE(9,9) FR TE(10,7) TE(10,8) TE(10,9) TE(10,lO) FR TE(11,7) TE(11,8) TE(11,9) TE(11,lO) TE(11,Il) FR TE(12,7) TE(12,8) TE(12,9) TE(12,lO) TE(12,Il) TE(12,12) FR TE(13,13) FR TE(14,13) TE(P4,14) FR TE(15,13) TE(15,14) TE(15,15) FR TE(16,13) TE(16,14) TE(16,15) TE(16,16)
259
260
Nesselroade et al.
FR TE(17,13) TE(17,14) TE(17,15) TE(17,16) TE(17,17) FR TE( 18,13) TE ( 18 14) TE ( 18 15) TE ( 18,161 TE ( 18,17) TE ( 18,18) FR TE(13,7) TE(13,8) TE(13,9) TE(13,lO) TE(13~il) TE(13~12) FR TE(14,7) TE(14,8) TE(14,9) TE(14,10) TE(14,11) TE(14,12) FR TE(15,7) TE(15,8) TE(15,9) TE(15,lO) TE(15,11) TE(15~12) FR TE(16,7) TE(l6,8) TE(16,9) TE(16,10) TE(16,11) TE(16,12) FR TE(17,7) TE(17,8) TE(17,9) TE(17~10) TE(17,11) TE(17J12) FR TE(18,7) TE(18,8) TE(18,9) TE(18,10) TE(18,11) TE(18712) !SCALING CONSTRAINTS VA 1.0 PS(1,l) PS(3,3) PS(5,5) ps(7,7) Ps(9,g) VA 1.0 PS(2,2) PS(4,4) PS(6,6) PS(8,8) PS(10,lO) !FACTOR COVARIANCES FR PS(2,l) EQ PS(2,l) PS(4,3) PS(6,5) PS(8,7) Ps(10,g) ST 1.0 TE(1,l) TE(2,2) TE(3,3) TE(4,4) TE(sJ5) TE(6,6) ST 0.01 LY(l,l) LY(291) LY(3j1) Ly(4J2) LY(5,2) Ly(6J2) !Factor One !First-Order Autoregression F1 [t-11 -> F1 [tl FR PA(1) CO LY(1,3) = LY(l,l)*PA(l) CO LY(2,3) = LY(2,1)*PA(1) CO LY(3,3) = LY(3,1)*PA(l) EQ LY(1,3) LY(7,5) LY(13,7) EQ LY(2,3) LY(8,5) LY(l4,7) EQ LY(3,3) LY(9,5) LY(15,7) ! Second-Order Autoregression F1 [t-21 -> F1 [tl FR PA(3) CO LY (1,5) = LY (1,l) *PA( I) *PA( 1) +LY (1 1) *PA(3) CO LY (2,5) = LY (2,l)*PA( 1) *PA (1) +Ly (2J 1) *PA (3) CO LY (3,5) = LY (3,l)*PA( 1)*PA (1) +LY (3 1) *PA(3) EQ LY(lJ5) LY(7,7) LY(13,9) EQ LY(2,5) LY(8,7) LY(14,9) EQ LY(3,5) LY(9,7) LY(15,9) !Factor Two !First-Order Autoregression F2 [t-11 -> F2 [t] FR PA(2) CO LY(4,4) = LY(4,2)*PA(2) CO LY(5,4) = LY(5,2)*PA(2) CO LY(6,4) = LY(6,2)*PA(2) EQ LY(4,4) LY(10,6) LY(16,8) EQ LY(5,4) LY(11,6) LY(17,8) Eq LY(6,4) LY(12,6) LY(18,8) ! Second-Order Autoregression F2 Ct-21 -> F2 [tl FR BA(4) CO LY(4,6) = LY(4,2)*PA(2)*PA(2) +LY(4,2)*PA(4) CO LY (5,s) = LY (5,2)*PA(2) *PA(2) +L'9(5,2) *PA(4)
261
Dynamic Fact or Models
CO LY(6,6) Eq LY(4,6) Eq LY(5,6) Eq LY(6,6)
= LY (6,2)*PA(2) *PA(2) +LY (6,2)*PA(4) LY(10,8) LY(I6,lO) LY(11,8) LY(17,lO) LY(12,8) LY(18,IO)
! S t a r t i n g Values f o r a l l r e g r e s s i o n parameters
ST -.I PA(1) PA(2) PA(3) PA(4) ou me = ml xm n s s o s e t v ss nd = 2 it
=
500 ad
= off
ACKNOWLEDGMENTS This work was supported by the Institute for Developmental and Health Research Methodology at the University of Virginia. An earlier version was presented at the Annual Meeting of the American Psychological Association, San Francisco, August, 1997. This final manuscript was completed while JRN was a Senior Guest Scientist at the Max Planck Institute for Human Development, Berlin, Germany. The authors acknowledge with gratitude the helpful comments of Michael Browne and Peter C. M. Molenaar on an earlier version of this chapter.
REFERENCES Anderson, T. W. (1963). The use of factor analysis in the statistical analysis of multiple time series. Psychometrika, 28, 1-24. Baltes, P. B., Reese, H. W., & Nesselroade, J. R. (1977). Life-span developmental psychology: Introduction to research methods. Monterey, CA: Brookes/Cole. Bereiter, C. (1963). Some persisting dilemmas in the measurement of change. In C. W. Harris (Ed.), Problems in measuring change. Madison: University of Wisconsin Press. Cattell, R. B. (1952). Factor analysis. New York: Harper. Cattell, R. B. (1957). Personality and motivation structure and measurement. New York: World. Cattell, R. B. (1961). Factor analysis. New York: Harpe. Cattell, R. B. (1963a). The interaction of hereditary and environmental influences. The British Journal of Statistical Psychology, 16, 191-210. Cattell, R. B. (1963b). The structuring of change by p-technique and incremental r-technique. In @. W. Harris (Ed.), Problems in measuring change (p. 167-198). Madison: University of Wisconsin Press. Cattell, R. B. (1966). Patterns of change: Measurement in relation to state dimension, trait change, lability, and process concepts. In R. B.
262
Nesselroade et al. Cattell (Ed.) , Handbook of multivariate experimental psychology (p. 355-402). Chicago, IL: Rand McNally.
Cattell, R. B., Cattell, A. K. S., & Rhymer, R. M. (1947). P-technique demonstrated in determining psychophysical source traits in a normal individual. Psychometrika, 12, 267-288. Eizenman, D. R., Nesselroade, J. R., Featherman, D. L., & Rowe, J. W. (1997). Intra-individual variability in perceved control in an elderly sample: The MacArthur successful aging studies. Psychology and Aging, 12, 489-502. Engle, R., & Watson, M. (1981). A one-factor multivariate time series model of metropolitan wage rates. Journal of American Statistical Association, 76, 774-781. Fiske, D. W., & Maddi, S. R. (Eds.). (1961). Functions of varied experience. Homewood, IL: Dorsey Press. Fiske, D. W., & Rice, L. (1955). Intra-individual response variability. Psychological Bulletin, 52, 217-250. Flugel, J. C. (1928). Practice, fatigue, and oscillation. British Journal of Psychology, 4, 1-92. Geweke, J. F., & Singleton, K. J. (1981). Maximum likelihood ”confirmatory” factor analysis of economic time series. International Economic Review, 22, 37-54. Hershberger, S. L., Molenaaar, P. C., & Corneal, S. E. (1996). A hierarchy of univariate and multivariate time series models. In G. A. Marcoulides & R. E. Schumacker (Eds.), Advanced structural equation modeling: Issues and techniques (p. 159-194). Mahwah, NJ: Lawrence Erlbaum Associates. Holtzman, W. H. (1963). Statistical models for the study of change in the single case. In C. W.Harris (Ed.), Problems in measuring change (p. 199-211). Madison: University of Wisconsin Press. Horn, J. L. (1972). State, trait, and change dimensions of intelligence. T h e British Journal of Educational Psychology, 42, 159-185. Horn, J. L., & McArdle, J. J. (1980). Perspectives on mathematical and statistical model building (masmob) in research on aging. In L. Poon (Ed.), Aging i n the 1980’s: Psychological issues (p. 203-541). Washington, DC: American Psychological Association. Hundleby, J. D., Pawlik, K., & Cattell, It. B. (1965). Personality factors in objective test devices. San Diego, CA: R. Knapp.
Dynamic Factor Models
263
Jones, C. J., & Nesselroade, J . R. (1990). Multivariate, replicated, singlesubject designs and p-techniques factor analysis: A selective review of the literature. Experimental Aging Research, 16, 171-183. Jones, K. (1991). The application of time series methods to moderate span longitudinal data. In L. M. Collins & J. L. Horn (Eds.), B e s t methods for the analysis of change: Recent advances, unanswered questions, future directions (p. 75-87). Washington, DC: American Psychological Association. Joreskog, K. G., & Sorbom, D. (1993). Lisrel 8: Structural equation modeling with the simplis command language. Hillsdale, NJ: Lawrence Erlbaum Associates. Kenny, D. A., & Zautra, A. (1995). The trait-state error model for multiwave data. Journal of Consulting and Clinical Psychology, 63, 52-59. Kim, J. E., Nesselroade, J. R., & Featherman, D. L. (1996). The state component in self-reported world views and religious beliefs in older adults: The MacArthur successful aging studies. Psychology and Aging, 11, 396-407. Larsen, R. J. (1987). The stability of mood variability: A spectral analysis approach to daily mood assessments. Journal of Personality and Social Psychology, 52, 1195-1204. Lebo, M. A., & Nesselroade, J. R. (1978). Intraindividual differences dimensions of mood change during pregnancy identified by five ptechnique factor analyses. Journal of Research in Personality, 12, 205-224. Luborsky, L., & Mintz, J . (1972). The contribution of p-technique to personality, psychotherapy, and psychosomatic research. In R. M. Dreger (Ed.), Multivariate personality research: Contributions t o the understanding of personality in honor of R a y m o n d B. Cattell (p. 387410). Baton Rouge, LA: Claitor’s Publishing Division. Magnusson, D. (1997). The logic and implications of a person approach. In R. B. Cairns, L. R. Bergman, & J . Kagan (Eds.), T h e individual as a focus in developmental research. New York: Sage. McArdle, J . J. (1982). Structural equation modeling of a n individual syst e m : Preliminary results f r o m ”a case study in episodic alcoholism”. (Unpublished manuscript, Department of Psychology, University of Denver) McArdle, 3 . J., & Goldsmith, H. H. (1990). Some alternative structural equation models for multivariate biometric analyses. Behavior Genetics, 20, 569-608.
264
Nesselroade et al.
Molenaar, P. C. M. (1985). A dynamic factor model for the analysis of multivariate time series. Psychometrika, 50, 181-202. Molenaar, P. C. M. (1994). Dynamic latent variable models in developmental psychology. In A. von Eye & C. C. Clogg (Eds.), Latent variables analysis: Applications for developmental research (p. 155-180). Newbury Park, CA: Sage. Nesselroade, J. R., & Boker, S. M. (1994). Assessing constancy and change. In T . Heatherton & J. Weinberger (Eds.), C a n personality change? (p. 121-147). Washington, DC: American Psychological Association. Nesselroade, J. R., & Featherman, D. L. (1997). Establishing a reference frame against which to chart age-related change. In M. A. Hardy (Ed.), Studying aging and social change: Conceptual and methodological issues (p. 191-205). Thousand Oaks, CA: Sage. Nesselroade, J. R., & Ford, D. H. (1985). P-technique comes of age: Multivariate, replicated, single-subject designs for research on older adults. Research o n Aging, 7, 46-80. Nesselroade, J. R., & Molenaar, P. C. R4. (1999). Pooling lagged covariance structures based on short, multivariate time-series for dynamic factor analysis. In R. H. Hoyle (Ed.), Statistical strategies for small sample research (p. 224-251). Newbury Park, CA: Sage. Nesselroade, J. R., & Schmidt McCollam, K. M. (2000). Putting the process in developmental processes. International Journal for the S t u d y of Behavioral Development, 24, 295-300. Singer, J. L., & Singer, D. G. (1972). Personality. A n n u a l Review of Psychology, 23, 375-412. Steyer, R., Ferring, D., & Schmitt, M. (1992). States and traits in psychological assessment. European Journal of Psychological Assessment, 8, 79-98. Valsiner, J. (1984). Two alternative epistcmological frameworks in psychology: The typological and variational modes of thinking. T h e Journal of Mind and Behavior, 5, 449-470. Wessman, A. E., & Ricks, D. F. (1966). Mood and personality. New York: Holt, Rinehart, and Winston. Wood, P., & Brown, D. (1994). The study of intraindividual differences by means of dynamic factor models: Rationale, implementation, and interpretation. Psychological Bulletin, 116, 166-186. Woodrow, H. (1932). Quotidian variability. Psychological Review, 39, 245-256.
Dynamic Fact or Models
265
Woodrow, H. (1945). Intelligence and improvement in school subjects. Journal of Educational Psychology, 36, 155-166. Zevon, M., & Tellegen, A. (1982). The structure of mood change: IdiographicJnomothetic analysis. Journal of Personality and Social Psychology, 43, 111-122.
Author Index 135, 141, 165, 172, 173, 174, 183, 185, 188 Burstein, L. 172 Byram, 0. W. 174
Ageton, S. S. 61 Aiken, L. S. 20, 148, 167 Aitkin, M. 172 Akaike, H. 210 Akers, R. L. 174 Albert, P. 29, 45, 53 Alpert, A. 178, 179, 180 Anderson, T. W. 66, 204, 236 Andrade, D. F. 112 Andrews, K. 175 Arbuckle, J. L. 180 Baker, P. C. 63 Baltes, P. B. 236 Bandura, A. 174 Barnes, G. M. 175, 189 Bauman, K. E. 174 Bell, W. R. 214, 219, 221, 222 Bern, D. J. 60 Bentler, P. M. 65 Bereiter, C. 236 Bock, R. 36 Boker, S. M. 236 Bolger, N. 16 Bollen, K. A. 59, 67, 79 Bosker, R. J. 158 Box, G. E. P. 205,206, 207, 208, 209, 211, 212, 214, 219 Bozdogan, H. 125 Brand, D. 27 Brook, J. S. 175, 189 Brown, D. 235, 243, 246 Brown, K. W. 93 Browne, M. W. 70 Browne, W. 16 Bryk, A. S. 16, 18, 20, 25, 26, 27, 29, 34, 45, 47, 53, 71, 73, 76, 105,
Capaldi, D. M. 174 Casella, G. 93, 172 Caspi, A. 60 Cattell, A. K. S. 235, 236, 238 Cattell, R. B. 235,236,237, 238,239 Chan, W. 27, 29, 30, 44 Chassin, L. 67, 80 Cholette, P. 221 Christensen, R. 124 Cicchetti, D. 60 Cochran, J. E. 174 Cohen, P. 175, 189 Coie, J. D. 60 Congdon, R. T. 16,18,53,135,185, 188 Cooper, D. M. 210 Corneal, S. E. 235, 243 Cox, D. R. 214 Cudeck, R. 70 Curran, P. J. 59, 67, 70, 73, 79, 80 Dalzell, C. 89, 91 Davidson, K. 27 Davis, J. 26 de Leeuw, J. 20, 121, 148, 158, 167, 172 Diggle, P. J. 53, 89, 92, 93, 97, 99, 101, 105, 131 Dishion, T. J. 60, 61 Dodge, K. A. 60 Draper, D. 16 Duncan, S. C. 67,172,173,175,176,
267
268 178, 179, 180, 189 Duncan, T. E. 67,172,173,175,176, 178, 179, 180, 189 Dwyer, J. H. 65 Eizenman, D. R. 236 Elder, G. H. 60 Elliott, D. 30, 175 Elliott, D. S. 61 Elliott, P. R. 193 Engle, R. 237, 239 Entwistle, B. 172 Epstein, D. 67, 68 Farrington, D. P. 174 Featherman, D. L. 236, 254 Ferring, D. 236, 239 Fiske, D. W. 236 Fletcher, J. 27 Flewelling, R. L. 174 Flugel, J. C. 236 Fly, J. W. 174 Ford, D. H. 236 Francis, D. 27 Fuller, W. A. 209, 220
Author Index
Hedeker, D. 16, 26 Heise, D. R. 66 Helms, R. W. 104, 112, 124 Herrnstein, R. J. 61 Hershberger, S. L. 122, 235, 243 Hillmer, S. C. 210, 214,219, 221,222 Hirschi, T. 30 Hocking, R. R. 111 Hofer, S. M. 191 Holtzman, W. H. 236, 237, 239 Hops, H. 67, 172, 175, 176, 178,189 Horn, J. L. 236, 242 Horney, J. 27 Hudak,' G. 216 Huizinga, D. 30, 61 Humphreys, L. G. 66 Hundleby, J. D. 236 Huttenlocher, J. E. 27
Jacobs, D. R. 1 2 1 Jacobs, M. R. 60 Jenkins, G. M. 205, 206, 207, 208, 209, 211, 212 Jennrich, R. 28, 36, 38, 47, 90, 111 Johnson, D. E. 105 Johnson, W. 124 Geweke, J. F. 237, 239, 243 Jones, C. J. 236, 237 Gibbons, R. 26 Jones, K. 236 Goldsmith, H. H. 253 Goldstein, H. 16, 26, 28, 29, 36, 53, Joreskog, K. 27 Joreskog, K. G. 65, 66, 238, 246 105, 135, 141, 164, 165 Goldstein, H. I. 172 Kalaian, H. 29, 36 Gordon, A. S. 175, 189 Kandel, D. B. 175 Gottfredson, M. 30 Kang, S. 29 Grady, J. J. 124 Kaplan, D. 193 Graham, J. W. 191 Kashy, D. A. 5, 6, 7, 16, 21 Gray, H. L. 210 Kazdin, A. E. 60, 61 Green, S. B. 122 Keck, C. K. 63 Guttman, L. A. 66 Kelley, G. D. 210 Haight, W. 27 Kenny, D. A. 16, 236 Hannan, P. J. 121 Kielcolt-Glaser, J. 74 Marford, T. C. 73 Kim, C. 74 Harville, D. A. 111, 112 Kim, 9. E. 236 Kealy, M. 16 Kirk, R. E. 105 Healy, M. J. R. 164 Kreft, I. 121, 158, 172 Beckman, N. 93 Kreft, I. G. 172
Author Index
269
Milliken, G. A. 105, 111, 112, 123, 131, 136, 150, 164 Mintz, J. 236, 237 Laird, N. 193 Moffitt, T. E. 61 Laird, N. M. 25, 111, 131 Molenaaar, P. C. 235, 243 Larsen, R. J. 236 Molenaar, P. C. M. 235, 236,237,239, Lebo, M. A. 246 241, 243, 246 Leeper, J. D. 112 Morry, M. 221 Lewis, C. E. 121 Moskowitz, D. S. 93 Li, F. 179, 180 Mott, F. L. 63 Li, X. 91 Liang, K.-Y. 29, 45, 53, 89, 92, 93, Mullally, P. R. 191 Muthkn, B. 67, 172, 173, 178, 180, 97, 99, 101, 105, 131 182, 183, 192 Lindquist, E. G. 105 M u t h h , B. 0. 28,63,65,67,73,80, Lindsey, J. K. 105 Littell, R. C. 105,111,112,123,131, 172, 173, 176, 180, 182, 183, 185 M u t h h , L. K. 63, 65, 80, 172, 173, 136, 150, 164 180, 182, 183, 185 Little, R. 26 Little, R. J. A. 36, 38, 63, 65, 90 Neale, M. C. 180 Liu, K. 121 Nelder, J. A. 53 Liu, L. 216 Nesselroade, J. R. 236,237,246,254 Ljung, G. 211 Newbold, P. 219 Loeber, R. 60 Longford, N. 172 Osgood, D. 27 Longford, N. T. 26, 53, 105, 172 Luborsky, L. 236, 237 Patterson, G. R. 60, 61, 174, 191 Kreft, I. G. G. 20, 148, 167
MacCallum, R. C. 74 Magnusson, D. 236 Malarkey, W. 74 Marquis, J. G. 122 Marshall, I. 27 Mason, W. M. 172 McArdle, J. J. 27,67,68,76,80, 172, 235, 237, 239, 242, 246. 253 McCarroll, K. '112 McCollam, K. M. 122 McCulloch, C. E. 93, 172 McCullogh, P. 53 McDermott, D. 175, 189 McIntire, D. D. 210 McLean, R. A. 136 McNamara, C. k. 125 Menard, S. 30 Meredith, W. 27, 28, 67, 178 Michael, M. 125 Milby, J. B. 125 '
Patterson, H. D. 111 Pawlik, K. 236 Pearson, L. M. 124 Piccinin, A. M. 191 Pierce, D. A. 211 Plewis, I. 16 Popkin, S. 125 Prosser, R. 135 Quenneville, B. 221 Quinlan, S. V. 63 Raftery, A. E. 160 Ramsay, J. 0. 89, 91, 93, 98 Rao, C. R. 67 Rasbash, J. 16, 29, 135, 164 Raudenbush, S. 'Mi. 16, 18, 20, 26, 27, 29, 30, 34, 36, 44, 45, 47, 53, 54, 71, 73, 76, 105, 135, 141, 165, 172, 173, 174, 183, 185, 188 Reese, H. W. 236
270
Author Index
Reid, J. B. 60, 61 Tellegen, A. 236 Reinsel, G. C. 205,206,207,209,211 Thompson, K. 175 Reis, H. T . 2 Thompson, M. S. 122 Rhymer, R. M. 235, 236, 238 Thompson, N. 27 Rice, L. 236 Thompson, R. 111 Thum, Y. 29, 47, 52 Ricks, D. F. 236 Tiao, G. C. 210, 214 Rogosa, D. R. 27, 59, 66, 67 Rowan, B. 29 Tisak, J. 27, 28, 67, 178 Rowe, J. W. 236 Tsay, R. S. 210 Rubin, D. B. 36, 38, 63, 65, 90 Tucker, L. R. 67 Usdan, S. 125 Salineas, T. 219, 221 Valsiner, J. 236 Sanders, W. L. 136 VanLeeuwen, D. M. 164 SAS Institute 100, 136 Satorra, A. 172 Wallace, D. 121, 125 Sayer, A. 79 Walters, R. H. 174 Sayer, A. G. 27, 28, 30, 75, 179 Ware, J. H. 25,26,27, 103, 111,131, Schafer, J. 26, 53 Schluchter, M. 28,36,38,47,90,111 193 Waternaux, C. 26 Schmidt McCollam, K. M. 254 Watson, M. 237, 239 Schmitt, M. 236, 239 Weisberg, H. 25 Schumacher, J. E. 125 Welte, J. W. 175, 189 Schwartz, C. 210 Wessman, A. E. 236 Searle, S. R. 93, 172 Wheeler, L. 2 Seltzer, M. 27 Whiteman, M. 175, 189 Seltzer, M. H. 26, 29, 52 Willett, J. B. 27, 28, 30,67, 75, 137, Shenker, N. 26 179 Silverman, B. 93 0. D. 121 Williams, Silverman, B. W. 89, 98 Willms, J. D. 29 Singer, D. G. 236 Wilsnack, R. 175 Singer, J. D. 123, 136, 140, 167, 168 Wilson, J. Q. 61 Singer, J. L. 236 M. 65 Windle, Singleton, K. J. 237, 239, 243 B. J. 105, 172 Winer, Snijders, T. A. B. 158 Wolfinger, R. D. 105, 111, 112, 120, Sorbom, D. 27, 65, 238, 246 123, 125, 131, 136, 150, 159, 164 Steyer, R. 236, 239 Wong, G. 172 Stice, E. 67, 80 Wood, E. F. 210 Stiratelli, R. 193 P. 235, 243, 246 Wood, Stoolmiller, M. 67, 172, 178 G. 16 Woodhouse, Strenio, J. 25 H. 236 Woodrow, Stroup, W. W. 105, 111, 112, 123, F. 112 Woolson, R. 131, 136, 150, 164 Strycker, L. A. 179, 180 Yang, M. 16 Stubing, K. 27 Sweetsir, D. A. 174 Zajonc, R. B. 191
Author Index Zautra, A. 236 Zeger, S. 29, 45, 53 Zeger, S. L. 53, 89, 92, 93, 97, 99, 101, 105, 131 Zevon, M. 236 Zimowski, M. 27
271
Subject Index Between structure, 181
Ad hoc estimator, 173 Alcohol use, 171, 174, 175, 178, 183, 184, 186, 188-190, 193 Analysis of variance, 1-5, 9, 11, 14, 15, 23, 26, 34, 59,97, 105, 112, 114, 115, 119, 141, 142, 146, 172, 174 Antisocial behavior, 30,60-63,6674, 76, 77, 79 Assumption, 25-27,29, 30,32,36, 39,43-45, 50, 52, 53, 89, 95,96, 106, 111,113,114, 117, 122, 147, 154, 158160, 164, 165, 192, 203, 204, 212, 218 homoscedasticity, 117 Linearity, 110, 165 normality, 29,45, 50,111,164, 165, 194, 204 Autocorrelation, 28,39,100,203207, 209-211, 213, 214, 216, 219, 220, 222, 225227, 251 Autocorrelation function, 205-207, 209, 210, 213, 214, 216, 219, 220, 222, 226 Autoregressive, 22, 39, 59, 65,66, 119, 159, 162, 164, 205209, 220, 226, 235, 238, 239, 241, 243, 250 Autoregressive model, 119, 205, 206, 209, 243
Centering, 6, 20, 136, 148, 150, 179 Cluster, 29, 53, 173, 180, 193, 194 Coding, 18, 20, 150 Compound symmetry, 34, 36, 39, 42, 159, 160 Conditional growth model, 155, 156, 165 Covariance parameter, 33, 42, 43, 45, 111, 112, 117, 129, 151 DAFS Model, 239, 241, 243, 245, 246, 248, 250, 251, 253257 Degrees of freedom, 3, 17,97,100, 101, 112, 113, 150, 154, 216, 243, 246, 248, 253 Developmental psychopathology, 79 Developmental trajectories, 61,68, 70, 74, 77, 79 Differencing, 209, 213, 219, 222, 226, 227 Easter effect, 222, 227 Efficiency, 18, 22 Ergodicity, 237 Error variance, 4, 11, 17, 27, 107, 164 Error variance-covariance, 136,159, 168 Extraneous factor, 231
Balanced data, 1, 2, 5, 11, 14, 90, 144,167, 173, 183, 193 Between level, 188, 193
Factor analysis model, 238 273
2 74 Factor score, 173, 235, 237-239, 241, 243, 244, 251, 253 Factor-of-curves, 171,173,174,177, 180, 182, 184, 186, 188, 190, 192 Family, 27, 61, 98, 104, 171, 175, 177-186, 188-190, 192, 193 Fixed effect, 21, 30,32,34, 36, 39, 42,45, 49,50, 52, 53, 67, 69, 75, 76, 97, 101, 106109, 111-117, 119, 121125, 127, 129, 131, 135, 136, 142, 144, 146, 149151, 154, 156, 160, 172
Subject Index 108-110, 115-117, 123, 127, 129, 142-144, 147151, 153, 155, 156, 158, 159, 162, 164, 165, 173, 180, 186, 188, 189 Interindividual differences, 149,236 Intraclass correlation, 146 Intraindividual change, 236, 237 Intraindividual variability, 236-238, 254 Known intervention, 203, 227
Heterogeneous, 28,43-45,50,172, 175 Heteroscedasticity, 158 Hierarchical data, 172 Hierarchical linear modeling, 2630,38,39,42-45,47-50, 52,87,105,136,141,~43, 148, 171-174, 176, 184, 185, 188, 189, 193 Homogeneous, 25, 42, 44, 45, 50, 106, 168, 175
Lag, 159, 204-207, 209, 211, 213, 214, 216, 225, 226, 239, 241, 243-246, 248-250, 253-257 Lagged covariance matrix, 248 Latent curve analysis, 59-61, 67, 80, 87 Level-1 model, 32, 36, 38, 43, 44, 49, 50, 53, 141, 148, 149, 156 Level-2 model, 27, 31, 32, 34, 37, 38, 45, 49, 53, 141, 147149, 156, 165 Level-3 model, 53 Linear stationary model, 205 Linear structural equation model, 238 Longitudinal data, 30, 59, 87-89, 92, 94, 95, 97-99, 101, 113, 114, 117, 121, 136, 137, 140, 167, 171-173, 190, 194 Longitudinal data analysis, 87 Lower level, 5-7, 9, 11, 16, 17, 22, 71,73, 79,88,89,94,96, 97
Individual growth model, 29, 135137, 140, 141, 144, 146151, 153, 155, 164, 167, 168, 178, 184, 188 Intercept, 7, 9, 11, 14, 17, 18, 20, 21, 23, 28, 32, 34, 49, 65,66,69-73,75-77,79,
Maximum likelihood, 15, 16, 18, 20-23, 26, 28, 54, 111, 112, 122, 154, 173, 180, 186, 192, 210 full information ML, 154,171174, 177, 180, 182-184, 186, 188, 190, 192, 193
Goodness of fit statistic log-likelihood statistic, 154 Schwarz’s Bayesian Criterion, 154, 155, 160, 162, 164 Goodness of fit statistics Akaike’s Information Criterion, 154, 155, 160, 162, 164, 210 Growth curve modeling, 28 Growth function, 75, 190, 193
275
Subject Index restricted information ML, 26, 111, 112, 122, 127, 129, 154 Maximum likelihood estimation, 2, 16, 18, 23, 28,63, 173, 183, 193, 220, 227 Measurement, 4, 27, 30, 62, 63, 65,80,89,90,92,93,97, 98, 103, 104, 110, 113, 119, 137, 141-144, 146, 148, 153, 167, 172, 178, 182, 184, 190, 192, 236, 238, 239, 243, 246 MLwiN, 16, 135, 136, 141, 144, 146, 164 Model building, 208, 216, 219 Model diagnostic checking, 211 Model estimation, 182, 192, 210 Model identification, 178,209, 210, 219 Moving average model, 206, 207, 226 Multilevel, 2-7, 9, 11, 14-16, 2023, 29, 80, 87-91,93, 94, 96-101, 135, 136, 142144, 146, 149, 164, 165, 168, 171-173, 180, 188190, 192, 194 Multilevel Model, 2, 5, 9, 11, 14, 15, 20, 23,88, 89,93, 96, 97, 100, 135, 136, 142, 143, 164, 165, 168 Multilevel modeling, 2-7, 9, 15, 16, 22, 96, 164 MUML, 171, 172, 183 Normal probability plot, 165 Ordinary least squares, 2, 6, 11, 14, 15, 18-23, 165, 204 Person-level covariate, 136, 151, 155, 158 Person-level data set, 137, 140 Person-level predictor, 11, 147, 148, 150
Person-period data set, 137, 140, 144, 147, 151, 168 PROC MIXED, 23,100,122,123, 129, 135-137, 141, 143, 144, 148, 149, 154, 156, 159, 164, 165, 168 class statement, 143, 151, 167 model statement, 17, 123,143, 144, 149, 150, 156, 165, 167 random statement, 123, 129, 143, 144, 149-151, 159, 162, 168 Process, 60, 61, 73, 74, 76, 8892, 95, 99, 112, 124, 125, 131, 204, 208, 211, 213, 216, 219, 227, 235, 237, 239, 245, 251, 253, 254, 257 Proportionality constraint, 253 Qualitative difference, 192 Random effect, 4, 11, 14, 29, 31, 33,45, 52, 53, 69, 71, 75, 76, 106, 107, 110, 111, 113-117, 121, 124, 125, 136, 142, 144, 146, 147, 149-151, 153-156, 160, 162, 173, 174, 184, 185, 194 Reading recognition, 62, 63, 6668, 72-74, 76, 77, 79 Regression model, 16, 23, 36, 38, 53, 105, 144, 203, 204, 218, 221, 222, 226 Repeated measures, 1, 2, 5, 14, 22, 26, 29, 45, 50, 59, 66-69, 71, 74, 76, 80, 87, 89,97,103-105,113-116, 119, 131, 135, 172, 177179, 181, 182, 192 Repeated measures data, 5, 14, 22, 50, 59, 68, 80, 89, 105, 113, 131, 135, 178
276 Residual, 4, 27-29, 31, 33, 3739, 45, 49, 52, 65, 70, 77,92,94,106,108-111, 116, 117, 127, 129, 142, 144, 150, 158, 159, 164, 165, 178,. 179, 182, 193, 204, 211, 216, 220, 226, 227, 250, 255-257 Rochester Interaction Record, 2, 5
SAS, 14-16, 18, 90, 100, 112, 113, 122, 123, 127, 129, 135, 136, 140, 143, 144, 146, 149, 150, 156, 159, 160, 162, 168 Scale invariance, 18, 20 Secondary intervention, 231 Structural equation modeling, 2730,36,44, 59, 65,68,80, 97, 172, 173, 180, 182, 183, 192, 193, 238, 239 Test score, 237 Three-level model, 29,47, 167, 174, 184, 185, 188 Time series, 26-28, 30, 38, 52, 89, 203-214, 218-222, 225227, 231, 235, 236, 238, 239, 241, 243, 244, 251, 253, 257 Time varying covariate, 72 Trading day variation, 221 Two level model, 27-29, 36, 44, 47, 53 Unbalanced data, 5, 15, 173, 180, 183, 186, 190 Unconditional growth model, 149, 153 Unconditional means model, 141144, 147, 149, 153 Uniqueness, 241, 243, 244, 248, 253 Upper level, 5-7, 9, 14-18,88, 89, 94-97
Subject Index Variance component, 23, 87, 9597,99-101,111-114,116, 117, 119, 121, 123, 129, 141, 142, 146, 149, 153, 158, 172 Weighted least squares, 2, 15-23 Within level, 181 Within slope, 18, 19, 23, 182 Within structure, 181 Within-person variability, 153 WNFS Model, 243-246,248,250, 251, 253-257