Linear Models: A Mean Model Approach
This is a volume in PROBABILITY AND MATHEMATICAL STATISTICS Z. W. Birnbaum, founding editor David Aldous, Y. L. Tong, series editors A list of titles in this series appears at the end of this volume.
Linear Models: A Mean Model Approach Barry Kurt Moser Department of Statistics Oklahoma State University Stillwater, Oklahoma
Academic Press San Diego Boston New York London Sydney Tokyo Toronto
This book is printed on acid-free paper, ^y Copyright © 1996 by ACADEMIC PRESS All Rights Reserved. No part of this publication may be reproduced or transmitted-in any form or by any means, electronic or mechanical, including photocopy, recording, or any information storage and retrieval system, without permission in writing from the publisher.
Academic Press, Inc. 525 B Street, Suite 1900, San Diego, California 92101-4495, USA http://www.apnet.com Academic Press Limited 24-28 Oval Road, London NW1 7DX, UK http://www.hbuk.co.uk/ap/ Library of Congress Cataloging-in-Publication Data Moser, Barry Kurt. Linear models : a mean model approach / by Barry Kurt Moser. p. cm. — (Probability and mathematical statistics) Includes bibliographical references and index. ISBN 0-12-508465-X (alk. paper) 1. Linear models (Statistics) I. Title. II. Series. QA279.M685 1996 519.5'35--dc20 96-33930 CIP PRINTED IN THE UNITED STATES OF AMERICA 96 97 98 99 00 01 BC 9 8 7 6 5
4
3 2 1
To my three precious ones.
This page intentionally left blank
Contents
xi
Preface Chapter 1 1.1 1.2 1.3
Elementary Matrix Concepts Kronecker Products Random Vectors
Chapter 2 2.1 2.2 2.3
Multivariate Normal Distribution
Multivariate Normal Distribution Function Conditional Distributions of Multivariate Normal Random Vectors Distributions of Certain Quadratic Forms
Chapter 3 3.1 3.2
Linear Algebra and Related Introductory Topics
Distributions of Quadratic Forms
Quadratic Forms of Normal Random Vectors Independence VII
1 1 12 16
23 23 29 32
41 41 45
Contents
viii
3.3 3.4
Chapter 4 4.1 4.2 4.3 4.4 4.5
Complete, Balanced Factorial Experiments
53
Models That Admit Restrictions (Finite Models) Models That Do Not Admit Restrictions (Infinite Models) Sum of Squares and Covariance Matrix Algorithms Expected Mean Squares Algorithm Applications
53 56 58 64 66
Chapter 5 5.1 5.2 5.3 5.4 5.5 5.6 5.7
6.4 6.5
Maximum Likelihood Estimation and Related Topics
Maximum Likelihood Estimators of /3 and a Invariance Property, Sufficiency, and Completeness ANOVA Methods for Finding Maximum Likelihood Estimators The Likelihood Ratio Test for H/3 = h Confidence Bands on Linear Combinations of (3
81 86 87 89 91 94 97
105 105 108 111 119 126
Unbalanced Designs and Missing Data
131
Replication Matrices Pattern Matrices and Missing Data Using Replication and Pattern Matrices Together
131 138 144
Chapter 8 8.1 8.2 8.3
81
2
Chapter 7 7.1 7.2 7.3
Least-Squares Regression
Ordinary Least-Squares Estimation Best Linear Unbiased Estimators ANOVA Table for the Ordinary Least-Squares Regression Function Weighted Least-Squares Regression Lack of Fit Test Partitioning the Sum of Squares Regression The Model Y = X/3 + E in Complete, Balanced Factorials
Chapter 6 6.1 6.2 6.3
47 49
The t and F Distributions Bhat's Lemma
Balanced Incomplete Block Designs
General Balanced Incomplete Block Design Analysis of the General Case Matrix Derivations of Kempthorne's Interblock and Intrablock Treatment Difference Estimators
149 149 152 155
ix
Contents Chapter 9 9.1 9.2 9.3 9.4 9.5
Model Assumptions and Examples The Mean Model Solution Mean Model Analysis When cov(E) = a2ln Estimable Functions Mean Model Analysis When cov(E) = cr2V
Chapter 10 10.1 10.2 10.3 10.4 10.5 10.6
Less Than Full Rank Models
The General Mixed Model
The Mixed Model Structure and Assumptions Random Portion Analysis: Type I Sum of Squares Method Random Portion Analysis: Restricted Maximum Likelihood Method Random Portion Analysis: A Numerical Example Fixed Portion Analysis Fixed Portion Analysis: A Numerical Example
161 161 164 165 168 172
177 177 179 182 183 184 186
Appendix 1 Computer Output for Chapter 5
189
Appendix 2
193
A2.1 A2.2
Computer Output for Chapter 7
Computer Output for Section 7.2 Computer Output for Section 7.3
193 201
Appendix 3
Computer Output for Chapter 8
207
Appendix 4
Computer Output for Chapter 9
209
Appendix 5
Computer Output for Chapter 10
213
A5.1 A5.2 A5.3
Computer Output for Section 10.2 Computer Output for Section 10.4 Computer Output for Section 10.6
References and Related Literature Subject Index
213 216 218 221 225
This page intentionally left blank
Preface
Linear models is a broad and diversified subject area. Because the subject area is so vast, no attempt was made in this text to cover all possible linear models topics. Rather, the objective of this book is to cover a series of introductory topics that give a student a solid foundation in the study of linear models. The text is intended for graduate students who are interested in linear statistical modeling. It has been my experience that students in this group enter a linear models course with some exposure to mathematical statistics, linear algebra, normal distribution theory, linear regression, and design of experiments. The attempt here is to build on that experience and to develop these subject areas within the linear models framework. The early chapters of the text concentrate on the linear algebra and normal distribution theory needed for a linear models study. Examples of experiments with complete, balanced designs are introduced early in the text to give the student a familiar foundation on which to build. Chapter 4 of the text concentrates entirely on complete, balanced models. This early dedication to complete, balanced models is intentional. It has been my experience that students are generally more comfortable learning structured material. Therefore, the structured rules that apply to complete, balanced designs give the student a set of leamable tools on which to
xi
xii
Preface
build confidence. Later chapters of the text then expand the discussion to more complicated incomplete, unbalanced and mixed models. The same tools learned for the balanced, complete models are simply expanded to apply to these more complicated cases. The hope is that the text progresses in an orderly manner with one topic building on the next. I thank all the people who contributed to this text. First, I thank Virgil Anderson for introducing me to the wonders of statistics. Special thanks go to Julie Sawyer, Laura Coombs, and David Weeks. Julie and Laura helped edit the text. Julie also contributed heavily to the development of Chapters 4 and 8. David has generally served as a sounding board during the writing process. He has listened to my ideas and contributed many of his own. Finally, and most importantly, I thank my wife, Diane, for her generosity and support.
1
Linear Algebra and Related Introductory Topics
A summary of relevant linear algebra concepts is presented in this chapter. Throughout the text boldfaced letters such as A, U, T, X, Y, t, g, u are used to represent matrices and vectors, italicized capital letters such as 7, £7, T, E, F are used to represent random variables, and lowercase italicized letters such as r, s, t, n, c are used as constants.
1.1
ELEMENTARY MATRIX CONCEPTS
The following list of definitions provides a brief summary of some useful matrix operations. Definition 1.1.1 Matrix: An r x s matrix A is a rectangular array of elements with r rows and s columns. An r x 1 vector Y is a matrix with r rows and 1 column. Matrix elements are restricted to real numbers throughout the text. Definition 1.1.2 Transpose: If A is an n x s matrix, then the transpose of A, denoted by A', is an s x n matrix formed by interchanging the rows and columns of A.
1
2
Linear Models
Definition 1.1.3 Identity Matrix, Matrix of Ones and Zeros: In represents an n x n identity matrix, Jw is an n x n matrix of ones, ln is an n x 1 vector of ones, and O mxrt is an m x n matrix of zeros. Definition 1.1.4 Multiplication of Matrices: Let atj represent the 17* element of an r x s matrix A with i = 1 , . . . , r rows and j = 1 , . . . , s columns. Likewise, let bjk represent the jk0* element of an s x t matrix B with j = 1,..., s rows and k = 1,..., t columns. The matrix multiplication of A and B is represented by AB = C where C is an r x t matrix whose ffc* element c(* = X)/=i atjbjk- If the r x s matrix A is multiplied by a scalar d, then the resulting r x s matrix d\ has /7th element da,y. Example 1.1.1
The following matrix multiplications commonly occur.
Definition 1.1.5 Addition of Matrices: The sum of two r x s matrices A and B is represented by A + B = C where C is the r x s matrix whose 17* element
Definition 1.1.6 Inverse of a Matrix: An n x n matrix A has an inverse if AA"1 = A"1 A = !„ where the n x n inverse matrix is denoted by A"1. Definition 1.1.7 Singularity: If an n x n matrix A has an inverse then A is a nonsingular matrix. If A does not have an inverse then A is a singular matrix. Definition 1.1.8 Diagonal Matrix: Let a,, be the im diagonal element of an n x n matrix A. Let a^ be the /7th off-diagonal element of A for i / 7. Then A is a diagonal matrix if all the off-diagonal elements a,; equal zero. Definition 1.1.9 Trace of a Square Matrix: The trace of an n x n matrix A, denoted by tr(A), is the sum of the diagonal elements of A. That is, tr(A) =
1
Linear Algebra
3
It is assumed that the reader is familiar with the definition of the determinant of a square matrix. Therefore, a rigorous definition is omitted. The next definition actually provides the notation used for a determinant. Definition 1.1.10 Determinant of a Square Matrix: Let det(A) = |A| denote the determinant of an n xn matrix A. Note det(A) = 0 if A is singular. Definition 1.1.11
Symmetric Matrix: Annxn matrix A is symmetric if A = A'.
Definition 1.1.12 Linear Dependence and the Rank of a Matrix: Let A be an n x s matrix (s < n) where a\,..., as represent the s n x 1 column vectors of A. The 5 vectors a i , . . . , a5 are linearly dependent provided there exists s elements ki,...,ks, not all zero, such that k&i -\ h ksas = 0. Otherwise, the s vectors are linearly independent. Furthermore, if there are exactly r < s vectors of the set a i , . . . , as which are linearly independent, while the remaining s — r can be expressed as a linear combination of these r vectors, then the rank of A, denoted by rank (A), is r. The following list shows the results of the preceding definitions and are stated without proof: Result LI:
Let A and B each ben xn nonsingular matrices. Then
Result 1.2: Let A and B be any two matrices such that AB is defined. Then Result 1.3: Let A be any matrix. The A'A and AA' are symmetric. Result 1.4: Let A and B each be n x n matrices. Then det(AB) = [det(A)][det(B)]. Result 1.5: Let A and B be m x n and n x m matrices, respectively. Then tr(AB) = tr(BA). Quadratic forms play a key role in linear model theory. The following definitions introduce quadratic forms. Definition 1.1.13 Quadratic Forms: A function f ( x 1 , . . . , x n ) is a quadratic form if /(*i,..., jc.) = £?=i Znj=i atjxtxj = X'AX where X = ( x l . . . , xn)' is an n x 1 vector and A is an n x n symmetric matrix whose //* element is a,;.
4
Linear Models
The symmetric matrix A is constructed by setting a/,- and a-}i equal to one-half the coefficient on the ;c,jc; term for / / ./. Example 1.1.3 Quadratic forms are very useful for defining sums of squares. For example, let
where the n x 1 vector X = (x\,..., xn)'. The sum of squares around the sample mean is another common example. Let
Definition 1.1.14 Orthogonal Matrix: Ann x n matrix P is orthogonal if and only if P-1 = P'. Therefore, PP7 = P'P = In. If P is written as (pi, p 2 , . . . , pB) where p, is an n x 1 column vector of P for / = 1,..., n, then necessary and sufficient conditions for P to be orthogonal are
1
Linear Algebra
Example 1.1.4
5
Let the n x n matrix
where PP = P'P = Iw. The columns of P are created as follows:
The matrix P' in Example 1.1.4 is generally referred to as an n-dimensional Helmert matrix. The Helmert matrix has some interesting properties. Write P as P = (PJ |PM) where the n x 1 vector pi = (\/*Jn)ln and the n x (n — 1) matrix P« = (P2, P3, Pn) then
The (n — 1) x n matrix P^ will be referred to as the lower portion of an n -dimensional Helmert matrix. If X is an n x 1 vector and A is an n x n matrix, then AX defines n linear combinations of the elements of X. Such transformations from X to AX are very useful in linear models. Of particular interest are transformations of the vector X that produce multiples of X. That is, we are interested in transformations that
6
Linear Models
satisfy the relationship
where A. is a scalar multiple. The above relationship holds if and only if
But the determinant of XL. — A is an n* degree polynomial in X. Thus, there are exactly n values of X that satisfy |XIn — A| =0. These n values of A. are called the n eigenvalues of the matrix A. They are denoted by Xi, X 2 , . . . , Xn. Corresponding to each eigenvalue X, there is an n x 1 vector X, that satisfies where X, is called the Ith eigenvector of the matrix A corresponding to the eigenvalue A,. Example 1.1.5 Find the eigenvalues and vectors of the 3 x 3 matrix A = 0.61s + 0.4J3. First, set |XI3 — A| = 0. This relationship produces the cubic equation
Therefore, the eigenvalues of A are Xi = 1.8,X2 = X3 = 0.6. Next, find vectors X, that satisfy (A — X,-I3)X,- = 0 3x i for each / = 1, 2, 3. For Xi = 1.8, (A - 1.8I3)Xi = 0 3x i or (-1.2I3 + 0.4J3)Xi = 0 3x i. The vector Xi = (l/\/3)I3 satisfies this relationship. For X2 = X3 = 0.6, (A — 0.6I3)X, = 0 3x i or 13X« = 0 3x i for i = 2,3. The vectors X2 = (l/\/2, -1/V5,0)' and X3 = (l/\/6,1/V6, -2/V6)' satisfy this condition. Note that vectors Xi,X 2 ,X 3 are normalized and orthogonal since XjXi = X2X2 = X3X3 = 1 and XjXi = x;x3 = x2x3 = o. The following theorems address the uniqueness or nonuniqueness of the eigenvector associated with each eigenvalue. Theorem 1.1.1 There exists at least one eigenvector corresponding to each eigenvalue. Theorem 1.1.2 If an n xn matrix A has n distinct eigenvalues, then there exist exactly n linearly independent eigenvectors, one associated with each eigenvalue. In the next theorem and corollary a symmetric matrix is defined in terms of its eigenvalues and eigenvectors.
1
Linear Algebra
7
Theorem 1.1.3 Let A be an n x n symmetric matrix. There exists an n x n orthogonal matrix P such that P'AP = D where D is a diagonal matrix whose diagonal elements are the eigenvalues of A and where the columns ofP are the orthogonal, normalized eigenvectors of A. The ith column of P (i.e., the ith eigenvectors of A.) corresponds to the ith diagonal element of D for / = !,...,«. Example 1.1.6
Let A be the 3 x 3 matrix from Example 1.1.5. Then P'AP = D or
Theorem 1.1.3 can be used to relate the trace and determinant of a symmetric matrix to its eigenvalues.
The number of times an eigenvalue occurs is the multiplicity of the value. This idea is formalized in the next definition. Definition 1.1.15 Multiplicity: The n x n matrix A has eigenvalue A* with multiplicity m < n if m of the eigenvalues of A equal A.*. Example 1.1.7 All the n eigenvalues of the identity matrix In equal 1. Therefore, In has eigenvalue 1 with multiplicity n. Example 1.1.8 Find the eigenvalues and eigenvectors of the n xn matrix G = (a - b)ln + bjn. First, note that
8
Linear Models
Therefore, a+(n — 1 }b is an eigenvalue of matrix G with corresponding normalized eigenvector (\^/n)\n. Next, take any n x 1 vector X such that l^X = 0. (One set of n — 1 vectors that satisfies l^X = 0 are the column vectors p2, p s , . . . , pn from Example 1.1.4.) Rewrite G = (a — b)\n + blnl'n. Therefore,
and matrix G has eigenvalue a — b. Furthermore,
Therefore, eigenvalue a + (n — \)b has multiplicity 1 and eigenvalue a — b has multiplicity n — 1. Note that the 3 x 3 matrix A in Example 1.1.5 is a special case of matrix G with a = 1, b = 0.4, and n = 3. It will be convenient at times to separate a matrix into its submatrix components. Such a separation is called partitioning. Definition 1.1.16 Partitioning a Matrix: If A is an m x n matrix then A can be separated or partitioned as
Most of the square matrices used in this text are either positive definite or positive semidefinite. These two general matrix types are described in the following definitions. Definition 1.1.17 semidefinite if
Positive Semidefinite Matrix: Ann x n matrix A is positive
(i) A = A',
(ii) Y'AY > 0 for all n x 1 real vectors Y, and (iii)
Y'AY = 0 for at least one n x I nonzero real vector Y.
Definition 1.1.18 nite if
Positive Definite Matrix: Ann x n matrix A is positive defi-
1
Linear Algebra
9
(i) A = A' and
(ii) Y'AY > 0 for all nonzero n x 1 real vectors Y. Example 1.1.10 The n x n identity matrix IM is positive definite because In is symmetric and Y'In Y > 0 for all nonzero n x 1 real vectors Y. Theorem 1.1.5
Let A. be an n x n positive definite matrix. Then
(i)
there exists an n x n matrix B of rank n such that A = BB' and
(ii)
the eigenvalues of A. are all positive.
The following example demonstrates how the matrix B in Theorem 1.1.5 can be constructed. Example 1.1.11 Let A be an n x n positive definite matrix. Thus, A = A' and by Theorem 1.1.3 there exists n x n matrices P and D such that P'AP = D where P is the orthogonal matrix whose columns are the eigenvectors of A, and D is the corresponding diagonal matrix of eigenvalues. Therefore, A = PDP' = PD1/2D1/2P/ = BB' where D1/2 is an n x n diagonal matrix whose Ith diagonal element is x}'2 and B = PD1/2. Certain square matrices have the characteristic that A2 = A. For example, let
Matrices of this type are introduced in the next definition. Definition 1.1.19
Idempotent Matrices: Let A be an n xn matrix. Then
(i) A is idempotent if A2 = A and (ii) A is symmetric, idempotent if A = A2 and A = A'. Note that if A is idempotent of rank n then A = ln. In linear model applications, idempotent matrices generally occur in the context of quadratic forms. Since the matrix in a quadratic form is symmetric, we generally restrict our attention to symmetric, idempotent matrices.
10
Linear Models
Theorem 1.1.6 Let H be ann xn symmetric, idempotent matrix of rank r < n. Then B is positive semidefinite. The next theorem will prove useful when examining sums of squares in ANOVA problems.
The eigenvalues of the matrix !„ — £ Jn are derived in the next example. Example 1.1.12 The symmetric, idempotent matrix !„ — ^Jn takes the form (a — b)\n + bjn with a = 1 — £ and b = — £. Therefore, by Example 1.1.8, the eigenvalues of In - Jjn are a + (n - l)b = (!-£) + («- !)(-£) = 0 with multiplicity 1 and a — b = (I — £) — (— £) = 1 with multiplicity n — 1. The result that the eigenvalues of an idempotent matrix are all zeros and ones is generalized in the next theorem. Theorem 1.1.8 The eigenvalues of an n x n symmetric matrix A of rank r
The proof is given by Graybill (1976, p. 39).
•
The following theorem relates the trace and the rank of a symmetric, idempotent matrix. Theorem 1.1.9 rank(\). Proof:
If A is an n x n symmetric, idempotent matrix then tr(\) =
The proof is left to the reader.
•
In the next example the matrix In — £ Jn is written as a function of n — 1 of its eigenvalues. Example 1.1.13 The n x n matrix In — £ Jn takes the form (a — b)ln + bjn with a = 1 — £ and b = —£. By Examples 1.1.8 and 1.1.12, the n — 1 eigenvalues equal to 1 have corresponding eigenvectors equal to the n — 1 columns of Pn where P^, is the (n — 1) x n lower portion of an n-dimensional Helmert matrix. Further, I — ITJn — Pn P' "
n
~
n-
1
Linear Algebra
11
This representation of a symmetric idempotent matrix is generalized in the next theorem. Theorem 1.1.10 I fA is an n x n symmetric, idempotent matrix of rank r then A = PP7 where P is an n x r matrix whose columns are the eigenvectors of A associated with the r eigenvalues equal to 1. Proof: 1.1.3,
Let A be an n x n symmetric, idempotent matrix of rank r. By Theorem
where R = [P|Q] is the n x n matrix of eigenvectors of A, P is the n x r matrix whose r columns are the eigenvectors associated with the r eigenvalues 1, and Q is the n x (n — r) matrix of eigenvectors associated with the (n — r) eigenvalues 0. Therefore,
Furthermore, P'P = L because P is an n x r matrix of orthogonal eigenvectors.
•
If X'AX is a quadratic form with n x 1 vector X and n x n symmetric matrix A, then X'AX is a quadratic form constructed from an n-dimensional vector. The following example uses Theorem 1.1.10 to show that if A is an n x n symmetric, idempotent matrix of rank r < n then the quadratic form X'AX can be rewritten as a quadratic form constructed from an r-dimensional vector. Example 1.1.14 Let X'AX be a quadratic form with n x 1 vector X and n x n symmetric, idempotent matrix A of rank r < n. By Theorem 1.1.10, X'AX = X'PP'X = Z'Z where P is an n x r matrix of eigenvectors of A associated with the eigenvalues 1 and Z = P'X is an r x 1 vector. For a more specific example, note that X'(In - ±J n )X = X'PMP'nX = Z'Z where Fn is the (n - 1) x n lower portion of an n-dimensional Helmert matrix and Z = P^Xisan(n — l ) x l vector. Later sections of the text cover covariance matrices and quadratic forms within the context of complete, balanced data structures. A data set is complete if all combinations of the levels of the factors contain data. A data set is balanced if the number of observations in each level of any factor is constant. Kronecker product notation will prove very useful when discussing covariance matrices and quadratic forms for balanced data structures. The next section of the text therefore provides some useful Kronecker product results.
Linear Models
12
1.2
KRONECKER PRODUCTS
Kronecker products will be used extensively in this text. In this section the Kronecker product operation is defined and a number of related theorems are listed without proof. Definition 1.2.1 Kronecker Product. If A is an r x s matrix with ij* element atj for i = 1,..., r and j = 1,..., s, and B is any t x v matrix, then the Kronecker product of A and B, denoted by A
B, is the r t x s v matrix formed by multiplying each atj element by the entire matrix B. That is,
Theorem 1.2.1
Let A and B be any matrices. Then (A B)' = A' B'.
Theorem 1.2.2
Let A, B, and C be any matrices and let a be a scalar. Then
Theorem 1.2.3
Let A and B be any square matrices. Then tr(\ <8> B) =
Theorem 1.2.4
Let A. be an r x s matrix, B be a t x u matrix, C be an s x v
Theorem 1.2.5
Let A and ^bemxm and n x n nonsingular matrices, respec-
1
Linear Algebra
13
Theorem 1.2.6 Let A and B be m x n matrices and let C be a p x q matrix.
Theorem 1.2.7 Let A be an m x m matrix with eigenvalues ct\,..., am and let B be an n x n matrix with eigenvalues f $ \ , . . . , /3n. Then the eigenvalues of A. B
Example 1.2.7 From Example 1.1.8, the eigenvalues of !„ — ^Jn are 1 with multiplicity n — 1 and 0 with multiplicity 1. Likewise, Ia — £ Ja has eigenvalues 1 with multiplicity a — 1 and 0 with multiplicity 1. Therefore, the eigenvalues of (Ia - ija) ® (!„ - ±J W ) are 1 with multiplicity (a - l)(n - 1) and 0 with multiplicity a + n — 1. Theorem 1.2.8 Let A. be an m x n matrix of rank r and let B be a p x q matrix
Theorem 1.2.9 Let Abe an m xm symmetric, idempotent matrix of rank r and let B be an n x n symmetric, idempotent matrix of rank s. Then A <8> B is an mn x mn symmetric, idempotent matrix where tr(A B) = rank (A ® B) = rs.
The following example demonstrates that Kronecker products are useful for describing sums of squares in complete, balanced ANOVA problems. Example 1.2.10 Consider a one-way classification where there are r replicate observations nested in each of the t levels of a fixed factor. Let y,y represent the 7* replicate observation in the /* level of the fixed factor for / = 1,..., t and j = 1,..., r. Define the tr x 1 vector of observations Y = (yn, ••••, y\r, • • •, ^1...., ytrY- The layout for this experiment is given in Figure 1.2.1. The ANOVA table is presented in Table 1.2.1. Note that the sums of squares are written in summation notation and as quadratic forms, Y'AmY, for m = 1,..., 4. The objective is to demonstrate that the tr x tr matrices Am can be expressed as Kronecker products. Each matrix Am is derived later. Note
14
Linear Models
Figure 1.2.1
One-Way Layout.
Table 1.2.1 One-Way ANOVA Table Source
df
SS
Mean
1
v L^j=\ y^ y ••v z~/i=\
Fixed factor
t- 1
Nested replicates
t(r-l)
Total
tr
r
2
= Y'AiY
£U£;=i(?<. -?-->22 =
Y'A2Y
ELiEy-i^-v-^.)
= Y'A3Y
V 5^r vyij2 Z^;=i LJJ=\
= Y'A4Y
Therefore, the sum of squares due to the mean is given by
where the tr x tr matrix AI = yj, <8> £j r . The sum of squares due to the fixed factor is
1
Linear Algebra
15
where the tr x tr matrix A2 = (If — ~3t) ® 7 Jr- The sum of squares due to the nested replicates is
where the tr x tr matrix AS = I, ® (Ir — £ Jr). Finally, the sum of squares total is
where the fr x tr matrix A4 = lt <S> lrThe derivations of the sums of squares matrices Am can be tedious. In Chapter 4 an algorithm is provided for determining the sums of squares matrices for complete, balanced designs with any number of main effects, interactions, or nested factors. This algorithm makes the calculation of sums of squares matrices Am very simple. This section concludes with a matrix operator that will prove useful in Chapter 8. Definition 1.2.2 BIB Product: If B is a c x d matrix and A is an a x b matrix where each column of A has c < a nonzero elements and a — c zero elements, then the BIB product of matrices A and B, denoted by A D B, is the a x bd matrix formed by multiplying each zero element in the Ith column of A by a 1 x d row
Linear Models
16
vector of zeros and multiplying the j * nonzero element in the /* column of A by
Example 1.2.11 Let the 3 x 3 matrix A = J3 - I3 and the 2 x 2 matrix B = Ii — \$2- Then the 3 x 6 BIB product matrix
Theorem 1.2.10 If A is anaxb matrix with c < a nonzero elements per column and a — c zeros per column; BI , 82, and B are each c x d matrices; D is a b x b diagonal matrix of rank b; Z is a d x 1 vector; and Y is a c x 1 vector, then
Example 1.2.12
1.3
Let A be any 3 x 3 matrix with two nonzero elements per
RANDOM VECTORS
Let the n x 1 random vector Y = (Yi, ¥2,..., Yn)' where 7, is a random variable fon = !,...,«. The vector Y is a random entity. Therefore, Y has an expectation; each element of Y has a variance; and any two elements of Y have a covariance
1
Linear Algebra
17
(assuming the expectations, variances, and covariances exist). The following definitions and theorems describe the structure of random vectors. Definition 1.3.1 Joint Probability Distribution: The probability distribution of the n x 1 random vector Y = (Y\,..., ¥„)' equals the joint probability distribution
Definition 1.3.2
Expectation of a Random Vector. The expected value of the
Definition 1.3.3 Covariance Matrix of a Random Vector Y: The n x I random vector Y = (Fi,..., ¥„)' has n x n covariance matrix given by
Definition 1.3.4 Linear Transformations of a Random Vector Y: If B is an m x n matrix of constants and Y is an n x 1 random vector, then the m x 1 random vector BY represents m linear transformations of Y. The following theorem provides the covariance matrix of linear transformations of a random vector. Theorem 1.3.1 If B is an m x n matrix of constants, Y is an n x 1 random vector, and cov(Y) is the n x n covariance matrix of y, then the m x I random vector BY has an m x m covariance matrix given by B[cov(Y)]B'.
The next theorem provides the expected value of a quadratic form. Theorem 1.3.2 Let Y be an n x 1 random vector with mean vector IJL = E(Y) andn x n covariance matrix £ = cov(Y) then E(Y'AY) = tr(AE) + //A/z where A is any n x n symmetric matrix of constants.
18
Linear Models
Therefore,
The moment generating function of a random vector is used extensively in the next chapter. The following definitions and theorems provide some general moment generating function results. Definition 1.3.5 Moment Generating Function (MGF) of a Random Vector Y: The MGF of an n x 1 random vector Y is given by
where the n x 1 vector of constants t = ( t 1 , . . . , tny if the expectation exists for
There is a one-to-one correspondence between the probability distribution of Y and the MGF of Y, if the MGF exists. Therefore, the probability distribution of Y can be identified if the MGF of Y can be found. The following two theorems and corollary are used to derive the MGF of a random vector Y.
Theorem 1.3.4 If Y is an n x l random vector, gisannxl vector of constants, and c is a scalar constant, then
1
Linear Algebra
19
Corollary 1.3.4 Let my, (.),..., wym (.) represent the MGFs of the independent random variables Y\,..., Ym, respectively. If Z = ^™=i YI then the MGF of Z is given by
Moment generating functions are used in the next example to derive the distribution of the sum of independent chi-square random variables. Example 1.3.1 Let Y1,..., Ym be m independent central chi-square random variables where 7, and n, degrees of freedom for i = 1,..., m. For any i
Therefore, Y^T=i ^ ^s distributed as a central chi-square random variable with Z^/li ni degrees of freedom. The next theorem is useful when dealing with functions of independent random vectors.
Linear Models
20
Theorem 1.3.5 Let gi(Yi), —g m (¥m} be m functions of the random vectors YI, ..., Ym, respectively. If y 1 , . . . , y m are mutually independent, then gi,...,gm we mutually independent. The next example demonstrates that the sum of squares of n independent Ni(0, 1) random variables has a central chi-square distribution with n degrees of freedom. Example 1.3.2 Let Z 1 , . . . , Zn be a random sample of normally distributed random variables with mean 0 and variance 1. Let F, = Z2 for i = !,...,«. The moment generating function of F, is
That is, each F/ has a central chi-square distribution with one degree of freedom. Furthermore, by Theorem 1.3.5, the F;'s are independent random variables. Therefore, by Example 1.3.1, Y^=i ^ = ]C?=i Z? *sa central chi-square random variable with n degrees of freedom.
EXERCISES 1. Find an mn x (n - 1) matrix P such that PP7 = ^Jm <8> (ln - £j n ). 2. Let Si = XJ,i Ui (Vi - V), 52 = ^Li Wi - U)\ and 53 = ELi W - V)2. If A = (!„ - ijJO, - VV)(In - iJn), U = 0/i,..., t/J', and V = (Vi,..., V*)', is the statement 52 - (5f/53) = U'AU true or false? If the statement is true, verify that A is correct. If the statement A is false, find the correct form of A.
3. Let V = lm [ofL, + a22J«], A, = (lm - ±Jm) £JB, and A2 = lm ® (!„ -
iJ-)(a) Show AiA 2 = O mnxWM . (b) Define an mn x mn matrix C such that Imn = \\ + A2 + C.
1
Linear Algebra
21
(c) Let the mn x 1 vector Y = (Yu,..., Ymn)' and let C be defined as in part b. Is the following statement true or false?: Y'CY = [^T=i E"=i Ytjf/ (mn). If the statement is true, verify it. If the statement is false, redefine Y'CY in terms of the r,/s. (d) Define constants k.\, ki, and mn x mn idempotent matrices Ci and €2 such that AiV = k\Ci and AaV = ki£,i. Verify that Ci and €2 are idempotent. 4. Find the inverse of the matrix 0.4L; + 0.6J4. 5. Show that the inverse of the matrix (!„ + VV) is ( ^""^y ) where V is an n x 1 vector. 6. Use the result in Exercise 5 to find the inverse of matrix (a — b)ln + bjn where a and b are positive constants.
(a) Find the 2n eigenvalues of V. (b) Find a nonsingular 2n x 2n matrix Q such that V = QQ'.
9. Define n x n matrices P and D such that (a — b)\n + b3n = PDF' where D is a diagonal matrix and a and b are constants. 10. Let
where S is a symmetric matrix and the S(; are n, x HJ matrices. Find BSB'. 11. Let A, = ^m ® iJB, A2 = X(X'X)-!X', A3 = lm ® O, - Jj«) and X = X+ (g> ln where X+ is an m x p matrix such that 1^,X+ = Oi x p . Find the mn x mn matrix A4 such that IOT <8> Iw = Y^t=i ^«- Express A4 in its simplest form. 12. Is the matrix A4 in Exercise 11 idempotent? 13. Let Y be an n x 1 random vector with n x n covariance matrix cov(Y) = (Tj2In + or|Jn- Define Z = P'Y where the n x n matrix P = (ln|P«) and the n x (« — 1) matrix Pn are defined in Example 1.1.4. Find cov(P'Y).
22
Linear Models
14. Let Y b e a f o x l random vector with E(Y) = lb®(ni,..., /z,)'andcov(Y) = ffjtlfe ® Jf] + olT[\b <8> (Ir - yJ/)]- Define A! = ^b <8> }jr, A2 = (Ife ^J fc )
Derive btr x for matrices Am for m = 1 , . . . , 5 where Sm = Y'AmY.
2
Multivariate Normal Distribution
In later chapters we will investigate linear models with normally distributed error structures. Therefore, this chapter concentrates on some important concepts related to the multivariate normal distribution.
2.1
MULTIVARIATE NORMAL DISTRIBUTION FUNCTION
Let Z i , . . . , Zn be independent, identically distributed normal random variables with mean 0 and variance 1. The marginal distribution of Z, is
for / = !,...,«. Since the Z/'s are independent random variables, the joint probability distribution of the n x 1 random vector Z = ( Z i , . . . , Zn)' is
23
24
Linear Models
for i = !,...,«. Let the n x 1 vector Y = GZ+^i where G is an n x n nonsingular matrix and p, is an n x 1 vector. The joint distribution of the n x 1 random vector Yis
where £ = GG' is an n x n positive definite matrix and the Jacobian for the transformation Z = G^CY - /x) is |GGT1/2 = ISr 1 / 2 . The function /y(y) is the multivariate normal distribution of an n x 1 random vector Y with n x 1 mean vector p, and n x n positive definite covariance matrix S. The following notation will be used to represent this distribution: the n x 1 random vector Y ~ N n (/x, £). The moment generating function of an n-dimensional multivariate normal random vector is provided in the next theorem.
Proof: Let the n x 1 random vector Z = (Z\,..., Zn)' where Z, are independent, identically distributed Ni(0, 1) random variables. The n x 1 random vector Y = GZ + p, ~ N B (/i, S) where £ = GG'. Therefore,
We will now consider distributions of linear transformations of the random vector Y when Y ~ N rt (/x, £). The following theorem provides the joint probability distribution of the m x 1 random vector BY + b where B is an m x n matrix of constants and b is an m x 1 vector of constants. Theorem 2.1.2 If'Y is an n x 1 random vector distributed N n (/i, £), B is an m x n matrix of constants with m < n, and b is an m x 1 vector of constants, then them x I random vector BY + b ~ Nm (B/x + b, BSB').
2
Multivariate Normal Distribution
25
The MGF of BY + b takes the form of a multivariate normal random vector with dimension m, mean vector Bfj, + b and covariance matrix BSB' and the proof is complete. •
Example 2.1.2 Let the n x 1 random vector Y be defined as in Example 2.1.1. Find the distribution of the (n — 1) x 1 random vector U = (l/i, ...,Un-\)' — (l/or)P^Y where P^ is the (n — 1) x n lower portion of an n-dimensional Helmert matrix. By Theorem 2.1.2 with (n - 1) x n matrix B = (l/a)P^ and (n - 1) x 1 vector b = 0 ( B _i ) x i, U ~ N n _i(0, I n _i) since
The n x 1 random vector Y can be partitioned as Y = (Yj, Y2)' where Y, is an n, x 1 vector for / = 1,2 and n = n\ + HI. Theorem 2.1.2 is used to derive the marginal distributions of the w, x 1 random vectors Y,. Theorem 2.1.3 Let then x I random vector Y = (Yj, \'2)' ~ NB (fi, E) where fj, = (IJL[ , At^)' is the n x 1 mean vector,
is the n x n covariance matrix, Y, and fj,f are n, x 1 vectors, £,•_/ is an n, x n; matrix for i, j = 1,2 and n — n\ + HI- The marginal distribution of the «,- x 1 random vector Y, is Nn. (/x,, £,-,-). Proof: By Theorem 2.1.2 with n\ x n matrix B = [!„, |0M, X n 2 l and ni x 1 vector b = O n , x i'Yi = BY + b ~ N Wl (/x 1? Sn). The marginal distribution of Y2 is
26
Linear Models
derived in the same way with «2 x 1 matrix B = [O n?X ni I In,] and HI x 1 vector
The results of Theorem 2.1.3 are generalized as follows. If the n x 1 random vector Y ~ Nn(/z, £), then any subset of elements of Y has a multivariate normal distribution where the mean vector of the subset is obtained by choosing the corresponding elements of /x, and the covariance matrix of the subset is obtained by choosing the corresponding rows and columns of S. Two normally distributed random vectors have the unique characteristic that the two vectors are independent if and only if they are uncorrelated. This result is given in the next theorem. Theorem 2.1.4 Let the n x I random vector Y = (Y' t ,..., Y^)' ~ N B (/i, S) where p, = (/Lt',,..., jj,' )' is the n x 1 mean vector,
is the n x n covariance matrix, Y, and fa are n,- x 1 vectors, S(J is an nz x n; matrix for i, j = 1 , . . . , m and n = ^™-\ n/. The random vectors YI , . . . , Ym are independent if and only if E,7 = 0M( xrlj for all i ^ j. Proof: ofYis
First assume E,; = 0W(.xn; for all/ / j. The moment generating function
where t = (t' l 5 ..., t^,)' with HI x 1 vector t; for i = 1,..., m. Therefore, by Theorem 1.3.3, the vectors Y,- are mutually independent. Now assume the vectors Y, are mutually independent. For any i ^ j,
2
Multivariate Normal Distribution
27
In the following examples mean vectors and covariance matrices are derived for a few common problems. Example 2.1.3 Let Y\,..., Yn be independent, identically distributed NI (a, a2) random variables. By Theorem 2.1.4, cov(y,, Yj) = 0 for i ^ j. Furthermore, E(Y,) = a and the var(y,) = a2 for all i = !,...,«. Therefore, the n x 1 random vector Y = ( y l f . . . , y n )' ~ N n («l n , a2!,,). Example 2.1.4 Consider the one-way classification described in Example 1.2.10. Let Yij be a random variable representing the 7* replicate observation in the /* level of the fixed factor for / = 1 , . . . , t and j = 1,..., r. Let the tr x 1 random vector Y = Y\\,..., Y\r,..., Yt\,..., Ytr)' where the 7,/s are assumed to be independent, normally distributed random variables with E(Y,y) = /x, and var(Yij) = cr2. This experiment can be characterized with the model
where the R(T)(f)j are independent, identically distributed normal random variables with mean 0 and variance cr2. The letter R signifies replicates and the letter T signifies the fixed factor or fixed treatments. Therefore, R(T) represents the effect of the random replicates nested in the fixed treatment levels. The parentheses around T identify the nesting. By Theorems 2.1.2 and 2.1.4, Y ~ N, r (/x, E) where the tr x 1 mean vector p, is given by
and using Definition 1.3.3 the elements of the tr x tr covariance matrix S are
The preceding model is composed of two parts: a fixed portion represented by the t fixed constants /^ and a random portion represented by the tr random variables R(T)^j. In models of this type, each constant in the fixed portion equals
28
Linear Models
the expected value of observations in particular combinations of the levels of the fixed factor(s). Models whose fixed portions are represented in this way are called mean models. Numerous examples of mean models are presented in this text and a specific discussion on mean models is provided in Chapters 9 and 10. Example 2.1.5 Consider a two-way cross classification where both factors are random. Let y,; be a random variable representing the observation in the ith level of the first random factor S and the y-th level of the second random factor T for i = 1,..., 5 and j = I,... ,t. Let the st x 1 random vector Y = FII, . . . , FI,, . . . , Ysi,..., Yst)'. This experiment can be characterized with the model
where a is a constant representing the overall mean; the random variable 5, represents the effect of the Ith level of the first random factor; the random variable 7) represents the effect of the yth level of the second random factor; and the random variable 57}; represents the interaction effect of the Ith level of factor S and the /h level of factor T. Furthermore, Si,..., Ss, T\,..., Tt and ST\ \,..., STst are assumed to be independent normal random variables with zero expectations and variances given by var(S1) = zcr|, var(7}) = &%, and var[5T}y] = ajT for i = 1 , . . . , s and j = 1 , . . . , t. Therefore, by Theorems 2.1.2 and 2.1.4, Y ~ N jr (/i, S) where the st x 1 mean vector p is given by
and the elements of the st x st covariance matrix £ are
Direct derivation of the covariance matrix £ can be difficult even for balanced design structures. In Chapter 4 a simple algorithm is provided for determining
2
Multivariate Normal Distribution
29
the covariance matrix for complete, balanced designs with any number of fixed or random main effects, interactions, and nested factors.
2.2
CONDITIONAL DISTRIBUTIONS OF MULTIVARIATE NORMAL RANDOM VECTORS
In this section conditional multivariate normal distributions are discussed.
is thennx n positive definite covariance matrix, Yi, and u, are n, x 1 vectors, £,y is an HI x HJ matrix for i, j = 1, 2 and n = n1 + n2. The conditional distribution of the n\ x 1 random vector YI given the n2 x 1 vector of constants y2 = c2 is N Bl [/ii + SwS^te - M 2 )> En - £i2S2~2 £21].
By Theorem 2.1.2 with the n x 1 vector b = O n x i and the n x n matrix
the n x 1 random vector (V,, Vi)' ~ N n (u*, S*) where the n x 1 mean vector
and the n x n covariance matrix S* is
Linear Models
30
Thus, Vj and ¥2 are independent multivariate normal random vectors. That is, the joint distribution of Vi and ¥2 can be written as the product of the two marginal distributions
where /v,( v i) i& a N ni (A*i ~~ ^12^22 fa* ^n ~~ ^12^22 S 2 i) distribution and f\2 (V2) is a NM2 (/Lt2, £22) distribution. The conditional distribution of Y} | Y2 = €2 is derived by utilizing the transformation from YI, Y2 to Vi, V2 and noting that the Jacobian of this transformation is 1.
But the distribution of /v,(vi + Si 2 £ 22 c 2 ) is the distribution of Vi plus the constant vector Si2E2~21c2). By Theorem 2.1.2 with B = !„, and b = Ei2S2"21c2, the proof is complete. • Theorem 2.2.1 is applied in the next few examples.
Then by Theorem 2.2.1 the conditional distribution of YI , Y2 \ Y3 = 1 is N2 (/zc, Sc) where the 2 x 1 conditional mean vector /zc is given by
and the 2 x 2 conditional covariance matrix Sc is given by
Example 2.2.2 Use the distribution of Y = (Yi, Y2, Y3)' from Example 2.2.1 to find the conditional distribution of 2Yi + Y2\Yi + 2Y2 + 373 = 2. First, let the 2x1 vector b = 0 2x i and let the 2 x 3 matrix
2
Multivariate Normal Distribution
31
By Theorem 2.1.2, the joint distribution of
and the 2 x 2 covariance matrix S* is
where p, and S are given in Example 2.2.1. Applying Theorem 2.2.1 to the distribution of BY, the conditional distribution of 2Y{ + Y2\Yi + 2Y2 + 3F3 = 2 is NI (fjic, ac2) where the conditional mean /zc is and the conditional variance err is given by
Example 2.2.3 Let the n x 1 random vector Y = (Y\,..., Yn)' ~ Nn(aln, a2\n). Find the conditional distribution of Y\,..., Yn-\\Y = y. By Theorem 2.1.2 with n x I vector b = O n x i and « x n matrix
and the n x n covariance matrix S* is
Linear Models
32
Applying Theorem 2.2.1 to the distribution of BY, the conditional distribution of YI, ..., Yn-i\Y = y is N n _i(/x c , Sc) where the (n — 1) x 1 conditional mean vector /xc is
Applying Theorem 2.1.2 [with scalar b = 0 and 1 x (n — 1) matrix B = (1,0,..., 0)] to the conditional distribution of Y\,..., Yn-\ \Y = y, the conditional distribution of Y\ \Y = y is NI(« C , a2} where the conditional mean ac is and the conditional variance a2 is
The distribution theory of quadratic forms is presented in Chapter 3. However, a number of interesting quadratic form problems are solved in the next section before the general distribution theory is developed.
2.3
DISTRIBUTIONS OF CERTAIN QUADRATIC FORMS
The distributions of several quadratic forms are derived in the following examples. Example 2.3.1 Letthenx 1 random vector Y = (Y\,..., Yny ~ N n (al n , a2\n). Define U = YZ=\(Yi ~ ^2/a2 and V = n(Y - a}2/a2 where Y = (\/n)l'n\. Find the distributions of U and V and show these two random variables are independent. First, note ^/n(Y - a)/a = [\/(a Jn}}l'n\ -(a*fn)/a. By Theorem 2.1.2 with 1 xn matrix B = [\/(a*Jn)}\'n and scalar & = ~((x^/n)/a, *Jn(Y — a)/a ~ Ni(0, 1) since
2
Multivariate Normal Distribution
33
Therefore, by Example 1.3.2, n(Y — a)2/a2 is distributed as a central chi-square random variable with 1 degree of freedom. Next, rewrite U as
where the (n-l)xl vector X = [(l/a)P;]Y = (Xi,..., X n _i)'andthe(rt-l)x« matrix P^ is the lower portion of an n -dimensional Helmert matrix with P«P^, = (!„ -1JB), P;PW = !„_! and l'nPn = OI X ( W _D. By Theorem 2.1.2 with (n - 1) x n matrix B = (l/
Therefore, U equals the sum of squares of n — 1 independent Ni(0, 1) random variables. By Example 1.3.2, U has a central chi-square distribution with n — 1 degrees of freedom. Finally, by Theorem 2.1.2 with n x n matrix B and n x I vector b given by
By Theorem 2.1.4, ^/n(Y — ot)/<j and X are independent. Therefore, by Theorem 1.3.5, U and V are independent.
Linear Models
34
Figure 2.3.1
Two-Way Layout.
Example 2.3.2 Consider the two-way cross classification described in Example 2.1.5 where the st x 1 random vector
The layout for this experiment is given in Figure 2.3.1 and the ANOVA table is provided in Table 2.3.1, where the sums of squares are written in summation notation and as quadratic forms, Y'A^Y, for m = 1,..., 5. The matrices AI, A2, and AS for the sums of squares due to the mean, random factor S, and the total, respectively, were already derived in Example 1.2.10. Note that (Y.1,..., Y.,)' = [(1/5)1^ ® I/]Y. Therefore, the sum of squares due to the random factor T is
Table 2.3.1 Two-Way ANOVA Table Source
df
ss
Mean
1
\p* y f2 Z^i=i LJJ=\ I "
Factor 5
s- 1
Factor T
t- 1
£;=,£;=, <*.-?-->22
S-, £;•-,(?.,• -r->
Interaction ST
(*-!)(/-!)
E;=I S=. My- fc. -?.; +
Total
st
s V V' Yij2 Z-
= Y'A 1 Y
= Y'A2Y = Y'A3Y 2 ?•.) = Y'A4Y
= Y'A5Y
2
Multivariate Normal Distribution
35
where the st x st matrix AS = [jJ5 <8> (I/ — 7!?)]. The matrix A4 for the sum of squares due to the random interaction can be solved by subtraction, A4 =
To find the distribution of Y'A2 Y, note that A2 is an idempotent matrix of rank s -1. Thus, Y'A2Y = Y'PP'Y = X'X where the (s - 1) x 1 vector X = ( X i , . . . , X5-i)' = P'Y, the st x (s - 1) matrix P = P, (1/V/)1,, and the (s — 1) x s matrix P's is the lower portion of an ^-dimensional Helmert matrix with P,P; = a, - jJ,), liP, = OI X ( J _D, and P;P, = I,_L By Theorem 2.1.2 with (5 — 1) x st matrix B = P' and (s — 1) x 1 vector b = 0( 5 _i) X i, X ~ N s _i(0, (ajT + taj)ls_i) since
Linear Models
36
degrees of freedom. The distributions of Y'AaY and Y^Y of Y'A2Y, Y'A3Y, and Y'A4Y are left to the reader.
and the independenc
The preceding examples suggest the following general technique for findin the distribution of the quadratic form Y'AY when Y ~ Nw (//, E) and A is an n x idempotent matrix of rank r. 1. Set A = PP7 where P is an n x r matrix of eigenvectors corresponding to th r eigenvalues of A equal to 1. 2. Let Y'AY = Y'PP'Y = X'X = £/=i X? where the r x 1 vector X = ( X i , . . . , Xr)' = P'Y. Then find the distribution of X. 3. Find the distribution of Y'AY using the distribution of X. This technique is used in the next chapter to prove some general theorems aboi the distributions of quadratic forms.
EXERCISES
Find a 2 x 2 triangular matrix T such that TY ~ N 2 (0,I 2 ).
Find the distribution of the 3 x 1 random vector Y = (Y\, F2, Y^V where
2
Multivariate Normal Distribution
37
(a) Find the distribution of Y*. Rewrite Y* in terms of the n x 1 vector
4. Let the 9 x 1 random vector Y ~ Ng(fj,, S) where the mean vector p, = (/Ail,, f^2^A, A^l'i)' and covariance matrix
(a) Define a 9 x 6 matrix P such that \ = PP'. (b) Find the distribution of P' Y. (c) Find the distribution of Y'AY.
(a) Find the distribution of Y. (b) Find the distribution of U. (c) Are ?. and U independent? Prove your answer.
38
Linear Models
(a) Find a linear combination of Y1 , ¥2., and F3 that is independent of Y\ — Y2_ • Prove your result. (b) What is the distribution of the linear combination you found in part a?
10. Let Y = X/3 + E where Y is an n x 1 random vector, X is an n x p matrix of known constants, f3 is a p x 1 vector of unknown constants, and E is an n x I random vector. Assume E ~ N n (0,
2
(c) Find a constant c such that c[5F + 2K2 + has a central chi-square distribution.
5F32
Y2+2Y2Y3.
- 4Fi Y2 + 2Y{ Y3 +4Y2Y3]
12. Let the 4 x 1 random vector Y = (Yi, Y2, F3, F4)' ~ N4(/z, E) where ^ = (2, 3,0, -I)' and
2
Multivariate Normal Distribution
39
(a) Find the conditional distribution of Y\ + Y2\Yj + Y4 = 1. (b) Show that the conditional mean of Y^1Y\ = y\,Y*i = y2 has the form A) + P\y\ + fay2 and find the values of the /Ts.
Find the conditional distribution of Y\ + Y2 + Y^\Y2 + Y3 = 1.
(a) Find the marginal distribution of (Y\, Y^)'. (b) Find the partial (i.e., conditional) correlation of Y\ and Y2 given YI = 2. (c) Find the distribution of 4Y\ -Y2-2. (d) Find a normally distributed random variable Z which is a (nontrivial) function of YI and Y2 and is independent of 4Y\ — Y2 — 2. 15. Let the n x 1 random vector Y = (Yi,..., ¥„)' ~ N n (al n , a2In) for n > 3. (a) Find the conditional expectation of \(Y2 + Y3)\Y where Y = J]"=i thatis,findE[i(y 2 + l3)l?].
^/ n »
(b) Show that the variance of E[ ^ (72 + ^3) I Y] is smaller than the variance of 1(^2 + ^3). 16. Let F and G be independent normal random variables with 0 means and variances a2 and 1 — a2, respectively, 0 < a2 < 1. Find the conditional distribution of F given F + G = c. 17. Let the 3 x 1 random vector Y = (Yi, Y2, Y3)' ~ N3(/x, £). Show that the variance of Y\ in the conditional distribution of Y\ given F2 is greater than or equal to the variance of Y\ in the conditional distribution of Y\ given Y2 and ^3.
This page intentionally left blank
3
Distributions of Quadratic Forms
The distribution of the quadratic form Y'AY is now derived when Y ~ N w (0, !„). Later, the distribution of Y'AY is developed when Y ~ Nrt (0, S) for any positive definite n x n matrix S.
3.1
QUADRATIC FORMS OF NORMAL RANDOM VECTORS
A chi-square random variable with n degrees of freedom and the noncentrality parameter A, will be designated by x^(A.). Therefore, a central chi-square random variable with n degrees of freedom is denoted by /^(A = 0) or Xn (0)Theorem 3.1.1 Let the n x I random vector Y ~ Nn(0, In) then Y'AY ~ Xp(A = 0) if and only if\ is ann x n idempotent matrix of rank p. Proof: First assume A is an n x n idempotent matrix of rank p. By Theorem 1.1.10, A = PF where P is an n x p matrix of eigenvectors with P'P = \p. Let the p x 1 random vector X = P'Y. By Theorem 2.1.2 with p x n matrix B = P' 41
42
Linear Models
and p x 1 vector b = O p x i, X = ( X i , . . . , Xp)' ~ Np(0, Ip). Therefore, by Example 1.3.2, Y'AY = Y'PP'Y = X'X = £f=1 x? ~ xfa = 0). Next assume that Y'AY ~ x^(A = 0). Therefore, the moment generating function of Y'AY is (1 —2t)~p/2. But the moment generating function of Y'AY is also defined as
The final equality holds since the last integral equation is the integral of a multivariate normal distribution (without the Jacobian \ln — 2tA.\1/2) with mean vector Ora x i and covariance matrix (!„ — It A) ~l. The two forms of the moment generating function must be equal for all t in some neighborhood of zero. Therefore,
Let Q be the n x n matrix of eigenvectors and D be the n x n diagonal matrix of eigenvalues of A where the eigenvalues are given by A . i , . . . , A.n. By Theorem 1.1.3, O'AO = D, O'O = L, and
Therefore, (1—20p = n" =1 (l-2/A r ). The left side of the equation is a polynomial in 2t with highest power p. The right side of the equation therefore must have highest power p in It also, implying that (n — p) of the A.,- 's are zero. Thus, the equation becomes (1 — It )p = nf =1 (1 — 2? A.,). Taking logarithms of each side and equating coefficients produces 1 — It = 1 — 2fA., for / = 1,...,/?. The solution to these equations is AI = • • • = Xp = 1. Therefore, by Theorem 1.1.8, A is an idempotent matrix. •
3
Distributions of Quadratic Forms
43
Thus far we have concentrated on central chi-square random variables (i.e., A. = 0). However, in general, if the n x 1 random vector Y ~ Nw (/i, !„) then Y'AY ~ Xp W where p, is any n x I mean vector and the noncentrality parameter is given by The next theorem considers the distribution of Y'AY when Y ~ Nn (/i, S) and S is a positive definite matrix of rank n. Theorem 3.1.2 Let the n x l random vector Y ~ Nn (/x, S) where Jlisannxn positive definite matrix of rank n. Then Y'AY ~ Xp(^ = A*'A/x/2) if and only if any of the following conditions are satisfied: (1)AS (or SA) is an idempotent matrix of rank p or (2) AS A = A and A has rank p. Proof: Let Z = T~!(Y - //) where S = TT'. By Theorem 2.1.2 with n x n matrix B = T"1 and n x I vector b = — T"1//* Z ~ Nn(0, In). Furthermore, Y'AY = (TZ + /z)'A(TZ + /i) = (Z + T-1/z)'T'AT(Z + T^/u) = V'RV where V = Z + T~V and R = T'AT. By Theorem 2.1.2 with n x n matrix B = ln and n x 1 vector b = T"1^. V ~ Nn (I""1/*»!«)• BY Theorem 3.1.1, V'RV ~ Xp(^) if and only if R is idempotent of rank p. But R is idempotent if and only if (T'AT) (T'AT) = T'AT or equivalently AS = AS AS. Therefore, Y'AY ~ Xp(^) if and only if AS is idempotent. Finally, p = rank(R) = rank(T'AT) = rank(ATT') = rank(AS) since T is nonsingular. Also, A = (T-V)/R(T~1M)/2 = M'T^'T'ATT-1/! = /i'Aji/2. The proofs for EA idempotent of rank p or AS A = A of rank p are left to the reader. • It is convenient to make an observation at this point. In most applications it is more natural to show that AS is a multiple of an idempotent matrix. We therefore state the following two corollaries which are direct consequences of Theorem 3.1.2. Corollary 3.1.2(a) Let the n x I random vector Y ~ Nn (u, S) where S is an n x n positive definite matrix of rank n. Then Y'AY ~ cx^(A. = /x'A/x/(2c)) if and only if(l) AS(or SA) is a multiple of an idempotent matrix of rank p where the multiple is c or (2) ASA = cA and A has rank p. Corollary 3.1.2(b) If the n x I random vector Y ~ Nn(0, cr2V) where V is an n x n positive definite matrix of known constants then Y'V"1 Y/
44
Linear Models
are independent x\(&i) random variables for i = 1,...,/?; and AI, . . . , Ap are the nonzero eigenvalues 0/A£. Proo/; Let Z = T-1(Y - /u) where £ = TT'. By Theorem 2.1.2 with n x n matrix B = T"1 and n x 1 vector b = -T~V>Z ~ NB(0, !„). Furthermore, Y'AY = (TZ + /i)'A(TZ + /x) = (Z + T~V)'T'AT(Z + T'1^) = (Z + T-1/Lt)TDr'(Z + I"1/*) where TAT is an n x n symmetric matrix, T is the n x w orthogonal matrix of eigenvectors of T'AT, and D is the n x n diagonal matrix of eigenvalues of T'AT such that T'AT = TDF'. The eigenvalues of T'AT are AI, . . . , Ap, 0 , . . . , 0 and rank (T'AT) = p. Let W = (Wi,..., Wn)' = T'(Z + T-1//)- Therefore, Y'AY = W'DW = £f = i ^^- BY Theorem 2.1.2 with n x n matrix B = T' and n x 1 vector b = T'T"1//' W ~ NnCF'T-1/*, I*)Therefore, W/2 are independent x\(8i > 0) random variables for i = 1,...,/?. Furthermore, /? = rank(T'AT) = rank(ATT') = rank(AE) because T is nonsingular. Finally, the eigenvalues of T'AT are found by solving the polynomial equation Premultiplying the above expression by |T' :| and postmultiplying by |T'| we obtain
Thus, the eigenvalues iT'AT are the eigenvalues of AE. We now reexamine the distributions of a number of quadratic forms previously derived in Section 2.3. Example 3.1.1 From Example 2.3.1 let the n x 1 random vector Y ~ Nn («!„,
which is a multiple of an idempotent matrix of rank n — 1. Furthermore,
3
Distributions of Quadratic Forms
45
which is a multiple of an idempotent matrix of rank 1. Example 3.1.2 Consider the two-way cross classification from Example 2.3.2 where the st x 1 random vector Y = (Y\\,..., Y\t,..., Ysi,..., Yst)' ~ Nj,(fi, E), /x = als <8> lt, and S = a|ls <8> Jf + a^3s <8> I, + or|rly I,. The sum of squares due to the random factor S is Y'A2Y where A2 = (I* — jJ5) <8> 7 J/. By Corollary 3.1.2(a), Y'A2Y ~ (cr|r + ta2s)xt-\&2 = 0) since A2E = faftCI, Jj,) ®7 Jr] + o-lrtO, - jJ.) ®7J,] = (a|r + 'o-,2)A2,
and A2 is an idempotent matrix of rank 5 — 1. The sum of squares due to the mean is Y'A t Y where A! = ±JS 7 J f . By Corollary 3.1.2(a), Y'AiY ~ (ta$ + sa} + a|r)xi2(A.i) since A]S = (/
The distributions of the other quadratic forms in Example 2.3.2 can be derived in a similar manner and are left to the reader.
3.2
INDEPENDENCE
The independence of two quadratic forms is examined in the next theorem. Theorem 3.2.1 Let A and B be n x n constant matrices. Let then x 1 random vector Y ~ N w (/x, £). The quadratic forms Y'AY and Y'BY are independent if and only i/AEB = 0 (or BE A = 0). Proof: The matrices A, E, and B are symmetric. Therefore, AEB = 0 is equivalent to BE A = 0. Assume AEB = 0. Since £ is positive definite, by
46
Linear Models
Theorem 1.1.5, there exists an n x n nonsingular matrix S such that SES' = ln. Then Z = SY ~ N n (S/x,I n ). Let G = (S^'AS"1 and H = (S^XES"1. Therefore, Y'AY = Z'GZ, Y'BY = Z'HZ, and AEB = S'GSSS'HS = S'GHS. Thus, the statement AEB = 0 implies Y'AY and Y'BY are independent and the statement GH = 0, implies Z'GZ and Z'HZ are independent, are equivalent. Since G is symmetric, there exists an orthogonal matrix P such that G = P'DP, where a = rank(A) = rank(G) and D is a diagonal matrix with a nonzero diagonal elements. Without loss of generality, assume that
where Da is the a x a diagonal matrix containing the nonzero elements of D. Let X = PZ ~ Nn (PS/x, ln) and partition X as [X^, X'2]', where Xi is a x 1. Note that Xi and X2 are independent. Then Z'GZ = Z'P'DPZ = X'DX = x;DaXi and Z'HZ = X'PHP'X = X'CX where the symmetric matrix C = PHP'. If GH = 0 then P'DPP'CP = 0, which implies DC = 0. Partitioning C to conform with D,
which implies Cn = 0 and €12 = 0. Therefore,
which implies X'CX = X'2^22^2> which is independent of XjD a Xi. Therefore, Z'GZ and Z'HZ are independent. The proof of the converse statement is supplied bySearle(1971). • The following theorem considers the independence of a quadratic form and linear combinations of a normally distributed random vector. Theorem 3.2.2 Let A and Bbenxn and mxn constant matrices, respectively. Let the n x I random vector Y ~ N rt (/x, £). The quadratic form Y'AY and the set of linear combinations BY are independent if and only j/BSA = 0 (or AEB' = 0). Proof: The "if" portion can be proven by the same method used in the proof of Theorem 3.2.1. The proof of the converse statement is supplied by Searle (1971). • In the following examples the independence of certain quadratic forms and linear combinations is examined.
3
Distributions of Quadratic Forms
47
Example 3.2.1 Consider the one-way classification described in Examples 1.2.10 and 2.1.4. The sum of squares due to the fixed factor is Y'A2Y where A2 = (lt — yJr) <8> pJr is an idempotent matrix of rank t — 1. Furthermore, A2E = [(I, - yj,) ® ^J r ] [a2!, <S> Ir] = o-2A2. The sum of squares due to the nested replicates is Y'A3 Y where A3 = I, (Ir — £ Jr) is an idempotent matrix of rankt(r - 1). Likewise, A3E = [I, <8> (Ir - pJr)] [o-2!, <8>Ir] = cr2A3. Therefore, by Corollary 3.1.2(a), Y'A2Y ~ or 2 x f 2 _ 1 (X 2 ) and Y'A3Y ~ CT2x2(r_1)(X3) where A2 =
[(Ml, • • • , HA' 2
<8> Ir]' [(I/ -
}J,)
<8> ±Jr] [(Ml, • • • , M/)' ® lr]/(2a 2 ) =
2
'ELO*.- - A-) /(2a ) with A- = EU^i'A and A3 = [ ( / m , . . . , /*,)' 0 ilrl' [Ir ® dr - fJr)] [ ( ^ i , . . . , ^ X ® lrl/(2or2) = 0. Finally, by Theorem 3.2.1, Y'A2Y and Y'A3Y are independent since A2EA3 = dr - f Jr)] = O rrx ,r-
Example 3.2.2 Reconsider Example 2.3.1 where Y = (Y\,..., 7n)' ~ Nn (ctln, a 2 I w ), 17 = S?=1(yi- - ?)2/a2 = y'td/a 2 )^ - ijB)]Y and ? = (l/n)i;Y. By Theorem 3.2.2, f and t/ are independent since (l/n)l^[cr 2 I n ] [(l/a 2 )(I n -
iX,)] = 0 1XB .
3.3
THE ^ AND F DISTRIBUTIONS
The normal and chi-square distributions were discussed at length in the previous sections. We now examine the distributions of certain functions of chi-square and normal random variables. Definition 3.3.1 Noncentral t Random Variable: Let the random variable Y ~ NI (a, cr 2 ) and the random variable U ~ x^(0)- If Y and U are independent, then the random variable T = (Y/a)/^/U/n is distributed as a noncentral t random variable with n degrees of freedom and noncentrality parameter A = «2/2. Denote this noncentral t random variable as f rt (A). Definition 3.3.2 Noncentral F Random Variable: Let the random variable U\ ~ X^ (A.) and the random variable f/2 ~ Xn 2 (^)- ^ ^1 an(^ ^2 are independent, then the random variable F = (U\ /n \)/ (t/2/«2) is distributed as a noncentral F random variable with n \ and n2 degrees of freedom and noncentrality parameter X. Denote this noncentral F random variable as F MliW2 (A). A t random variable with n degrees of freedom and a noncentrality parameter equal to zero [i.e., tn(k = 0)] has a central t distribution. Likewise, an F random variable with n\ and n2 degrees of freedom and a noncentrality parameter equal to zero [i.e., Fn^n2(k = 0)] has a central F distribution.
48
Linear Models
In recent years Smith and Lewis (1980, 1982), Pavur and Lewis (1983), Scariano, Neill, and Davenport (1984) and Scariano and Davenport (1984) have developed the theory of the corrected F random variable. The definition of the corrected F random variable is given next. Definition 3.3.3 Noncentral Corrected F Random Variable: Let the random variable U\ ~ cix^(X) and the random variable £/2 ~ C2X 2 2 (0). If U\ and t/2 are independent, then the random variable Fc = (c 2 /ci)[(f/i/ni)/(£/ 2 /n 2 )] ~ ^/Ji,n 2 (^) is called a corrected F random variable where the ratio c2/ci is the correction factor. In practice, we often encounter independent random variables U\ and (72, which are distributed as multiples of chi-square random variables (t/2 being a multiple of a central chi square). The random variable F = (£/i/wi)/(£/ 2 /n 2 ) in this case will be distributed as a noncentral F random variable if and only if c\ = c2 (i.e., ci/c\ = 1). Generally, c\ and c2 will be linear combinations of unknown variance parameters. In the following examples a number of central and noncentral t and F random variables are derived. Example 3.3.1 Letthenx 1 random vector Y =(Yi,..., Yn)' ~ N n (al n , a2ln). By Example 2.1.1, Y ~ N^a 2 /")- By Example 3.1.1, E"=i(y< ~ ?)2 = Y'[OU - ±J W )]Y ~
Example 3.3.2 Consider the one-way classification described in Example 3.2.1. It was shown that the sum of squares due to the fixed factor Y'A2Y = Y'[(I, — j-Jf) ® ^Jr]Y ~ cr 2 x, 2 _i(A, 2 ) and the sum of squares due to the nested replicates Y'A3Y = Y'[I, (8) (Ir - ^J r )]Y ~ cr2x}(r-i)(V). Furthermore, Y'A2Y and Y'A3Y are independent. Therefore, the statistic
where X2 = r£^ =1 (//,,- — /^-) 2 /(2a 2 ). The hypothesis H0 : X2 = 0 versus HI : A > 0 is equivalent to the hypothesis HO : Mi = At2 = • • • = fJLt versus HI : the /x/'s are not all equal. Thus, under HO, the statistic F* has a central F distribution with t — I and t (r — 1) degrees of freedom. A y level rejection region for the hypothesis HO versus HI is as follows: Reject HO if F* > F^_l t ^ r _ l ) where
3
Distributions of Quadratic Forms
49
Ff_i t(r-\) 1S me 100 (1 — y)* percentile point of a central F distribution with t — 1 and t(r — 1) degrees of freedom. Example 3.3.3 Consider the two-way cross classification described in Example 3.1.2. The sums of squares due to the random factor 5 and due to the random interaction S T are given by Y'A2Y and Y'A4Y, respectively. It was shown that Y'A2Y = Y'[(I5 - ij,) ij,]Y ~ (ajT + ra|)x,2_i(0). Furthermore, A4S = [(I. - jJs) <8> (If - jjf)] [ J/ + or|j, (g> I, + ajTls I,] = (I, - 7J,)]Y ~
Under the hypothesis HO : er| = 0, the statistic F* has a central F distribution with s — 1 and (s — !)(t — 1) degrees of freedom. A y level rejection region for the hypothesis HO : crj = 0 versus HI : aj > 0 follows; Reject H0 if JT* ^
3.4
ff?Y
s-l,(s-l)(t-l)-
BHATS LEMMA
The following lemma by Bhat (1962) is applicable in many ANOVA and regression problems. The lemma provides necessary and sufficient conditions for sums of squares to be distributed as multiples of independent chi-square random variables. Lemma 3.4.1 Let k and n denote fixed positive integers such that 1 < k < n. Suppose ln = XL=i AM where each A, is an n x n symmetric matrix of rank «, with Y^i=i ni = n. If the n x I random vector Y ~ Nn(/x, E) and the sum of squares Sf = Y'A, Y/or / = 1,..., k, then
Proof: This proof is due to Scariano et al (1984). Assume that the quadratic forms S? satisfy (a) and (b) given in Lemma 3.4.1. By Theorems 3.1.2 and 3.2.1, (i) the matrices (1/c,) A, S are idempotent for i = 1 , . . . , £ and (ii) A,SA; = 0 n x n for i / j,i,j = !,...,£. Furthermore, by Theorem 1.1.7, A, = A? and
Linear Models
50
A/Ay = Qnxnfori / 7, i, 7 = 1 , . . . , fc. But (i) and (ii) imply that £f =1 (l/c,-) A,;£ is idempotent of rank n and thus equal to In. Hence, E = Q^-iO/CiOAj]"1 = £*=i C J, + a^Js <8> I, + crjTls <8> UAi = Jj, ® }J,,A2 = 0, - ij a ) ® }J,,A3 = l-Js ® (I, - 7J/), A4 =
a-iJ,)®a-}jr),^ = Ei=1rank(AJB) = i+(j-i)+(/-i)+(*-i)a-i),
and Is ® I, = Y^n=i Am- Furthermore, AmS = cmAm for m = 1 , . . . , 4 where d = ajT+taj+sa^,C2 = a|r + ?cr|, C3 = ajT +sa%, and04 = cr|r. Therefore, £ = (El=i AW)S = £1=1 (A«£) = El=i^A m . Thus, by Bhat's lemma, the quadratic forms Y'AmY are distributed as independent cmxrgnk(A }{A.m = (orlj l,)'A IB (ofl J ® l,)/(2cm)} for m = 1 , . . . , 4.
EXERCISES 1. Use Corollary 3.1.2(a) to find the distribution of £"=1 w, y,2 from Exercise 3b in Chapter 2. 2. Use Corollary 3.1.2(a) to find the distribution of Y'AY from Exercise 4c in Chapter 2. 3. Consider the model presented in Exercise 5 of Chapter 2. (a) Find the distribution of V, = C=i £;=i £Li(*V ~ ^-) 2 (b) Find the distribution of V2 = £?=i £;=i Ei=i(% ~ ?<>)2(c) Find the distribution of {Vi/[a(s - l)]}/(V2/[as(t - 1)]}. 4. Consider Exercise 6 of Chapter 2. (a) Use Corollary 3.1.2(a) to find the distribution of U. (b) Use Theorem 3.2.2 to show that U and ?. are independent. 5. Prove Theorem 3.1.2, part (2). 6. Use Corollary 3.1.2(a) to find the distribution of (Y\. — Y2.)2 from Exercise 9b in Chapter 2. 7. Consider Exercise 11 of Chapter 2.
3
51
Distributions of Quadratic Forms
8. Derive the distributions of Y'AsY and Y^Y
from Example 2.3.2.
9. Calculate the noncentrality parameters X\,..., ^4 in Example 3.4.1.
(a) Find the distributions of Z\ and Z2. (b) Find the E(Zf) for / = 1, 2 and any positive integer k.
12. Let the n x 1 random vector Y ~ NM (/x, I M ). Let X = AY where A is an n x n orthogonal matrix whose first row is /Lt'/V/^'A*- Let V = X\(X\ is the first element of vector X) and U = (X'X - V). (a) Find the distributions of U and V. (b) Are U and V independent? Prove your answer. 13. Let Uj ~ X 2 (A./) for / = 1,2 where U\ and Ui are independent. Let a and b be two positive constants. Under what conditions is aU\ + bUi ~ cx 2 (A)? Provide the values of c and A.
15. (Paired f-Test Problem) Consider an experiment with n experimental units. Suppose two observations are made on each unit. The first observation corresponds to the first level of a fixed factor, the second observation to the second level of the fixed factor. Let 7,y be a random variable representing the
52
Linear Models 7th observation on the Ith experimental unit for i = !,...,« and j = 1,2. Let the 2n x 1 random vector Y = (7n, Y\2, Y2\, 722, . . . , 7 n i, Yn2)'. Let E(Yy) = 11 j and var(ly) = a2 for i = ! , . . . , « and j = 1,2; and let cov(7n, 7/2) = a2p for all i = 1,..., n. Assume Y ~ N2n(p,, £). (a) Define p, and £ in terms of /jy, a2, and p. (Hint: Use Kronecker products.) (b) Let T = bl(SDlJn) where A = 7n - Yi2 for i = 1 , . . . , n; D = E"=i A7«; and S2D = Ef =1 (A - £>)2/(« - !)• Find the distribution of T. [//mr: Start by finding the distribution of D = (D},..., Dn)'.]
i. Let the 6n x 1 random vector Y = (Y\\\,..., Y\\n, Y\2\,..., Y\2n,..., FBI , . . . , FB,,, 7211, • • • > 72in, . . . , 7221, • • • , 722n, 7231, • • • , 723,,)' ~
N 6n (l2 <S>
Oi, /i2, Ms)' <S> 1«, S) where
(b) Find the distribution of 7i — 7.2.. (c) Find the distribution of Y'[(I2 - |J2) ® (I3 - |J3) <8> JJB]Y. 7. Let the (n\ + n2) x 1 random vector Y = ( y n , . . . , 7 ln ,, 7 2 ! ,.,., 7 N ni+n2 (M, S) where ^t = (ii\\'n^ H2l'n2)' and
(a) Find the distribution of
(b) Describe the distribution of V when a2 / a2..
4
Complete, Balanced Factorial Experiments
The main objective of this chapter is to provide sum of squares and covariance matrix algorithms for complete, balanced factorial experiments. The algoorithm rules are dependent on the model used in the analysis and on the model assumptions. Therefore, before the algorithms are presented we will discuss two different model formulations, models that admit restrictions on the random variables and models that do not admit restrictions.
4.1
MODELS THAT ADMIT RESTRICTIONS (FINITE MODELS)
We begin our model discussion with an example. Consider a group of btr experimental units. Separate the units into b homogeneous groups with tr units per group. In each group (or random block) randomly assign r replicate units to each of the t fixed treatment levels. The observed data for this two-way mixed experiment with replication are given in Figure 4.1.1.
53
Linear Models
54
Figure 4.1.1
Two-Way Mixed Experimental Layout with Replication.
A model for this experiment is
for i = 1,..., b, j = 1,..., t, and k = 1 , . . . , r where Ytjk is a random variable representing the £* replicate value in the 17* block treatment combination; //; is a constant representing the mean effect of the 7th fixed treatment; B{ is a random variable representing the effect of the Ith random block; BTij is a random variable representing the interaction of the Ith random block and the 7* fixed treatment; and R(BT)aj)k is a random variable representing the effect of the fc* replicate unit nested in the 17* block treatment combination. We now attempt to develop a reasonable set of distributional assumptions for the random variables Bt, BTij, and R(BT)aj)k. Start by considering the btr observed data points in the experiment as a collection of values sampled from an entire population of possible values. The population for this experiment can be viewed as a rectangular grid with an infinite number of columns, exactly t rows, and an infinite number of possible observed values in each row-column combination (see Figure 4.1.2). The infinite number of columns represents the infinite number of blocks in the population. Each block (or column) contains exactly t rows, one for each level of the fixed treatments. Then the population contains an infinite number of replicate observed values nested in each block treatment combination. The btr observed data points for the experiment are then sampled from this infinite population of values in the following way. Exactly b blocks are selected at random from the infinite number of blocks in the population. For each block selected, all t of the treatment rows are then included in the sample. Finally, within the selected block treatment combinations, r replicate observations are randomly sampled from the infinite number of nested population replicates. Since the r blocks are selected at random from an infinite population of possible blocks, assume that the block variables B,- for/ = 1 , . . . , b are independent. If the
4
Factorial Experiments
55
Figure 4.1.2 Finite Model Population Grid.
r blocks have been sampled from the same single population of blocks, then the variables #, are identically distributed. Furthermore, assume that across the entire population of blocks the average influence of fi/ is zero, that is, E(fi,) = 0 for all / = 1 , . . . , b. If the random variables Bf are assumed to be normally distributed, then the assumptions above are satisfied when the b variables #, ~ iid NI (0, erg). Now consider the random variables BT^ that represent the block by treatment interaction. Recall that the population contains exactly t treatment levels for each block. Therefore, in the /th block the population contains exactly t possible values for the random variable BTij. If the average influence of the block by treatment interaction is assumed to be zero for each block, then E[fl7)y] = 0 for each /. But for each i, E[57};] = Y?J=I BTij/t since the population contains exactly / values of BTij for each block. Therefore, £]'•_! BTij = 0 for each i, implying that the variables BTi\,..., BTjt are dependent, because the value of any one of these variables is determined by the values of the other t — I variables. Although the dependence between the BTij variables occurs within each block, the dependence does not occur across blocks. Therefore, assume that the b vectors (BTn,..., BTlt)',..., (BTb\,..., BTbt)' are mutually independent. If the random variables BTij are assumed to be normally distributed, then the assumptions above are satisfied when the b(t — 1) x 1 random vector (Ib P',)(flr H ,..., BTit, ...,BTbi,..., BTbty ~ Nfc(,_o[0, alTlb I,_j] where P't is the (t — 1) x t lower portion of a f-dimensional Helmert matrix. Finally, consider the nested replicate variables R(BT)^ij)ic. Within each block treatment combination, the r replicate observations are selected at random from the infinite population of nested replicates. If each block treatment combination of the population has the same distribution, then the random variables R(BT)aj)k are independent, identically distributed random variables. Furthermore, within each block treatment combination, assume that the average influence of R(BT\ij)k is zero, that is, E[R(BT)(ij)k] = 0 for each ij pair. If the random variables R(BT)(ij)k are also assumed to be normally distributed, then the assumptions
Linear Models
56
above are satisfied when thebtr random variables R(BT)(ij)k ~ ndNi(0, cr|(fir)). Furthermore, assume that random variables fi,, the t x 1 random vectors (B T,^,..., BTit)f, and the random variables R(BT\ij)k are mutually independent. In the previous model formulation, the random variables BT^ contain a finite population of possible values for each /. If the variables are assumed to have zero expectation, then the finite population induces restrictions and distributional dependencies. Note that variables representing interactions of random and fixed factors are the only types of variables that assume these restrictions. Furthermore, the dependencies occurred because of the assumed population structure of possible observed values. Kempthorne (1952) called such models finite models, because the fixed by random interaction components were restricted to a finite population of possible values. In the next section we discuss models where the population is assumed to have a structure where no variable dependencies occur.
4.2
MODELS THAT DO NOT ADMIT RESTRICTIONS (INFINITE MODELS)
Consider the same experiment discussed in Section 4.1. Use a model with the same variables
where all variables and constants represent the same effects as previously stated. In this model formulation, the population has an infinite number of random blocks. For each block, an infinite number of replicates of each of the t treatment levels exists. Each of these treatment level replicates contains an infinite number of experimental units (see Figure 4.2.1). The btr observed values for the experiment are sampled from the population by first choosing b blocks at random from the infinite number of blocks in the population. For each selected block, one replicate of each of the t treatment levels is selected. Finally, within the selected block treatment combinations, r replicate observations are randomly sampled. Since the blocks are randomly selected from one infinite population of blocks, assume the random variables fi, are independent, identically distributed. With a normality and zero expectation assumption, let the b block random variables B, ~ iid NI (0, cr|). Since the t observed treatment levels are randomly chosen from an infinite population of treatment replicates, an infinite number of possible values are available for the random variables BTij. Assume that the average influence of BT^ is zero for each block. But now E[B7}y] = 0 does not imply 5D;=i ^^j = ® f°r eacn * since the variables BTij have an infinite population. Therefore, a zero expectation does not imply dependence. With a normality assumption, let the bt random variables BTij ~ iid Nj(0, o^T}. Finally,
4
Factorial Experiments
Figure 4.2.1
57
Infinite Model Population Grid.
within each block treatment combination, the nested replicates are assumed to be sampled from an infinite population. With a normality and zero expectation assumption, let the btr random variables R(BT)(ij)k ~ iid Ni(0, cr^BT)). Furthermore, assume that random variables fi,, the random variables BTtj, and the random variables R(BT\ij)k are mutually independent. Hence, in models that do not admit restrictions, all variables on the right side of the model are assumed to be independent. Kempthorne (1952) called such models infinite models, because it is assumed that all of the random components are sampled from infinite populations. In passing, we raise one additional topic. Consider the previous experiment, with one replicate unit within each block treatment combination (r = 1). Observing only one replicate unit within each block treatment combination does not change the fact that different experimental units are intrinsically different. The random variables R(BT)(tj)k represents this experimental unit difference. Hence, there is some motivation for leaving the random variable R(BT)(ij)k in the model. However, the variance
Linear Models
58
In the next section we establish the sum of squares and covariance matrix algorithm rules for models that admit restrictions. We then show how the covariance matrix algorithm can be easily modified to accommodate models that do not admit restrictions. Finally, for completeness, we discuss how the algorithms can be modified to accommodate models that contain variables with nonestimable variance parameters. Therefore, the algorithms can be applied to any complete, balanced experiment, using finite or infinite models, and using models with or without nonestimable variance parameters.
4.3
SUM OF SQUARES AND COVARIANCE MATRIX ALGORITHMS
In Section 4.1 a finite model was presented for an experiment with r replicate observations nested in bt block treatment combinations. This experiment is now used to establish the sum of squares and covariance matrix algorithm rules for finite models. Let the btr x 1 random vector Y = ( K m , . . . , F i l r , . . . , Ybti,..., Ybtr)'. The covariance matrix E = cov(Y) for the finite model is given by
The rules for constructing the matrix E are given in the following paragraphs. The matrix E is constructed in tabular form. The first rule describes the construction of the table, and the second rule describes the matrix terms that fill the table. Rule £1 List the variances of all random factors and interactions, one variance in each row. Construct column headings where the first column heading designates the main factor letters and the second heading designates the number of levels of the factor. Place brackets ([ ]), Kronecker product symbols (<8>), and plus signs (+) in each row, as described in Example 4.3.1.
4
Factorial Experiments
59
Rule £2 Compare the factor letters in the first column heading with each subscript letter on the variance (i.e., B from
Rule £2.2 If a factor letter corresponding to a fixed non-nested factor matches a subscript letter on the variance, place I/ — yj/ in the Kronecker product. Example 4.3.1
Rule £2.3
(continued) Rule £2.2
Place I/ elsewhere.
Example 4.3.1
(continued) Rule Z2.3
60
Linear Models
The covariance algorithm can be applied to complete, balanced finite models with any number of fixed and random main effects, interactions, or nested factors. For infinite models, the same covariance matrix algorithm can be used, except that Rule 22.2 is omitted. That is, for infinite models that do not contain restrictions on variables representing interactions of random and fixed factors, the covariance matrix is constructed following Rules 21, 22, 22.1, and 22.3. The next example illustrates the construction of covariance matrices in such models. Example 4.3.2 Consider the experiment in Example 4.3.1, but now use an infinite model that does not admit restrictions. Using Rules 21, 22, 22.1, and 22.3, the covariance matrix is given by
Although finite and infinite models are motivated in different ways and produce different covariance structures, algebraically the two covariance structures are simply reparameterizations of each other. To illustrate this point, consider the previous experiment with b random blocks, t fixed treatments, and r random replicates per treatment. First, rewrite the covariance matrix for the finite model in the following equivalent form.
Now to distinguish the parameters in the finite and infinite models, rename the variance parameters crj, respectively. Then the covariance matrix for the infinite model becomes
Note that the finite and infinite model covariance matrices are equal with oj* = CT| — -tG\T, &BT* = 0gT and cr|(Br), = o^BT). Therefore, the two covariance structures are simply reparameterizations of each other. An algorithm is now given for constructing the matrices As in the sums of squares Y'AS Y. This sum of squares algorithm applies to finite and infinite models.
4
Factorial Experiments
61
The subscripts s on the matrices A.s are numbered sequentially with AI being associated with the sum of squares for the overall mean /AO, ^2 being associated with the sum of squares for the next factor, etc. In the example considered in this section, \2 is associated with the blocks B, AS with the treatments T, A.* with interaction of B and T, and AS with the nested replicates R(BT). The sum of squares matrices A.s are constructed in tabular form. The first rule describes the table; the second rule describes the matrix terms in the table. Rule Al Construct two row headings where the first designates the letters of the factors and interactions, while the second designates the matrices \s associated with those factors and interactions. Construct two column headings where the first is the factor letter and the second is the number of levels, /, of the factor. Place brackets ([ ]), Kronecker product symbols (<8>), and equal signs (=) in each row, as described in Example 4.3.3.
Rule A2 Compare the factor letters in the first row heading with the factor letters in the first column heading. Rule A2.1 If a row factor letter does not match the column factor heading, place a j J/ in the Kronecker product. Note that /ZQ does not match any column factor heading; hence, the Kronecker product for AI will be comprised of terms of the form }j/. Example 4.3.3
(continued) Rule A2.1
Linear Models
62
Rule A2.2 If a non-nested factor in the row heading matches the column factor heading, place an I/ — j- J/ in the Kronecker product. Example 4.3.3
(continued) Rule A2.2
Rule A2.3 Place I/ elsewhere. Example 4.3.3
(continued) Rule A2.3
4
Factorial Experiments
63
The sums of squares for each factor and interaction can be written as quadratic forms, Y'ASY, using these Kronecker product matrices \s. For example, the sum of squares for the factor B is Y'A2Y = \'[(lb - £ Jfc) <8> }J, ® ±Jr]Y. In Examples 4.3.1, 4.3.2, and 4.3.3 all the variance parameters in the model were estimable. In the next example we apply the algorithms to a model that contains a nonestimable variance parameter. Example 4.3.4 Consider an experiment with b random blocks, / fixed treatments, and r = 1 replicate observations per block treatment combination. Assume the model
for i = 1,..., b, j = 1,..., f, and k = 1. Thus, ff^(BT) *s nonestimable and R(BT)(ij)k could be dropped from the model. However, if R(BT)(ij)k is left in the model, then the sum of squares and covariance matrix algorithms are applied just as they were in Examples 4.3.1,4.3.2, and 4.3.3 with r replaced by 1. Therefore, for the finite model, the covariance algorithm follows Rules £1, £2, £2.1, E2.2, and £2.3. With r = 1 the covariance matrix £ is given by
For the infinite model, the covariance algorithm follows Rules £1, £2, E2.1, and £2.3. With r = 1 the covariance matrix E is
For both finite and infinite models, the sum of squares algorithm follows Rules Al, A2, A2.1, A2.2, and A2.3. With r = 1 the sum of squares matrices
64
Linear Models
Since A5 is a bt x bt matrix of zeros, the sum of squares due to effect R(BT) equals zero and no estimate of
4.4
EXPECTED MEAN SQUARES
The mean square of an effect is the sum of squares of that effect divided by the corresponding degrees of freedom. For complete, balanced designs the mean
4
Factorial Experiments
65
square is [l/fr(A s )]Y'A 5 Y, since the degrees of freedom equal the tr(A.s). The expected value of the mean square, usually called the expected mean square (EMS), is a function of the mean vector \JL = E(Y) and of the variance parameters in £ = cov(Y). The expected mean square indicates how the mean squares can be used to obtain unbiased estimates of functions of the variance parameters. The expected mean square in complete, balanced designs is defined in the following theorem. The proof of the theorem is a direct result of Theorem 1.3.2. Theorem 4.4.1 Let Ybeannxl random vector associated with the observations of a complete, balanced factorial experiment with annxl mean vector p = £(Y) and n x n covariance matrix 52 = cov(Y). The expected mean square associated with the sum of squares Y'AY is E{[l/fr(As)]Y'AY} = [fr(AE) + jz'A/i]/fr(A) where ?r(A) equals the degrees of freedom associated with Y'AY. Example 4.4.1 Consider the experiment described in Examples 4.3.1 and 4.3.3 in which a finite model was assumed. The sums of squares due to the random effect B and the fixed effect T are Y'A2Y and Y'AaY, respectively, where A2 = (Ifr - £jfe) <8> }J, ® Jj r , A3 = £jfc ® (I, - JJ,) ® ij r , and the btr x 1 random vector Y = (Y\\\,..., Y\\r,..., Ybti,..., Ybtr)'. The mean vector /z = E(Y) = E(y 1 1 1 ,..., r , l r , . . . , n , i , . . . , Ybtry = lb® (^ ..., /*,)' ® lr. Note that A2S = [trag + ali(BT)\^i anc^ AsS = [rcr|r + or^(gr)]A3. Therefore, by Theorem 4.4.1, the expected mean square of the random effect B equals
and the expected mean square of the fixed factor T equals
Linear Models
66
Thus, the expected mean square of the random effect B provides an unbiased estimate of tra\ + cr^(fir). Likewise, the expected mean square of the fixed factor T provides an unbiased estimate of rag +
4.5
ALGORITHM APPLICATIONS
The sum of squares and covariance matrix algorithms are now applied to a series of complete, balanced factorial experiments. As mentioned in Section 4.3, finite and infinite model covariance structures are reparameterizations of each other. Therefore, the choice between the finite and infinite model is somewhat arbitrary. For the remainder of this text, the finite model is assumed. Therefore, unless specifically stated, subsequent covariance matrices for complete, balanced factorial experiments will always be constructed with Rules El, E2, E2.1, E2.2, andE2.3. Example 4.5.1 Consider a two-way cross classification where the first two factors 5 and T are fixed with / = 1 , . . . , s and j = 1 , . . . , t levels and the third factor is a set of k = 1,..., r random replicates nested in the st combinations of the first two factors. Let Yijk be a random variable representing the klh replicate observation in the i/'* combination of the two fixed factors. Let the st r x 1 random
where //,,•; are constants representing the mean effect of the //* combination of the two fixed factors and R(ST)(ij)k are str random variables representing the effect of the nested replicates. Assume that the str random variables R(ST)(ij)k ~ iid Ni(0, cr|(57)). Therefore, the str x 1 random vector Y ~ N Jfr (^t, S) where the st r x 1 mean vector
and the str x str covariance matrix
In addition, by the sum of squares algorithm is Section 4.3, Y'AiY,..., Y'A5Y are the sums of squares due to the overall mean, the fixed factor 5, the fixed
4
Factorial Experiments
67
factor T, the fixed interaction of S and T, and the nested replicates, respectively, where
Furthermore, AWS = cr|(sr)Am for m = 1 , . . . , 5 where each Am is idempotent with rank(Ai) = 1, rank(A2) =5 — 1, rank(A3) = t — 1, rank(A4) = (s - l)(t - 1), and rank(A5) = st(r - 1). Thus, by Corollary 3.1.2(a), Y'A^Y ~ ^(SD/rW™)^) forTO= 1 , . . . . 5 where
68
Linear Models
By Theorem 3.2.1, the random variables Y'Am Y are mutually independent since A m EA w = crR(ST)AmA.n = Qstrxstr for m, n = 1 , . . . , 5 and m ^ n. The EMSs associated with each sum of squares are calculated using Theorem 4.4.1:
Bhat's lemma 3.4.1 also applies since S^rank^A^) = 1 + (s — 1) + (t — 1) + (s - !)(/ - 1) + r(s - l)(f - 1) = str, I, I, Ir = £5 =1 A m , and S = (Ei ==1 A m )S = E^^A^S) = sLi^ ( 5r)A w . Therefore, (i) Y'A m Y ~ a R(ST)^Tai± (A.m)(^m) and (ii) Y'AWY are mutually independent for m = 1 , . . . , 5.
4
Factorial Experiments
69
Example 4.5.2 Consider the same two-way layout as in Example 4.5.1 except now let S and T both be random factors. The model is
where a is a constant representing the overall mean effect; si are random variables representing the effect of the first random factor; Tj are the random variables representing the second random factor; STfj are random variables representing the interaction between S and T; and /?(Sr )(//)* are random variable defined as in Example 4.5.1. Assume the s random variables 5, ~ iid Ni(0, cr|); the t random variables Tj ~ iid NI (0, cr^); the st random variables ST,y ~ iid NI (0, crjT); and the str random variables /?(Sr)(,-_,-)* ~ iid Ni(0,
and, by the covariance algorithm, the str x str covariance matrix
The sum of squares matrices are not dependent on whether the factors are fixed or random. Therefore, the sum of squares Y'AWY for m = 1,..., 5 are the same as those given in Example 4.5.1. Furthermore, AWE = cmAm form = 1,..., 5 where
70
Linear Models
Examples 4.4.1, 4.5.1, and 4.5.2 provide the EMSs for three factorial experiments. Snedecor and Cochran (1978, p. 367) provide EMSs for the same three experiments in their Table 12.11.1. Snedecor and Cochran's EMSs are the same as the EMSs presented in this text, although they use different notation. Snedecor and Cochran's A, B, AB, error, a2, a\, G\,O\%, K2A, Kg, K2AB,a, b, and n are equivalent to our T, S,ST, /?(Sr),a^ (5r) ,a£,af,a| r , £)y=1 (A.j•• - f i . . f / ( t - 1), £?=i (A/. - A--) 2 /(* -1). Ef=i E',=i toy - A.-. - A.y + A») 2 /[(J - DC - 01, f, 5, and r, respectively. Example 4.5.3 Consider the split plot experiment discussed by Kempthorne (1952, sixth printing 1967, pp. 374-375). The experiment has a random replicate factor, R, with / = 1 , . . . , r levels; a set of fixed whole plot treatments, T, with j = 1,..., t levels; and a set of fixed split plot treatments, 5, with k = 1,..., s levels. In Kempthorne's Table 19.1 he lists the following sources of variation: replicates R, whole plot treatments T, replicate by whole plot interaction RT, split plot treatments S, split by whole plot interaction ST, and remainder. The remainder is equal to the interaction RS plus the interaction RST, or equivalently, the replicate by split plot treatment interaction nested in whole plot treatments RS(T). The fixed portions of the experiment are designated by 7\ 5, and ST with subscripts j and k. The random portions of the experiment are designated by R, RT, and RS(T). A model that identifies these sources of variation is
where the r random variables /?, ~ Ni(0, cr^); the r(t — 1) x 1 random vector (L P;)(/?r n ,..., RTrt}' ~ N r( ,_i)(0, alTlr <8> I,_i); and the rt(s - 1) x 1 random vector (I, ®ItP's)(RS(T)mi,..., RS(T)r(t)sY ~ N r , (j _i)(0, a^Ir <8> I, <8> L,_i) where P't and P^ are (t — 1) x t and (s — 1) x s lower portions of t- and s -dimensional Helmert matrices, respectively. Furthermore, these three sets of variables are mutually independent. Thus, the rts x 1 random vector Y = ( F i n , . . . , ^115,..., Yrti,..., YrtsY ~ Nr,5(/x, E) where the rts x 1 mean vector
and, by the covariance matrix algorithm, the rts xrts covariance matrix is
4
Factorial Experiments
71
By the sum of squares algorithm, the matrices AI, . . . , A.7 are
Furthermore, Am E = cmAm for m = 1,..., 7 where c1 = c2 = sta%, c-$ = c* = sa%T, and c5 = c6 = c7 = erJ5(r); Ir $ I, ® I, = E^ =1 A m ; E7=1rank (A m ) = l + (r-l) + a-l) + ( r - l ) ( r - l ) + (j-l) + (/-l)(5-l) + ( r - l ) / ( j - l ) = rrs; and E = (ELi A-)s = (ELi A - E > = EL=i^A w . Therefore, by Bhat's Lemma 3.4.1, (i) Y'AmY ~ cm/4ik(Am)(^) and (ii) Y'AWY are mutually independent for m = 1,..., 7 where
Linear Models
72
By Theorem 4.4.1 the expected mean squares are EMS (overall mean) = sta^ + rsf/l..2 EMS (replicate R) = sta^ t EMS (whole plot T) = sa%T + rs J^(A;. - A--) 2 /C - 1) 7=1
EMS (RT) = salT
4
Factorial Experiments
73
Kempthorne (1952) provides the EMSs for T, RT, S, ST, and RS(T) in his Table 19.2. Kempthorne's EMSs are the same as the EMSs given here, although Kempthorne uses different notation. Kempthorne's a2, tj,
where //.y* are st constants representing the mean effect of the jk^ combination of factors S and T; the /?/ are random variables representing the random effect of blocks; and the /?/,•* are random variables representing the random residual or remainder. It is our intention to write a covariance matrix that contains two variance components, one associated with the variance of the random variables Bj and one associated with the variance of the random variables /?,y* • However, to use the covariance algorithm, we must first rewrite the variables /?//* in terms of the factor letters B, S, and T. Note that Rfjk can be equivalently written as
where flS,y, B 7}*, and BSTtjk are random variables representing the interactic of B with 5, B with T, and B with ST, respectively. We could proceed from hei by putting the last two equations together to produce a model
However, the last model has four sets of random variables and would thus require a definition with four, not two, random components. The solution to the problem
74
Linear Models
is to view the st combinations of fixed factors S and T as one fixed factor, say, V, with st levels. If we let v = 1 , . . . , st designate the levels of factor V then the residual /?/,-* can be rewritten as
Now assume the random variables B, ~ Ni(0, aj) and the b(st — 1) x 1 random vector (lb <8> P;,) (BVn, • • •, BVb,st)' ~ N fr( ,,_i)(0, a\v\b <8> Is,_0 where P'st is the (5? — 1) x st lower portion of an st-dimensional Helmert matrix. Thus, the bst x 1 random vector Y = (Ym,..., YU(,..., Ybsi,..., Ybst)' ~ N^,(/x, E) where the ft^/ x 1 mean vector
The covariance matrix algorithm can be applied using the two factor letters B and V where the number of levels is b and st, respectively. The bst x bst covariance matrix, S, is constructed here:
The sum of squares algorithm is now used to calculate matrices AI , . . . , A$. Recall that the sum of squares remainder equals the sum of the sum of squares due to the interactions BS, BT, and BST. Thus, Ae equals the sum of the matrices associated with the sum of squares due to BS, BT, and BST.
4
Factorial Experiments
75
Again Am£ = cmAm for m = 1 , . . . , 6 where c\ = 02 = stcrg, c^ = c<\ = 05 = ce = orj v , Ib® ls <8> lt = Sj=1Aw, Shrank (A,,) = 1 + (b - 1) + (s - 1) + (t - 1) + (5 - l)(r - 1) + (b - l)[(s - 1) + (t - 1) + (s - \)(t - 1)] = bst, £ = (ELi A m)£ = (£LiA»E) = ^ =1 c m A m . Therefore, by Bhat's lemma 3.4.1, (i) Y'AmY ~ CmxZuk(\m)&m) and (ii) Y'AmY are mutually independent for m = 1,..., 6 where
Linear Models
76
By Theorem 4.4.1, the expected mean squares are EMS (overall mean) = stag + bstfl2.. EMS (random block 5) = stag
4
Factorial Experiments
77
EXERCISES 1. Consider an experiment where r replicate units are nested in each of the s levels of a fixed factor s. The t levels of a second fixed factor T are then applied to each of the sr experimental units. A model for this experiment is
for i = 1,..., 5, j = 1,..., r, and k = 1,..., t where R represents the replicate units nested in the s levels of factor 5. Assume the str x 1 random vector Y = (Y\n, • • • i Ynt,..., Yiri,..., Y\rt,..., Ysn,..., Ysit,..., Ysri,..., y f ,,)'~N, rr (/i,E). (a) Derive /x and E. (b) Write out the ANOVA table and define the matrices Am used in the sums of squares Y'Am Y for m = 1,..., 7 where A? is the matrix corresponding to the sum of squares total. (c) Derive the distributions of Y'Am Y for m = 1 , . . . , 6. (d) Calculate all the expected mean squares. (e) Construct all "appropriate" F statistics and explicitly define the hypothesis being tested in each case. Prove that all the statistics constructed above have F distributions. 2. Consider a factorial experiment with three fixed factors 5, T, and U where the factors have s, t, and u levels, respectively. Within the stu combinations of the three main factors, r random replicate observations are observed. Let R represent the random nested replicate factor. Assume the stur x 1 random vector Y = (y i m ,..., Ymr, • •-, Ystui,..., Ystury ~ N srttr (^, E). (a) Write a model for this experiment and derive the mean vector p, and the covariance matrix S. (b) Write out the ANOVA table and define the matrices Am used in the sums of squares Y'AmY for m = 1,..., 10 where AIQ is the matrix corresponding to the sum of squares total. (c) Derive the distributions of Y'AmY for m = 1,..., 9.
Linear Models
78
(d) Calculate all the expected mean squares. (e) Construct all "appropriate" F statistics and explicitly define the hypothesis being tested in each case. Prove that all the statistics constructed above have F distributions. 3. Redo Exercise 2 above when S is a fixed factor and T and U are random factors. 4. Redo Exercise 2 when 5, 7, and U are all random factors. 5. Consider the paired Mest problem introduced in Exercise 15 of Chapter 3. However, now view the problem as an experiment where two levels of a fixed factor T are applied to each of the n levels of a random factor B. Thus, the n levels of B are the n experimental units. The model for this problem is
where /zy are constants representing the average effect of the 7* level of fixed factors 7\ B, are random variables representing the effect of the zth experimental unit, and BTij are random variables representing the interaction effect of factors B and T. Assume the 2n x 1 random vector Y = (a) Derive fj, and use the covariance algorithm to define the covariance matrix E in terms of a\ and a\T. (b) Write out the ANOVA table and define the matrices Am used in the sums of squares Y'AmY for m = 1 , . . . , 5 where A5 is the matrix corresponding to the sum of squares total. (c) Derive the distributions of Y'AmY for m = 1 , . . . , 4. (d) Calculate all the expected mean squares. (e) Construct a statistic to test HO : n\ =1^2 versus HI : /MI ^ /Z2- Does the statistic you constructed have an F distribution? Prove your answer. (f) Prove that the mean square due to factor T divided by the mean square due to the BT interaction is equal to T2 where T is defined in Exercise 15b from Chapter 3. 6. An animal scientist was interested in evaluating the effect of five different types of feed on the weight gain of cattle. An experiment was run where 60 cattle were randomly assigned to 15 pens with 4 cattle in each pen. Each pen then received a certain type of feed. The assignment of feeds to pens was done randomly with 3 pens in each of the five feed types. Weight observations on each head of cattle were then taken at three different times. Let F,^/ be a random variable representing the weight gain observed at the 7th time period, on the fc* head of cattle, in the y* pen, and being fed the /* feed type for
4
Factorial Experiments
79
/ = 1,2,3,4,5,; = l , 2 , 3 , f c = 1, 2, 3,4, and/ = 1, 2, 3. Assume the 180x1 random vector Y = (Ymi,.. •, ^1113, • • •, *534i, • • •, ^5343)' ~ Ni 8 o(M> s)- A model for this problem is
where //,,-/ are constants representing the average effect of the zm feed type at the /* time and P(F)(0;, C(FP)(ij)k, PT(F)(i)jl, and CT(FP)(ij)kl are random variables representing the effects of pens nested in feed types, cattle nested in feed pens, the interaction of time by pens nested in feeds, and the interaction of time and cattle nested in feed pens, respectively. (a) Derive fi and E. (b) Write out the ANOVA table and define the matrices \m used in the sums of squares Y'AmY for m = 1 , . . . , 9 where Ag is the matrix corresponding to the sum of squares total. (c) Derive the distributions of Y'AWY for m = 1,..., 8. (d) Calculate all the expected mean squares. (e) Construct all "appropriate" F statistics and explicitly define the hypothesis being tested in each case. Prove that all the statistics constructed above have F distributions. 7. A pump manufacturer wanted to evaluate how well his assembled pumps performed. He ran the following experiment. He randomly selected 10 people to assemble his pumps, randomly dividing them into two groups of 5. He then trained both groups to assemble pumps, but one group received more rigorous instruction. The two groups were therefore identified to be of two skill levels. Each person then assembled two pumps, one pump by one method of assembly and a second pump by a second method of assembly. Each assembled pump then pumped water for a fixed amount of time and then repeated the operation later for the same length of time. The amount of water pumped in each time period was recorded. The order of the operation (first time period or second) was also recorded. Let F,^/ be a random variable representing the amount of water pumped during the /* time period or order, on a pump assembled by the £* method and the y* person in the /* skill level for i = 1, 2, j = 1, 2, 3,4, 5, k = 1, 2, and/ = 1, 2. Assume the 40x 1 random vector Y =
(Fun, Ym2, Fn2i, Fn22, • • • , Y25U, ^2512* ^2521» ^2522)' ~ N4o(//., £).
A model for this problem is
where //,*/ are constants representing the average effect of the / order in the A;* method with the Ith skill level, P(S)(i)j, PM(S)(i)jk, and P0(SM)(0;W are
Linear Models
80
random variables representing the effects of people nested in skill levels, the interaction of methods and people nested in skill levels, and the interaction of order and people nested in skill levels and methods. (a) Derive /u and E. (b) Write out the ANOVA table and define the matrices \m used in the sums of squares Y'AmY for m = 1 , . . . , 12 where Y'A^Y is the sum of squares total. (c) Derive the distributions of Y'A^Y for m = 1 , . . . , 11. (d) Calculate all the expected mean squares. (e) Construct all "appropriate" F statistics and explicitly define the hypothesis being tested in each case. Prove that all the statistics constructed above have F distributions. 8. Consider Exercise 7. (a) Calculate the standard error of ?i... — ?2... where f,-... = £)y=i Z^=i ]C/=i r y u /20fori = l,2. (b) Find an unbiased estimator of /11.12 — £2.12 where £/.*/ = ]C;-=i fajki/5 and calculate the standard error of the estimator. 9. Prove that the necessary and sufficient conditions of Bhat's Lemma 3.4.1 are satisfied in the following situation. Consider any complete, balanced factorial experiment with n observations where the n x 1 random vector Y ~ N n (/x, £). The covariance matrix algorithm rules £ 1, £2, £2.1, £2.2, and £2.3 are used to derive £. The sum of squares algorithm rules Al, A2, A2.1, A2.2, and A2.3 are used to derive the k sum of squares matrices AI ..., A* with X)«=i Az- = ln •
5
Least-Squares Regression
In this chapter the least-squares estimation procedure is examined. The topic is introduced through a regression example. Later in the chapter the regression model format is applied to a broad class of problems, including factorial experiments.
5.1
ORDINARY LEAST-SQUARES ESTIMATION
We begin with a simple example. An engineer wants to relate the fuel consumption of a new type of automobile to the speed of the vehicle and the grade of the road traveled. He has a fleet of n vehicles. Each vehiclOe is assigned to operate at a constant speed (in miles per hour) on a specific grade (in percent grade) and the fuel consumption (in ml/sec) is recorded. The engineer believes that the expected fuel consumption is a linear function of the speed of the vehicle and the speed of the vehicle times the grade of the road. Let Y, be a random variable that represents the observed fuel consumption of the Ith vehicle, operating at a fixed speed, on a road with a constant grade. Let jc,i represent the speed of the Ith vehicle and let Xj2 represent the speed times the grade of the z'th vehicle. The expected fuel
81
Linear Models
82
consumption of the Ith vehicle can be represented by
where fa, ft\, and fa are unknown parameters. Due to qualities intrinsic to each vehicle, the observed fuel consumptions differ somewhat from the expected fuel consumptions. Therefore, the observed fuel consumption of the Ith vehicle is represented by
where £, is a random variable representing the difference between the observed fuel consumption and the expected fuel consumption of the Ith vehicle. An example data set for this fuel, speed, grade experiment is provided in Table 5.1.1. In a more general setting consider a problem where the expected value of a random variable 7, is assumed to be a linear combination of p — 1 different variables
Adding a component of error, £,-, to represent the difference between the observed value of Yt and the expected value of F, we obtain
By taking expectations on the right and left sides of the preceding two equations, we obtain E(£/) = 0 for all / = !,...,«. Table 5.1.1 Fuel, Speed, Grade Data Set i l 2 3 4 5 6 7 8 9 10
Fuel Yi
Speed xi \
Grade
Speed x Grade x12
1.7 2.0 1.9 1.6 3.2 2.0 2.5 5.4 5.7 5.1
20 20 20 20 20 50 50 50 50 50
0 0 0 0 6 0 0 6 6 6
0 0 0 0 120 0 0 300 300 300
5
Least-Squares Regression
83
The model just discussed can be expressed in matrix form by noting that
where the n x 1 random vector Y = (Y\,..., Yn)', the p x 1 vector {3 = (£o, P\ • • •, Pp-\Y, the n x 1 random vector E = (E\,..., £„)' and the n x p matrix
Furthermore, E(£,) = 0 for all / = 1 , . . . , n implies E(E) = 0 M X i. Therefore E(Y) = X(3. For the present, assume that the E,'s are independent, identically distributed random variables where var(E/) = a2 for all i = 1 , . . . , n. Since the EI'S are independent, co\(Ei, £/) = 0 for all i ^ j. Therefore, the covariance matrix of E is given by XI = cov(E) = a2ln. In later sections of this chapter more complicated error structures are considered. Note that £ has been used to represent the covariance matrix of the n x 1 random error vector E. However, S is also the covariance matrix of the n x 1 random vector Y since
Since the *,•_/ values are known for / = ! , . . . , « and j = I,..., p — l,xj = Z)Li xij/n can be calculated for any j. Therefore, the preceding model can be equivalently written as
84
Linear Models
and Xc is an n x (/?—!) matrix such that 1^XC = Oi X (p-i). This later model form is called a centered model. Without loss of generality, a model can always be assumed to be centered since any model Y = X/3 + E can be written as Y = X*/3* + E. The asterisks on the centered model are subsequently dropped since X can always be considered a centered matrix if necessary. In the next example, the 10 x 3 centered matrix X is derived for the example data set. Example 5.1.1 For the example data given in Table 5.1.1, the average speed is x.i = [5(20) + 5(50)]/10 = 35 and the average value of speed x grade is jc.2 = [6(0) + (1)120 + 3(300)]/10 = 102. Therefore, the 10 x 3 centered matrix X = (lio|Xc) where
The main objective to this section is to develop a procedure to estimate the p unknown parameters fii, fii,..., fip-\. One method that provides such estimators is called the ordinary least-squares procedure. The ordinary least-squares estimators of /So, j # i , . . . , fip-i are obtained by minimizing the quadratic form Q with respect to the p x 1 vector ft where
To derive the estimators, take the derivative of Q with respect to the vector ft, set the resulting expression equal to zero, and solve for ft. That is,
or X'X/3 = X'Y. If X'X is nonsingular [i.e., rank (X'X) = p] then the leastsquares estimator of (3 is
5
Least-Squares Regression
85
Thus, the ordinary least-squares estimator of the p x 1 vector /? is a set of linear transformations of the random vector Y where (X'X)"^' is the p x n transformation matrix. If E(Y) = X/3, /3 is an unbiased estimator of /3 since
Furthermore, the p x p covariance matrix of /3 is given by
It is also generally of interest to estimate the unknown parameter a2. The quadratic form
provides an unbiased estimator of a2 when E = cr2ln since
In the next example the ordinary least-squares estimates of /3 and a2 are calculated for the example data set. The IML procedure in SAS has been used to generate all the example calculations in this chapter. The PROC IML programs and outputs for this chapter are presented in Appendix 1. Example 5.1.2 For the example data given in Table 5.1.1, the least-squares estimate of j3 is given by
Therefore, the prediction equation is
Linear Models
86
The ordinary least-squares estimator of a2 is given by
5.2
BEST LINEAR UNBIASED ESTIMATORS
In many problems it is of interest to estimate linear combinations of $ > , . . . , Pp-i, say, t'/3, where t is any nonzero p x 1 vector of known constants. In the next definition the "best" linear unbiased estimator of t'/3 is identified. Definition 5.2.1 Best Linear Unbiased Estimator (BLUE) oft'(3: The best linear unbiased estimator of t'/3 is (i)
a linear function of the observed vector Y, that is, a function of the form a'Y + ao where a is an n x 1 vector of constants and flo is a scalar and
(ii) the unbiased estimator of t'/3 with the smallest variance. In the next important theorem t'0 = t'CX'X^X'Y is shown to be the BLUE of t'/3 when E(E) = 0 and cov(E) = a2ln. The theorem is called the Gauss-Markov theorem. Theorem 5.2.1 Let Y = Xp + E where E(E) = 0 and cov(E) = cr2In. Then the least-squares estimator oft'(3 is given by t'/3 = t'CX'X^X'Y and t'j3 is the BLUE oft'(3. Proof: First, the least-squares estimator of t'(3 is shown to be t'/3. Let T be a p x p nonsingular matrix such that T = (t|To) where t is a p x 1 vector and TO is a p x (p - 1) matrix. If R = T'"1 then
The least-squares estimate of u is given by
5
Least-Squares Regression
87
Therefore, t'/3 is the least-squares estimator of t'fl. Next, t'/3 is shown to be the BLUEof t'ft. Linear estimators of intake the form a'Y+a0- Since t'CX'X)"1^ is known, without loss of generality, let a' = t'CX'X^X'+b'. Then linear unbiased estimators of t'/3 satisfy the relationship
Therefore, in the class of linear unbiased estimators b'X/3 + a0 = 0 for all /3. But for this expression to hold for all /3, b'X = Oi x p and ao = 0. Now calculate and minimize the variance of the estimator a'Y -f ao within the class of unbiased estimators of t'/3, (i.e., when b'X = Oi x p and ao = 0).
But a2 and t'(X'X) 4 are constants. Therefore, var(a'Y + ao) is minimized when b'b = 0 or when b = O p x i- Therefore, the BLUE of t'/3 has varianceCT2t'(X'X)-4. But t'/3 is a linear unbiased estimator of t'/3 with variance a 21' (X'X) ~! t. Therefore, f4istheBLUEoff/3. • Example 5.2.1 Consider the example data set given in Table 5.1.1. By the Gauss-Markov theorem, the best linear unbiased estimate of fi\ — fa is t'/3 = (0,1,-1)(3.11,0.01348, 0.01061)'= 0.00287. 5.3
ANOVA TABLE FOR THE ORDINARY LEAST-SQUARES REGRESSION FUNCTION
An ANOVA table can be constructed that partitions the total sum of squares into the sum of squares due to the overall mean, the sum of squares due to fi\,..., ftp-\, and the sum of squares due to the residual. The ANOVA table for this model is given in Table 5.3.1. The sum of squares under the column "SS" can be applied to any form of the n x p matrix X. The sum of squares under the column "SS Centered" can be applied to centered matrices X = [1W|XC] where l'nXc = OI X ( P -I).
Linear Models
88
Table 5.3.1 Ordinary Least-Squares ANOVA Table Source
df
SS
Overall mean
1
V'il 1
SS Centered V nj"1
— - iV n-J1/ iY »
Regression (fi\ , . . . , Pp-\)
p-l
Y'[X(X'X)->X' - ijn]Y = Y'XC(X;XC)-' X;Y
Residual
n-p
¥'[!„- X(X'X)-1X']Y = YU - ijn
-xc(x;x>- 'X^]Y Total
n
Y'Y
The expected mean squares for each effect are calculated using Theorem 1.3.2:
and EMS (residual) = E(
5
Example 5.3.1
The ANOVA table for the data in Table 5.1.1 is given here:
Source Overall mean Regression (P , fa) Residual Total
5.4
89
Least-Squares Regression
df
ss
1 2 7
96.721 24.070 0.419
10
121.210
WEIGHTED LEAST-SQUARES REGRESSION
In the first three sections of this chapter the model was confined to Y = Xp + E where E(E) = 0 and cov(E) = o 2 I n . In this section, the model Y = XB + E is considered when E(E) = 0, cov(E) = cr2V, and V is an n x n symmetric, positive definite matrix of known constants. Because V is positive definite, there exists an n x n nonsingular matrix T such that V = TT. Premultiplying both sides of the model Y = Xp + E by T-1 we obtain
where Yw = T - 1 Y,X W = T - 1 X, and Ew = T -1 E. Therefore, E(EW) = T -1 E(E) = Opx1 and cov(Ew) = cov(T -1 E) = T -1 (a 2 V)T- 1/ = a2In. The weighted least-squares estimators of B and cr2 are derived using the ordinary leastsquares estimator formulas with the model Yw = X w B+E w . That is, the weighted least-squares estimators of B and cr2 are given by
and
The Gauss-Markov theorem can also be generalized for the model Y = XB+E where E(E) = 0 and cov(E) = cr2V. For this model, the weighted least-squares estimator of t'B is given by t'Bw = t'(X'V-1Xr'X'V-1 Y and t'/3w is the BLUE of t'/3. The proof is left to the reader.
Linear Models
90
Table 5.4.1 Weighted Least-Squares ANOVA Table Source
df
SS
Overall mean
1
Y'V-1 ln(l'nV-lln)-1
•i;v -i Y
U^v-'ij-'i^v-1 Y X(X'V-»X) -IX/V-IJY
Regression (/?!,. ..,0 p _i)
p-1
Y'V [X(X'V-'X)-1 X'-
Residual
n- p
Y'[V-
Total
n
Y'V-1 Y
1
-V -1
The ANOVA table for weighted least-squares regression functions can be constructed using Table 5.3.1 and substituting T-1X for X and T-1Y for Y. The weighted least-squares ANOVA table is provided in Table 5.4.1. In the next example a weighted least-squares ANOVA table is derived for the fuel, speed, grade data set.
Example 5.4.1 Consider the data in Table 5.1.1. Suppose the fuel consumption observations are independent but the variance of the observations at speed 50 mph is twice the variance of observations at speed 20 mph. Then cov(E) = <j 2 V where the 10 x 10 matrix
The weighted least-squares estimates of ft and a2 are
Therefore, the prediction equation is
or
The weighted least-squares estimator of cr2 is given by
5
91
Least-Squares Regression The weighted least-squares ANOVA table is Source Overall mean Regression (fa , fa) Residual Total
df
ss
1 2 7
57.4083 14.5813 0.2654
10
72.2550
Thus far in Chapter 5 no assumptions have been made about the functional form of the distribution of the n x 1 random vector E (and hence about the distribution of Y). However, if we want to test hypotheses or construct confidence bands on model parameters, then we need to make an assumption about the functional form of the distribution of E. It is common to assume that the n x 1 random vector E has a multivariate normal distribution where E(E) = Oandcov(E) = E. In the simplest case the Ei's are independent, identically distributed normal random variables such that E ~ Nn(0, cr2In). In more complicated problems, it is assumed that E ~ Nn (0, E) where E is an n x n matrix whose elements are functions of a series of unknown variance components. In Section 5.5 model adequacy is discussed when E ~ Nw(0, cr2In). In Section 5.6 least-squares regression is developed for complete, balanced factorial experiments where the n x n covariance matrix E is a function of a series of unknown variance components. Later in Chapter 6 a general discussion on confidence bands and hypothesis testing is provided.
5.5
LACK OF FIT TEST
In this section assume that the n x 1 random error vector E ~ Nn (0, cr2In). It is of interest to check whether the proposed model adequately fits the data. This lack of fit test requires replicate observations at one or more of the combinations of the jci, x2,. ..,x P _ 1 values. Since the elements of the n x 1 random vector Y = (Y\,..., Yn)' can be listed in any order, we adopt the convention that sets of F, values that share the same j c i , . . . , xp-1 values are listed next to each other in the Y vector. For example, in the data set from Table 5.1.1, the 10 x 1 vector Y = (1.7, 2.0, 1.9, 1.6, 3.2, 2.0,2.5, 5.4, 5.7, 5.1)' with Y1-Y4 sharing a speed equal to 20 and a speed x grade equal to 0, Y5 having a speed equal to 20 and a speed x grade equal to 120, Y6-Y7 sharing a speed equal to 50 and a speed x grade equal to 0, andY8-Y10sharing a speed equal to 50 and a speed x grade equal to 300. When replicate observations exist within combinations of the x 1 , . . . , xp-1 values, the residual sum of squares can be partitioned into a sum of squares due to pure error plus a sum of squares due to lack of fit. The pure error component is a
92
Linear Models
measure of the variation between Yi, observations that share the same X 1 , . . . , xp-1 values. In the example data set from Table 5.1.1, the pure error sum of squares is the sum of squares of Y1-Y4 around their mean plus the sum of squares of Y5 around its mean (zero in this case), plus the sum of squares of Y6-Y7 around their mean plus the sum of squares of Y8-Y10 around their mean. In general the sum of squares pure error is given by
where Ape is an n x n block diagonal matrix with the jth block equal to Ir. — ^-Jr. for j = 1,...,k where k is the number of combinations of x1,..., xp-1 values that contain at least one observation and rj is the number of Yi values in the jth combination with n = ]T)*=1 rj- Note that Ape is an idempotent matrix of rank n — k and JwApe = O nxn . Furthermore, the first r\ rows of the matrix X are the same, the next r2 rows of X are the same, etc. Therefore, ApeX = O nXjP - In balanced data structures r\ — ri = • • • = A> = r, n = rk, and the n x n pure error matrix Ape can be expressed as the Kronecker product Ik 0 (L — £ J r ). For the fuel, speed, grade data set, the 10 x 10 pure error sum of squares matrix Ape is derived in the next example. Example 5.5.1 From the Table 5.1.1 data set, four groups of Yi 's share the same speed and grade values. Therefore, k = 4, r1 = 4, TI = 1, r3 = 2, r4 = 3, and Ape is given by
In this example, r2 = lsoL 2 — ^-Jr2 = 1 — jl =0. Thus, thefifthdiagonal element of Ape equals the scalar 0, indicating that observation Y5 does not contribute to the pure error sum of squares. The rank(Ape) = 10 — 4 = 6. The sum of squares lack of fit is calculated by subtraction. Therefore, SS (lack of fit) = SS (residual) - SS (pure error)
The sums of squares due to the overall mean, regression, lack of fit, pure error, and total are provided in Table 5.5.1. Note that [!„ - X(X / X)~ 1 X' - Ape]a2In = a2[ln - X(X'X)~1X' - Ape] where \n — X(X'X)~1X/ — Ape is an idempotent matrix of rank k — p. Likewise, [Ape]cr2In = a2 A^ where Ape is an idempotent matrix of rank n — k . Therefore, by
5
93
Least-Squares Regression Table 5.5.1 ANOVA Table with Pure Error and Lack of Fit Source
df
SS
SS Centered — V I nI JTnY l
Overall mean
1
Y'ij n Y
Regression (fi\ , . . . , fip-\ )
p-l
Lack of fit
k-p
Y'[X(X'X)-'X'- JW* = Y'XC(X^XC)'-'X£Y YU-XCX'Xr'X'-ApelY = Y [In — - Jn -Ape
-xc(xpo-' X;JY Pure error
n-k
Y'ApeY
Total
n
Y'Y
= Y'ApeY
Corollary 3.1.2(a), the sum of squares lack offit¥'[!„ - XCX'Xr'X' - Ape]Y ~ &2Xk-p(^iof) and the sum of squares pure error Y'ApeY ~ cr2x^_fc(A.pe) where
and Furthermore, by Theorem 3.2.1, the lack of fit sum of squares and the pure error sum of squares are independent since
Therefore, the statistic
Note that if E(Y) = X£, then
If E(Y) / X/3 then A.]0f > 0. Therefore, the hypothesis HO : A.i0f = 0 versus HI : Xiof > 0 is equivalent to H0 : E(Y) = \(3 versus HI : E(Y) / X/3. The statement E(Y) = X/3 implies that the model being used in the estimation
Linear Models provides a good fit and therefore may be appropriate. Thus, a y level rejection region for the hypothesis HO versus HI is as follows: Reject HO if F* > F%_p
df 1 2 1
6
SS
MS
96.796 24.070 0.014 0.405
96.796 12.035 0.014 0.0675
F*
0.21
10
The lack of fit test statistic F* = 0.21 < FfJ5 = 5.99 indicating that the equation Y = 1.556 + 0.013848*1 + 0.01601jc,2 provides a good fit of the data.
5.6
PARTITIONING THE SUM OF SQUARES REGRESSION
In Table 5.3.1 the sum of squares regression was expressed with p — 1 degrees of freedom. This sum of squares represented the total influence of the variables j c i , . . . , X p - 1 in the ordinary least-squares regression. It is often of interest to check the contribution of a particular variable (or variables) given that other variables are already in the model. Such contributions can be calculated by partitioning the n x p matrix X as
where Xy is an n x PJ matrix for j = 1,..., m, p = Y^J=I Ph anc^ ^1 = InIf *i = Xi, R2 = (Xi|X 2 ),.. -, Rm-i = (Xi|X 2 | • • • |Xm_0 and Rm = X, then the sum of squares due to the p3 variables in Xj given that X1, X 2 , . . . , Xj-1 are already in the model is given by
Such conditional sums of squares are often called Type I sums of squares. The entire ANOVA table with the Type I sums of squares is presented in Table 5.6.1.
5
95
Least-Squares Regression Table 5.6.1 Type I Sum of Squares ANOVA Table Source
df
Type I SS
Overall mean X1
P1 = l
Y'R1(R'1R1)-1R'1Y = Y'ijl,Y
X2|X1
P2
X 3 |X 1 ,X 2
P3
Y'[R2(R'2R2)-1 R'2-R1(R'1R1) Y'[R 3 (R^R 3 )-'R^-R 2 (R 2 R 2 )-1R'2]Y
Xm |X1 , X2, . . . , Xm-1
Pm
Residual
n- p
Y'[X(X'X)-'X'-R m i(R' ,R m i)~'R' ,]Y Y'O,, -X(X'X)-'X')Y
Total
n
Y/Y
Note that the sum of squares due to all sources of variations still add up to the total sum of squares Y'Y. The Type I sums of squares for the fuel, speed, grade data set are provided below with the output provided in Appendix 1. Example 5.6.1 Using the example data set from Table 5.1.1, the Type I sums of squares are provided for the overall mean, for the speed variable x1 given the overall mean, and for the speed x grade variable x2 given the overall mean and x1.
Source Overall mean x\ | overall mean X2 |overall mean, x1 Residual Total
df
SS
1 1 1 7
96.721 10.609 13.461 0.419
10
121.210
Some useful relationships are developed next. Note R'jRj(R'jRj)-1 !R^ = Ry for any j = 1 , . . . , m. That is,
96
Linear Models
Therefore, for any 1 < i < j < m
and
It is left to the reader to show that the above relationships imply that the matrices RyCR^-R/)"1!^. -Rj-iCR'^R,--!)"1!^.! are idempotent of rank pj for any j = 2 , . . . , m and to show that any two matrices RKR-R.O^R- - Rj-iCRj.jR,--!)" 1 Rj_i andRyCR^Rj-r^-Ry-iCRj-iRj-ir^-i are orthogonal for any i ^ j. In certain problems the n x Pj matrix X, is orthogonal to the n x pi matrix Xi for all i ^ j, i, j = 1,..., m. In such cases the Type I sums of squares due to Xj; |X1,..., Xj-1i reduce to much simpler forms for all j = 1 , . . . , m. That is, if X;X;- = 0,,-xp; for all i ^ j, then
Therefore, the Type I sum of squares due to Xj | X1 , . . . ,Xj-1i for any j = 1,..., m is given by
In the previous sections, the model Y = XB + E was introduced within the context of a regression analysis. However, Y = XB + E is a very general model form that can be used to describe a very broad class of experiments, including complete, balanced factorial experiments. In the next section the general linear model, Y = XB+E, is adapted to complete, balanced factorial experiments where the n x n covariance matrix, cov(E) = E, may be a complicated function of many
5
Least-Squares Regression
97
variance components. In Chapters 6 and 7 more general applications of the model Y = X/3 + E are introduced.
5.7
THE MODEL Y = X/3 + E IN COMPLETE, BALANCED FACTORIALS
The experiment presented in Section 4.1 has b random blocks, / fixed treatments, and r random replicates nested in each block treatment combination. The btr x 1 random vector Y = ( K m , . . . , Y11 r ,...,y b t l ,..., Y btr ) ~ Nbtr(v>, E) where the btr x 1 mean vector and the btr x btr covariance matrix are given by
and
This experiment can be characterized by the general linear model Y = X/3+E. First, cov(E) equals the btr x btr covariance matrix E. Next, the btr x 1 vector p, must be reconciled with the btr x 1 mean vector E(Y) = XB from the general linear model. Note that the btr x 1 mean vector fi is a function of the t unknown parameters ii\,..., //.,. Therefore, the general linear model mean vector XB must also be written as a function of /z i , . . . , \it. One simple approach is to let the t x 1 vector /3 = ( / > t i , . . . , /x,)' and let the btr x t matrix X = \b <8> I, <8> lr. Then the btr x 1 mean vector of the general linear model is
The preceding example suggests a general approach for writing the mean vector \L as XB for complete, balanced factorial experiments. First, if /x is a function of p unknown parameters, then let B be a p x 1 vector whose elements are the p unknown parameters in fi. In general these elements will be subscripted, such as fjLijk. The elements of B should be ordered so the last subscript changes first, the second to the last subscript changes next, etc. The corresponding X matrix can then be constructed using a simple algorithm. The previous experiment is used to develop the algorithm rules. Rule XI Construct column headings where the first column heading designates main factor letters and the second heading designates the number of levels of the factor, l. Place Kronecker product symbols <8> as described in Example 5.7.1.
Linear Models
98
Rule X2 Place \t in the Kronecker product under the random factor columns. Rule X3 Place \t elsewhere. Example 5.7.1
Rules XI, X2, and X3 for the example model. Factor Levels t
B b
T t
R r
where Rule X2 is designated by and Rule X3 is designated by . This formulation of the X matrix and its associated (3 vector is not unique. Another X matrix and /3 vector can be generated for the same experiment. This second formulation of X and ft is motivated by the sum of squares matrices Am from Section 4.2. In the example experiment, the sum of squares matrices for the mean, blocks, treatments, block by treatment interaction, and the nested replicates are given by
respectively. Matrices A1 through A5 can be rewritten as A1 = X1X1', A2 = ZiZ1', A3 = X2X'2, A4 = Z2Z'2, and A5 = Z3Z'3 where
where the (l — 1) x t matrix P'l is the lower portion of an l-dimensional Helmert matrix. Note that X1 and X2 are associated with the fixed factor matrices and
5
Least-Squares Regression
99
Z1, Z2, and Z3 are associated with the random factor matrices. In this form X '1 X1 = 1, Z'1Z1 = Ib-1, X'2X2 = It-1, Z'2Z2 = I(b-1)(t-1), and Z'3Z3 = Ibt(r-1). Now let the btr x t matrix X = (X1 |X2) where X1 and X2 are the btr x 1 and btr x (t — 1) matrices defined earlier. Note that X'1X2 = 011x(t-1). Then define the t x 1 vector ft such that XB = fj,. Premultiplying this expression by (X'X^X' we obtain
or
A third formulation of the matrix X can be constructed by writing A1 = X1 (XiXO^Xi, A2 = ZiCZiZirX, A3 = X2(X2X2)-1X2, A4 = Z2(Z2Z2)-1Z2, and A5 = Z3(Z^Z3)-1Z/3 where
and where the t x (t — 1) matrix Ql is given by
Linear Models
100
Note that the columns of Q^ equal ^/j(j + 1) times the columns of PI where j is the column number for j = 1,..., t — 1. Now let the btr x t matrix X = (Xi 1X2) where Xi and X2 are the btr x 1 and btr x (t — 1) matrices defined above. Note X'tX2 = O i x ( f - l ) -
It is apparent that a number of different forms of the matrices X, Xi, X 2 , . . . , Z1, Z2,.. can be constructed in complete balanced factorial designs. Furthermore, in any particular problem, one form of the matrix X can be defined while another form of the X1, X2,..., Z1, Z 2 , . . . matrices can be used to construct the sum of squares matrices. For example, in the previous experiment, the btr x t matrix X can be defined as X = \b ® I, <8> lr. Then with X1 = 1b, ® 1, <8> lr, Z1 = Qb, <8> 1, <8> lr, X2 = 1b <8> Q, (8) lr, Z2 = Qb <8> Q, <8> lr, Z3 = lb If <8> Qr, the sum of squares matrices can be constructed as AI = X1 (X'1X1)-1X'1,, A2 = Z1(Z'1ZO-'Z',, A3 = X2(X'2X2)-1X'2, A4 = Z'2(Z2Z2)-1Z'2, and A5 = Z(Z' 3 Z 3 ) -1 z' 3 . In general, any acceptable form of the X matrix can be used with any acceptable form of the matrices X1, X 2 , . . . , Z1, Z 2 ,..., where the later set of matrices is used to construct the sums of squares matrices.
EXERCISES 1. From Table 5.3.1, let B1, B2, and B3 represent the matrices for the sums of squares due to the overall mean, regression and residual, respectively. Prove Br2 = Br for r = 1, 2, 3 and BrBs = 0 for r / s. 2. Let Y = X/3 + E where X is an n x p matrix and E ~ Nn(0, a 2 V) for any n x n symmetric, positive definite matrix V. (a) Is /3 = (X'X^X'Y an unbiased estimator of /3? (b) Prove that if there exists a p x p nonsingular matrix F such that VX = XF then /3 = /3w where /3W is the weighted least-squares estimator of (3. 3. Let Y = X/3 + E where X is an n x p matrix and E ~ Nn(0,
R'j - Rj-1
5
Least-Squares Regression
101
6. Assume the model Y = Xp + E where X is defined as in Section 5.6 and E ~ Nn(0,
(b) Find the variance of the BLUE of b\ —b^. 10. Let YI — \JL\ + E\, ¥2 = H2 + £2, and ¥3 = /JL\ + ^2 + EI where E(Et) = 0 and E(£(2) = 2, E(£,£;) = 1 for / ^ j = 1, 2, 3. (a) Find the BLUEs of u1 and ^2(b) Find the covariance between the BLUEs of JJL\ and ^2. (c) Find the BLUE and the variance of the BLUE of I\JL\ + 3/^211. Let y, = Xjft + U, for i = I,... ,n and 0 < x\ < x2 < • • • < xn where Ut = E\ + £2 + • • • + E{ and the £/'s are uncorrelated with £(£,) = 0, var(E/) = a2(jc, — jc,_i) for / > 1 and var(£i) = a2x\. (a) Find the BLUE of P and show it depends only on (Yn, xn). (b) Find the variance of the BLUE of p. (Hint: Transform the y, values into y* values such that Y* = Yt -Yt^ for / > 1 and Y* = y,.)
Linear Models
102
12. Consider the following design layout:
Let Y - ^ 0 l a < 8 > l n + X ^ + E where Y = (!-„,..., y l n , . . . , y a l , . . . , y f l n y , 0o is an unknown scalar parameter, ft = ( 0 i , . . . , 0^)' is a p x 1 vector of unknown parameters, X = X* ln with X* an a x /? matrix of known values, / ? < « — !, l^X* = Oi x/7 , and the an x I vector E ~ Nan(0, S) with E = I fl 0(a 1 2 I n +a 2 2 J n )]. (a) Let Y'AiY, Y'A2Y, Y'A3Y, and Y'A4Y be the sum of squares due to 0o, 0, lack of fit, and pure error, respectively. Find AI, A2, AS, and A4. (b) Find the distributions of Y'AiY, Y'A2Y, Y'A3Y, and Y'A4Y. (c) Are Y'AiY, Y'A2Y, Y'A3Y, and Y'A4Y mutually independent? (d) Assume no significant lack of fit in the model. Construct a test for the hypothesis HO : ft = 0 versus HI : /3 ^ 0. 13. Consider this factorial layout:
Let the 8 x 1 vector Y = (rm, Ym, Ym, Y122, Y2n, ^212, ^221, ^222)'Assume the model Y = X/3+EwhereX = [X1|X2|X3|X4],Xi = 12®12<8>12, X2 = Q2®12(8)12,X3 = 12<8>Q2<8>12,X4 = Q2<8>Q2«)l2,Q2 = (1, -1)', ft = (0i, 02, 03» 04), and E ~ Ng(0, cr2l8). Note that the vector 0i corresponds to the overall mean, 02 to the fixed factor A, 03 to the fixed factor B, and 04 to the interaction of A and B. (a) Define the sums of squares due to 0i, 02, 03, and 04. (b) Find /3, the least-squares estimator of ft. (c) Calculate the standard error of ft.
5
Least-Squares Regression
103
14. Let Yij =//, + Bi + R(B)(i)j where /JL is an unknown constant, and the S,'s and /?(B)(/)'s are uncorrelated random variables with 0 means and variances a\ and cr|(B), respectively, for i = 1,..., b and j = 1,..., r. (a) Write the model as Y = X(3 + E where E(E) = 0 and cov(E) = E. Identify all terms explicitly. (b) Construct the matrices for the sums of squares for the usual ANOVA table for this model and derive the corresponding expected mean squares. (c) Assume the Yij's are normally distributed. Find the distributions of the sums of squares defined in part b.
This page intentionally left blank
6
Maximum Likelihood Estimation and Related Topics
This chapter deals with maximum likelihood estimation of the parameters of the general linear model Y = X/3 + E when E ~ Nn(0, E). The maximum likelihood estimators of ft and E are the parameter values that maximize the likelihood function of the random vector Y. In the first section of the chapter, the discussion is confined to the cases where E = a2ln and E = a2V when V is known. In the second section, the concepts of invariance, completeness, sufficiency, and minimum variance unbiased estimation are discussed. In the third section, maximum likelihood estimation is developed for more general forms of E. Finally, the likelihood ratio test and related confidence bands on linear combinations of the p x 1 vector ft are examined.
6.1
MAXIMUM LIKELIHOOD ESTIMATORS OF (3 AND (T2
For the present, assume the model Y = X/3 + E where E ~ Nn(0, cr 2 I n ). Therefore, the likelihood function is given by 105
106
Linear Models
The logarithm of the likelihood function is
The objective is to find the values of/3 and a2 that maximize the function log[^(/3, a2, Y)]. Take derivatives of log[€(/3, a2, Y)] with respect to the p x 1 vector (3 and a2, set the resulting expressions equal to zero, and solve for ft and a2. That is,
and
Solving the first equation for (3 produces X'X/3 = X'Y. If the n x p matrix X has full rank (i.e., if X'X is nonsingular), then the maximum likelihood estimator (MLE) of (3 is given by Solve the second equation for a2 with (3 replaced by ft. The resulting maximum likelihood estimator of a2 is
The maximum likelihood estimator of ft is a set of p linear transformations of the random vector Y. Since Y ~ Nn(X/3,
Therefore, ft is an unbiased estimator of ft. Furthermore, (!„ — X(X'X) 1X') (a2\n) = cr 2 (I w —X(X / X)~ 1 X / )isamultipleof an idempotent matrix of rank n—p. Therefore, by Corollary 3.1.2(a), na2 = Y'[In - X(X/X)-1X']Y ~ a2x2^p(^) where
The MLE a2 is notan unbiased estimator of a2 since E(<72) = E[a2/2_/,(0)/«] = (n — p)cr2/n. Finally, by Theorem 3.2.2, ft and a2 are independent since
6
Maximum Likelihood Estimation
107
In Chapter 5, /3 and d2 denoted the ordinary least-squares estimators (OLSEs) of ft and a2, respectively. Note that the OLSE /3 equals the MLE (3, and the OLSE a2 is a multiple of the MLE a2 for the model Y = X/3 + E when E ~ Nn(0, a2ln) [i.e., /3 = (3 and a2 = na2/(n — p)]. Furthermore, the OLSE a2 is an unbiased estimator of a2 for this model. Therefore, the OLSE a2 is often used to estimate a2 instead of the MLE a2. In the following example the MLEs of /3 and a2 arc derived for a one-way balanced factorial experiment. Example 6.1.1 Consider the one-way classification described in Examples 1.2.10 and 2.1.4. Rewrite the model Yij = /z, + R ( T ) ( i ) j as Y = X/3 + E where Y = (Y 11 ,..., Ylr, ...,Ytl,...t YtrY, X = I, ® lr, /3 = (Pi,..., 0,)' = ( H i , . . . , iJLty, and cov(E) = a 2 I t <8> lr. Therefore, the MLE of ft is given by
where YL = Y?j=i Yij/r- That is, the MLEs of fii,..., /3t (or IJL\ , . . . , \jut) are the observed treatment means. Furthermore, the MLE of a2 is
Thus, the MLE of a2 equals the sum of squares due to the nested replicates divided by tr. We conclude this section by briefly examining likelihood estimation for the model Y = X/3 + E when E ~ Nn(0, a 2 V) and V is an n x n positive definite matrix of known constants. Since V is an n x n positive definite matrix there exists an n x n nonsingular matrix T such that V = XT'. Premultiplying the original model by T"1 we obtain
Linear Models
108
where Y* = T^Y, X* = T^X, and E* = T~1E with E* ~ Nn(0, a 2 I n ). Therefore, the maximum likelihood estimator of (3 is
Likewise, the MLE of a2 is given by
It is left to the reader to find the distributions of J3 and a2 when E ~ Nw(0, a 2 V).
6.2
INVARIANCE PROPERTY, SUFFICIENCY, AND COMPLETENESS
In this section the invariance property, sufficiency, completeness, and minimum variance unbiased estimators are discussed. Definition 6.2.1 Invariance Property: Let the k x 1 vector 9 = (6\,..., 0*)' be the MLE of the k x 1 vector 0. If g(0) is a function of 9 then g(Q) is the MLE of 8(0). Example 6.2.1 Consider the one-way classification in Example 6.1.1. By the invariance property, the MLE of g((3, a2) = Y?i=i A/0" *s
Sufficiency involves the reduction of data to a concise set of statistics without loss of information about the unknown parameters of the distribution. Thus, if the parameters of the distribution are of interest, attention can be focused on the joint distribution of the "reduced" set of statistics. In this sense, the reduced set of statistics provides sufficient information about the unknown parameters. The topic of sufficiency is addressed in the next theorem. Theorem 6.2.1 Factorization Theorem: Let the n x 1 random vector Y = ( F i , . . . , Yny have joint probability distribution function f\(Y\,..., Yn, 0} where 9 is a k x 1 vector of unknown parameters. Let S = ( S i , . . . , Sr)' be a set of r statistics for r > k. The statistics S\,..., Sr are jointly sufficient for 0 if and
6
Maximum Likelihood Estimation
109
only if
where g(S, 0) does not depend on Y\,..., Yn except through S and h (Y\,..., Yn) does not involve 0. Example 6.2.2 Letthen x 1 random vector Y = ( F i , . . . , Yn)' ~ N n (al n , a 2 I n ). The statistics Si = l^Y and £2 = Y'Y are jointly sufficient for 0 = (a, a2)' since
The next theorem and example link the ideas of sufficiency and maximum likelihood estimation. Theorem 6.2.2 IfS = (Si,..., Sr)' are jointly sufficient for the vector 9 and if 6 is a unique MLE of 6, then 6 is a function ofS. Proof:
By the factorization theorem
which means that the value of 6 that maximizes /Y(-) depends on S. If the MLE is unique, the MLE of 0 must be a function of S. • Example 6.2.3 Consider the problem from Example 6.2.2. Rewrite the model as Y = Xa + E where the n x 1 matrix X = !„ and the n x 1 random vector E ~ Nn(0, o2ln}. Therefore, the MLE of a is given by
and the MLE of a2 is
The MLEs & = S\/n and a2 — [(82 — S 2 /«)/n] and jointly sufficient for a and a2 where Si = l'n Y and S2 = Y'Y.
110
Linear Models
This section concludes with a discussion of completeness and its relation to minimum variance unbiased estimators. Definition 6.2.2 Completeness: A family of probability distribution functions {/T(f, 0), 9 6 0} is called complete if E[w(T)] = 0 for all 6 <= @ implies w(T) = 0 with probability 1 for all 0 e 0. Completeness is a characterization of the joint probability distribution of the statistics T. However, the term complete is often linked to the statistics themselves. Therefore, a sufficient statistic whose probability distribution is complete is referred to as a complete sufficient statistic. Completeness implies that two different functions of the statistics T cannot have the same expectation. To understand this interpretation, let MI (T) and w 2 (T) be two different functions of T such that E[«j(T)] = r(0) and E[w2(T)] = r(0). Therefore, E[«i(T) - u2(T)] = 0. If the distribution of T is complete then Mi(T) - w2(T) = 0 or «i(T) = n 2 (T) with probability 1. That is, an unbiased estimator of any function of 9 is unique if the distribution of T is complete. Suppose it is of interest to develop an unbiased estimator of r(0). Let T be sufficient for 9. Therefore, when searching for an unbiased estimator of r(0), we confine ourselves to functions of T. If T is a set of complete sufficient statistics, then there is at most one unbiased estimator of r(0) based on T. Since this estimator is the only unbiased estimator of r(0), trivially, it must be the unbiased estimator with the smallest variance. From the preceding discussion the class of unbiased estimators of T (0) based on complete sufficient statistics has at most one member. Therefore, to call this estimator "best" in its class is in some sense misleading, since there are no competing estimators in the class. With no competition these best estimators could perform very poorly. Surprisingly, minimum variance unbiased estimators based on complete sufficient statistics do in many cases have relatively small variances and, in that sense, do turn out to be good estimators. The next theorem identifies the complete sufficient statistics of/3 and a2 when Y = XB + E and E ~ Nn(0, a2In). Theorem 6.2.3 Let Y = XB + E where Y is an n x 1 random vector, X is an n x p matrix of constants, (3 is a p x 1 vector of unknown parameters, and then x I random vector E ~ Nn(0, a2ln). The MLEs J3 = (X'X^X'Y and d2 = ¥'[!„ — X(X'X)~1X']Y/n are complete sufficient statistics for B and a2. Furthermore, any two linearly independent combinations of B and a2 are also complete sufficient statistics for (3 and a2. Example 6.2.4 Consider the problem from Example 6.2.3. By Theorem 6.2.3, Y. and Y'[In — £ Jn]Y are complete sufficient statistics for a and a2. Furthermore,
6
Maximum Likelihood Estimation
111
Y. and ¥'[!„ — £ Jn]Y are independent random variables where F. ~ NI (a, o2/n) and Y'[In - ±J B ]Y ~ ff2xL\(^ = 0). Therefore, E(f.) = a and E{(« - 3) [¥'(!„ — ^Jn)Y]~1} = a~2. Thus, the minimum variance unbiased estimator of a/a2 is given by (n - 3)(y.)[Y'(I,, - l-Jnm~l.
6.3
ANOVA METHODS FOR FINDING MAXIMUM LIKELIHOOD ESTIMATORS
In some models the ordinary least-squares estimators and certain ANOVA sums of squares provide direct MLE solutions, even when the covariance matrix cov(E) = E is a function of multiple unknown variance parameters. The topic is introduced with an example and then followed by a general theorem. Example 6.3.1 Consider a two-way balanced factorial experiment with b random blocks and t fixed treatment levels. Let the bt observations be represented by random variables y,7 for i = 1,..., b and j = 1 , . . . , ? . The model for this experiment is
where //.y is a constant representing the mean effect of the 7* treatment level, BI is a random variable representing the effect of the Ith random block, and BTij is a random variable representing the interaction of the i* block and the 7th treatment. Assume fi, ~ iid Ni(0, P't)(BTu,..., BTbt) ~ Nj,(f-i)(0, crj r lfc<8)l r _i). Furthermore,assume(B\,..., Bb)and[BTu,..., BTbt] are mutually independent. Rewrite the model as Y = X/3+E where the bt x 1 random vector Y = ( F n , . . . , FU, . . . , F M , . . . , Ybty, the bt x t matrix X = lb ® I,, the f x 1 vector /3 = ( f i i , . . . , /?,)' = (//, 1 } ..., /z,)', and the fo x 1 error vector E = (En,..., £ l r , . . . , £ M , . . . , Ebt)' ~ Nfor(0, Z) with
The covariance matrix Z is derived using the covariance matrix algorithm in Chapter 4. The problem is to derive the maximum likelihood estimators of Pi,...,pt,ag and a\T. First, let the bt x bt matrix
where PJ is the (t — 1) x t lower portion of the f-dimensional Helmert matrix.
112
Linear Models
Recall P,P; = It - }j,, P;P, = I,_i, and l'tPt = OI X ( ,_D. By Theorem 2.1.2,
where
The distribution of the b x 1 vector (Ib, 1^)Y is a function of the two unknown parameters 1J/3 and aj. The distribution of the b(t — 1) x 1 vector (lb <8> jP/)Y is a function of the t unknown parameters P't(3 and a|r. Furthermore, by Theorem 2.1.4, the two vectors are independent. Therefore, the MLEs of l't/3 and crj can be calculated separately from the MLEs of PJ/3 and a\T. First, model the b x 1 vector (lb 1^)Y as
where the b x 1 vector YI = (I*,
The MLE of t2a^ is given by
Therefore, the MLE of a\ is
Note that the MLE of a\ equals the sum of squares for blocks divided by bt. Now model the b(t - 1) x 1 vector (lb <8> P^Y as
6
Maximum Likelihood Estimation
113
where the b(t - 1) x 1 vector Y2 = (lb <S> P,)Y, the b(t - 1) x (t - 1) matrix K2 = 1& <8> lt-\, the (t — 1) x 1 vector of unknown parameters 62 = P[^» and the b(f — 1) x 1 random vector E2 ~ N/,(,_i)(0, o\T\b <8> I,_i). Therefore, the MLE of the (t — 1) x 1 vector #2 is
The MLE of a|r is given by
Note that the MLE of a%T is the sum of squares for the block by treatment interaction divided by b(t — 1). Now the maximum likelihood estimators of (0\, 02) and the invariance property are used to derive the MLEs of the original parameters /3 = (0i,...,/3,)'- Note that
Premultiplying by the t x t matrix (yLJP,) we obtain
114
Linear Models
or
Therefore, by the invariance property, the MLE of ft is given by
The model from Example 6.3.1 belongs to a class of linear models where the MLE of ft equals the ordinary least-squares estimator of ft and the MLEs of the variance parameters in E are linear combinations of the ANOVA mean squares for the random effects. The next theorem provides formulas for the MLEs of a broad class of models, including the model from Example 6.2.1. Theorem 6.3.1 Let Y = X/3 + E where Y is an n x 1 random vector, X is an n x p matrix of rank p, ft is a p x 1 vector of unknown paramters, and the n x I random vector E ~ Nn (0, £). For i = 1 , . . . , m, let Y'B, Y and Y'C, Y be sums of squares corresponding to the various fixed and random effects, respectively, such that ln = YZLi(Bi + c<) where rank(Ei) = p( > 0, rank (C,-) = ri > 0, p = Y^T=i Pi ana n = Y^iLi (Pi + r i)- If there exist unique constants af > 0 such that X = ^lai(Bi+Ci)then i)
the maximum likelihood estimator of (3 is given by J3 = (X'X)-1X'Y if and only ifY%=\ B*x = X' ana
ii)
under the conditions that induce i), the maximum likelihood estimator ofaf is given by at = Y'QY/fo- + r,).
Proof: By Theorem 1.1.7, B, and C; are idempotent matrices for i = 1 , . . . , m; BiCj = O n x n for any i, j = 1 , . . . , m; Bi-Bj = O nxn for any i ^ j; and therefore, Bi + C, is an idempotent matrix of rank pi + ri. These conditions imply S"1 =
Eii^CBi + Q).
6
Maximum Likelihood Estimation
115
i) Assume £Hi B,X = X. Then B,C; = O wxw for all i, j implies C;X = 517=i C,B,X = O nxp for all ;. Now let the n x p matrix Q = [Qi |Q2| • • • |Qm] where Q, is an n x pt matrix of rank pf such that B, = Q, Q^ for each i = 1,..., m. Thus,QQ' = 52?=i B, and£7=1 airlBi = QA"1^ where Aisapxpnonsingular block diagonal matrix with a, lpi on the diagonal for i = 1,..., m. Therefore, X = 527=i B,X = QQ'X, which implies Q = X(Q'X)-1. The maximum likelihood estimator of /3 is given by
The "only if" portion of the proof of i) is omitted, but can be found in Moser and McCann (1995). ii) Before deriving the MLE of a,-, we need to derive a particular relationship between the matrices X and B,-. From the proof of i), Q = XCQ'X)"1. Therefore, X(X'X)-1XQ, = QJ; for / = 1,..., m. Premultiplying by Q't produces X(X'X)-1XB/ = B,-. Now, to find the MLE of a/, write the likelihood function with (3 replaced by ft and XT1 = 527=i "f^8' + Q)- Note tnat <*i aie the eigenvalues of E with multiplicity /?, + r, for / = 1 , . . . , m. Therefore,
and
116
Linear Models
Take derivatives of log f(ai,...,am,(3,Y) with respect to a/ for j = 1 , . . . , m, set the derivatives equal to zero, and solve for a7:
and
Therefore,
In Theorem 6.3.1 the lower limit on /?, is zero. Therefore, the theorem allows B, to equal Qnxn and thus admits situations where the number of sums of squares for fixed effects differs from the number of random effects sums of squares. Furthermore, note that the lower limit on r, is strictly positive, implying that the number of matrices C, and the number of unknown variance parameters both equal m. This restriction is imposed so that the MLE estimators are defined, rather than being nonestimable. Example 6.3.1 is now reworked using Theorem 6.3.1. Example 6.3.2 Consider the two-way balanced factorial experiment given in Example 6.3.1. The model can be written as Y = X/3+E where X = \b <8> lt, ft — ( f t . . . , A)'and
6
Maximum Likelihood Estimation
117
Let the sums of squares due to the overall mean, the random blocks, the fixed treatments, and the random interaction of blocks and treatments be represented by Y'BiY, Y'CiY, Y'B2Y, and Y'C2Y, respectively, where Pl = 1, n = b-1, p2 = t-\,r2 = (b- l ) ( f - l)and
Note that lb <8> I, = BI + Ci + B2 + C2. Furthermore, BiE = fajBi, CiE = fcrJCi,B 2 E = or|rB2, and C2E = aJrC2. Therefore, ^?=i«'(B' + c/) = tol[lb <8> }jf] + a|r[I/, ® (I, - j j t ) ] = E where a\ = to\ anda 2 = a|r. Furthermore,
Therefore, by Theorem 6.3.1, the MLE of /3 is given by
and the MLEs of a\ and «2 are
and
Therefore, the MLEs of crj and a\T are
118 and
These are the same MLEs derived in Example 6.3.1. Although this chapter deals mainly with maximum likelihood estimators of multivariate normal models, Theorem 6.3.1 also motivates a further generalization of the Gauss-Markov theorem. The Gauss-Markov theorem was introduced in Section 5.2 for the model Y = X/3 + E when the n x 1 error vector E had a distribution with E(E) = 0 and cov(E) = o2In. In Section 5.4 the Gauss-Markov theorem was extended to include the model Y = X/3 + E with E(E) = 0 and cov(E) = a2V where V is an n x n positive definite matrix of known constants. In the next theorem, the Gauss-Markov theorem is again extended to include an even broader class of covariance matrices. Theorem 6.3.2 Let Y = X/3 + E where Y is an n x 1 random vector, X is an n x p matrix of rank p, /3 is a p x 1 vector of unknown parameters, and E is an n x 1 random vector withE(E) = Qandcov(E) = E. Fori = 1,..., m, let Y'B/Y and Y'Q Y be the sums of squares corresponding to the various fixed and random effects, respectively, such that !„ = X]™_i(B, + C,-) where ranfc(Bi) = pi > 0, rank (C,-) = n > 0, p = Y^T=i Pi and n = J2?=\(Pi + r;)« If there exist unique constants a, > 0 such that E = YXLi a'@*i + C,-) then the BLUE oft'ft is given by f (X'X^X'Y if and only if£XLi B«X = X. Proof:
(Sufficiency) Assume Y^Li B<x = x- T*16 BLUE of t'/3 is given by
(Necessity) The necessity proof is given in Moser (1995). Theorem 6.3.2 is applied in the next example. Example 6.3.3 Consider the model in Example 6.3.2. Since E = X^i Q) and ^!Li B,X = X, by Theorem 6.3.2, the BLUE of s'/3 is
where the t x 1 vector s = (si,..., st)'.
a
t (B« +
6
Maximum Likelihood Estimation
119
In the next section the likelihood ratio test in derived for hypotheses of the form H/3 = h where H is a q x p matrix of constants and h is a q x 1 vector of constants.
6.4
THE LIKELIHOOD RATIO TEST FOR H/3 = h
Tests of the hypothesis H/3 = h are developed using the likelihood ratio statistic. For the model Y = X/3 + E where E ~ Nn(0, a2Irt) the likelihood function is given by
The likelihood ratio statistic is a function of two values: 1. the maximum value of £(/3, cr 2 , Y) maximized over all possible values of /3 and a2, that is, over all 0 < a2 < oo and —oo < $ < oo for /3 = (Pi,..., PPY where $ is a scalar and 2. the maximum value of 1(0, cr2, Y) maximized over the parameter space defined by H/3 = h. The likelihood ratio statistic, L is the ratio of these two values.
The denominator of L is maximized when the maximum likelihood estimators of P and a2 are used in t(/3, a2, Y). That is,
where /3D = (X'X^X'Y and og = Y'(L. - X(X'X)-1X')Y/n. For the numerator of L, the likelihood function is maximized with respect to (3 and a2 under the restriction H/3 = h. To this end, let G be a (p — q) x p matrix of rank p — q chosen such that the p x p matrix ["] has full rank with HG' = 0 9X (p_ 9 ). Let ["] = [R|S] where R is a p x q matrix and S is a P x (P ~ & matrix. Therefore, HR = I^,GS = Lj_ g ,HS = Qqx(p-q), and GR = 0(p_ 9) x 9 . Note that
120
Linear Models
so R = H'(HH')-1 and S = G'(GG')-1. Also,
or
Rewrite Y = X/3 + E under H/3 = h as
Since XRh is known, rewrite the above model as
where the n x 1 random vector Z = Y — XRh, the n x (p — q) matrix K = XS, the (p — q) x 1 vector 0 = G/3, and the n x 1 random vector E ~ N M (0, <72In). Therefore, the likelihood in the numerator is maximized under H/3 = h when the maximum likelihood estimators of 0 and a2 for the model Z = K0 + E are used in the likelihood function. That is,
where 0N = (K'K)-1K'Z and a2 = Z'(ln - K(K'K)-lK')Z/n. likelihood ratio statistic is given by
Therefore, the
Instead of using L as the test statistic, use the following monotonic function of L:
6
Maximum Likelihood Estimation
121
A second form of V can be generated by noting that
or
Therefore, V can be rewritten in a second form as
Another form of the likelihood ratio statistic, V, can be generated using Lagrange multipliers. This third form of V equals
where /3 = (X'X) ^'Y. The denominator of the third form of L is the same as the other two forms. That is, the denominator equals
where a2^ was defined earlier. Lagrange multipliers are used to derive the numerator of the third form of L. By the Lagrange multiplier technique, the numerator of L is given by
where A is a q x 1 vector. To maximize the right side of this expression, take derivatives with respect to /3, a2, and A, set the resulting expression equal to zero, solve for /3, a2, and A., and substitute these solutions back into the original expression. Let€*(/3, a2, A, Y) = €(/3, a2, Y)-A'(H/3-h). Then the derivatives are
122
Linear Models
Set equations (1), (2), and (3) equal to zero and solve for (3, a2, and A.. Let the solutions of (3, a2, and X, be designated as /3N, &£, and XN. From equations (1), (2), and (3):
From (4):
Premultiplying each side of (7) by H and noting from (6) that H/3N = h, we obtain
Solving the right-hand equality in (8) for X^, we have
Substituting A^ into (4) and solving for /3N, we obtain
or
Substituting (10) into (5) and rearranging terms, we have
Since
and
6
Maximum Likelihood Estimation
123
equation (11) reduces to
Therefore,
The likelihood ratio statistic L therefore equals
Instead of using L as the test statistic, use a monotonic function of L:
The MLE J3D will subsequently be referred to as the least-squares estimator ft since J3D = /3. The derivations of all three forms of V essentially follow the derivations presented by Graybill (1976). The distribution of V is needed before the statistic V can be used to test the hypothesis H/3 = h. Note that H/3 = H(X'X)~1X'Y; therefore, by Theorem 2.1.2, H/3 ~ N^H/3, ^HtX'X^H'). Under H0 : H/3 = h, H/3 - h ~ N^(0, a^X'X)-1H'). By Corollary 3.1.2(b), (H/3-hX^X'X)-1^]-1 (H/3h) ~ 0 if H/3 ^ h. Furthermore, by Theorem 3.2.1, (H^)/[H(X'X)-1H']-1H^ is independent of ¥'[!„ X(X'X)~1X']Y since
and
Since H/3 — h has the same covariance matrix as H/3 and both vectors are normally distributed, (H/3 - ^'[HCX'X)-^']-1^ - h) is also independent of ¥'[!„ X(X'X)-1X']Y. Therefore, by Definition 3.3.3, V ~ F^n-p(X). Note that all
124
Linear Models
three forms of the statistic V are equal and therefore they all have the same F distribution. The various forms of the likelihood ratio statistic are now applied to a number of example problems. Example 6.4.1 Consider the model Y = X/3 + E where Y is an n x 1 random vector, X = (Xi 1X2) is an n x p matrix where X1 is an n x (p—q) matrix and X2 is an n x q matrix, ft = ( f a , . . . , fip-q-\ \fip-q,..., Pp-iY is a p x 1 vector and the nxl error vector E ~ Nn(0, cr2In). The objective is to construct a likelihood ratio test for the hypothesis H0 : ftp-q = • • • = ftp^ = 0 versus HI : not all fa = 0 for i = p — q,..., p— 1. The hypothesis HO is equivalent to the hypothesis H/3 = h where the q x p matrix H = [O^x^-^ILJ and the q x 1 vector h = 0. Since h = 0, Z = Y - XHCHHT1!! = Y. Furthermore, let the (p - q} x p matrix G = [lp-q\Q(p-q)xq] so that [|?] has rank p and HG' = Qqxp- Therefore,
and the first form of the likelihood ratio statistic V equals
Label Y = X/3 + E as the full model with p unknown parameters in ft. Label Y = Xi/3(1) + EI as the reduced model with p — q unknown parameters in /3(1) = (/3 0 ,..., pp_q_\y. The first form of the likelihood ratio statistic equals
where SSER is the sum of squares residual for the reduced model and SSEF is the sum of squares residual for the full model. A y level test of HO versus H1 is to reject HO if V > F^n_p. Note that Y'(I B -£j B )Y = SSRegR+SSER = SSRegF+SSEF, which implies SSER - SSEF = SSRegF - SSRegR where SSRegF and SSRegR are the sum of squares regression for the full and reduced models, respectively. Therefore, the likelihood ratio statistic for this problem also takes the form
Example 6.4.2 Consider a special case of the problem posed in Example 6.4.1. From Example 6.4.1 let the n x p matrix X = (Xi 1X2) where Xt equals the n x 1 vector \n and X2 equals the n x (p — 1) matrix Xc such that l'nXc = Oi X ( p -i), and partition the p x 1 vector ft as ( f i o l f i i , . . . , Pp-i)'. The objective is to test
6
Maximum Likelihood Estimation
125
the hypothesis HO : fi\ = • • • = fip-\ = 0 versus HI : not all fr = 0 for i = 1 , . . . , p — 1. From Example 6.4.1, the likelihood ratio statistic equals
Therefore, V equals the mean square regression due to £1,..., fip-\ divided by the mean square residual, as depicted in Table 5.3.1. Example 6.4.3 Consider the one-way classification from Example 2.1.4. The experiment can be modeled as Y = X/3 + E where the tr x 1 vector Y = ( F n , . . . , Y 1 r , . . . ,Y t 1 , . . . ,Ytr)', the tr x t matrix X = [lt <8> lr], the t x 1 vector ft — ( f a , . . . , fa)' — (AII, ..., At/)' where AH, • • •, t^t are defined in Example 2.1.4, and the n x 1 random vector E ~ N fr (0, cr|(r)I, <8> Ir). The objective is to construct the likelihood ratio test for the hypothesis HO : fi\ = • • • = fit versus HI : not all /?,- equal. The hypothesis HO is equivalent to the hypothesis H/3 = h with the (t - 1) x t matrix H = PJ and the (p - 1) x 1 vector h = 0 where P't is the (/ - 1) x t lower portion of a t-dimensional Helmert matrix with P'tP, = I,_i, P,P; = I, - Ij,, and i;P, = OI X( ,_D. Note
126
Linear Models
Therefore, the third form of the likelihood ratio statistic is given by
Therefore, V equals the mean square for the treatments divided by the mean square for the nested replicates, which is the usual ANOVA test for equality of the treatment means.
6.5
CONFIDENCE BANDS ON LINEAR COMBINATIONS OF ft
The likelihood ratio statistic is now used to construct confidence bands on individual linear combinations of (3. Assume the model Y = X/3 + E where E ~ Nn(0, o2\n). Let the q x p matrix H and the q x 1 vector h from Section 6.4 equal a 1 x p row vector g' and a scalar go, respectively. The hypothesis HO : H/3 = h versus HI : H/3 ^ h becomes HO : g'/3 = go versus H1 : g'B = gQ. The third form of the likelihood ratio statistic equals
Therefore,
where tynip is the 100(1 — y/2) th percentile point of a central t distribution with n — p degrees of freedom. Substitute for /V and solve for go = g'B in the preceding probability statement. The resulting 100(1 — y)% confidence band on g'B is
In the next example, confidence bands are placed on certain linear combinations of the treatment means in a one-way ANOVA problem.
6
Maximum Likelihood Estimation
127
Example 6.5.1 Consider the one-way ANOVA problem from Example 6.4.3. The objective is to construct individual confidence bands on 0i — fa, and on P. = £|=1 Pilt = l'tp/t. For the one-way ANOVA problem, n = tr,p = t,
and
For 0i - 02, the t x 1 vector g = (1, -1,0,..., 0)', g'/3 = YL - Y2. and
For 0. = (1/01J/3, the t x 1 vector g = (1/01*, g'$ = Y.. and
Therefore, 100(1 — y}% confidence bands on 0i — 02 are
and 100(1 — y)% confidence bands on 0. are
EXERCISES 1. LetF, = 0o+0i^,-l-£, for/ = 1 , . . . , n where£, ~ n'f/Ni(0, a2). Construe individual confidence bands on 00 and 0i and write the answer in terms o1 £*. E^M E^ 2 > E^2. ^F,, and n.
128
Linear Models
2. Consider the one-way classification from Example 6.4.3. Find the minimum variance unbiased estimator of ^'i=i ^i/ a ^(T). 3. Let YI , ¥2,..., Yn be a random sample of size n from a NI (/x, a2) distribution where — oo < />i < oo and a2 > 0. Consider the y level test HO : u = HQ versus HI : //. ^ IJLQ where /ZQ is given. Show that the likelihood ratio test is equivalent to the two-sided test, reject HO if \T\ > t^ where T = (Y - MO)/[*/>], Y = E"=i Yt and s2 = EJ-iM ~ Y}2/(n - 1). 4. Let Y = X/3 + E where E_~ NB(0, o2ln)^ A 100(1 - y)% confidence interval on h'0 is h'/3±/^ [std(h'/3)] where std(h'/3) is the estimated standard deviation of h'0. (a) Find the distribution of L2 where L is the length of the interval. (b) If p = 3, n = 20, ft = (0!, 02, 0s)', Y'Y = 418, X'X = I3, and X'Y = (5, 10, 15)', construct individual 95% confidence bands on 0i, 0i + 02, and 0!-0 3 .
5. LetY! = fii+Si,\2 = 1^1+^2+^1+82,Y3 = ^i+<5 3 ,andY 4 = /zi+/u, 2 + 63 + ($4 where /zi and 7^2 are unknown parameters and S = (S\, 82, ^3, 84)' ~ N4(0, as2l4). (a) Write the model as Y = X(3 + E where E = (Ei, E2, E3, £4)'. Define each term and distribution explicitly. (b) Find the MLE of n = (m, At 2 )'. (c) Find the distribution of the MLE of u. 6. Let YIJ = ui + E{J where the E//s are distributed as independent NI (0, ja2} random variables. Let Y = (Y\\, YU, Y2\, Y^i)'. Find the likelihood ratio statistic for testing the hypothesis HO : IJL\ = n-2 versus HI : ^\ ^ ^21. Consider the paired Mest problem of Exercise 5 in Chapter 4. (a) Calculate the MLEs of /zi, 1^2, crj, and cr|r. (b) Construct the likelihood ratio statistic for testing the hypothesis H0 = /^i = M2 versus HI : n\ ^ ^28. Consider the experiment from Exercise 6 in Chapter 4. Calculate the MLEs O f / i n , ... ,At53,O r p(F)'° r C(FP)'
Cr
Pr(F)' a n ^ c r cr(FP)-
9. Consider the experiment from Exercise 7 in Chapter 4. (a) Calculate the MLEs of A M n , . . . , ^222, ^(s), apM(S)>
anc
* apo(MS)-
(b) Write the model as Y = X/3 + E. Define the vector p in Kronecker product form such that p'/3 = Ai.. — #2.. = (^111 +^112 + ^121 +^122 — M211 — M212 — M221 — /^222) /4.
6
Maximum Likelihood Estimation
129
(c) Find the MLE of p'/3 from part b. (d) Find the standard error of the MLE of p'(3. 10. Consider the experiment from Example 4.5.4. (a) Find E"1 in terms of erg and o\v. (b) Write the model as Y = X/3 + E and find the MLEs of /3, crj, and
a|v.
11. Under the conditions of Theorem 6.3.1, prove that the eigenvalues of £ are 01,..., am with multiplicities p\ + r\,..., pm + rm, respectively.
This page intentionally left blank
7
Unbalanced Designs and Missing Data
In complete, balanced factorial experiments, the same number of replicate values is observed within each combination of the factors. Kronecker products may be used in such experiments to construct covariance and sum of squares matrices. However, in other types of experiments, the number of replicates per combination of the factors varies, or certain factor combinations may have no observations at all. Kronecker products are often not useful for constructing covariance and sums of squares matrices in such unbalanced and missing data experiments. To accommodate these unbalanced and missing data situations, replication and pattern matrices are introduced.
7.1
REPLICATION MATRICES
This section beings with a reexamination of the data in Table 5.1.1. This data set contains four distinct combinations of speed and grade; {(speed, grade)} {(20, 0), (20, 6), (50,0), (50, 6)}. Four replicates (Fi,..., Y4) were observed in the (20,0) speed x grade combination, one replicate (Y5) in the (20, 6) combination, two replicates (Y6, Y7) in the (50,0) combination, and three replicates (Y8, Y9,Y10) in
131
132
Linear Models
the (50,6) combination. Thus, the observations 7, appear in k = 4 distinct groups with r\ = 4, YI = 1, r3 = 2, and r$ = 3 observations per group. Assume the model Y = X/3 4- E where the 10 x 1 random vector Y = (Yi,..., FIO)', the 3 x 1 vector /3 = (Pi, fa, &)', the 10 x 1 error vector E ~ Nio(0, a 2 Iio), and the 10 x 3 matrix X is given by
where the three columns of X correspond to an intercept, a speed variable, and a speed x grade variable. Note that the rows of X appear in four distinct groups with TJ rows per group for j = 1 , . . . , 4. Therefore, the 10 x 3 matrix X can be rewritten as X = RX^ where the 1 0 x 4 replication matrix R is given by
and the 4 x 3 matrix Xd defines the distinct speed and speed x grade combinations where
In general, the model Y = X/3 + E can always be rewritten as Y = RXdB + E where r/ > 1 is the number of replicate observations for the y'th distinct set of j c i , . . . , X p - i values for j = I,... ,k > p; the replication matrix R is an n x k block diagonal matrix with \Tj on the diagonal; Xd is a k x p matrix of distinct j c i , . . . , Xp-i values with 1* in the first column; and n = Y?J=I riThe ordinary least-squares estimate of (3 is
7
Unbalanced Designs and Missing Data
133
where the k x k diagonal matrix
and the k x 1 vector Y = D-1R'Y = (Y1,..., Yk)'. The random variable Yj is the average of the observed Y values in the jth group identified by distinct x1,..., xp-1 values. The least-squares estimate of a2 is
The least-squares estimator of a2 is the mean square residual and this estimator is unbiased for a2 provided E(Y) = X/3 = RXd/3. In Table 5.5.1 the sum of squares for the mean, regression, lack of fit, and pure error were presented for the model Y = X(3 + E. These sums of squares are now reconstructed for the equivalent model Y = RXd/3 -f E. First, substitute RXd for X in the regression sum of squares Y'fXCX'X)"^' — £ Jn]Y and into the residual sum of squares Y'[In - X(X'X)-1X']Y to obtain
Observe that the pure error sum of squares can be rewritten as
134
Linear Models Table 7.1.1 ANOVA Table with Pure Error and Lack of Fit Source
df
SS
Overall mean
1
Y'll J Y * n «Y
Regression (fi\ , . . . , Pp-\)
p-l
Lack of fit
k-p
Y'[RXd(X^DXd)-'X^R' - ;J«]Y Y'{R[D-> - Xd(X^DXd)-1 X^]R'}Y
Pure error
n-k
Y'fln-RD-'R'JY
Total
n
Y'Y
where Ape is the pure error sum of squares matrix defined in Section 5.5. Therefore, by subtraction the lack of fit sum of squares is SSLack of Fit = SSRes - SSPure Error
Table 7.1.1 gives the ANOVA table for the model Y = RXd/3 + E. In the next example problem, a replication matrix is used in a one-way classification problem when the number of observations in each level is unequal. Example 7.1.1 Consider a one-way classification problem with three fixed treatment levels and r\ = 3, r2 = 4, and r3 = 2 observations in each level, respectively. Let YIJ represent the y* numbered observation in the /* fixed treatment level for i = 1,2,3 and j = 1,..., r,. The 9x1 vector of observations is Y = (Fn, Yn, FIS, Y2i, ^22, F23, *24, ^31, Y32)'. Assume the model Y = RXd/3 + E where the 9 x 1 random error vector E ~ N9(0, a^T)lg), the 9x3 replication matrix
the 3 x 3 matrix Xd = I3, and the 3 x 1 vector ft = (B1,B2,B3)'. For this model, k = p = 3 and n = 9. It is not possible to compute a lack of fit sum of squares since k — p = 0. Furthermore, $ represents the mean effect of the Ith treatment level for i = 1, 2, 3. The value k = 3 indicates that there are three distinct treatment levels and each treatment level is defined by a row of the Xd matrix. A 1 in the first row of the Xa matrix indicates treatment level one, a 1 in
7
Unbalanced Designs and Missing Data
135
the second row indicates treatment level two, and a 1 in the third row indicates treatment level three. For the example, the sum of squares due to the mean is Y'^JgY with 1 degree of freedom. The sum of squares for regression is equivalent to the sum of squares for treatments with p —1=3 — 1 = 2 degrees of freedom. The matrix for the sum of squares for lack of fit has zero rank. That is, ROD"1 - XdCX^DXd)"1^]^ = 0 or RD-1R' = RXdCX^DXtO^X^R'. Therefore, the sum of squares for treatments equals
Finally the sum of squares for pure error equals
Replication matrices can also be used when it is of interest to partition the regression sum of squares into Type I sums of squares. Assume that there are k distinct groups of x values with r,- replicate observations in each group for j = I,... ,k. The n x p matrix X can be written as
where R is the n x k block diagonal replication matrix with lrj on the diagonal for j = 1,..., k and Xd = [Xi d |X 2 dl • • • |Xmd] is a k x p matrix. Each k x ps matrix Xsd contains the k distinct values of ps x variables. Furthermore, p = Y%=1 Ps- Let Xld = lk, Si = RX ld , S2 = R[X ld |X 2d ],..., S m _j = R[Xid|X2dl • • • |Xm-i,dL and Sm = RXd. The Type I sums of squares using R, S i , . . . , Sm and Y are given in Table 7.1.2. In the next example, a replication matrix is used to generate Type I sums of squares when the model has a covariance matrix with several variance components. Example 7.1.2 Consider a two-way cross classification with b random blocks, t fixed treatment levels, and r,; > 1 replicate observations per block treatment combination for i = 1,..., b and j = 1,..., t. The total number of observations n = Y^=i Y?J=\ ru- Use the model ]Y = RXd/3 + E where the n x 1 random vector Y = ( K m , . . . , Y\\rn,.. •, Ybti,..., F^,)'; the n x bt replication matrix R is a block diagonal matrix with \r.. on the diagonal; the bt x t matrix Xd =
Linear Models
136
Table 7.1.2 Type I Sums of Squares with Replication Source
df
Type I SS
Overall mean X]
P\ = l
V I ^' JI »Y i
X2|X,
P2
Y'[S2(S^S2)--'S' 2 -Ij n ]Y
X 3 |X,,X2
P3
Y'[S3(S'3S3)--1S^-S2(SiS2)-1S'2]Y
X m |Xi, . . . , X m _i
Pm
Y'tSm(S^Sm
r^-s^-i^-iS-ir's^iY
1
Lack of fit
k-p
Y'fRtD- - Xd^DXar'X^R'JY
Pure error
n-k
¥'[!„ - RD- 'R']Y
Total
n
Y'Y.
(Xid|X2d) = (U <8> If life <8> Pf) where P^ is the (t — 1) x t lower portion of a t-dimensional Helmert matrix; the t x 1 vector ft = (fiilfa, • • • •> &)'; and the n x 1 random vector E = ( E m , . . . , EH,,,, • ••, Ebt\, • • • , Ebtn,,}' ~ NB(0, E). To construct the n x n covariance matrix E, redefine the random variable Eijv as
for v = 1,..., rtj where the random variables 5, represent the random block effect such that Bi ~ iid NI (0, or|); the random variables BTIJ represent the random block treatment interaction such that the b(t - 1) x 1 vector (Ib <8> PJ) (BTn,..., BTbt)' ~ Nfc(,_i)(0, a| r lfci8)lf_i); and the random variables R(BT}(ij)v represent the random nested replicates such that R(BT\ij)V ~ iidNi(Q, cf^BT)). Furthermore, assume that #/, (BTn, • • • , BTbt)f, and R(BT)(ij)V are uncorrelated. Next, construct the covariance matrix when there is one replicate observation per block treatment combination. If r(J = 1 for all /, j then the bt x bt covariance matrix is given by
where the subscript d on Ed denotes a covariance matrix for one replicate observation in each of the bt distinct block treatment combinations. Note that the variance of R(BT\ij)V is nonestimable when r,; = 1 for all /, j. Therefore, a R(BT) d°es not appear in Ed. Now expand the covariance structure Ed to include all n = X!f=i Zlj=i rtj observations by premultiplying Ed by R, postmultiplying Ed and R', and adding a variance component that will account for the estimable variance of the R(BT)(ij)k variables when r,y > 1. Therefore, the n x n covariance
7
Unbalanced Designs and Missing Data
137
Table 7.1.3 Type I Sums of Squares for Example 7.1.2 Source
df
Type I SS
Overall mean n
1
Y'Si(S',Si)-1S',Y == Y'Ij n Y
Block (B)\n
b-\
Treatment (T)\n, B
t- 1
Y'[Ti(T'1Tir1T'1 -- -n^ Y'[S2(S^S2)-1S^ - TI(T;T I )- 'T',]Y
BT\fi,B,T Rep (BT)
(b - \)(f - 1)
Y'raC^r'Tj -- S2(S2S2r 'S2]Y
n-bt
Y'tln-RD-'R'JY
Total
n
Y'Y
matrix E is given by
The ANOVA table with Type I sums of squares can also be constructed for this example. First, consider what the sums of squares would be if there was one replicate observation per block treatment combination. If r,; = 1 for all /, j, the matrices for the sums of squares due to the overall mean, blocks, treatments, and the block by treatment interaction are given by
respectively, where Xw = lb <8> lt, Zw = PJ, <8> 1,, X2d = 1* ® Pr, Z2d = P^ <8>P,, and P^ is the (t — 1) x t lower portion of an ^-dimensional Helmert matrix. Let Si = RXld = l B ,Ti = R(X ld |Z ld ),S 2 = R(Xid|Zld|X2d), T2 = RCXidlZidlXw^), and D = R'R. Matrices Si, TI, S2, T2, and R are used to construct Type I sums of squares in Table 7.1.3. In the next section pattern matrices are used in data structures with crossed factors and missing data.
Linear Models
138
Figure 7.2.1
7.2
Missing Data Example.
PATTERN MATRICES AND MISSING DATA
In some data sets, certain data points accidentally end up missing. In other data sets, data points are intentionally not observed in certain cells. For example, in fractional factorials or incomplete block designs, data are not observed in certain cells. In either case, the overall structure of such experiments follows the complete, balanced factorial form except that the actual observed data set contains "holes" where no data are observed. These holes are located in patterns in fractional factorial experiments and incomplete block designs. However, in other experiments, the holes appear irregularly. Such experiments with missing data can be examined using pattern matrices. The following example introduces the topic. Consider the two-way cross classification described in Figure 7.2.1. The experiment contains three random blocks and three fixed treatment levels. However, the observed data set contains no observations in the (1, 1), (2, 2), and (3, 3) block treatment combinations and one observation in each of the other six block treatment combinations. This may have arisen from a balanced incomplete block design. We begin our discussion by first examining the experiment when one observation is present in all nine block treatment combinations. In this complete, balanced design, let the 9 x 1 random vector Y* = (Yn, Y{2, Y^, Y2l, *22, ^23, ^31, Y32, F33)'. Write the model Y* = X*/3+E* where the 9 x 3 matrix X* = I 3 <8>l 3 ,the3xl vector ft = (Pi, p2, 0 3 )',the9x 1 error vector E* = (En, E{2, £13, £21, £22, £23, E 3 i,E 3 2,£33)'~N 9 (0, E*),and
The 9x9 covariance matrix E* is built by setting Etj = B{ + (BT)ij and applying the covariance matrix algorithm from Chapter 4. For this complete, balanced data set, the sums of squares matrices for the mean, blocks, treatments, and the block
7
Unbalanced Designs and Missing Data
139
by treatment interaction are given by
respectively, whereXi = la^la, Zi = Q3013, X2 = 13<S>Q3, Z2 = Q3<S>Q3,and
The actual data set only contains six observations since Fy, ¥22, and ^33 are missing. Let the 6 x 1 vector Y = (Yu, Y\3, 72i, ^23, ^31, ^32)' depict the actual observed data set. Note that Y = MY* where the 6 x 9 pattern matrix M is given by
Furthermore, note MM' = l^ and the 9 x 9 matrix
The vector of actual observations Y contains the second, third, fourth, sixth, seventh, and eighth elements of the complete data vector Y*. Therefore, the second, third, fourth, sixth, seventh, and eighth diagonal elements of M'M are
140
Linear Models
ones and all other elements of M'M are zero. Furthermore, M is a 6 x 9 matrix of zeros and ones, with a one placed in the second, third, fourth, sixth, seventh, and eighth columns of rows one through six, respectively. Since the 9 x 1 complete vector Y* ~ N9(X*/3, E*), the 6 x 1 vector of actual observations Y = MY* ~ N6(X/3, E) where
and
The Type I sums of squares for this problem are presented in Table 7.2.1 using matrices Si = MXi,Ti = M[Xi|Zi], and S2 = M[Xi|Zi|X 2 ]. The sum of squares matrices AI, ..., A4 is Table 7.2.1 were calculated numerically using PROCIML in SAS. The PROCIML output for this section is presented in Section A2.1 of Appendix 2. The resulting idempotent matrices are as follows:
Depending on the pattern of the missing data, some Type I sum of squares matrices may have zero rank. In the example data set, the Type I sums of squares
142
Linear Models
four sums of squares Y'Ai Y , . . . , Y'A4Y and mutually independent. Therefore,
where X3 = 0 under the hypothesis H0 : fi\ — fa = fa. A y level rejection region for the hypothesis HO : fi\ = fa = fa versus HI : not all /Ts equal is to reject HO if F* > F%i where F^\ is the 100(1 — y) percentile point of a central F distribution with 2 and 1 degrees of freedom. Note that HO is equivalent to hypothesis that there is no treatment effect. The Type I sums of squares can also be used to provide unbiased estimators of the variance components aj and o\T. The mean square for BT|/z, B, T provides an unbiased estimator of aJr since A.4 = OandE(Y'A4Y/l) = E(ag r x 2 (0)) = a\T. Constructing an unbiased estimator for crj involves a little more work. In complete balanced designs, the sum of squares for blocks can be used to find an unbiased estimator for o\. However, in this balanced, incomplete design problem, the block effect is confounded with the treatment effect. Therefore, the Type I sum of squares for Block(fi)|/^ has a noncentrality parameter A.2 > 0 and cannot be used directly with Y'A4Y to form an unbiased estimator of crj. One solution to this problem is to calculate the sum of squares due to blocks after the overall mean and the treatment effect have been removed. After doing so, the block effect does not contain any treatment effects. As a result, the Type I sum of squares due to Block (B)\fj,, T has a zero noncentrality parameter and can be used with Y'A4Y to construct an unbiased estimator of crj. The Type I sum of squares due to Block (B)\fjL, T is given by
where A^ = T^TfT*)-1^' - S^'Sp"^' with S£ = M[X,|X 2 J and Tf = M [X11X21Z i ]. Note that the matrices Sj-J and T* now order the overall mean matrix Xi first, the treatment matrix X2 second, and the block matrix Z\ third. From the PROC IML output, the 6 x 6 matrix Aj for the example data set equals
Furthermore, ^(A^E) = (3cr| + cr|r) and
7
Unbalanced Designs and Missing Data
143
Therefore, an unbiased estimator of crj is provided by the quadratic form | Y'[A*, — A4]Y since
The procedure just described is now generalized. Let the n* x 1 vector Y* represent the observations from a complete, balanced factorial experiment with model Y* = X*/3 + E* where X* is an n* x p matrix of constants, (3 is a p x 1 vector of unknown parameters, and the n* x 1 random vector E* ~ Nn(0, E*) where E* can be expressed as a function of one or more unknown parameters. Suppose the n x 1 vector Y represents the actual observed data with n < n* where n* is the number of observations in the complete data set, n is the number of actual observations, and n* — n is the number of missing observations. The n x 1 random vector Y = MY* ~ Nn (X/3, E) where M is an n x n* pattern matrix of zeros and ones, X = MX*, and E = ME*M'. Each of the n rows of M has a single value of one and (n* — 1) zeros. The //'* element of M is a 1 when the t>th element in the actual data vector Y matches the jth element in the complete data vector Y* for i = !,...,« and j = !,...,«*. Furthermore, then x n matrix MM' = !„ and the n* x n* matrix M'M is an idempotent, diagonal matrix of rank n with n ones and n* — n zeros on the diagonal. The ones on the diagonal of M'M correspond to the ordered location of the actual data points in the complete data vector Y* and the zeros on the diagonal of M'M correspond to the ordered location of the missing data in the complete data vector Y*. Finally, let XXX^X,)-^ and Zs(Z'sZsrl^'s be the sum of squares matrices for the fixed and random effects in the complete data set for s = 1,..., m where rank(X^) = ps > 0, rank(Z5) = qs > 0, and Xi = 1*. Let S5 = M[X,|Zi|X 2 | • • • \Zs-i\Xs] and T, = M[Xi|Zi|X 2 | • • • |X,|Z,] for s = 1,..., m. The Type I sum of squares for the mean is Y'SiCSiSO^SjY = Y'±JnY. The Type I sums of squares for the intermediate fixed effects take the form
The Type I sum of squares for the intermediate random effects take the form
for s = 2 , . . . , < m. However, the missing data may cause some of these Type I sum of squares matrices to have zero rank. Furthermore, it may be necessary to calculate Type I sums of squares in various orders to obtain unbiased estimators of the variance components. The estimation of variance components with Type I sums of squares is discussed in detail in Chapter 10.
Linear Models
144
7.3
USING REPLICATION AND PATTERN MATRICES TOGETHER
Replication and pattern matrices can be used together in factorial experiments where certain combinations of the factors are missing and other combinations of the factors contain an unequal number of replicate observations. For example, consider the two-way cross classification described in Figure 7.3.1. The experiment contains three random blocks and three fixed treatment levels. The data set contains no observations in the (1, 1), (2, 2), and (3, 3) block treatment combinations and either one or two observations in the other six combinations. As in Section 7.2, begin by examining an experiment with exactly one observation in each of the nine block treatment combinations. In this complete, balanced design the 9 x 1 random vector Y* = (Y111, Y121, Y131, Y211, Y221, Y231, Y311, Y321, y33i)'. Use the model Y* = X*/3 + E where E ~ N9(0, E*). The matrices X*, /3, E, S*, Xi, Zi, X2, Z2, and M are defined as in Section 7.2. For the data set in Figure 7.3.1, let the 9 x 6 replication matrix R identify the nine replicate observations in the six block treatment combinations that contain data. The replication matrix R is given by
Finally, let the 9 x 1 random vector of actual observations Y = (Yu\, KBI, Fm, Y2n, Y212, y^, y311, y321, r322)'. Therefore, Y ~ N9(X/3, E) where
Figure 7.3.1
Missing Data Example with Unequal Replication.
7
145
Unbalanced Designs and Missing Data Table 7.3.1 Type I Sums of Squares for the Missing Data Example with Unequal Replication df
Type I SS
Overall mean ft
1
Y'Si (S'^iT'S', Y
Block (B)\n
2
Y'[Ti(T' 1 Tir li r, -S ] (S' 1 S 1 )- 1 S / 1 ]Y = Y / A 2 Y
Treatment (T)\n, B
2
Y'[S 2 (S^S2)" 1 S^-T,(T' 1 Ti)- 1 T / 1 ]Y = Y / A3Y
BT\[i,B,T
1
Y / [RD- 1 R / -S2(S^S 2 )- | S:,]Y
= Y'A4Y
Pure error
3
Y'^-RD-'R'JY
=Y'A 5 Y
Total
9
Y'Y
Source
=Y'AiY
and
The Type I sums of squares for this problem are presented in Table 7.3.1 using matrices R, D, Si =RMX 1 ; Ti = RM[X 1 |Zi],andS 2 = RM[Xi|Zi|X 2 ]. The sums of squares matrices AI, ..., A5 in Table 7.3.1, the matrices AI S A i , . . . , AsEAs, Aj, AjjEA^, and the noncentrality parameters X\,..., AS, ^3 were calculated numerically using PROCIML in SAS. The PROCIML output for this section is presented in Section A2.2 of Appendix 2. From the PROC IML output note that
146
Linear Models
Therefore, by Corollary 3.1.2(a),
where
The quadratic forms |[Y'A4Y-(Y'A5Y/3)] and Y'A5Y/3 are unbiased estimators of <jgT and O-|(ST), respectively, since
7
Unbalanced Designs and Missing Data
147
and
Furthermore, A5£A, = 06x6 for all s ^ t, s, t = 1 , . . . , 5. By Theorem 3.2.1, the five sums of squares Y ' A j Y , . . . , Y'AsY, are mutually independent. Therefore,
where A.3 = 0 under the hypothesis H0 : fti = fa = fo. A y level rejection region for the hypothesis HO : Pi = fa = Pi versus HI : not all j8's equal is to reject HO if F* > F%i where F%\ is the 100(1 — y) percentile point of a central F distribution with 2 and 1 degrees of freedom. The Type I sum of squares due to Block (B)\jjL, T is given by
where A£ = TKTf'Tf)-1!'*', -S^SfSp-^, with S£ = RM[Xi|X2] and Tf = RM[Xi|X 2 |Zi]. Furthermore, /3'X'A*3X/3 = 0, fr(A^E) = 4a| + |crjr + 2a|(B7), and EfY'AijY] = 4crJ + f ^Jr + 2cr|(Br). Therefore, an unbiased estimator of a\ is provided by ^{[Y'A^Y] - [Y'A4Y + (Y'A5Y/3)]} since
EXERCISES 1. If B i, 82, 83, and 84 are the n x n sum of squares matrices in Table 7.1.1, prove E2r = Br for r = 1 , . . . , 4 and BrB5 = O nxw for r ^ s. 2. In Example 7.1.1 prove that the sums of squares due to the mean, regression, and pure error are distributed as multiples of chi-square random variables. Find the three noncentrality parameters in terms of fii, /J2, and #3. 3. In Example 7.1.2 let b = t = 3, r\\ = r22 = ^33 = 2, ri 2 , ^23 = r^\ = 1, r\3 = r%i = r32 = 3, and thus n = 18. Let BI, 82, 83, 84, 85 be the Type I
148
Linear Models
sum of squares matrices for the overall mean ju,, B\n, T\n,B, BT\n,B,T, and R(BT)\n, B,T,BT, respectively. Construct and BI, B2, B3, B4, B5. 4. From Exercise 3, construct the n x n covariance matrix E. 5. From Exercises 3 and 4, calculate fr(B r E) and /3'X'dR'ErRKd(3 for r = 1,...,5. 6. From Exercise 3, find the distributions of Y'BrY for r = 1,..., 5. Are these five variables mutually independent?
8
Balanced Incomplete Block Designs
The analysis of any balanced incomplete block design (BIBD) is developed in this chapter.
8.1
GENERAL BALANCED INCOMPLETE BLOCK DESIGN
In Section 7.2 a special case of a balanced incomplete block design was discussed. As shown in Figure 7.2.1, the special case has three random blocks, three fixed treatments, two observations in every block, and two replicate observations per treatment, with six of the nine block treatment combinations containing data. We adopt Yates's/Kempthorne's notation to characterize the general class of BIBDs. Let b = the number of blocks t = the number of treatments
149
Linear Models
150
Figure 8.1.1
Balanced Incomplete Block Example.
k = the number of observations per block r = the number of replicate observations per treatment. The general BIBD has b random blocks, t fixed treatments, k observations per block, and r replicate observations per treatment, with bk (or tr) of the bt block treatment combinations containing data. Furthermore, the number of times any two treatments occur together in a block is A = r(k — !)/(? — 1). In the Figure 7.2.1 example, b = 3, t = 3, k = 2, and r = 2. The total number of block treatment combinations containing data is bk or tr, establishing the relationship bk = tr. To obtain a design where each block contains k treatments, the number of blocks equals all combinations of treatments taken k at a time or b = t\/[k\(t - Jk)!]. A second example of a BIBD is depicted in Figure 8.1.1. This design has b = 6 random blocks, t = 4 fixed treatments, k = 2 observations in every block, and r = 3 replicate observations per treatment, with bk = 12 of the bt = 24 block treatment combinations containing data. Next we seek a model for the general balanced incomplete block design. To this end, begin with a model for the complete balanced design with b blocks, t treatments, and one observation per block/treatment combination. The model is
where the bt x 1 vector Y* = ( Y n , . . . , YI,, . . . , Y M ,. ••, Y^)', theb? x? matrix X* = \b <8) It, the t x 1 vector of unknown treatments means r = (r\,..., rt)' and the bt x 1 random vector E* = (En,..., E\t, . . . , £ M , . . . , Ebt)' ~ N fcr (0, S*) where
Let the bk x 1 vector Y represent the actual observed data for the balanced incomplete block design. Note that Y can be represented by Y = MY* where M
8
Balanced Incomplete Block Designs
151
is a bk x bt pattern matrix M. The matrix M takes the form
where M,; is a k x t matrix for i = 1 , . . . , b. Furthermore, MM' = lb®Ik and
Each of the k rows of M, has t — 1 zeros and a single value of one where the one indicates the treatment level of the y* observation in the i * block for j = 1,..., k. Therefore, the model used for the balanced incomplete block design is Y = Xr+E where the bk x t matrix X = MX*, E ~ Nj*(0, £) and
Rearranging terms, E can also be written as
or
This last form of E can be used to find E"1. Note [lb <8> £ J*] and [lb <8> (I* -1J*)] are idempotent matrices where [lb <8> Qj*)] [h ® (l/t — |J*)] = 0. Therefore,
Linear Models
152
Table 8.2.1 First ANOVA Table with Type I Sum of Squares for BIBD
8.2
Source
df
Type I SS
Overall mean /x
1
Y'A1Y = Y'ij i ,ij^Y
Block (B)|fi
b-1
Y'A2Y = Y'(Ife - Iji) ® ijtY
Treatment (r)|/i, B
t- 1
Y'A3Y
BT\fi,B,T
bk-b-t + l
Y'A4Y
Total
bk
Y'Y
ANALYSIS OF THE GENERAL CASE
The treatment differences can be examined and the variance parameters a\ and a\T can be estimated by calculating the Type I sums of squares from two ANOVA tables. Some notation is necessary to describe these calculations. Let the bt x 1 matrix Xi = lb ® lt, the bt x (b — 1) matrix Zi = Qb lt, the bt x (t - 1) matrix X2 = lb <8> Qf and the bt x (b - l)(r - 1) matrix Z2 = Qfe <8> Qr where Qf, and Q, are 6 x (b — 1) and f x (t — 1) matrices defined as in Section 5.7. Furthermore, let S1 = MXi, TI = M[X1|Z1],S2 = M[X1|Z1|X2], and T2 = M[X1 |Z1 |X2|Z2]. The first ANOVA table is presented in Table 8.2.1. The Type I sums of squares due to the overall mean [i, due to Blocks (fi)|/z, Treatments (7) |/x, B,andBT\n, fi, Tare represented by Y'AiY, Y'A2Y, Y'A3Y, and Y'A4Y, respectively, where
Note Ai + A2 = Ib ® ij fc , ^=1 AM = lb ® lk, and ^=1 rank(Att) = bk. By Theorem 1.1.7, A^ = AM for u = 1 , . . . , 4 and A.UA.V = 0 for u / v. Also A u (Ai + A2) = Au (Ib ® ij fc ) = 0 or A w (I t J fc ) = 0 for u = 3,4. Therefore,
for M = 3, 4. Furthermore, AME = (£crj + ^cr|r)Au for M = 1, 2. Therefore, by Corollary 3.1.2(a), Y'A1Y ~ flixfC^i), Y'A2Y ~ fl2X*2_i(*2), Y'A3Y ~
8
153
Balanced Incomplete Block Designs Table 8.2.2 Second ANOVA Table with Type I Sum of Squares for BIBD Source
df
Type I SS
Overall mean n.
1
Y'AiY = Y'£j*®£j t Y
Treatments (r)|/z
t- I
Y'A*Y
Block (B)\n,T
b-l
Y'A;Y
BT\fj,,B,T
bk-b-t + l
Y'A4Y
Total
bk
Y'Y
«3X,2-i(^-3), and Y'A4Y ~ a^xlk-t-b+\(^} where ai = a2 = (ko\ + ^ajj,), a3 = a4 = a|r, Xu = (Xr)'A tt (Xr)/(2a M ) for u — 1, 2, 3 and X4 = 0. Finally, by Theorem 3.2.1, Y'Ai Y, Y'A2Y, Y'A3Y, and Y'A4Y are mutually independent. A test on the treatment means can be constructed by noting that T\ = • • • = r, implies ^3 = 0. Therefore, a y level rejection region for the hypothesis HO : r\ = • • • = rt versus HI : not all T/S equal is to reject H0 if F* > F^_{ blc_t_b+l where
Unbiased estimators of a\ and cr|r can also be constructed. An unbiased estimator of ajr is provided by
since Y'A4Y ~ G^T"X.bk-t-b+\(^ = 0)- ^n unbiased estimator of cr| can be developed using a second ANOVA table. Let the Type I sums of squares due to the overall mean IJL and due to BT|/x, B, T be represented by Y'Ai Y and Y'A4Y, as before. Let the Type I sums of squares due to Treatments (T) \JJL and due to Blocks (B) I//, T be represented by Y'A^ Y and Y'Ag Y, respectively. The matrices AI and A4 are defined in Table 8.2.1. Matrices A| and A^ can be constructed by setting S| = M[Xj |X2] and Tf = M[Xj |X2|Z!]. Then
The second ANOVA table is presented in Table 8.2.2. The expected mean square for Blocks (B)\n, T is now derived and then used to generate an unbiased estimator of a\. First, let G be the t x bk matrix such that GY = (Y,i,..., Y_ty. The t x t covariance matrix GEG' has diagonal elements equal to (cr|+^-tfjj-) Jr and off-diagonals equal to (k— 1) (&% — J&BT) /[r(t—1)1-
154
Linear Models
The sum of squares due to Treatments (T}\n can be reexpressed as rY'G'(l, — }J,)GY. Therefore, A£ = rG'(l, - }j,)G and
But A2 + A3 = A£ + A?|, therefore
Furthermore,
since r'X'AgXr = 0. Therefore, an unbiased estimator of crj is provided by
because
8
Balanced Incomplete Block Designs
155
Treatment comparisons are of particular interest in balanced incomplete block designs. Treatment comparisons can be described as h'r where h is a t x 1 vector of constants. By the Gauss-Markov theorem, the best linear unbiased estimator of h'r is given by h'tX'E^Xr'X'E^Y where E"1 is given in Section 8.1. Therefore, the BLUE of h'r is a function of the unknown parameters aj and a\T. However, an estimated covariance matrix, E, can be constructed by using E with cr| and alT replaced by a| and ojr, respectively. An estimator of h'r is provided byh'IX'E^XJ^X'E^Y.
8.3
MATRIX DERIVATIONS OF KEMPTHORNE'S INTERBLOCK AND INTRABLOCK TREATMENT DIFFERENCE ESTIMATORS
In Section 26.4 of Kempthorne's (1952) Design and Analysis of Experiments text, he develops two types of treatment comparison estimators for balanced incomplete block designs. The first estimator is derived from intrablock information within the blocks. The second estimator is derived from interblock information between the blocks. The purpose of this section is to develop Kempthorne's estimators in matrix form and to relate these estimators to the discussion presented in Sections 8.1 and 8.2 of this text. Within this section we adopt Kempthorne's notation, namely: Vj = the total of all observations in treatment j TJ = the total of all observations in blocks containing treatment j
Qj = Vj - Tj/k. Kempthorne indicates that differences between treatments j and / (i.e., TJ — TJ>) can be estimated by
0! = k(t - l)[Qj - Qr]/[tr(k - 1)]
(1)
02 = (t - l)[Tj - Tr]/[r(t - *)].
(2)
The statistics 0\ and #2 are unbiased estimators of TJ — TJ> where 9\ is derived from intrablock information and #2 is derived from interblock information. The following summary provides matrix procedures for calculating estimators (1) and (2).
Estimator 1 Let AI and A3 be the bk x bk idempotent matrices constructed in Section 8.2 where rank(AO = 1 and rank(A3) = t - 1. Let AI = RiR( and A3 = R3R3
156
Linear Models
where RI = (l/\/M)lfe <S> Ik and RS is the bk x (t — 1) matrix whose columns are the eigenvectors of AS that correspond to the t — 1 eigenvalues equal to 1. The estimator 9\ is given by
where the bk x t matrix X and the bk x 1 vector Y are defined in Section 8.1, the bk x t matrix R = [Ri |Ra], and g is a t x 1 vector with a one in row j, a minus one in row / and zeros elsewhere. A second form of the estimator 6\ can be developed by a different procedure. First, construct a t x b matrix N. The i/h element of N equals 1 if the ijlh block treatment combination in the BIBD factorial contains an observation and the /7 th element of N equals 0 if the ijih block treatment combination is empty for i = ! , . . . , & and j = 1 , . . . , t. For example, the 3 x 3 matrix N corresponding to the data in Figure 7.2.1 is given by
and the 4 x 6 matrix N corresponding to the data in Figure 8.1.1 is given by
The matrix N is sometimes called the incidence matrix. The estimator 9\ is given by
where N is the t x b matrix constructed above and N O(lk — ^Jk) is a t x bk BIB product matrix. For example, the 3 x 6 matrix [N D (lk — £ J fc )] corresponding to the data in Figure 7.2.1 is given by
and the 4 x 12 matrix [N n(I* — ^I*)] corresponding to the data in Figure 8.1.1
8
Balanced Incomplete Block Designs
157
is given by
The following relationships are used to derive the variance of 6\.
Therefore,
The matrix [N n (I* — £ Xt)] is used to create the estimator 9\. This estimator is constructed from the treatment effect after the effects due to the overall mean and blocks have been removed. Similarly, Y'AsY is the sum of squares due to treatments after the overall mean and blocks have been removed. Therefore, [N D (lk — ijfc)] and A3 are related and can be shown to satisfy the relationship
158
Linear Models
There is a one-to-one relationship between the t x b incidence matrix N and the bk x bt pattern matrix M. In Appendix 3 a SAS computer program generates the bk x bt pattern matrix M for a balanced incomplete design, when the dimensions b,t,k, and the / x b incidence matrix N are supplied as inputs. A second SAS program generates the incidence matrix N, when the dimensions b,t,k, and the pattern matrix M are supplied as inputs. Estimator 2 The estimator 62 is given by
where g, N, and Y are defined as in estimator 1. The following relationships are used to derive the variance of §2:
Therefore,
Finally, Kempthorne suggests constructing the best linear unbiased estimator of the treatment differences by combining 9\ and $2, weighting inversely as their variances. That is, the BLUE of TJ — TJ> is given by
Note that #3 is a function of oj and a\T since var(#i) and var(^) are functions of
8
Balanced Incomplete Block Designs
159
where var(#i) and var(02) equal var(00 and var(#2), respectively, witha\ andcr|r replaced by aj and a\T. It should be noted that the estimator $3 given above equals the estimator h'[X'±-lX]-lX't-lY given in Section 8.2 when h = g.
EXERCISES Use the example design given in Figure 8.1.1 to answer Exercises 1-11. 1. Define the bk x bt pattern matrix M, identifying the t x t matrices MI , . . . , Mb explicitly. 2. Construct the matrices AI, A2, AS, A4, A|, and Aj. 3. Verify that Ar£ = (ka^ + ^
12. Calculate the Type I sum of squares for Tables 8.2.1 and 8.2.2. 13. Compute unbiased estimates of er| and a\T.
160
Linear Models
14. Test the hypothesis HO : T\ = ti = TI versus HI : not all T/S are equal. Use X = 0.05. 15. Compute Kempthorne's estimates of 9\ and 02 for the differences r\ —12, r\ — 13, and T2 — TI. 16. Compute Kempthorne's estimates of #3 for the differences i\ — TZ, r\ — TT,, and T2 — T3 and then verify that these three estimates equal h' (X' E ~! X) ~! X' I) ~! Y when h = (1, -1,0)', (1, 0, -1)' and (0, 1, -1)', respectively.
9
Less Than Full Rank Models
In Chapter 7 models of the form Y = RX/3 + E were discussed when the k x p matrix Xrf had full column rank p. In this chapter, models of the same form are developed when the matrix X^ does not have full column rank.
9.1
MODEL ASSUMPTIONS AND EXAMPLES
Consider the model
where Y is an n x I random vector of observations, R is an n x k replication matrix, X
767
Linear Models
162
normal equations
where the nonsingular k x k matrix D = R'R. Since the p x p matrix X^DXd has rank k < p, X^DX^ is singular and the usual least-squares solution /3 = (X^DXrf^X^R'Y does not exist. Therefore, the analysis approach described in Section 7.1 is not appropriate and an alternative solution must be found. In Section 9.2 the mean model solution is developed. Before we proceed with the mean model, a few examples of less than full rank models are presented. Example 9.1.1 Searle (1971, p. 165) discussed an experiment introduced by Federer (1955). In the experiment, a fixed treatment A with three levels has n = 3, r2 = 2, and r^ = 1 replicate observations per level. Let 7/y represent the y |
where a is the overall mean, or, is the effect of the / treatment level, and E/y is a random error term particular to observation F,y. The model can be rewritten in matrix form as
where the 6 x 3 replication matrix R, the 3 x 4 matrix Xj, the 4 x 1 vector ft and the 6x1 error vector E are given by
and E = (E\\, E\2, £13, £21, £22, £31)'- The 3x4 matrix Xd has rank 3 and therefore X^DXd is singular. In this case n = 6, p = 4, and k = 3.
Figure 9.1.1
Searle's (1971) Less than Full Rank Example.
9
Less Than Full Rank Models
163
The main difficulty with the model in Example 9.1.1 is that p = 4 fixed parameters (a, «i, «2, and 0(3) are being used to depict k = 3 distinct fixed treatment levels. Therefore k < p and the less than full rank model overparameterizes the problem. Less than full rank models can also originate in experiments with missing data. In Chapter 7 pattern matrices proved very useful when analyzing missing data experiments. However, as shown in the next example, pattern matrices do not in general solve the difficulties associated with the less than full rank model.
Example 9.1.2 Consider the two-way layout described in Figure 9.1.2. Fixed factor A has three levels, fixed factor B has two levels, and there are r\\ = ryi = 2 and r\2 = ri\ = r31 = 1 replicate observations per A, B combination. Note there are no observations in the (/, ;') = (2, 2) A, B combination. As in Section 7.3, we develop the model for this experiment using a pattern matrix. Let Y* = X*/3 + E* describe an experiment with one observation in each of the six distinct combinations of factors A and B where the 6x1 random vector Y* = (7m, Km, ^211, ^221, ^311, ^321)'* the 6 x 6 matrix X* = [Xi |X2|X3 |X4], the 6 x 1 vector /3 = ( f t , . . . , &)' and the 6 x 1 error vector E* = (Em, Em, £ 2 n, ^221, £311, £321)' with Xi = 13 ® 12, X2 = Q3 ® 12, X3 = 13 ® Q2, X4 = Q3 ® Q2, Q2 = (1, -1)', and
Let the 7x1 random vector of actual observations Y = (Fin, ¥112, ^"121» ^211* ^3iii ^32i> ^322)'- Therefore, the model for the actual data set is
where the 7 x 5 replication matrix R, the 5 x 6 pattern matrix M, and the 7 x 1
Figure 9.1.2
Less than Full Rank Example Using a Pattern Matrix.
164
Linear Models
vector E are given by
andE = (Em, £112, £121* £211* £311» £321, £322)'- The preceding model can be rewritten as
where the 5 x 6 matrix Xd = MX*. In this problem, n = 1, p = 6, k = 5, with X^DXd is a 6 x 6 singular matrix of rank 5. The main difficulty with Example 9.1.2 is that p = 6 fixed parameters are used to depict the k = 5 distinct fixed treatment combinations that contain data. Note that the use of a pattern matrix did not solve the overparameterization problem. In the next section the mean model is introduced to solve this overparameterization problem.
9.2
THE MEAN MODEL SOLUTION
In less than full rank models, the number of fixed parameters is greater than the number of distinct fixed treatment combinations that contain data. As a consequence, the least-squares estimator of /3 does not exist and the analysis cannot be carried out as before. One solution to the problem is to use a mean model where the number of fixed parameters equals the number of distinct fixed treatment combinations that contain data. Examples 9.1.1 and 9.1.2 are now reintroduced to illustrate how the mean model is formulated. Example 9.2.1 Reconsider the experiment described in Example 9.1.1. Let E(Yij) = Hi represent the expected value of the y'th observation in the Ith fixed treatment level. Use the mean model
In matrix form the mean model is given by
where the 3 x 1 vector /z = (ni, 1*2, faY and where Y, R, and E are defined as in Example 9.1.1. Note the 6 x 3 replication matrix R has full column rank k = 3.
9
Less Than Full Rank Models
165
Example 9.2.2 Reconsider the experiment described in Example 9.1.2. Let E(Yijk) = n>ij represent the expected value of the kth observation in the ijlh combination of fixed factors A and B. Use the mean model
where the 5 x 1 vector p, = (nn, ^12, M2i> ^31, ^32)' and where Y, R, and E are defined as in Example 9.1.2. Note the 7 x 5 replication matrix R has full column rank k = 5. In general, the less than full rank model is given by
where the k x p matrix X<j had rank k < p. The equivalent mean model is
where the n x k replication matrix R has full column rank k and the elements of the & x 1 mean vector \L are the expected values of the observations in the k fixed factor combinations that contain data. Since the two models are equivalent Rp, = RXd/3. Premultiplying each side of this relationship by (R'R^R' produces
This equation defines the relationship between the vector /x from the mean model and the vector /3 from the overparameterized model.
9.3
MEAN MODEL ANALYSIS WHEN COV(E) = <72L,
The analysis of the mean model follows the same analysis sequence provided in Chapter 5. Since the n x k replication matrix R has full column rank, the ordinary least-squares estimator of the k x 1 vector /z is given by
where Y = D *R'Y is the k x 1 vector whose elements are the averages of the observations in the k distinct fixed factor combinations that contain data. The least-squares estimator fi is an unbiased estimator of \JL since
Linear Models
166
Table 9.3.1 Mean Model ANOVA Table Source
df
SS
Overall mean
1
Y'Ij n Y
Treatment combinations
k- 1
Y'[RD-'R' - ij fl ]Y = Y'A2Y
=Y'A,Y
Residual
n-k
Y'Ha-RD-'R'JY
Total
n
Y'Y
=Y'ApeY
The k x k covariance matrix of /* is given by and the least-squares estimator of a2 is
where Ape is the n x n pure error sum of squares matrix originally defined in Section 5.5. The quadratic form a2 provides an unbiased estimator of a2 since
where tr(Ape) = n — k and ApgR = O nx )t- Furthermore, by Theorem 5.2.1, the least-squares estimator t'ju = t'Y is the BLUE of t'/i for any k x 1 nonzero vector t. An ANOVA table that partitions the total sum of squares for the mean model is presented in Table 9.3.1. The expected mean squares are calculated below using Theorem 1.3.2 with the k x 1 mean vector fj, = (IJL\, Hi,..., HkYEMS (overall mean) = E [v'i^Y 1 L * J
9
167
Less Than Full Rank Models EMS (treatment combinations) = E Y' f HO"1!*' - - JB ) Y /(k - 1)
L V
» / J
and the EMS (residual) = E[Y'ApeY]/(« — k)] =
and
since J^RD^R' = J«. Therefore, by Corollary 3.1.2(a), Y'A2Y ~
a2Xk-\(^2)
168
Linear Models
where
By Theorem 3.2.1, Y'A2Y and Y'ApeY are independent and therefore
Furthermore, if the k elements of /z are equal then A.2 = 0. Therefore, a y level test for H0 : /i = ct\k versus HI : /x ^ al* is to reject HO if F* > F/_jn_k where a is the overall mean. Confidence bands can be constructed on the linear combinations t'n where t is a k x 1 nonzero vector of constants. Under the normality assumption t'/i ~ NiCt'ji.crh'D^t) since
and
By Theorem 3.2.2, t'fr and Y'ApeY are independent since
Therefore,
A 100(1 — y)% confidence band on t'/Li is given by
9.4
ESTIMABLE FUNCTIONS
In Section 9.1 the less than full rank model Y = RXd/3 + E was introduced. The mean model Y = R/x + E was developed in Sections 9.2 and 9.3 to solve the difficulties caused by the less than full rank model. Arguably, there is no need to develop the less than full rank model since the mean model solved the overparameterization problem. However, less than full rank models are used (in
9
Less Than Full Rank Models
169
SAS, for example), so it seems worthwhile to explore them and their relationship to the mean model. For the less than full rank model, the least-squares estimator ft satisfies the system of normal equations
However, since X^DXd is singular, no unique solution for ft exists. In fact, there are an infinite number of vectors ft that satisfy the normal equations. All of these solutions are linear combinations of the vector Y, but none of them is an unbiased estimator of ft. As Graybill (1961, p. 227) points out, no linear combination of the vector Y exists that produces an unbiased estimator of ft. So how should we think of the term /3? As Searle (1971) states, for a less than full rank model, ft provides "a solution" to the normal equations "and nothing more." Therefore, ft should be thought of as a nonunique solution to the system of p normal equations, rather than as an estimator of ft. Although no linear combination of the vector Y produces an unbiased estimator of/3, unbiased estimators of g'/3 do exist for certain p x 1 nonzero vectors g. Unfortunately, unbiased estimators of g'/3 do not exist for all g. For example, let the p x 1 vector g = ( 1 , 0 , 0 . . . , 0)'. The parameter g'/3 is not estimable in this case since ft is not estimable and therefore no element of ft is estimable. The term "estimable" has been introduced. Before continuing, we formally define estimability. Definition 9.4.1 Estimable: A parameter (or function of parameters) is estimable if there exists an unbiased estimate of the parameter (or function of parameters). Definition 9.4.2 Linearly Estimable: A parameter (or function of parameters) is linearly estimable if there exists a linear combination of the observations whose expected value is equal to the parameter (or function of parameters). For the remainder of this chapter we confine our attention to linearly estimable functions. Therefore, the term estimable will subsequently imply linearly estimable. The next example demonstrates that all linear combinations of the vector // from the mean model are estimable. Example 9.4.1
For the mean model Y = R/u + E
where t is any k x 1 nonzero vector. For the less than full rank model the question still remains: When is g'/3 estimable? The following theorem addresses this question. The answer lies in the relationship that links \i and /3, namely, p, = Xd/3.
Linear Models
170
Theorem 9.4.1 The linear combination g'/3 is estimable if and only if there exists a k x 1 vector t such that g = Xdt. Proof: By definition g'/3 is estimable if and only if there exists an n x 1 vector b such that E[b'Y] = g'/3. First assume that g'/3 is estimable. Therefore, there exists an n x 1 vector b such that g'/3 = E[b'Y] = b'RXd/3 for all j3, which implies g' = b'RXd or g = XdR'b. Let t = R'b and there exists a k x 1 vector t such that g = Xdt. Now assume there exists a k x 1 vector t such that g = Xdt. Then E[t'A] = E[f D-'R'Y] = t'n = t'Xd/3 = g'/3. Therefore, g'/3 is estimable. As mentioned earlier, the normal equations have an infinite number of solutions. If /30 represents any one of the solutions then the next theorem shows that g'/30 is invariant to the choice of /30 when g'/3 is estimable. Theorem 9.4.2 Ifg'fl is estimable then g'/30 = t'/i provides a unique, unbiased estimate ofg'/3 where /30 is any solution to the normal equations and t is defined in Theorem 9.4.1. Proof:
Solve the normal equations for Xd/30. Therefore,
or
Since g'/3 is estimable, g'/30 = t'Xd/3o = t'/i where t'/i = t'Y is a unique estimate. Furthermore, g'/30 is an unbiased estimate of g'ft since E[g'/^] = E[t'/i] = t'^t = f Xd/J = g'/3. The Gauss-Markov theorem is applied to find the BLUE of g'/3. Theorem 9.4.3 Ifg'fi is estimable then the BLUE ofg'fi is g'/30 = t'/x. Proof: By Theorem 9.4.2, t'A = g'A)-BY Theorem 9.4.1, t'/i = t'Xd/3 = g'/3. By the Gauss-Markov theorem, t'/i is the BLUE of t'// = g'/3. The heart of the three previous proofs lies in the relationship // = Xd/3. Since t'/z is always estimable by t'/i, t'Xd/3 = t'/x is also estimable by t'/i. Therefore, g'/3 is estimable provided g' can be written as t'Xd for some k x 1 nonzero vector t. One could argue that the whole topic of estimable functions is viable only in so far as /3 is related to n through the relationship p, = Xd/3. Stated more strongly, estimable functions have little meaning without the mean model and, because of the mean model, estimable functions are at best redundant.
9
Less Than Full Rank Models
171
This section concludes with a SAS PROC GLM program that analyzes Federer's (1955) data from Example 9.1.1. Both the mean model and Searle's less than full rank model are run. The two models are then used to generate the same estimable functions. Example 9.4.2 The data set is given in Figure 9.4.1. The SAS program and output are presented in Appendix 4. The SAS output provides the following parameter estimates for the model Yfj = a + a, + £// (or equivalently for the model Y = RXd/3 + E).
Note that although the notation a, a\, #2, #3 is used, P0 should not be viewed as an estimate of (3 = (a, a\, a.^, o^)'. Rather /30 is one of the normal equation solutions for /3. The SAS output also provides the following estimates for the mean model Yjj = [LI + Etj (or equivalently for the model Y = R/z + E).
Suppose it is of interest to estimate ct\ — 0.2 = g'/3 for g = (0, 1, — 1,0)'. By Theorem 9.4.1, g'fl is estimable since there exists a 3 x 1 vector t = (1, — 1,0)' such that
Therefore, by Theorems 9.4.2 and 9.4.3, the unique BLUE of g'/3 = a\ — cti is provided by g'/^ = t'fi where
Figure 9.4.1
Federer's (1955) Data Set.
172
9.5
Linear Models MEAN MODEL ANALYSIS WHEN COV(E) =
Previous sections of Chapter 9 have been limited to models where cov(E) = a2ln. Such models occur when all the main factors in the design are fixed with random replicates nested in the combinations of the fixed factors. In these cases, the replication matrix R identifies the replicate observations in the k combinations of the fixed factors. Therefore, R is an n x k matrix, // is a k x 1 vector and Y = RH + E is a viable model. We are now interested in extending our attention to models where cov(E) = a2 V and V is an n x n positive definite matrix. Such models can occur when some of the factors in the design are fixed and some are random, with random replicates nested in the combinations of the fixed and random factors. In these cases, the replication matrix identifies the replicate observations nested in the k combinations of the fixed and random factors. Therefore, R is an n x k matrix. However, p, is a v x 1 vector where v < k is the number of fixed treatment combinations that contain data. In this case, R/J, is not a viable structure since the number of columns of R(= k) is greater than the number of rows of /i(= v). A solution to this problem is to use the more general mean model
where C is a k x v matrix of zeros and ones. The matrix C identifies which elements of the v x 1 vector /x correspond to the k combinations of the fixed and random factors. The next example illustrates how to construct the matrix C. Example 9.5.1 Consider the unbalanced experimental layout given in Figure 9.5.1 where the three blocks are random, the two treatments are fixed, and the nested replicates are random. In this experiment the number of observations is n = 8, the number of combinations of the fixed and random factors that contain data is k = 5, the number of fixed treatment combinations is v = 2, and the number of replicates in the ijth combination of blocks and treatments is r,; where
Figure 9.5.1
Unbalanced Experimental Layout for Example 9.5.1.
9
Less Than Full Rank Models
173
r\\ = r\2 = f"3i = 1, ?i\ = 2, and ryi = 3. The 8x5 replication matrix R, the 5x2 matrix C, and the 2 x 1 mean vector /x are given by
Note that the k x v matrix C can also be constructed using the relationship C = MX* where matrices M and X* are defined according to the methods in Section 7.2. For Example 9.5.1, the 5 x 6 pattern matrix M and the 6 x 2 matrix X* are given by
and the 5 x 2 matrix C = MX*. The general mean model Y = RC//+E also applies to the examples in Sections 9.1 through 9.4 where all the main factors were fixed and the nested replicates were random. In such cases, v = k and the k x v (i.e., the k x k} matrix C = I*. The analysis approach presented in Section 9.3 is now summarized for the general mean model Y = RC^t + E when E ~ Nw(0, a 2 V) and V is a known n x n positive definite matrix. The weighted least squares estimator of p, is given by The weighted estimator £w is unbiased for n since E(/iw) = E[(C / R'V~ 1 RC)~ 1 C'R'V'Y] = (C'R'V^RQ-'C'R'V-'RC/z = /x. The covariance matrix of Aw is given by
and the weighted least squares estimator of a2 is
The ANOVA table for the weighted analysis is provided in Table 9.5.1. In the weighted case, the 100(1 — y)% confidence bands on t'/x are given by
Linear Models
174
Table 9.5.1 Mean Model Weighted ANOVA Table Source
df
SS
Overall mean
1
Y'V-'l^V-'lJ-'l^V-'Y
Treatment combinations
k-1
Y' V- ' [RC(C'R'V- ' RC)~ l C'R'
-M^v-^rXiv-'Y Residual
n-k
Y'fV- 1 - V-'RCfC'R'V-'RCr'C'R'V-'jY
Total
n
Y/Y-IY
where Aaw is the residual sum of squares matrix from Table 9.5.1. A y level test of H0 : n = c\k versus HI : ^ ^ elk is to reject HO if F* > F^_ln_k where
and A2w is the treatment combination sum of squares matrix from Table 9.5.1. The derivations of the confidence band and the test statistic are left to the reader. Models of the form Y = RC/i + E with E ~ Nn (0, <72V) are also encountered when V is an n x n positive definite matrix whose elements are functions of m unknown variance parameters. The analysis for this type of model is discussed in Chapter 10.
EXERCISES Use the data in Table 5.1.1 to solve Exercises 1-5. 1. A researcher wants to fit the model
where / = 1 , . . . , 10, Jt/i is the speed of the /* vehicle, jc/2 is the speed x grade of the Ith vehicle, XH is the speed x speed of the /th vehicle, and jc/4 is the speed x speed x grade of the ith vehicle. (a) Write the above model in matrix form Y = RXd/3+E where the 5 x 1 vector (3 = ($), ft, 02> 03> ^4)'- Define all terms and distributions explicitly. (b) Is (3 estimable? Explain. (c) Write the mean model Y = R/z -I- E for this problem. Define all terms and distributions explicitly.
9
Less Than Full Rank Models
175
(d) What is the rank and dimension of Xd? 2. Find the ordinary least-squares estimates of /z, a2, and cov(/i) for the mean model Y = Rp, + E. 3. Use the mean model to do the following: (a) Construct the ANOVA table. (b) At the Y = 0.05 level, test the hypothesis HO : n = a\^ versus HI : /x =/ a 1.4 where \JL = (\L\, /^, Ms, AM)'(c) Place 99% confidence bands on XLt=i M*4. Show that ^2 + 50/?4 is estimable by defining vectors g and t such that g = Xdt. The parameters fa, fa, and & are defined in Exercise 1. 5. For the mean model, assume cov(E) = cr2V where
(a) Find the weighted least-squares estimates of jz, a2, and cov(/z)(b) Construct the weighted ANOVA table. (c) At the Y = 0.05 level, test the hypothesis HQ : VL\ = A4 2 = Ms = M4 versus HI : at least one of the M* 's is not equal to the others where n = GU 1,^2,^3,^4)'. (d) Place 99% confidence bands on £)t=i M*. 6. Consider the experiment described in Figure 7.3.1. (a) Write the general mean model Y = RCp, + E defining all terms and distributions explicitly. (b) In Section 7.3, the model for the experiment was written
where E ~ N9(0, E). Define relationships between the terms R, C, /LI, and E from the mean model in part a and the terms R, M, X*, /3, and E from the model in Section 7.3. In particular, define the matrix Xd such that /Lt = Xd/3. What is the rank and dimension of Xd? (c) Is the model Y = RMX*/3 + E from Section 7.3 a less than full rank model? (d) If ft and /3 are the ordinary least-squares estimators of p and (3, respectively, find the relationship between ft and /3.
176
Linear Models
7. Consider the balanced incomplete block experiment described in Figure 8.1.1. (a) Write the general mean model Y = RCp, -I- E defining all terms and distributions explicitly. (b) In Section 8.1, the model for the design was Y = MX* (3 + E where E ~ N^t(0, E). Relate the terms R, C, /x, and E from the general mean model in part a to the terms M, X*, /3, and E from the model in Section 8.1. (c) Is the model Y = MX*(3 + E from Section 8.1 a less than full rank model?
10
The General Mixed Model
Any model that includes some fixed factors and some random factors is called a mixed model. Numerous examples of mixed models have been presented throughout the text. For example, the model for the experiment described in Figure 4.1.1 is an example of a balanced mixed model. The general class of mixed models applies to both balanced and unbalanced data structures. Furthermore, the general mixed model covariance structure contains a broad class of matrix patterns including those discussed in Chapter 4. In this chapter the analysis of the general mixed model is presented. Balanced mixed model examples from previous chapters are reviewed and new unbalanced examples are presented to illustrate the general mixed model approach.
10.1
THE MIXED MODEL STRUCTURE AND ASSUMPTIONS
The mixed model is applicable whenever an experiment contains fixed and random factors. Consider the experiment presented in Table 4.1.1. The experiment has three factors where B and R are random factors and T is a fixed factor. The
777
178
Linear Models
model is
for * = 1 , . . . , b, j = 1 , . . . , t, and s = 1 , . . . , r where IJLJ represents the fixed portion of the model and B, + BTfj + R(BT)(ij)S represents the random portion. Assume a finite model error structure where Bt ~ ndNi(0, Og),(I fc <8> P^) (BTn, • • •, BTbt) ~ Nfc (/ _,)(0, aj r l*<8>l,-i)and/?(fir) m ~ i/dN^O, aJ(Bn). Furthermore, assume the three sets of random variables are mutually independent. In matrix form the mixed model is
where the btr x 1 vector Y = (Y\\\,..., Y\\r,..., Ybt\,..., Ybtr)1'. The fixed portion of the model is given by RC/i where the btr x bt replication matrix R = lb I, lr, the bt x t matrix C = Ij, <8> I,, and the f x 1 mean vector ^t = ( j L t i , . . . , fjLty. The random portion of the model is UiEi-|-1)282+11333 where the b x 1 random vector EI ~ N/,(0, cr|lfc), the btr x b matrix Ui = I& ® 1* ® lr, the fe(f — l)x 1 random vector 82 ~ Nfc(,_i)(0, ajy-Ife^Ir-O^he^r xfc(/ —1) matrix U2 = lb <8> Pf <S> lr, the (t — 1) x t matrix P^ is the lower portion of a t -dimensional Helmert matrix, the btr x 1 random vector 33 ~ N&tr(0, crl^T-)!^ ® I, <8> Ir), and the btr x btr matrix Us = I& ® Ir <8) lr. Furthermore, vectors ai, 32, and 83 are mutually independent. Therefore, the btr x 1 random vector Y ~ Nfc,r(RC/x, E) where
Note that E matches the finite model covariance structure given in Section 4.2. If an infinite model is assumed, the same mixed model can be used with U2 replaced bylfr®!, <8>l r . The mixed model can be generalized as
where Y is an n x 1 random vector of observations. The fixed portion of the model is given by RC/Lt where R is an n x A: replication matrix of rank k, C is a k x v matrix of rank v, and p, is a v x 1 mean vector. The random portion of the model is X^/=i U/ a / where U/ is an n x #/ matrix of rank g/, a/ is a / x 1 random vector distributed N?/(0,
10
General Mixed Model
179
general mixed model can also be written in the form
where the n x 1 random vector E ~ Nw(0, S) and E = £/=i o/U/U'y. The n x v matrix RC has full column rank v. Therefore, the elements of /x are the expected values of the random, variables in the v distinct fixed factor combinations that contain data. Consequently, this form of the mixed model is a full rank model and could logically be called the mixed mean model. Similarly, the random portion of the model X)/=i U/a/ is sometimes called the variance components portion of the model since the distributions of the m random vectors a i , . . . , am are functions of the m variance components of o f , . . . , cr^. Although the general mixed model was motivated using a complete, balanced design with three factors, the model applies to a broad class of balanced and unbalanced designs with a wide variety of covariance structures. In fact, every experimental layout and covariance structure covered in this text can be modeled in the general mixed model format. However, because the general mixed model format applies to such a broad class of experiments, no one analysis approach is "best" for all cases. A number of different analysis approaches are available. Two approaches for analyzing the random portion of the model are presented in Sections 10.2 and 10.3. A numerical example of the random portion analysis is provided in Section 10.4. An analysis of the fixed portion of the model is presented in Section 10.5.
10.2
RANDOM PORTION ANALYSIS: TYPE I SUM OF SQUARES METHOD
In some mixed models, estimates of the variance components can be derived by calculating the Type I sums of squares for the fixed portion first and then calculating m Type I sums of squares for the subsequent random factors. The expectations of the Type I sums of squares for the subsequent m random factors do not involve p, since they are calculated after the fixed portion has been removed. Setting these m expectations equal to the corresponding Type I sums of squares produces m equations and the m unknowns o f , . . . , a^. Estimates of the m unknown variance parameters can then be derived provided the m equations are linearly independent. Balanced and unbalanced examples of the method are given next. Example 10.2.1 Consider the complete, balanced design described in Section 10.1 where B and R are random factors and T is a fixed factor. Since the design is complete and balanced, the Type I sums of squares equal the sums of squares defined by the algorithm in Section 4.3. Therefore, the Type I sums of squares due
180
Linear Models
to the three random effects are given by
The expectations of these three Type I sums of squares are
Setting the three Type I sums of squares equal to their expectations and solving for crj, a|r, and ^(BT) produces the estimators
In general, if any of the variance estimates is less than zero, set it equal to zero. Example 10.2.2 Consider the unbalanced experimental layout given in Figure 9.5.1. Let the 8 x 1 random vector Y = (Y nl , Ym, Y2n, Y2n, F 3 n, Y32\, Y322, 7323)'. The mean model for this experiment is
where the 2x1 mean vector p, = (ju-i, ^2)' and the 8x1 random error vector E = (Em, £121, £211, £212, £311, £321, £322, £323)' ~ N8(0, S). The 8 x 5 replication matrix R and the 5 x 2 matrix C are given in Example 9.5.1. Let Y'A3Y, Y'A4Y, and Y'A5Y equal the SS (B\ overall mean, T\ SS (BT\ overall mean, T, B) and SS (R(BT)\ overall mean, T, B, BT), respectively. The matrices AS, A4, AS, and Z were generated using the SAS PROC IML program listed in Section A5.1 of Appendix 5. The program output and derivations of the various
10
General Mixed Model
181
results are also given in Section A5.1. From Appendix 5 the expected values of the three Type I sums of squares are
Setting these three Type I sums of squares equal to their expectations and solving for erj, or|r, and o^(BT) produces the estimators
We now demonstrate that the covariance structure for the unbalanced experiment in Example 10.2.2 follows the mixed model covariance format. Following the methods described in Section 7.3, the covariance matrix of E from Example 10.2.2 is given by
where the 8 x 5 replication matrix R is provided in Example 9.5.1. The matrix E* and the 5 x 6 pattern matrix M are given by
and
Therefore,
6x6
182
Linear Models
where of = crj, a\ = a\T, of = (r^(BT), the 8 x 3 matrix Ui = RM(I3 12), the 8 x 3 matrix U2 = RM(I3 ® P2), the 8 x 8 matrix U3 = I8, and the 1 x 2 matrix P2 is the lower portion of a two-dimensional Helmert matrix with P2P2 = 12 — 2 J2-
10.3
RANDOM PORTION ANALYSIS: RESTRICTED MAXIMUM LIKELIHOOD METHOD
The maximum likelihood estimators (MLEs) of the variance components are the values of o f , . . . , cr^ that maximize the function
where the maximization is performed simultaneously with respect to the k + m terms ^t, o f , . . . , a%. By Theorem 6.3.1, for complete, balanced designs, the MLE of n equals the ordinary least-squares estimator (C'DC^'C'R'Y and the MLE of a 1 for / = 1,..., m equals a linear combination of the sums of squares for the m random effects and interactions. However, in general, for unbalanced designs, derivation of the MLEs of fi, e r f , . . . , a% can be tedious and usually involves numerical techniques such as the Newton-Raphson method. Russell and Bradley (1958), Anderson and Bancroft (1952), W. A. Thompson (1962), and later Patterson and R. Thompson (1971,1974) suggested what is called a restricted maximum likelihood (REML) approach. The REML estimators of o f , . . . , a^ are derived by first expressing the likelihood function in two parts, one involving the fixed parameters, //, and the second free of these fixed parameters. To construct the REMLs, let G be any n x (n —k) matrix of rank n—k such that G'RC = 0(n-t)x*. For example, G can be defined such that GG' = !„ - RC(C'DC)"1C/R' and G'G = !„_* where D = R'R. Next, transform the n x 1 random vector Y by the n x 1 nonsingular matrix [RC, G]' where
The distribution of G'Y is free of the fixed parameters /it. The REML estimators of the variance components are the values o f o f , . . . , o^ that maximize the marginal likelihood of G'Y:
As with the maximum likelihood approach, numerical techniques are often necessary to determine the REML estimates of o f , . . . , a^. Furthermore, the maximization is performed under the restriction that the estimators of o f , . . . , a^ are all positive.
10
General Mixed Model
183
The S AS PROC VARCOMP routine has a Type I sum of squares and a restricted maximum likelihood option. In the next section a numerical example is analyzed using PROC VARCOMP.
10.4
RANDOM PORTION ANALYSIS: A NUMERICAL EXAMPLE
Suppose that the observations in Table 9.5.1 take the values Y = (Yni, Y\2i,Y2n, Y2n, Ym, Ym, y322, ^323)' = (237,178, 249, 268, 186, 183, 182, 165)'. All of the numerical calculations are performed using the PROC VARCOMP output listed in Section A5.2 of Appendix 5. The Type I sums of squares option of the PROC VARCOMP program provides the SS (B\ overall mean, T) = Y'A3Y = 2770.8, SS (BT\ overall mean, T, B) = Y'A4Y = 740.03 and SS (R(BT}\ overall mean, 7\ B, BT) = Y'AsY = 385.17. These values are now substituted into the variance parameter relationships derived in Example 10.2.2. Therefore, the finite model estimates of crj, or|r, and cr^(BT) are
The PROC VARCOMP program will actually provide variance parameter estimates automatically for both the Type I sum of squares and the REML options. However, when calculating the variance parameter estimates, PROC VARCOMP assumes an infinite model for both options. Conversion between the infinite and finite model estimates can be accomplished quite easily, however, since one model is simply a reparameterization of the other. For example, let the variance parameters for the finite model remain Og, a|r, and o^(BT). Let a|», o-Jr., and a R(BTr be me corresponding variance parameters for the infinite model. Then crj = crj, + \O\T*, a\T = &BT*' and aR(BT) =ffR(BT)*wnere * is me number of fixed treatment levels. Therefore, using the preceding relationships, infinite model variance parameter estimates provided by PROC VARCOMP can be converted directly to finite model estimates. From the PROC VARCOMP output in Section A5.2, the Type I sum of squares estimates of the infinite model variance parameters are oj, = 271.71, a|r, = 509.70, and a|(flr)» = 128.39. Therefore, the corresponding finite model variance parameter estimates are
Linear Models
184
These finite model estimates agree exactly with the estimates derived earlier using the Example 10.2.2 variance parameter relationships. The expected mean squares (EMSs) for the infinite model are also automatically given by the Type I sum of squares option of the PROC VARCOMP program. The three EMSs are
Using the relationships that link the finite and infinite model parameters, the expected mean squares for the finite model are
These finite model EMSs are equivalent to the expectations derived in Example 10.2.2. The REML option of PROC VARCOMP was also run on the data set. Assuming the infinite model, the program provides the REML variance parameter estimates d\. = 88.60, ajr, = 726.62, and a\{BTy - 129.15. The corresponding REML variance parameter estimates for the finite model are therefore
10.5
FIXED PORTION ANALYSIS
In Section 10.4 REML and Type I sum of squares estimates of the variance parameters oTj 2 ,..., a^ were derived for both the finite and infinite models. Therefore,
10
General Mixed Model
185
the covariance matrix £ can be estimated by
where o f , . . . , a* are a set of variance parameter estimates. The weighted leastsquares estimator of /x can therefore be estimated by
The BLUE of t'fj, can be estimated by t'/Ctw where t is any nonzero v x 1 vector of constants. The covariance matrix of the weighted least- squares estimator of n can be estimated by
and the variance of the BLUE of t'/u is estimated by
For large samples, (t'Aw-tV)/Vvar(t'Aw) ~ Ni(0, 1). Therefore, a 100(1 y)% confidence band on t'p, is
Hypothesis tests on various fixed effects can be constructed using a Satterthwaite approximation. Suppose the expected mean square of a certain fixed effect is
where 4>(F) is the fixed portion of the expected mean square and k\, ...,km are constants. An unbiased estimator of the first m terms of the expected mean square can be constructed using a linear combination of random effect mean squares. That is,
where E(W) = k\a\ + k^a^ + • • • +kma^ and where MS/ is the mean square of a particular random effect for / = 1 , . . . , m. If MSF is the mean square for a specific fixed effect with VF degrees of freedom, then the statistic MSF/ W is approximately distributed as an F random variable with VF and \ degrees of freedom where
and i > i , . . . , vm are the degrees of freedom associated with M S i , . . . , MSm. Therefore, a y level test on the equality of the fixed treatment means is to reject if MSF/W > FZrf
186
Linear Models
All of these fixed portion calculations are performed for an example problem in the next section.
10.6
FIXED PORTION ANALYSIS: A NUMERICAL EXAMPLE
An estimate of /x, a confidence band on t'/x, and a hypothesis test on the fixed treatment effect are now calculated using the data in Section 10.4. The numerical calculations are performed using the SAS PROCIML procedure detailed in Section A5.3 of Appendix 5. All calculations are made assuming a finite model with Type I sums of squares estimates for a\, a\T, and cr|(B7-). From Section A5.3 the estimate of p, = (^\, \JL-I)' is given by
The estimated covariance matrix of the weighted least squares estimator of /u is
An estimate of the BLUE of JJL\ — /^2 = t'/x for t' = (1, —1) is given by
The variance of the BLUE of n,\ — ^ is estimated by
Therefore, 95% confidence bands on VL\ — ^2 are
A Satterthwaite test on the treatment means can be constructed. Note that
where Y' A2 Y is the Type I sum of squares due to treatments given the overall mean and 4>(r) is the fixed portion of the treatment EMS. Furthermore,
where c\ = 1/2, c2 = 1.3/1.2, and c3 = -0.7/1.2. Therefore, the Satterthwaite
10
General Mixed Model
187
statistic to test a significant treatment effect is
The degree of freedom for the denominator of the Satterthwaite statistic is
Since MSF/ W = 4.74 < 18.5 = Fj^5, do not reject the hypothesis that there is no treatment effect.
EXERCISES For Exercises 1-6, assume that E has a multivariate normal distribution with mean vector 0 and covariance matrix E = $37= i cr/U/Uj. 1. Write the general mixed model Y = RC/x+E for the two-way cross classification described in Example 4.5.2. Define all terms and distributions explicitly. 2. Write the general mixed model Y = RC//+E for the split plot design described in Example 4.5.3. Define all terms and distributions explicitly. 3. Write the general mixed model Y = RC/Lt + E for the experiment described in Example 4.5.4. Define all terms and distributions explicitly. 4. Write the general mixed model Y = RC/i + E for the experiment described in Figure 7.3.1. Define all terms and distributions explicitly. 5. Write the general mixed model Y = RC/x + E for the experiment described in Exercise 12 in Chapter 5. Define all terms and distributions explicitly. 6. Write the general mixed model Y = RC/u + E for the experiment described in Exercise 7 in Chapter 4. Define all terms and distributions explicitly. 7. Let the 8 x 1 vector of observations Y = (Y\\u YIU, Y\i\, Y\22, ^211, ^212, ^221, ^222)' = (2, 5, 10, 15, 17, 14, 39,41)' represent the data for the experiment described in Example 10.2.1 with b = t = r = 2.
188
Linear Models
Figure 10.6.1 Split Plot Data Set for Exercise 9.
(a) Find the Type I sum of squares estimates of cr|, or|r, and (^K(BT) assuming a finite model. (b) Find the REML estimates of a\, a\T, and cr^BT) assuming a finite model. Are the Type I sum of squares and REML estimates of the three variance parameters equal? 8. The observations in Table E10.1 represent the data from the split plot experiment described in Example 4.5.3 with r = s = 3, t = 2, and with some of the observations missing. (a) Write the general mixed model Y = RCp, + E for this experiment where E ~ N13(0, E) and E = ^=1 o^U/U'y. Define all terms and distributions explicitly. (b) Find the Type I sum of squares and REML estimates of the variance parameters defined in part a assuming a finite model. (c) Find the Type I sum of squares and REML estimates of the variance parameters defined in part a assuming an infinite model. (d) Calculate the estimates /iw and cov(/iw). (e) Construct a 95% confidence band on the difference between the two whole plot treatment means. (f) Construct a Satterthwaite test for the hypothesis that there is no significant whole plot treatment effect. (g) Construct a Satterthwaite test for the hypothesis that there is no significant split plot treatment effect.
190
Linear Models
SIG2HAT = (1/7)#T(Y)*(I(10) - X*INV(T(X)*X)*T(X))*Y; A1 =X1*INV(T(X1)*X1)*T(X1); A2 = XC*INV(T(XC)*XC)*T(XC); A3 = I(10)-A1 -A2; SSMEAN = T(Y)*A1*Y; SSREG = T(Y)*A2*Y; SSRES = T(Y)*A3*Y; SSTOT = T(Y)*Y; V = BLOCK(1,2)@I(5); WBETAHAT = INV(T(X)*INV(V)*X)*T(X)*INV(V)*Y; WSIG2HAT=(1/7)#T(Y)*(INV(V)-INV(V)*X*INV(T(X)*INV(V)*X) *T(X)*INV(V))*Y; AV1 = INV(V)*X1*INV(T(X1)*INV(V)*X1)*T(X1)*INV(V); AV2 = INV(V)*X*INV(T(X)*INV(V)*X)*T(X)*INV(V) - AV1; AV3 = INV(V) - AV1 -AV2; SSWMEAN = T(Y)*AV1*Y; SSWREG = T(Y)*AV2*Y; SSWRES = T(Y)*AV3*Y; SSWTOT = T(Y)*INV(V)*Y; B1=l(4)-J(4,4,1/4); B2 = 0; B3 = l(2)-J(2,2,1/2); B4 = l(3)-J(3,3,1/3); APE = BLOCK(B1 ,B2,B3,B4); ALOF = A3-APE; SSLOF = T(Y)*ALOF*Y; SSPE = T(Y)*APE*Y; R1=X(|1:10,1|); R2 = X(|1:10,1:2|); T1 =R1*INV(T(R1)*R1)*T(R1); T2 = R2*INV(T(R2)*R2)*T(R2)-T1; T3 = X*INV(T(X)*X)*T(X)-R2*INV(T(R2)*R2)*T(R2); SSX1 = T(Y)*T2*Y; SSX2 = T(Y)*T3*Y; PRINT BHAT SIG2HAT SSMEAN SSREG SSRES SSTOT WBETAHAT WSIG2HAT SSWMEAN SSWREG SSWRES SSWTOT SSLOF SSPE SSX1 SSX2;
191
Appendix 1 BHAT SIG2HAT 3.11 0.0598812
SSMEAN SSREG 96.721 24.069831
SSRES 0.4191687
SSTOT 121.21
0.0134819 0.0106124 WBETAHAT WSIG2HAT SSWMEAN SSWREG 3.11 0.0379176 57.408333 14.581244
SSWRES SSWTOT 0.2654231 72.255
0.013 0.0107051 SSLOF 0.0141687 QUIT; RUN;
SSPE 0.405
SSX1 10.609
SSX2 13.460831
This page intentionally left blank
APPENDIX 2: COMPUTER OUTPUT FOR CHAPTER 7
A2.1
COMPUTER OUTPUT FOR SECTION 7.2
Table A2.1 provides a summary of the notation used in Section 7.2 of the text and the corresponding names used in the SAS PROCIML computer program. Table A2.1 Section 7.2 Notation and Corresponding PROC IML Program Names Section 7.2 Notation
PROC IML Program Name
X*
XSTAR SIG1 SIG2 X1,Q3,Z1,X2,Z2 M,X SIGM1 SIGM2 S1.T1 A1,...,A4 A1SIGM1,...,A4SIGM1 A1SIGM2, . . . , A4SIGM2 XA1X, . . . , XA4X S2STAR.T1STAR A2STAR, A3STAR XA3STARX TRA3ST1 TRA3ST2
I3<S>J3 I3 ® (I3 - £ J3) Xi,Q3,Zi,X2,Z2
M, X = MX* M[I3®J3]M' M[I3 ® (I3 - $ J3)]M' Si.T, A], .. ., A4 A,M[I3 J3]M', . . . , A4M[I3 8 J3]M' A,M[I3 8 (I3 - |J3)]M', . . . , A4M[I3 ® (I3 - |J3)]M' X AiX, . . . , X A4X S|,Tt A2*,A3*
X'A;X
tr[A*M(I3<8>J3)M'] tr{A;M[I3 (g) (I3 - | J3)]M'}
The SAS PROC IML program for Section 7.2 and the output follow: PROC IML; XSTAR=J(3, 1,1)@l(3) SIG1=I(3)@J(3, 3,1); SIG2=I(3)@(I(3)-(1/3)#J(3, 3, 1)); X1=J(9,1,1); Q3 = { 1 1 , -1 1 , 0-2}; Z1=Q3@J(3, 1,1); X2=J(3, 1,1)@Q3; Z2=Q3@Q3;
193
194
M={0 0 0 0 0
Linear Models
1 0 0 0 0 0 0 0 , 0 1 0 0 0 0 0 0 , 0 0 1 0 0 0 0 0 , 0 0 0 0 1 0 0 0 , 0 0 0 0 0 1 0 0 ,
0 0 0 0 0 0 0 1 0 } ; X=M*XSTAR; SIGM1=M*SIG1*T(M); SIGM2=M*SIG2*T(M); S1=M*X1; T1=M*(X1||Z1); S2=M*(X1||Z1||X2); A1=S1*INV(T(S1)*S1)*T(S1); A2=T1 *INV(T(T1 )*T1 )*T(T1 )-A1; A3=S2*INV(T(S2)*S2)*T(S2)-A1-A2; A4=I(3)@I(2)-A1-A2-A3; A1SIGM1=A1*SIGM1; A1SIGM2=A1*SIGM2; A2SIGM1=A2*SIGM1; A2SIGM2=A2*SIGM2; A3SIGM1=A3*SIGM1; A3SIGM2=A3*SIGM2; A4SIGM1=A4*SIGM1; A4SIGM2=A4*SIGM2; XA1X=T(X)*A1*X; XA2X=T(X)*A2*X; XA3X=T(X)*A3*X; XA4X=T(X)*A4*X; S2STAR=M*(X1||X2); T1STAR=M*(X1||X2||Z1); A2STAR=S2STAR*INV(T(S2STAR)*S2STAR)*T(S2STAR)-A1; A3STAR=T1 STAR*INV(T(T1 STAR)*T1 STAR)*T(T1 STAR)-A2STAR-A1; XA3STARX=T(X)*A3STAR*X; TRA3ST1=TRACE(A3STAR*SIGM1); TRA3ST2=TRACE(A3STAR*SIGM2); PRINT SIG1 SIG2SIGM1 SIGM2A1 A2 A3 A4 A1SIGM1 A1SIGM2 A2SIGM1 A2SIGM2A3SIGM1 A3SIGM2 A4SIGM1 A4SIGM2 XA1X XA2X XA3X XA4X A3STAR XA3STARX TRA3ST1 TRA3ST2
196
A1
A2
A3
A4
A1SIGM1
Linear Models
Appendix 2 A1SIGM2
A2SIGM1
A2SIGM2
A3SIGM1
A3SIGM2
197
Linear Models
198 A4SIGM1
A4SIGM2
XA1X
XA2X
XA3X
XA4X
A3STAR
XA3STARX
QUIT; RUN;
TRA3ST1 TRA3ST2
Appendix 2
199
In the following discussion, the computer output is translated into the results that appear in Section 7.2.
so
so
200
so
so
so
so
Linear Models
Appendix 2
201
so
so
so
so
A2.2
COMPUTER OUTPUT FOR SECTION 7.3
Table A2.2 provides a summary of the notation used in section 7.3 of the text and the corresponding names used in the SAS PROC IML computer program. The program names XSTAR, SIG1, SIG2, XI, Q3, Zl, X2, Z2, and M are defined as in Section A. 1 and therefore are not redefined in Table A2.2.
Appendix 2
203
R=BLOCK(1,J(2, 1, 1), J(2, 1, 1), 1, 1, J(2, 1,1)); D=T(R)*R; X=R*M*XSTAR; SIGRM1 =R*M*SIG1 *T(M)*T(R); SIGRM2=R*M*SIG2*T(M)*T(R); SIGRM3=I(9); S1=R*M*X1; T1=R*M*(XI||Z1); S2=R*M*(X1||Z1||X2); A1=S1*INV(T(S1)*S1)*T(S1); A2=T1 *INV(T(T1 )*T1 )*T(T1 )-A1; A3=S2*INV(T(S2)*S2)*T(S2)-A1 - A2; A4=R*INV(D)*T(R) - A1 -A2 -A3; A5=l(9) - R*INV(D)*T(R); A1SIGRM1 =A1 *SIGRM1 *A1; A1 SIGRM2=A1 *SIGRM2*A1; A1 SIGRM3=A1 *SIGRM3*A1; A2SIGRM1 =A2*SIGRM1 *A2; A2SIGRM2=A2*SIGRM2*A2; A2SIGRM3=A2*SIGRM3*A2; A3SIGRM1=A3*SIGRM1*A3; A3SIGRM2=A3*SIGRM2*A3; A3SIGRM3=A3*SIGRM3*A3; A4SIGRM1=A4*SIGRM1*A4; A4SIGRM2=A4*SIGRM2*A4; A4SIGRM3=A4*SIGRM3*A4; A5SIGRM1 =A5*SIGRM1 *A5; A5SIGRM2=A5*SIGRM2*A5; A5SIGRM3=A5*SIGRM3*A5;
XA1X = T(X)*A1*X; XA2X = T(X)*A2*X; XA3X = T(X)*A3*X; XA4X = T(X)*A4*X; XA5X = T(X)*A5*X; S2STAR=R*M*(X1||X2); T1STAR=R*M*(X1||X2||Z1); A2STAR=S2STAR*INV(T(S2STAR)*S2STAR)*T(S2STAR)-A1; A3STAR=T1 STAR*INV(T(T1 STAR)*T1 STAR)*T(T1 STAR)-A2STAR-A1; A2STARM1 =A2STAR*SIGRM1 *A2STAR; A2STARM2=A2STAR*SIGRM2*A2STAR; A2STARM3=A2STAR*SIGRM3*A2STAR; A3STARM1=A3STAR*SIGRM1*A3STAR;
204
Linear Models
A3STARM2=A3STAR*SIGRM2*A3STAR; A3STARM3=A3STAR*SIGRM3*A3STAR; XA2STARX = T(X)*A2STAR*X; XA3STARX = T(X)*A3STAR*X; TRA3ST1 =TRACE(A3STAR*SIGRM1); TRA3ST2=TRACE(A3STAR*SIGRM2); TRA3ST3=TRACE(A3STAR*SIGRM3); PRINT A1 A2 A3 A4 A5 A1SIGRM1 A1SIGRM2 A1SIGRM3 A2SIGRM1 A2SIGRM2 A2SIGRM3 A3SIGRM1 A3SIGRM2 A3SIGRM3 A4SIGRM1 A4SIGRM2 A4SIGRM3 A5SIGRM1 A5SIGRM2 A5SIGRM3 XA1X XA2X XA3X XA4X XA5X XA3STARX TRA3ST1 TRA3ST2 TRA3ST3; QUIT; RUN; Since the program output is lengthy, it is omitted. In the following discussion, the computer output is translated into the results that appear in Section 7.3.
so
so
Appendix 2
so
so
so
so
so
so
205
206
so
so
so
Linear Models
APPENDIX 3: COMPUTER OUTPUT FOR CHAPTER 8
Table A3.1 provides a summary of the notation used in Section 8.3 of the text and the corresponding names used in the SAS PROC IML computer programs and outputs. Table A3.1 Section 8.3 Notation and Corresponding PROC IML Program Names Section 8.3 Notation
PROC IML Program Names
b,t,k N,M,
B,T,K N,M
The SAS PROC IML programs and the outputs for Section 8.3 follow: PROC IML; THIS PROGRAM USES THE DIMENSIONS B, T, AND K AND THE T x B INCIDENCE MATRIX N TO CREATE THE BK * BT PATTERN MATRIX M FOR A BALANCED INCOMPLETE BLOCK DESIGN. THE PROGRAM IS RUN FOR THE EXAMPLE IN FIGURE 7.2.1 WITH B=T=3, K=R=2, LAMBDA=1. THE PROGRAM CAN BE GENERALIZED TO PRODUCE THE BK x BT PATTERN MATRIX M FOR ANY BALANCED INCOMPLETE DESIGN BY SIMPLY INPUTING THE APPROPRIATE DIMENSIONS B, T, K AND THE APPROPRIATE INCIDENCE MATRIX N. B=3; T=3; K=2; N={ 0 1 1 , 1 0 1, 1 1 0 } ; BK=B#K; BT=B#T; M=J(BK, BT, 0); ROW=1;COL=0; DO 1=1 TO B; DO J=1 TO T; COL=COL+1; IF N(|J, l|)=1 THEN M(|ROW, COL|)=1; IF N(|J, l|)=1 THEN ROW=ROW+1; END; END; PRINT M;
207
Linear Models
208
M 0 1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 QUIT; RUN;
0 0 1 0 0 0
0 0 0 0 0 0
0 0 0 1 0 0
0 0 0 0 1 0
0 0 0 0 0 1
0 0 0 0 0 0
PROC IML; THIS PROGRAM USES THE DIMENSIONS B, T, AND K AND THE BK x BT PATTERN MATRIX M TO CREATE THE T * B INCIDENCE MATRIX N FOR A BALANCED INCOMPLETE BLOCK DESIGN. THE PROGRAM IS RUN FOR THE EXAMPLE IN FIGURE 7.2.1 WITH B=T=3, K=R=2, LAMBDA=1 . THE PROGRAM CAN BE GENERALIZED TO PRODUCE THE T x B INCIDENCE MATRIX N FOR ANY BALANCED INCOMPLETE BLOCK DESIGN BY SIMPLY INPUTING THE APPROPRIATE DIMENSIONS B, T, K AND THE APPROPRIATE PATTERN MATRIX M. B=3; T=3; K=2; M = { 0 1 0 0 0 0 0 0 0 , 0 0 1 0 0 0 0 0 0 , 0 0 0 1 0 0 0 0 0 , 0 0 0 0 0 1 0 0 0 , 0 0 0 0 0 0 1 0 0 , 0 0 0 0 0 0 0 1 0 } ; BK=B# K; N=J(T, B, 0); ROW=1;COL=0; DO 1=1 TO B; DO J=1 TO T; COL=COL+1; IF M(|ROW, COL|)=1 THEN N(|J, l|)=1; IF M(|ROW, COL|)=1 & ROW < BK THEN ROW=ROW+1; END; END; PRINT N; N
0 1 1 1 0 1 1 1 0 QUIT; RUN;
APPENDIX 4: COMPUTER OUTPUT FOR CHAPTER 9
Table A4.1 provides a summary of the notation used in Section 9.4 of the text and the corresponding names used in the SAS PROC GLM computer program and output. Table A4.1 Section 9.4 Notation and Corresponding PROC GLM Program Names Section 9.4 Notation
Proc GML Program Names
Levels of factor A Replication number Observed Y
A R Y
The SAS PROC GLM program and the output for Section 9.4 follow: DATA A; INPUT A R Y; CARDS; 1 1 101 1 2 105
1 3
94
2 1 84 2 2 88 3 1 32 PROC GLM DATA=A; CLASSES A; MODEL Y=A/P SOLUTION; PROC GLM DATA=A; CLASSES A; MODEL Y=A/P NOINT SOLUTION; QUIT; RUN;
209
Linear Models
210
General Linear Models Sums of Squares and Estimates for the Less Than Full Rank Model Dependent Variable: Y Source Model Error Cor. Total
DF
Sum of Squares
Mean Square
2 3 5
3480.00 70.00 3550.00
1740.00 23.33
Parameter
Estimate
INTERCEPT A
1 2 3
32.00 B 68.00 B 54.00 B 0.00 B
F Value
Pr>F
74.57
0.0028
T for HO: Parameter=0
Pr>|T|
Std Error of Estimate
6.62
0.0070 0.0012 0.0028
4.83045892 5.57773351 5.91607978
12.19 9.13
NOTE: The X'X matrix has been found to be singular and a generalized inverse was used to solve the normal equations. Estimates followed by the letter 'B' are biased, and are not unique estimators of the parameters. General Linear Models Estimates for the Full Rank Mean Model Estimate
Parameter A
1 2 3
100.00 86.00 32.00
T for HO: Pr > |T| Parameter=0
35.86 25.18 6.62
Std Error of Estimate
0.0001 2.78886676 0.0001 3.41565026 0.0070 4.83045892
In the following discussion, the preceding computer output is translated into the results that appear in Section 9.4. The less than full rank parameter estimates in the output are titled INTERCEPT, A 1, 2, 3. These four estimates are SAS's solution for p0 = (32, 68, 54,0)'. The mean model estimates in the output are titled A 1, 2, 3. These three estimated equal A = (100, 86, 32)'. Note that the ANOVA table given in the SAS output is generated by the PROC GLM statement for the less than full rank model. However, the sums of squares presented are equivalent for the mean model and the less than full rank model. Therefore, the sum of squares labeled MODEL in the output is the sum of squares due to the treatment combinations for the data in Figure 9.4.1. Likewise, the sums of squares labeled error and cor. total in the output are the sums of squares due
Appendix 4
211
to the residual and the sum of squares total minus the sum of squares due to the overall mean for the data in Figure 9.4.1, respectively. As a final comment, for a problem with more than one fixed factor, the classes statement should list the names of all fixed factors and the model statement should list the highest order fixed factor interaction between the '=' sign and the '/'• F°r example, if there are three fixed factors A, B, and C, then the CLASSES statement is CLASSES A B C ; and the model statement is MODEL Y=A*B*C/P SOLUTION; (for the less than full rank model) MODEL Y=A*B*C/P NOINT SOLUTION; (for the mean model).
This page intentionally left blank
APPENDIX 5: COMPUTER OUTPUT FOR CHAPTER 10
A5.1
COMPUTER OUTPUT FOR SECTION 10.2
Table A5.1 provides a summary of the notation used in Section 10.2 of the text and the corresponding names used in the SAS PROCIML computer program and output.
Table A5.1 Section 10.2 Notation and Corresponding PROC IML Program Names Section 10.2 Notation
PROC IML Program Names
l3®J2,l3®(l2-zJ2),Xi,Q2,Q3
SIG1,SIG2,X1,Q2, Q3 X2, Zl, Z2, M, R, D, C SIGRM1, SIGRM2, SIGRM3 Tl, Al, A2, A3, A4, A5 TRA1SIG1,...,TRA5SIG1 TRA1SIG2, . . . , TRA5SIG2 TRA1SIG3, ...,TRA5SIG3 RCA1RC RCA5RC
X2,Z1,Z2,M,R,D,C RM(I3 <S> J2)M'R', RM[I3 ® (I2 - 5 J2)]M'R', I8 TI , A] , A2, A3, A4, AS tr[A,RM(I3 <8> J2)M'R'], . . ., tr[A5RM(I3 ® J2)M'R'] tr{A,RM[I3 (I2 - £J2)]R'M'} tr[A5[I3 ® (I2 - Z J 2 )]} tr(A,I 8 ) ; ...,tr(A 5 I 8 ) C'R'A,RC, ...,C'R'A 5 RC
The SAS PROC IML program and output for Section 10.2 follow:
PROC IML; SIG1=I(3)@J(2,2,1); SIG2=I(3)@(I(2)-(1/2)#J(2, 2, 1)); X1=J(6, 1,1); Q2={1, -1}; Q3={ 1 1, -1 1, 0 -2}; X2=J(3, 1,1)@Q2; Z1=Q3@J(2, 1, 1); Z2=Q3@Q2; M={1 0 0 0 0 0 , 0 1 0 0 0 0 , 0 0 1 0 0 0 , 0 0 0 0 1 0 , 000001}; R=BLOCK(1,1,J(2, 1,1), 1,J(3, 1,1)); D=T(R)*R; 273
Linear Models
214
C={1 0 1 1 0
0, 1, 0, 0, 1};
SIGRM1=R*M*SIG1*T(M)*T(R); SIGRM2=R*M*SIG2*T(M)*T(R); SIGRM3=I(8); S1=R*M*X1; S2=R*M*(X1||X2); T1=R*M*(X1||X2||Z1); A1 =S1 *IN V(T(S1 )*S1 )*T(S1); A2=S2*INV(T(S2)*S2)*T(S2) - A1; A3=T1*INV(T(T1)*T1)*T(T1) - A1 - A2; A4=R*INV(D)*T(R) - A1 -A2 -A3; A5=l(8) - R*INV(D)*T(R); TRA1SIG1 =TRACE(A1 *SIGRM1); TRA1 SIG2=TRACE(A1 *SIGRM2); TRA1 SIG3=TRACE(A1 *SIGRM3); TRA2SIG1=TRACE(A2*SIGRM1); TRA2SIG2=TRACE(A2*SIGRM2); TRA2SIG3=TRACE(A2*SIGRM3); TRA3SIG1=TRACE(A3*SIGRM1); TRA3SIG2=TRACE(A3*SIGRM2); TRA3SIG3=TRACE(A3*SIGRM3); TRA4SIG1=TRACE(A4*SIGRM1); TRA4SIG2=TRACE(A4*SIGRM2); TRA4SIG3=TRACE(A4*SIGRM3); TRA5SIG1=TRACE(A5*SIGRM1); TRA5SIG2=TRACE(A5*SIGRM2); TRA5SIG3=TRACE(A5*SIGRM3); RCA1RC = T(C)*T(R)*A1*R*C; RCA2RC = T(C)T(R)*A2*R*C; RCA3RC = T(C)*T(R)*A3*R*C; RCA4RC = T(C)*T(R)*A4*R*C; RCA5RC = T(C)*T(R)*A5*R*C; PRINT TRA3SIG1 TRA3SIG2 TRA3SIG3 TRA4SIG1 TRA4SIG2 TRA4SIG3 TRA5SIG1 TRA5SIG2 TRA5SIG3 RCA1 RC RCA2RC RCA3RC RCA4RC RCA5RC;
Linear Models
216
A5.2
COMPUTER OUTPUT FOR SECTION 10.4
Table A5.2 provides a summary of the notation used in Section 10.4 of the text and the corresponding names used in the SAS PROC VARCOMP computer program and output. Table A5.2 Section 10.4 Notation and Corresponding PROC VARCOMP Program Names Section 10.4 Notation
PROC VARCOMP Program/Output Names
B, T, R, Y, R(BT)
B, T, R, Y, Error var(fi), var(7 * B), var(Error), Q(T)
*2.,flir.,oi(Br).,*(r)
The SAS program and output for Section 10.4 follow: DATA A; INPUT B T R Y; CARDS; 1 1 1 237
1 2 1 178 2 1 1 249 2 1 2 268 31 1 186 321 183 322182 323165 PROC VARCOMP METHOD=TYPE1; CLASS T B; MODEL Y=T|B/FIXED=1; RUN; PROC VARCOMP METHOD=REML; CLASS T B; MODEL Y=T|B/FIXED=1; QUIT; RUN; Variance Components Estimation Procedure Class Level Information Class Levels Values T 2 12 B 3 123 Number of observations in data set = 8 Dependent Variable: Y
217
Appendix 5 TYPE 1 SS Variance Component Estimation Procedure Source T B T*B Error Corrected Total
DF 1 2 1 3 7
Type I SS 6728.0000 2770.8000 740.0333 385.1667 10624.0000
Type I MS 6728.0000 1385.4000 740.0333 128.3889
Expected Mean Square Var(Error) + 2 Var(T*B) + Var(B) + Q(T) Var(Error) + 1.4 Var(T*B) + 2 Var (B) Var(Error) + 1.2Var(T*B) Var(Error)
Source T B T*B Error
Variance Component Estimate Var(B) 271.7129 509.7037 Var(T*B) Var(Error) 128.3889 RESTRICTED MAXIMUM LIKELIHOOD Variance Components Estimation Procedure
Iteration Objective Var(B) Var(T*B) Var(Error) 35.946297 30.899 616.604 158.454 0 1 35.875323 51.780 663.951 144.162 2 3 4 5 6 7 8 9 10 11 12 13 14
35.852268 35.845827 35.843705 35.843069 35.842890 35.842842 35.842830 35.842827 35.842826 35.842825 35.842825 35.842825 35.842825
66.005 81.878 86.686 87.877 88.250 88.421 88.509 88.553 88.576 88.587 88.593 88.596 88.598
693.601 701 .484 711.185 718.188 722.237 724.385 725.494 726.060 726.348 726.493 726.567 726.605 726.624
136.830 133.406 131.422 130.334 129.757 129.458 129.305 129.227 129.187 129.167 129.157 129.152 129.149
Convergence criteria met In the following discussion, the computer output is translated into the results that appear in Section 10.4.
218
Linear Models
A5.3
COMPUTER OUTPUT FOR SECTION 10.6
Table A3.3.1 provides a summary of the notation used in Section 10.6 of the text and the corresponding names used in the SAS PROCIML computer program and output. Table A5.3 Section 10.6 Notation and Corresponding PROC IML Program Names Section 10.6 Notation
PROC IML Program Names
l3®J2,l3®(l2-2J2),Xi,Q2,Q3
X 2 , Z i , Z 2 , M , R,D, C RM(I3 ®J 2 )M'R', RM[I3 <8> (I2 - £ J2)]M'R', I8 £,/i w: cov(A w ) t, 1' ft, var(t', /t w )
SIG1,SIG2,X1,Q2,Q3 X2,Z1,Z2,M, R, D,C SIGRM1, SIGRM2, SIGRM3 COVHAT, MUHAT, COVMUHAT T, TMU, VARTMU
t' AW ± Zo.o25 V var(t'Aw) Si, S 2 , TI, AI, A 2 , AS, A4, AS W, f, Y'A 2 Y/ W
LOWERLIM, UPPERLIM SI, S2, Tl, Al, A2, A3, A4, A5 W, F, TEST
The SAS PROC IML program and output for Section 10.6 follow: PROC IML; Y={237, 178, 249, 268, 186, 183, 182, 165}; SIG1=I(3)@J(2, 2, 1); SIG2=I(3)@(I(2)-(1/2)#J(2, 2, 1)); X1=J(6, 1, 1);
Appendix 5
Q2={1, -1}; Q3={ 1 1, -1 1, 0 -2}; X2=J(3,1,1)@Q2; Z1=Q3@J(2,1,1); Z2=Q3@Q2; M={1 0 0 0 0 0 ,
0 1 0 0 0 0 , 0 0 1 0 0 0 , 0 0 0 0 1 0 , 0 0 0 0 0 1 } ; R=BLOCK(1,1,J(2, 1,1),1,J(3, 1,1)); D=T(R)*R; C={1 0,
0 1, 1 0, 1 0, 0 1}; SIGRM1=R*M*SIG1*T(M)*T(R); SIGRM2=R*M*SIG2*T(M)*T(R); SIGRM3=I(8); COVHAT=526.56#SIGRM1 + 509.70#SIGRM2 + 128.39#SIGRM3; MUHAT=INV(T(C)*T(R)*INV(COVHAT)*R*C)*T(C)*T(R) *INV(COVHAT)*Y; COVMUHAT=INV(T(C)*T(R)*INV(COVHAT)*R*C); T={1,-1)}; TMU=T(T)*MUHAT; VARTMU=T(T)*COVMUHAT*T; LOWERLIM=TMU-1.96#(VARTMU**.5); UPPERLIM=TMU+1.96#(VARTMU**.5); S1=R*M*X1; S2=R*M*(X1||X2); T1=R*M*(XI||X2||Z1); A1 =S1 *INV(T(S1 )*S1 )*T(S1); A2=S2*INV(T(S2)*S2)*T(S2) - A1; A3=T1*INV(T(T1)*T1)*T(T1) - A1 - A2; A4=R*INV(D)*T(R) - A1 -A2 -A3; A5=l(8) - R*INV(D)*T(R); W=.25#T(Y)*A3*Y + (1.3/1.2)#T(Y)*A4*Y - (0.7/3.6)#T(Y)*A5*Y; F=W**2/ ((((.25#T(Y)*A3*Y)**2)/2) + (((1.3/1.2)#T(Y)*A4*Y)**2) + (((0.7#T(Y)*A5*Y/3.6)**2)/3));
219
Linear Models
220
TEST=T(Y)*A2*Y/W; PRINT MUHAT COVMUHAT TMU VARTMU LOWERLIM UPPERLIM W F TEST; MUHAT 227.94 182.62153 LOWERLIM -0.111990 QUIT; RUN;
COVMUHAT 295.7818 88.33466 88.33466 418.14449 UPPERLIM 90.748923
W 1419.5086
TMU 45.318467
F 2.2780974
VARTMU 537.25697
TEST 4.739663
References and Related Literature
Anderson, R. L., and Bancroft, T. A. (1952), Statistical Theory in Research, McGraw-Hill Book Co., New York. Anderson, T. W. (1958), An Introduction to Multivariate Statistical Analysis, John Wiley & Sons, New York. Anderson, V. L., and McLean, R. A. (1974), Design of Experiments, A Realistic Approach, Marcel Dekker, New York. Arnold, S. F. (1981), The Theory of Linear Models and Multivariate Analysis, John Wiley & Sons, New York. Bhat, B. R. (1962), "On the distribution of certain quadratic forms in normal variates," Journal of the Royal Statistical Society, Ser. B 24, 148-151. Box, G. E. if., Hunter, W. G., and Hunter, J. S. (1978), Statistics for Experimenters, John Wiley & Sons, New York. Chew, V. (1970), "Covariance matrix estimation in linear models," Journal of the American Statistical Society 65, 173-181. Cochran, W. D., and Cox, G. M. (1957), Experimental Designs, John Wiley & Sons, New York. Davenport, J. M. (1971), "A comparison of some approximate F-tests," Technometrics 15, 779-790. Draper, N. R., and Smith, H. (1966), Applied Regression Analysis, John Wiley & Sons, New York. Federer, W. T. (1955), Experimental Design, Macmillan, New York. Graybill, F. A. (1954), "On quadratic estimates of variance components," Annals of Mathematical Statistics, 25, No. 2, 367-372.
227
222
References and Related Literature
Graybill, F. A. (1961), An Introduction to Linear Statistical Models, Volume 1, McGraw-Hill Book Co., New York. Graybill, F. A. (1969), An Introduction to Matrices with Applications in Statistics, Wadsworth Publishing Co., Belmont, CA. Graybill, F. A. (1976), Theory and Application of the Linear Model, Wadsworth & Brooks/Cole, Pacific Grove, CA. Guttman, I. (1982), The Analysis of Linear Models, John Wiley & Sons, New York. Harville, D. A. (1977), "Maximum likelihood approaches to variance component estimation and to related problems," Journal of the American Statistical Association 72, No. 358, 320-338. Hocking, R. R. (1985), The Analysis of Linear Models, Brooks/Cole, Publishing Co., Belmont, CA. Hogg, R. V., and Craig, A. T. (1958), "On the decomposition of certain x2 variables," Annals of Mathematical Statistics 29,608-610. John, P. W. M. (1971), Statistical Design and Analysis of Experiments, Macmillan, New York. Kempthome, O. (1952,1967), The Design and Analysis of Experiments, John Wiley & Sons, New York. Kshirsagar, A. M. (1983), A Course in Linear Models, Marcel Dekker, New York. Milliken, G. A., and Johnson, D. E. (1992), Analysis of Messy Data, Volme 1: Designed Experiments, Von Nostrand Reinhold, New York. Morrison, D. (1967), Multivariate Statistical Methods, McGraw-Hill Book Co., New York. Moser, B. K. (1987), "Generalized F variates in the general linear model," Communications in Statistics: Theory and methods 16, 1867-1884. Moser, B. K., and McCann, M. H. (1996), "Maximum likelihood and restricted maximum likelihood estimators as functions of ordinary least squares and analysis of variance estimators," Communications in Statistics: Theory and Methods, 25, No. 3. Moser, B. K., and Lin, Y. (1992), "Equivalence of the corrected F test and the weighted least squares procedure," The American Statistician 46, No. 2, 122-124. Moser, B. K., and Marco, V. R. (1988), "Bayesian outlier testing using the predictive distribution for a linear model of constant intraclass form," Communications in Statistics: Theory and Methods 17, 849-860. Moser, B. K., Stevens, G. R., and Watts, C. L. (1989), "The two sample T test versus Satterthwaite's approximate F test," Communications in Statistics: Theory and Methods, 18, 3963-3976. Myers, R. H., and Milton, J. S. (1991), A First Course in the Theory of Linear Statistical Models, PWS-Kent Publishing Co., Boston. Nel, D. G., van der Merwe, C. A., and Moser, B. K. (1990), "The exact distributions of the univariate and multivariate Behrens-Fisher statistic with a comparison of several solutions in the univariate case," Communications in Statistics: Theory and Methods 19, 279-298. Neter, J. Wasserman, W, and Kutner, M. H. (1983), Applied Linear Regression Models, Richard D. Irwin, Homewood, IL. Patterson, H. D., and Thompson, R. (1971), "Recovery of interblock information when block sizes are unequal," Biometrika 58, 545-554. Patterson, H. D., and Thompson, R. (1974), "Maximum likelihood estimation of components of variance," Proceedings of the 8th International Biometric Conference, pp. 197-207. Pavur, R. J., and Lewis, T. O. (1983), "Unbiased F tests for factorial experiments for correlated data," Communications in Statistics: Theory and Methods 13, 3155-3172. Puntanen, S., and Styan, G. P. (1989), "The equality of the ordinary least squares estimator and the best linear unbiased estimator," The American Statistician 43, No. 3, 153-161. Rao, C. R. (1965), Linear Statistical Inference and Its Applications, John Wiley & Sons, New York. Russell, T. S., and Bradley, R. A. (1958), "One-way variances in a two-way classification," Biometrika, 45,111-129. Satterthwaite, F. E. (1946), "An approximate distribution of estimates of variance components," Biometrics Bulletin 2, 110-114.
References and Related Literature
223
Scariano, S. M., and Davenport, J. M. (1984), "Corrected F-tests in the general linear model," Communications in Statistics: Theory and Methods 13, 3155-3172. Scariano, S. M., Neill, J. W., and Davenport, J. M. (1984), "Testing regression function adequacy with correlation and without replication," Communications in Statistics: Theory and Methods, 13, 1227-1237. Scheff6, H. (1959), The Analysis of Variance, John Wiley & Sons, New York. Searle, S. R. (1971), Linear Models, John Wiley & Sons, New York. Searle, S. R. (1982), Matrix Algebra Useful for Statistics, John Wiley & Sons, New York. Searle, S. R. (1987), Linear Models for Unbalanced Data, John Wiley & Sons, New York. Seber, G. A. F. (1977), Linear Regression Analysis, John Wiley & Sons, New York. Smith, J. H., and Lewis, T. O. (1980), "Determining the effects of intraclass correlation on factorial experiments," Communications in Statistics: Theory and Methods: 9, 1353-1364. Smith, J. H., and Lewis, T. O. (1982), "Effects of intraclass correlation on covariance analysis," Communications in Statistcs: Theory and Methods, 11, 71-80. Snedecor, W. G., and Cochran, W. G. (1978), Statistical Methods, The Iowa State University Press, Ames, LA. Steel, R. G., and Torrie, J. H. (1980), Principles and Procedures of Statistics, McGraw-Hill Book Co., New York. Thompson, W. A. (1962), "The problem of negative estimates of variance components," Annals of Mathematical Statistics 33, 273-289. Weeks, D. L., and Trail, S. M. (1973), "Extended complete block designs generated by BIBD," Biometrics 29, No. 3, 565-578. Weeks, D. L., and Urquhart, N. S. (1978), Linear models in messy data: Some problems and alternatives," Biometrics, 34, No. 4, 696-705. Weeks, D. L., and Williams, D. R. (1964), "A note on the determination of correctness in an N-way cross classification," Technometrics, 6, 319-324. Zyskind, G. (1967), "On canonical forms, non-negative covariance matrices and best and simple least squares linear estimators in linear models," Annals of Mathematical Statistics 38,1092-1109. Zyskind, G., and Martin, F. B. (1969), "On best linear estimation and a general Gauss-Markov theorem in linear models with arbitrary non-negative covariance structure," SI AM, Journal of Applied Mathematics 17, 1190-1202.
This page intentionally left blank
Subject Index
Confidence bands, 126-127,168,173-175, 185-186, 188 ANOVA, 10, 13-15, 34-36,49-50, 57, Chi-square distribution, 19-20, 33, 35, 38, 77-80, 87-88, 90, 103, 114, 127, 134, 41-50 137, 152-153, 166, 172-174,210 Completeness, 108, 110 Assumptions, 54-58, 91, 161-162 a
d b
Determinent of matrix, see Matrix
Balanced incomplete block design, 138, 149-159 BIB product, see Matrix BLUE, 86-87, 89,100-101,118,155,158, 166,170-171, 185-186 c
Complete, balanced factorial, 11, 13, 15, 29,53-77,91,97-100
e
Eigenvalues and vectors, 6-11, 13, 21, 36, 41-42, 44, 115, 156 Estimable function, 168-171 Expectation conditional, 29-32, 39 of a quadratic form, 17 of a vector, 17
225
Linear Models
226
Expected mean square, 57, 64—66,68-70, 72-73, 76-77, 88, 103, 153, 166, 184-185
k Kronecker product, 12-16, 52,59,61-63, 98,131
f
F distribution, 47-49, 77-80, 142, 147 Finite model, see Model
g
Gauss-Markov, 86-87, 89, 118,155, 170 General linear model, 96-97, see also Model ANOVA for, 87 interval estimation, 126-127 point estimation, 86-87, 105-108 replication, 131-137, 144-147 tests for, 119-126
h
Helmert, see Matrix Hypothesis tests for regression model, 91,93-94,101, 125 for factorial experiments, 48, 78, 80, 102, 125-126, 142,147, 153,160, 174
i
Idempotent matrix, see Matrix Identity, see Matrix Incidence, see Matrix Incomplete block design, see Balanced incomplete block design Independence of random vectors, 26 of quadratic forms, 45 of random vector and quadratic form, 46 Infinite model, see Model Interval estimation, see Confidence bands Invariance property, 108, 113-114 Inverse of a matrix, see Matrix
1
Lack of fit, see Least squares regression Lagrange multipliers, 121 Least squares regression lack of fit, 91-94,102, 133-134 partitioned SS, 87, 91, 94-97,135 pure error, 91-94, 102, 133-134, 147, 166 ordinary, 85-86,94,107, 111,114, 132-133,161 weighted, 89-91, 173 Likelihood function, 105, 115, 119-120, 182 Likelihood ratio test, 119-126
m
Marginal distribution, 25, 30, 39 Matrix BIB product, 15-16, 156-157 covariance, see also Random vector, 11, 17, 24, 53, 58-77, 111, 118, 123, 135-136, 138, 148, 153-155 determinant, of 3, 6-7 diagonal, 2, 7, 9, 16, 21, 132 Helmert, 5, 10-11, 25, 33, 35, 98, 111, 125, 136-137, 178, 182 idempotent, 9-11,13, 21, 35-36, 41-50, 67, 92, 96, 100, 106, 114, 140, 143, 151, 155 identity, 2, 9 incidence, 156 inverse, 2, 12, 21 multiplicity, 7-8, 10, 13 of ones, 2 orthogonal, 4, 6-7, 9, 11, 44, 51 partitioning, 8, 25, 94, 124, 135 pattern, 138-147,150-152, 158, 163-164, 181
227
Subject Index positive definite, 8-9 possible semidefinite, 8, 10 rank of, 3,9-11, 13,16 replication, 131-137, 144-147, 161-165, 178, 180-181 singularity, 2, 12,21 symmetric, 3-4,7, 9-11,13, 17 trace of, 2, 7, 10 transpose of, 1 Maximum likelihood estimator, 105-109, 111-119,182 Minimum variance unbiased estimate, see UMVUE Mean model, see Model Mixed model, see Model Model finite, 53-56, 58-60, 64, 178, 183-184, 186 general linear, 97-100, 105 infinite, 56-58, 60, 64,178,183-184 less than full rank, 161-164 mean, 28, 164-174, 180 mixed, 177-187 Moment generating function, 18-20, 24, 26,42 Multivariate normal distribution conditional, 29-32 independence, 26-28 ofY,24 of transformation BY + b, 24 marginal distribution, 25 MGF of, 24 o
One-way classification, 13, 27,47-^8, 107-108,125,134 Overparameterization, 163-165, 168 P
Pattern, see Matrix Point estimation, see also BLUE and UMVUE for balanced incomplete block design, 138-143, 152-159
for general linear model, 105-107 least squares, 84-85 variance parameters, 85, 106, 142-143, 152-153, 179-182 Probability distribution, see also Random vector chi-square, 19-20, 33-35, 41-50 conditional, 29-32 multivariate normal, 23-36 noncentral corrected F, 48 noncentral F, 47 noncentral t, 47 PROCGLM, 171,209 PROC IML, 85, 140-141, 145, 180, 186, 189, 193, 201-202, 207, 213, 218 PROC VARCOMP, 183-184, 216 Pure error, see Least squares regression
q Quadratic forms, 3, 9, 11, 13, 17, 32-36, 41-50, 63, 85, 143, 146, 161, 166 independence of, 45—46
r
Random vector covariance matrix of, 17, 21, 28-29, 58-74,96, 111, 135, 138, 140, 145, 150-151, 155, 181, 185-186 expectation of, 17 linear transformation, of 17, 24 MGF of, 18-20, 24-26, 42 probability distribution, 17-18 Rank of matrix, see Matrix Regression, see Least squares regression Rejection region, 49, 94, 124, 128, 142, 147, 153, 168, 174, 185, 187 Reparameterization, 60,66, 183 Replication, see also Matrix, 47—48, 53-58, 60-61, 63, 66-67, 70, 77, 91, 97-98, 107, 126, 150 REML, 182-184, 188
228
Linear Models s
Satterthwaite test, 185-188 Sufficiency, 108-111 Sum of squares, 4, 14, 20, 33-35,47, 49, 60-69, 71,74, 77-80, 87, 91-96, 98, 100-103,107,112,124,131, 133-135, 140-143, 145, 147-148, 166, 174 type I, 94-96, 135, 137, 140-143, 145, 147, 152-153, 159, 179-182, 186
for variance parameter, 49 likelihood ratio, 124 Satterthwaite, 185 Trace of matrix, see Matrix Transformation, 17, 24 Transpose, see Matrix Two-way cross classification, 28, 34, 45, 49-50, 53, 66, 69, 135 Type ISS, see Sum of squares u
t
t distribution, 47-48, 126, 168 Tests F test, 49, 57, 94, 124, 142, 153, 168, 174, 185 for lack of fit, 91-94 for noncentrality parameters, 48 for regression, 124
UMVUE, 110-111, 128 v
Variance components, 135-136, 142-143, 179 Vector, see Random vector
PROBABILITY AND MATHEMATICAL STATISTICS
Thomas Ferguson, Mathematical Statistics: A Decision Theoretic Approach Howard Tucker, A Graduate Course in Probability K. R. Parthasarathy, Probability Measures on Metric Spaces P. Revesz, The Laws of Large Numbers H. P. McKean, Jr., Stochastic Integrals B. V. Gnedenko, Yu. K. Belyayev, and A. D. Solovyev, Mathematical Methods of Reliability Theory Demetrios A. Kappos, Probability Algebras and Stochastic Spaces Ivan N. Pesin, Classical and Modern Integration Theories S. Vajda, Probabilistic Programming Robert B. Ash, Real Analysis and Probability V. V. Fedorov, Theory of Optimal Experiments K. V. Mardia, Statistics of Directional Data H. Dym and H. P. McKean, Fourier Series and Integrals Tatsuo Kawata, Fourier Analysis in Probability Theory Fritz Oberhettinger, Fourier Transforms of Distributions and Their Inverses: A Collection of Tables Paul Erdos and Joel Spencer, Probabilistic Methods of Statistical Quality Control K. Sarkadi and I. Vincze, Mathematical Methods of Statistical Quality Control Michael R. Anderberg, Cluster Analysis for Applications W. Hengartner and R. Theodorescu, Concentration Functions Kai Lai Chung, A Course in Probability Theory, Second Edition L. H. Koopmans, The Spectral Analysis of Time Series L. E. Maistrov, Probability Theory: A Historical Sketch William F. Stout, Almost Sure Convergence E. J. McShane, Stochastic Calculus and Stochastic Models Robert B. Ash and Melvin F. Gardner, Topics in Stochastic Processes Avner Friedman, Stochastic Differential Equations and Applications, Volume I, Volume 2 Roger Cuppens, Decomposition ofMultivariate Probabilities Eugene Lukacs, Stochastic Convergence, Second Edition H. Dym and H. P. McKean, Gaussian Processes, Function Theory, and the Inverse Spectral Problem N. C. Giri, Multivariate Statistical Inference Lloyd Fisher and John McDonald, Fixed Effects Analysis of Variance Sidney C. Port and Charles J. Stone, Brownian Motion and Classical Potential Theory
Konrad Jacobs, Measure and Integral K. V. Mardia, J. T. Kent, and J. M. Biddy, Multivariate Analysis Sri Gopal Mohanty, Lattice Path Counting and Applications Y. L. Tong, Probability Inequalities in Multivariate Distributions Michel Metivier and J. Pellaumail, Stochastic Integration M. B. Priestly, Spectral Analysis and Time Series Ishwar V. Basawa and B. L. S. Prakasa Rao, Statistical Inference for Stochastic Processes M. Csorgo and P. Revesz, Strong Approximations in Probability and Statistics Sheldon Ross, Introduction to Probability Models, Second Edition P. Hall and C. C. Heyde, Martingale Limit Theory and Its Applications Imre Csiszar and Janos Korner, Information Theory: Coding Theorems for Discrete Memoryless Systems A. Hald, Statistical Theory of Sampling Inspection by Attributes H. Bauer, Probability Theory and Elements of Measure Theory M. M. Rao, Foundations of Stochastic Analysis Jean-Rene Barra, Mathematical Basis of Statistics Harald Bergstrom, Weak Convergence of Measures Sheldon Ross, Introduction to Stochastic Dynamic Programming B. L. S. Prakasa Rao, Nonparametric Functional Estimation M. M. Rao, Probability Theory with Applications A. T. Bharucha-Reid and M. Sambandham, Random Polynomials Sudhakar Dharmadhikari and Kumar Joag-dev, Unimodality, Convexity, and Applications Stanley P. Gudder, Quantum Probability B. Ramachandran and Ka-Sing Lau, Functional Equations in Probability Theory B. L. S. Prakasa Rao, Identifiability in Stochastic Models: Characterization of Probability Distributions Moshe Shaked and J. George Shanthikumar, Stochastic Orders and Their Applications Barry K. Moser, Linear Models: A Mean Model Approach