Lecture Notes in Statistics Edited by P. Bickel, P. Diggle, S. Fienberg , K. Krickeberg, 1. Olkin, N. Wermuth , and S. Zeger
166
Springer New York Berlin Heidelberg Barcelona HongKong London Milan Paris Singapore Tokyo
Hira L. Koul
Weighted Empirical Processes in Dynamic Nonlinear Models Second Edition
,
Springer
Hira L. Koul Department of Statistics and Probability Mich igan State University East Lansing, MI 48824 USA
Library of Co ngress Cataloging-in-Publ ication Data Koul , H.L. (Hira L.) Weighted empiric al processes in dynamic nonlinear model s / Hira L. Koul.-2nd ed. p. cm. - (Lecture notes in statistics ; 166) Rev. ed. of: Weighted empirica ls and linear models. cl 992. ISB N 0-387-95476-7 (softcover : alk. paper) I. Sampling (Statistics) 2. Linear models (Statistics) 3. Regression analysis. 4. Autoregressio n (Statistics) I. Kou l, H.L. (Hira L.). Weighted empiricals and linear models. II. Title. III. Lecture notes in statistics (Springer-Verlag) ; v. 166. QA276.6 .K68 2002 5 19.5-dc21 2002020944 ISBN 0-387 -95476-7
Printed on acid-free paper.
First edition © 1992 Institute of Mathem atical Statistics, Ohio.
© 200 2 Springer-Verlag New York, Inc. All rights reserved. This work may not be translated or copied in whole or in part without the written permission of the publi sher (Springer-Verlag New York, Inc., 175 Fifth Aven ue, New York , NY 100 I0, USA), except for brie f excerpts in connection with reviews or scholarly anal ysis. Use in connection with any form of information storage and retrieval, electron ic adaptation, computer software, or by similar or dissim ilar meth odology now known or hereafter developed is forbidden . The use in this publication of trade names, trademarks, service mark s, and similar terms, even if they are not identified as such , is not to be taken as an expres sion of opinion as to whether or not they are subjec t to proprietary rights. Printed in the United State s of America.
9 8 7 6 5 4 3 2 I
SPIN 10874053
www .springer-ny.com Sprin ger-Verlag New York Berlin Heidelberg A member ofBertelsmannSpringer Science+Business Media GmbH
Preface to the Second Edition The role of the weak convergence technique via weighted empirical processes has proved to be very useful in advancing the development of the asymptotic theory of the so called robust inference procedures corresponding to non-smooth score functions from linear models to nonlinear dynamic models in the 1990's. This monograph is an expanded version of the monograph Weighted Empiricals and Linear Models , IMS Lecture Notes-Monograph, 21 published in 1992, that includes some aspects of this development. The new inclusions are as follows. Theorems 2.2.4 and 2.2.5 give an extension of the Theorem 2.2.3 (old Theorem 2.2b .1) to the unbounded random weights case. These results are found useful in Chapters 7 and 8 when dealing with homoscedastic and conditionally heteroscedastic autoregressive models, actively researched family of dynamic models in time series analysis in the 1990's. The weak convergence results pertaining to the partial sum process given in Theorems 2.2.6 .and 2.2.7 are found useful in fitting a parametric autoregressive model as is expounded in Section 7.7 in some detail. Section 6.6 discusses the related problem of fitting a regression model, using a certain partial sum process. In both sections a certain transform of the underlying process is shown to provide asymptotically distribution free tests. Other important changes are as follows. Theorem 7.3.1 gives the asymptotic uniform linearity of linear rank statistics in linear autoregressive (LAR) models for any nondecreasing bounded score function ip, compared to its older version Theorem 7.3b.1 that assumed ip to be differentiable with uniformly continuous derivative. The new Section 7.5 is devoted to autoregression quantiles and rank scores. Its
VI
Preface to the Second Edition
contents provide an important extension of the regression quantiles of Koenker-Bassett to LAR models. The author gratefully acknowledges the help of Kanchan Mukherjee with Section 8.3, Vince Melfi's help with some tex problems and the NSF DMS 0071619 grant support. East Lans ing, Michigan March 18, 2002
Hira L. Koul
Preface to the First Edition An empirical process that assigns possibly different non-random (random) weights to different observations is called a weighted (randomly weighted empirical process. These processes are as basic to linear regression and autoregression models as the ordinary empirical process is to one sample models. However their usefulness in studying linear regression and autoregression models has not been fully exploited. This monograph addresses this question to a large extent. There is a vast literature in non parametric inference that discusses inferential procedures based on empirical processes in k-sample location models. However, their analogs in autoregression and linear regression models are not readily accessible. This monograph makes an attempt to fill this void. The statistical methodologies studied here extend to these models many of the known results in k-sample location models, thereby giving a unified theory. By viewing linear regression models via certain weighted empirical processes one is naturally led to new and interesting inferential procedures. Examples include minimum distance estimators of regression parameters and goodness - of - fit tests pertaining to the errors in linear models. Similarly, by viewing autoregression models via certain randomly weighted empirical processes one is naturally led to classes of minimum distance estimators of autoregression parameters and goodness .. of - fit tests pertaining to the error distribution. The introductory Chapter 1 gives an overview of the usefulness of weighted and randomly weighted empirical processes in linear models. Chapter 2 gives general sufficient conditions for the weak convergence of suitably standardized versions of these processes to continuous Gaussian processes. This chapter also contains the proof of the asymptotic uniform linearity of weighted empirical processes based
Vlll
Preface to the First Edition
on the residuals when errors are heteroscedastic and independent. Chapter 3 discusses the asymptotic uniform linearity of linear rank and signed rank statistics when errors are heteroscedastic and independent. It also includes some results about the weak convergence of weighted empirical processes of ranks and signed ranks. Chapter 4 is devoted to the study of the asymptotic behavior of M- and Restimators of regression parameters under heteroscedastic and independent errors, via weighted empirical processes. A brief discussion about bootstrap approximations to the distribution of a class of Mestimators appears in Section 4.2.2. This chapter also contains a proof of the consistency of a class of robust estimators for certain scale parameters under heteroscedastic errors. In carrying out the analysis of variance of linear regression models based on ranks, one often needs an estimator of the functional f jd
Preface to the First Edition
IX
shown to be consequences of the asymptotic continuity of certain basic weighted and randomly weighted empirical processes. Chapters 2-4 are interdependent. Chapter 5 is mostly self- contained and can be read after reading the Introduction. Chapter 6 uses results from Chapters 2 and 5. Chapter 7 is almost self-contained. The basic result needed for this chapter appears in Section 2.2b. The first version of this monograph was prepared while I was visiting Department of Statistics, Poona University, India, on sabbatical leave from Michigan State University, during the academic year 1982-83. Several lecture on some parts of this monograph were given at the Indian Statistical institute, New Delhi, and Universities of La Trobe, Australia, and Wisconsin, Madison. I wish to thank Professors S. R. Adke, Richard Johnson, S. K. Mitra, M. S. Prasad and B. 1. S. Prakasa Rao for having some discussions pertaining to the monograph. My special thanks go to James Hannan for encouraging me to finish the project and for proof reading parts of the manuscript, to Soumendra Lahiri for helping me with sections on bootstrapping, and to Bob Serfling for taking keen interest in the monograph and for many comments that helped to improve the initial draft. Ms. Achala Sabne and Ms. Lora Kemler had the pedestrian task of typing the manuscript . Their patient endeavors are gratefully acknowledged. Ms. Kemler's keen eye for details has been an indispensable help. During the preparation of the monograph the author was partly supported by the University Grants Commission of India and the National Science Foundation, grant numbers NSF 82-01291, DMS9102041. East Lansing, Michigan May 28, 1992
Hira L. Koul
Contents Preface to the Second Edition
v
Preface to the First Edition
vii
Notation and Conventions
xv
1 Introduction 1.1 Weighted Empirical Processes . . . . . . . 1.2 M-, R- and Scale Estimators . 1.3 M.D. Estimators & Goodness-of-fit Tests . 1.3.1 Minimum distance estimators 1.3.2 Goodness-of-fit testing . 1.3.3 Regression model fitting . . . . . 1.4 R.W.E . Processes and Dynamic Models
1 1 2 5 5 8 8 10
2 Asymptotic Properties of W.E.P.'s 2.1 Introduction... . . . 2.2 Weak Convergence . . 2.2.1 Wd - processes 2.2.2 Vh - processes . 2.2.3 C¥n- processes . 2.3 AUL of Residual W.E.P.'s 2.4 Some Additional Results for W.E.P.'S . 2.4.1 Laws of the iterated logarithm 2.4.2 Weak convergence ofW.E.P.'s in 1)[0, 1JP , in Pq - metric and an embedding result. 2.4.3 A martingale property . . . . . . . . . . . . ..
15 15 16 16 27 43 50 64 64 65 67
CONTENTS
xii
3
Linear Rank and Signed Rank Statistics 3.1 Introducti on.. . . . . . . . . . . . .. 3.2 AUL of Lin ear Rank Statistics .. .. 3.3 AUL of Linear Signed Rank Statistics 3.4 Weak Convergence of Rank and Signed Rank W.E.P.'s. . . . . . . . . . . . . . . . . . .
69 69 70 83
M , R and Some Scale Estimators 4.1 Introducti on...... .. . . . . 4.2 M-Estimators . . . .. . . . . . . 4.2.1 First order approximations: Asymptotic normality . . . 4.2.2 Bootstrap approximations. 4.3 Distributions of Some Scale Estimators. 4.4 R-Estimators . . . 4.5 Est ima tion of QU) . . . . . . .
99 99 101 101 108 112 121 128
5
Minimum Distance Estimators 5.1 Introducti on .. . . . . . .. .. 5.2 Definitions of M.D. Estimators 5.3 Finite Sample Properties. . . . 5.4 A General M. D. Estimator . . 5.5 Asymptotic Uniform Quadraticity 5.6 Distributions, Efficien cy & Robustness 5.6.1 Asymptotic distributions and efficiency 5.6.2 Robustness . . . . . . . . . . . . . . . . 5.6.3 Locally asymptotically minimax property
138 138 140 145 159 173 201 201 209 218
6
Goodness-of-fit Tests in Regression 6.1 Introducti on . . . . . . . . . . . . . 6.2 The Supremum Distance Tests . . 6.2.1 Asymptotic null distributions 6.2.2 Bootstrap distributions 6.3 L2-Distance Tests. . . . . . . . . . . 6.4 Testing with Unknown Scale. . . . . 6.5 Testing for the Symmetry of the Errors 6.6 Regression Model Fitting . . . . . . . .
229 229 231 231 241 247 256 258 265
4
89
CONTENTS
xiii
6.6.1 .Introduction . 6.6.2 Transform Tn of Sn,t/J' . 6.6.3 Computation of TnSn,t/J 6.6.4 Proofs of some results of Section 6.6.2
· · · ·
265 269 278 281
7 Autoregression 7.1 Introduction . . . . . . . . 7.2 AUL of W h and Fn . . . . 7.3 GM- and GR- Estimators 7.3.1 GM-estimators.. 7.3.2 GR-estimators .. 7.3.3 Estimation of QU) := f jd
294 . 294 . 297 . 311 . 311 . 312 . 319 . 321 . . . . . . . . .
333 335 338 340 341 341 344 351 354
8 Nonlinear Autoregression 8.1 Introduction. . . . . . . . . . . . . 8.2 AR Models . . . . . . . . . . . . . 8.2.1 Main Results in AR models 8.2.2 Examples of AR models . . 8.3 ARCH Models 8.3.1 ARCH Models and Some Definitions 8.3.2 Main Results in ARCH models 8.3.3 Examples of ARCH models . . . . .
. . . . . . . .
358 358 359 360 376 380 381 383 399
9 Appendix 9.1 Appendix
.408
10 Bibliography
414
408
Notation and Conventions
AUL C-S D.C .T. dJ. ('s) Fubini
...-
IIglioo
...-
i.i.d. L-F CLT
..-
N(o,C)
.-
0(1)(op(l))
.-
O(l)(Op(l))
"-
r.v .(,s)
.-
ft.~.~.J>.('s)
..-
7 a2
up(l)
.-
~.E.J>.('s)
..-
w.r.t.
Asymptotic uniform linearity. the Cauchy-Schwarz inequality. the Dominated Convergence Theorem. distribution function(s) . the Fubini Theorem. the supremum norm over the domain of a real valued function g. independent identically distributed. the Lindeberg-Feller Central Limit Theorem. either a LV. whose distribution is normal with mean vector 0 and covariance matrix C or the corresponding distribution. a sequence of numbers (LV.'S) converging to zero (in probability) . a sequence of numbers (r.v.'s) that is bounded (in probability). random variable(s). randomly weighted empirical process(es). L:?=I a;i' for an arbitrary real vector (anI, ' " ,ann)" a sequence of stochastic processes converging to zero uniformly over the time domain, in probability. weighted empirical process(es) . with respect to.
xvi
Notation and Conventions
The p-dimension Euclidean space is denoted by JRP , P ~ 1; JR = JRl ; BP := the (J- algebra of Borel sets in JRP, B = B\ A := Lebesgue measure on (JR, B) . The symbol" := " stands for "by definition" . For any set A C JR, V(A) denotes the class of real valued functions on A that are right continuous and have left limits while VI(A) denotes the subclass in V(A) whose members are nondecreasing. C[O,I] := the class of real valued bounded continuous functions on [0,1]. A vector or a matrix will be designated by a bold letter.
t E JRP is a p x 1 vector, t' its transpose, Iltl/ := L:~=l max{ltjl, 1 :::; j :::; pl. For any p-square matrix C , 2
t;, It I
A :=
IICI/co = sup{IIt'C1I ; lit II :::; I}. For an n x p matrix D , d~i denotes its i t h row, 1 :::; i :::; n, and D, the n x p matrix D - -0, whose i t h row consists of (d n i - an)', with an := L:~l dndn , 1 :::; i :::; n. Oft en in a discussion or in a proof the subscript n on the triangular arrays and various other quantities will not be exhibited. The index i in L:i and maxi will vary from 1 to n, unless specified otherwise. All limits, unless specified otherwise, are taken as n --t 00. For a sequence of LV .'S {X, X n , n ~ I}, X n --td X means that the distribution of X n converges weakly to that of X. For two r.v .'s X , Y, X 4 Y means that the distribution of X is the same as that ofY. For a sequence of stochastic processes {Y,Yn , n ~ I}, Yn ===} Y means that Yn converges weakly to Y in a given topology. Yn - 7 J.d. Y means that all finite dimensional distributions of Yn converge weakly to that of Y. For convenient reference we list here some of the most often used assumptions in the manuscript. For an arbitrary d.f, F on JR, assumptions (Fl), (F2) and (F3) are as follows:
(Fl)
F has uniformly continuous density f w.r.t. A.
(F2)
f > 0, a.e . A.
(F3)
sUPXE IR Ixf(x)1
::; k < 00,
sUPXEIR Ixf(x(I
+ u)) -
xf(x)1
--t
0,
as
u
--t
0.
Notation and Conventions
XVll
These conditions are introduced for the first time just before Corollary 3.2.1 and are used frequently subsequently. For an n x P design matrix matrix X , the conditions (NX) , (NXl) and (NX c) are as follows:
(NX)
(X'X)-1 exists, V
n ~ Pi
max X~i(X'X)-1x~i = 0(1).
1~~~n
(NXl)
(X~Xc) -1 exists , V n ~ P;
The condition (NX) is the most often used from Theorem 2.3.3 onwards. The letter N in these conditions stands for Noether, who was the first person to use (NX) , in the case P = 1, to obtain the asymptotic normality of weighted sums of r.v .'s ; see Noether (1949).
1 Introduction 1.1
Weighted Empirical Processes
A weight ed empirical process (W .E.P.) corresponding to the random variables (r.v.ls] X n 1 , · · · , X nn and the non-random real weights dn1, ..., dnn is defined to be n
Ud( X)
:=
L dniI(Xni :::; x) ,
x E JR, n 2:: 1.
i= l
The weights {dni} need not be nonnegative. The classical example of a W.E .P. is the ordinary empirical process that corresponds to dni == n- 1 . Another example is given by the two sample empirical process obtained as follows: Let m be an integer, 1 :::; m :::; n ,r := n - m ;dni = -rln ,l :::; i :::; m ;d ni = min, m + 1 :::; i :::; n . Then the corresponding Ud - process becomes
Ud(X) " (mrjn) {r-
1i~~' u x.; S x) -
m-
1
~ tix; S X)} ,
precisely the process that arises in two-sample models. More generally, W.E.P's arise naturally in linear regression models where, for each n 2:: 1 and some (3 E W , the data {(x~i ' Ynd , 1 :::; i :::; n} are related to the error variables {eni , 1 :::; i :::; n} by the linear relation (1.1.1)
1 :::; i :::; n .
H. L. Koul, Weighted Empirical Processes in Dynamic Nonlinear Models © Springer-Verlag New York, Inc 2002
1. Introduction
2
Here enl, '" , enn are independent r .v.'s with respective continuous d.f.'s Fnl, .. . ,Fnn, x~i = (Xnib ... ,Xnip) is the i t h row of the known n x p design matrix X and f3 is the parameter vector of interest. Consider the vector of W .E.P.'s V := (VI , ..., Vp ) ' where n
(1.1.2)
Vj(y, t) :=
L xnijI(Yni ~ Y + x~it) , i=1
for y E~, t E JRP , 1 ~ j ~ p. Clearly, Vj(" t) is an example of the W .E.P. Ud(') with dni == Xnij and x.; == Yni - x~it, 1 ~ i ~ n , 1 ~ j ~ p. Observe that the data {(x~i' Yni) , 1 ~ i ~ n} in the model (1.1.1) are readily summarized by the vector of W.E.P.'s {V(y, 0) , y E ~} in the sense that the given data can be recovered from the sample paths of this vector up to a permutation. This in turn suffices for the purpose of inference about f3 in (1.1.1). In this sense the vector of W .E.P 's {V(y, 0), y E ~} is at least as important to linear regression model (1.1.1) as is the ordinary empirical process to one-sample location models. One of the purposes of this monograph is to discuss the role of V - processes in inference and in proving limit theorems in models (1.1.1) in a unified fashion .
1.2
M-, R- and Scale Estimators
Many inferential procedures involving (1.1.1) can be viewed as functions of V. For example the least squares estimator, or more generally, the class of M- estimators corresponding to the score function 'lj;, (Huber: 1981), is defined as a solution t of the equation
J
'lj;(y)V(dy, t) =
a known constant.
Similarly, rank (R) estimators of f3 corresponding to the score function
J
(1.2.1)
Hn(y, t)
:=
n- I
L I(Yni ~ y + x~it), i=1
Y E~, t E W.
1.2. M-, R- and scale estimators
3
A significant portion of non parametric inference in models (1.1.1) deals with M- and R- estimators of f3 (Adichie; 1967. Huber; 1973) and linear rank tests of hypotheses about f3 (Hajek - Sidak; 1967) . By viewing these procedures as functions of {V(y ,t) ,y E IR,t E W}, it is possible to give a unified treatment of their asymptotic distribution theory, as is done in Chapters 3 and 4 below . There is a vast literature in nonparametric inference that discusses inferential procedures based on functionals of empirical processes in the k-sample location model such as the books by Puri and Sen (1969), Serfli.ng (1980) and Huber (1981) . Yet their appropriate extensions to the linear regression model are not readily accessible. This monograph seeks to fill this void . The methodology and inference procedures studied here extend many known results in the k-sample location model to the model (1.1.1), thereby giving a unified treatment . An important result needed for study of the asymptotic behavior of R-estimators of f3 is the asymptotic uniform linearity of the linear rank statistics of (1.2.1) in the regression parameter vector. Jureckova (1969, 1971) obtained this result under (1.1.1) with i.i.d . errors. A similar result was proved in Koul (1969, 1971) and van Eeden (1972) for linear signed rank statistics under i.i.d. symmetric errors . Its extension to the case of non-identically distributed errors is not readily available. Theorems 3.2.4 and 3.3.3 prove the asymptotic uniform linearity of linear rank and linear signed rank statistics with bounded scores under the general independent errors model (1.1.1). In the case of i.i.d. errors, the conditions in these theorems on the error d .f. are more general than requiring finite Fisher information for location. The results are proved uniformly over all bounded score functions and are consequences solely of the asymptotic sample continuity of V - processes and some smoothness of {Fnd . The uniformity with respect to the score functions is useful when constructing adaptive rank tests that are asymptotically efficient against Pitman alternatives for a large class of error distributions. Chapter 3 also contains a proof of the asymptotic normality of linear rank and linear signed rank statistics under independent alter-
1. Introduction
4
natives and for indicator score functions. This proof proceeds via the weak convergence of certain basic W.E.P. 's and complements some of the results in Dupac and Hajek (1969). Section 4.2.1 discusses the asymptotic distribution of M- estimators under heteroscedastic errors using the asymptotic continuity of V- processes. Section 4.2.2 presents some second order results on bootstrap approximations to the distribution of a class of Mestimators. In order to make M-estimators scale invariant one often needs an appropriate robust scale estimator. One such scale estimator, as recommended by Huber (1981) and others, is 81
=
m ed{lYni - x~iJI, 1:S i :S n} ,
where i3 is an estimator of {3. The asymptotic distribution of 8 1 under heteroscedastic errors is given in Section 4.3. If the errors ar e i.i.d. , this asymptotic distribution does not dep end on i3 provided the common error distribution is symmetric around O. This observation naturally leads one to construct a scale estimator based on the symmetrized residuals , thereby giving another scale estimator 82
:=
m ed{lYni - x~ii3 - Yni + x~ii3l ; 1 :S i, j :S n} .
As expected , the asymptotic distribution of 82 is shown to be free from the estimator i3 in the case of i.i.d . errors , not necessarily symmetric. It also appears in Section 4.3. Section 4.4 discusses the asymptotic distribution of a class of R-estimators under heteroscedastic errors using the asymptot ic uniform linearity results of Chapter 3. The R-estimators considered ar e asymptotically equivalent to Jaeckel's estimators. The complete rank analysis of the linear regression model (1.1.1) requires an estimate of the scale parameter Q(J) :=
!
fdrp(F)
where f is density of the unknown common error d .f. F and sp is a nondecreasing function on (0,1) . This estimate is used to standardize the test statistic and estimate the standard error of the R-estimator
1.3. M.D . ESTIMATORS & GOODNESS-OF-FIT TESTS
5
corresponding to the score function cpo This parameter also appears in the efficiency comparisons of rank procedures and it is of interest to estimate it , after the fact, in an analysis. Lehmann (1963) , Sen (1966) , Koul (1971), among others, provide estimators of QU) in the one - and two - sample location models and in the linear regression model. These estimators are given in terms of the lengths of Lebesgue measures of certain confidence intervals or regions. They are usually not easy to compute when the dimension p of f3 is larger than 1. In Section 4.5, estimators of QU) , based on kernel type density estimators of f and the empirical d.f H n , are defined and their consistency under (1.1.1) with i.i.d. errors is proved. An estimator whose window width is based on the data and is of the order of square root n, is also considered. The consistency proof presented is a sole consequence of the asymptotic continuity of certain W.E .P.'s and some smoothness of the error d.f. 'so
1.3 1.3.1
M.D. Estimators & Goodness-of-fit tests Minimum distance estimators
The practice of obtaining estimators of parameters by minimizing a certain distance between some functions of observations and parameters has been present in statistics since its beginning. The classical examples of this method are the Least Square and the minimum Chi Square estimators. The minimum distance (m.d.) estimation method, where one obtains an estimator of a parameter by minimizing some distance between the empirical d.f. and the modeled d.f, was elevated to a general method of estimation by Wolfowitz (1953, 1954, 1957). In these papers he demonstrated that, compared to the maximum likelihood estimation method, the m.d. estimation method yielded consistent estimators rather cheaply in several problems of varied levels of difficulty. This methodology saw increasing research activity from the mid 1970's when many authors demonstrated various robustness proper-
1. Introduction
6
ties of certain m.d. estimators. See, e.g., Beran (1977, 1978), Parr and Schucany (1979), Millar (1981, 1982, 1984), Donoho and Liu (1988 a, b) , among others. All of these authors restrict their attention to the one sample setup or to the two sample location model. See Parr (1981) for additional bibliography on m.d. estimation till 1980. In spite of many advances made in the m.d. estimation methodology in one sample models, little was known till late 1970's as to how to extend this methodology to one of the most applied models, v.i.z., the multiple linear regression model (1.1.1). A significant advantage of viewing the model (1.1.1) through V is that one is naturally led to interesting m.d. estimators of f3 that are natural extensions of their one- and two- sample location model counterparts. To illustrate this, consider the m.d. estimator {j of the one sample location parameter 0, when errors are i.i.d. symmetric around 0, defined by the relation
{j:= argmin {Tn(t); t E JR}, with
Tn(t) = !{n-l/2tI(Yni::; y+t)-I(-Yni < y-t)]}2dG(y), t E JR, i=l
where G E 'DI(JR). Since (1.1.1) is an extension of the one sample location model, it is only natural to seek an extension of {j in this model. Assuming that {eni} are symmetrically distributed around 0, the first thing one is tempted to consider as an extension of (j is defined by the relation
e;
f3t := argmin{Kt(t) ; t
E
JRP},
with
Kt(t)
.-
! {n-
1
/
2t
[I(Yni ::; y +
x~it)
i=l
- I (- Yni < y -
x~i t)]
r
dG (y), t E JRP.
However, any extension of {j to the linear regression model should have the property that it reduce to {j when the model is reduced to
1.3.1. Minimum distance estimators
7
the one sa mple locati on model and , in addit ion, t hat it redu ce to an appropriate extens ion of {; to the k-sample locati on model when the model (1.1.1) is reduced to it. In t his sense f3t does not provide t he right exte nsion but f3 j( does, where
f3 j(: = arg min{K i(t) ; t E JRP },
(1.3.1)
wit h
Ki (t) y+'
.-
J
.-
(V1+ , ... , V/ ),
Y +'( y, t )(X'X)-ly+ (y, t )dG(y) , t E IRP,
n
Vj+(y, t ) .- Vj( y,t) - L Xnij
+ Vj( -y ,t ),
i= l
for 1 ::; j ::; p , y E ~, t E W. In the case errors are not symmet ric bu t i.i.d . according to a known d .f. F , so that EVj( y, (3 ) == L i xnij F (y), a suitable class of m.d. estimators of f3 is defined by t he relati on (1.3.2)
{3x
K x (t)
' -
.-
ar gmin {K x(t); t E JRP },
J
2
II W (y, t )11 dG(y ),
W (y, t ) .- (X' X) - 1/2{y(y , t ) - X'IF (y )}, I' .- (1" " , 1h xn .
y E ~, tE JRP ,
Chapter 5 discusses t he existe nce, the asymptot ic distribution, t he robustness and the asymptotic optimality of f3j( and {3x under (1.1.1) with het eroscedastic errors. For exa mple, if p = 1 in (1.1.1) and the design vari abl e is nonnegative then th e asy mptotic variance of f3 j( is smaller th an t hat of f3t for a large class of symmetric error d.I. 's F and integrating measures G. A similar resu lt holds abo ut {3x and for p ~ 1. Chapter 5 also discusses severa l ot her m.d. estimators of f3 and t heir asymptotic th eory under (1.1.1) wit h heteroscedastic errors . T hese include analogues of {3x when t he common error d.f. is unk nown and some m.d. est imators correspo nding to certain supremum distances based on Y.
1. Introduction
8
1.3.2
Goodness-of-fit testing
Closely related to the problem of m.d. estimation is the problem of testing the goodness-of-fit hypothesis H o : Fn i == Fo, Fo a known d.f.. One test statistic for this problem is
fh := supln 1 / 2{Hn (y,,8) - Fo(y)}l,
(1.3.3)
y
where ,8 is an estimator of f3 . This test statistic is suggested by looking at the estimated residuals and mimicking the one sample location model technique. In general, its large sample distribution depends on the design matrix. In addition, it does not reduce to the Kiefer (1959) tests of goodness-of-fit in the k-sample location problem when (1.1.1) is reduced to this model. Test statistics that overcome these deficiencies are
b,
:=
sup IWo(y, ,8)1,
fh
:=
y
sup IIWo(y,,8)11 , y
where WO is equal to the W of (1.3.2) with F = Fo. Another natural class of tests is based on K~,.(,8x), where K& is equals to the Kx of (1.3.2) with W replaced by WO in there. All of the above and several other goodness-of-fit tests are discussed at some length in Chapter 6. Section 6.2.1 discusses the asymptotic null distributions of the supremum distance statistics Dj ,j = 1,2 ,3. Also discussed in this section are asymptotically distribution free analogues of these tests, in a sense similar to that discussed by Durbin (1973, 1976) and Rao (1972) for the one-sample location model. Section 6.2.2 discusses smooth bootstrap approximations to the null distributions of tests based on W .E.Po's . Tests based on L 2 - distances are discussed in Section 6.3. Some modifications of goodness-of-fit tests when F o has a scale parameter appear in Section 6.4 while tests of the symmetry of the errors are discussed in Section 6.5.
1.3.3
Regression model fitting
Another problem of interest is that of fitting a regression model. More precisely, let X, Y be r .v .ts with X being p-dimensional. In
1.5. Randomly weighted etapixicels
9
much of the existing literature the regression function is defined to be the conditional mean function /-l(x) := E(YIX = x), xE JRP , assuming it exists, and then one proceeds to fit a parametric model to this function, i.e., one assumes the existence of a parametric family
M = {m(x, 0) : x E W , 0 E 8} of functions and then proceeds to tests the hypothesis
/-l(x) = m(x, ( 0 ) ,
for some 00 E 8 and V x EX,
based on n i.i.d. observations {(Xi ,Yi); 1 ::; i ::; n} on (X, Y), where X is a compact subset of JRP . We consider the problem of fitting the model M to the 'ljJ- regression function defined as follows. Let 'ljJ be a nondecreasing real valued function such that EI'ljJ(Y - r)1 < 00, for each rElit Define the 'ljJ-regression function m'lj; by the requirement that
E['ljJ(Y - m'lj;(X)) IX]
= 0, a.s.
Observe that, if 'ljJ(x) == 'ljJo.(x) := I(x > O)-(l-a) , for an 0 < a < 1, then m'lj;(x) == mo.(x), the a th quantile of the conditional distribution of Y, given X = x, and if 'ljJ(x) == x, then m'lj; = /-l. Tests for fitting a model to m'lj;, i.e., for testing the hypothesis
H o : m'lj;(x) = m(x, (0),
for some 00 E 8 and "Ix EX,
will be based on the weighted empirical process n
Sn,'Ij;(x,O)
1 2
:= n- /
L: 'ljJ(Yi - m(Xi, 0)) I(£(Xd ::; x), i=l
where x E [-00, (0), l is a known function from ~p to ~, and 0 is a n 1 / 2 _ consistent estimator of 0 0 . This process is an appropriate extension of the usual partial sum process useful in testing for the mean in the one sample problem to the current regression setup. Section 6.6 discusses the weak convergence of this process and gives a transform of this process whose asymptotic null distribution is known, so that tests based on this transformed process are asymptotically distribution free.
1. Introduction
10
1.4
R.W.E. Processes and Dynamic Models
A randomly weighted empirical process (RW.E .P.) corresponding to the r.v. 's (nl, ' " (nn , the random noise bnl , ' " , bnn and the random real weights h nl, .. . ,hnn is defined to be n
(1.4.1)
Vh(X)
:= n-
l
I: hniI((ni :s; x + bnd ,
x E JR, n ::::: 1.
i =l
Examples of RW.E.P.'s are provided by the W.E .P.'s {Vj; 1 :s; j :s; p} of (1.1.2) in the case the design variables are random. More importantly, RW .E .P.'s arise naturally in autoregressive models. To illustrate this let Yo = (X o, ' " , X l - p)1 be an observable random vector. In the pth order linear autoregressive (LAR(p)) model one observes {Xi} obeying the relation (1.4.2) for some P = (PI , '" , pp)' E JRP , where {Ei ,i::::: I} are i.i .d , r.v .'s, independent of Yo . Processes that play a fundamental role in the robust estimation of P in this model are randomly weighted residual empirical processes T = (T I , ' " ,Tp )' , where, for x E JR, t E W , n
(1.4.3)
Tj(x , t)
:=
n-
I
I: gj(Yi -dI(X
i
:s;
X
+ t'Y i - l ) ,
i =I
with Y~-l (Xi-I, ..., Xi-p) , i ::::: 1, and where the components gj , 1 :s; j :s; p , of the vector g' = (gl, ' " ,gp) , are p measurable functions, each from JRP to JR. Clearly, for each 1 :s; j :s; p, Tj (x, P + n- l / 2t ) is an example of Vh(X) with (ni == Ei , bni == n- l / 2 t'Y i _ l and hni == 9j(Yi-d · It is customary to expect that a method that works for linear regression models should have an analogue that will also work in autoregressive models. Indeed the above inferential procedures based on W.E.P.'s in linear regression have perfect analogues in LAR(p) models in terms ofT. The generalized M-estimators of P as proposed by Denby and Martin (1979) corresponding to the weight function 9
1.5. Randomly weigbted empiricals
11
and the score function 'ljJ are given as a solution t of the p equations
J
'ljJ(x)T(dx, t)
= 0,
assuming that E'ljJ(c) = O. Clearly, the classical least square estimator is obtained upon taking g(y) == y , 'ljJ(x) == x in these equations. A generalized R-estimator PR corresponding to a score function sp is defined by the relation
PR := argmin {IIS(t)ll ;t
(1.4.4)
E
W},
where
S(t)
J
.-
cp(Fn(x, t))T(dx, t), n
Fn(x, t)
n-
:=
1
L I(X
i ::; X
+ t'Yi -d ,
x E JR, t E W.
i=l
An analogue of an R-estimator of (1.2.1) is obtained by taking g(x) == x in (1.4.4). The m.d. estimators pi that are analogues of'si of (1.3.1) are defined as minimizers, w.r.t . t E W , of
L J[n- L: X i_ j { I (X i ::; X+t'Yi-l) n
p
K+(t)
:=
1 2 /
j=l
i =l
- I( <X, <
X -
t'Yi -
1 ) } ] 2 dG(x) .
Observe that K+ involves T corresponding to 9j(Y i - 1 ) == Xi-j . Chapter 7 discusses these and some other procedures in detail. Section 7.2 contains a result that establishes the asymptotic uniform linearity in t of the R.W.E.P.'s {T(x, p+n- 1 / 2 t ), x E JR, Iltll ::; b} and the residual empirical processes {Fn(x , p + n- 1 / 2 t ), x E JR, litH::; b}, for every 0 < b < 00 . These results are used to investigate the asymptotic behavior of G-M- and G-R- estimators in Sections 7.3.1 and 7.3.2 respectively. In order to carry out the rank analysis in LAR(p) models, one needs a consistent estimator of Q(J) where now f is the error density of {ci}. A class of such estimators is given
12
1. Introduction
in Section 7.3.3. A large class of m.d. estimators and their asymptoties appears in Section 7.4. Section 7.5 discusses an extension of an important methodology of the regression quantiles of KoenkerBassett (1978) and the related rank scores to linear autoregression. Section 7.6 briefly discusses some tests of goodness-of-fit hypotheses pertaining to the error d.f, in linear AR models. The problem of fitting a given parametric autoregressive model of order 1 to a real valued stationary ergodic Markovian time series Xi, i = 0, ±1, ±2, '" is discussed in Section 7.7. Much of the development here is parallel to that of Section 6.6 above . Define the 'l/J-autoregressive function m1/J by the requirement that
E['l/J(X1
-
m1/J(Xo))IXo]
= 0, a.s.
Tests of goodness-of-fit for fitting the model M to m'/J are to be based on the process ~
n -1 '2 '"""'
M n, 1/J(x, lJ) := n i L 'l/J(Xi
-
m(Xi -
1,
~ lJ)) I(Xi -
1 ~
x),
i=l
for x E [-00,00] . Section 7.7 discusses a martingale type transform of this process whose asymptotic null distribution is known . The decade of 1990's has seen an exponential growth in the applications of nonlinear AR models to economics, finance and other sciences. Tong (1990) illustrates the usefulness of homoscedastic dynamic AR models in a large class of applied examples from physical sciences while Gourieroux (1997) contains several examples from economics and finance where the ARCH (autoregressive conditional heteroscedastic) model of Engle (1982) and its various generalizations are found useful. Most of the existing literature has focused on developing classical inference procedures in these models. The theoretical development of the analogues of the estimators discussed above that are known to be robust against outliers in the innovations in linear AR models has relatively lagged behind. Chapter 8 discusses the asymptotic distributions of the analogues of M-, R-, and m .d.- estimators and sequential empirical processes of residuals for a class of nonlinear AR and ARCH time series models. The main
1.5. Randomly weighted empiricals
13
point of the chapter is to exhibit the significance of the weak convergence approach in providing a unified methodology for establishing these results for non-smooth underlying score functions 1/J and .p, The contents of Chapter 2 are basic to those of Chapters 3, 4, 7, 8 and parts of Chapter 6. Sections 2.2.1 , 2.2.2 and 2.2.3, contain, respectively, proofs of the weak convergence of suitably standardized W .E.P.'s, R.W.E.P.'s and the partial sum processes an to continuous Gaussian processes. Even though W.E.Po's are a special case of R.W.E.Po's, it is beneficial to investigate their weak convergence separately. For example, the weak convergence of Ud is obtained under a fairly general independence setup and minimal conditions on {dni} whereas that of Vh is obtained under some hierarchal dependence structure on {ryni' hni , 8ni} and the boundedness and/or the square integrability of the weights {hni}. The process an is different from the Vh process because in these processes the weights are the innovations, whereas in the weighted empiricals process of innovations, the jumps occur at the innovations. In Section 2.3, the asymptotic continuity of certain standardized W.E.P.'s is used to prove the asymptotic uniform linearity of V(., t) in t , for t in certain shrinking neighborhoods of 13, under fairly general independent , non-identically distributed errors. This result is found useful in Chapter 4 when discussing M- estimators and in Chapter 6 when discussing supremum distance test statistics for goodness-of-fit hypotheses. The asymptotic continuity is also found useful in Chapter 3 to prove various results about rank and signed rank statistics under heteroscedastic errors. The asymptotic continuity of Vh - processes is found useful in Chapters 7 and 8 when discussing the AR and ARCH models . Chapter 2 concludes with results on functional and bounded laws of the iterated logarithm pertaining to certain W.E.Po's. It also includes an inequality due to Marcus and Zinn (1984) that gives an exponential bound on the tail probabilities of W .E.Po's of independent r.v .ts. This inequality is an extension of the well celebrated Dvoretzky, Kiefer and Wolfowitz (1956) inequality for the ordinary empirical process. A result about the weak convergence of W .E.Po's when r .v.ts are p-dimensional is also stated. These results are in-
14
1. Introduction
eluded for completeness , without proofs. They are not used in the subsequent sections. A martingale property of a properly centered Ud process is proved in Section 2.4.
2 Asymptotic Properties of W.E.P. 's 2.1
Introduction
Let, for each n ~ 1,17nl , ' " -Tlnn. be independent LV .'S taking values in [0, 1] with respective d.f.'s G n1, ... , G nn and dn1, .. . , dnn be real numbers. Define n
(2.1.1)
Wd(t) =
L dni{I(17ni :::; t) -
Gni(t)} , 0:::; t
< 1.
i =l
Observe that Wd belongs to V[O, 1] for each n and any triangular array {dni, 1 :::; i :::; n}, while Vh of (1.4.1) belongs to V(IR) for each n and any triangular array {h ni, 1 :::; i :::; n}. In this chapter we first prove certain weak convergence results about suitably standardized Wd, Vh and an processes. This is done in Sections 2.2.1, 2.2.2, and 2.2.3 , respectively. Section 2.3.1 uses the asymptotic continuity of a certain W d - process to obtain the asymptotic uniform linearity result about V(- , u) of (1.1.2) in u. Analogous result for T(·, u) of (1.4.3) uses the asymptotic continuity of a certain Vh - process and is proved in Section 7.2. The weak convergence results about an is used in Section 7.6 to derive the limiting distribution of some tests for fitting an autoregressive model to the given time series . H. L. Koul, Weighted Empirical Processes in Dynamic Nonlinear Models © Springer-Verlag New York, Inc 2002
2. Asymptotic Properties of W .E.P. 's
16
A proof of an exponential inequality for a stopped martingale with bounded differences due to Johnson, Schechtman and Zinn (1985) and Levental (1989) is included in Section 2.2.2. This inequality is of a general interest . It is used to carry out a chaining argument pertaining to the weak convergence of Vh with bounded h. Section 2.4 treats laws of the iterated logarithm pertaining to W d , the weak convergence of Wd when hd are in [O,lJP, the weak convergence of Wd w.r.t . some other metrics when {1Jd are in [0,1], an embedding result for Wd when {1Jd are i.i.d . uniform [0, 1J r.v.'s and a proof of its martingale property. It also includes an exponential inequality for the tail probabilities of W.E.P.'s of independent r. v.'s. This inequality is an extension of the well celebrated Dvoretzky, Kiefer and Wolfowitz (1956) inequality for the ordinary empirical process. These results are stated for the sake of completeness, without proofs. They are not used in the subsequent sections.
2.2 2.2.1
Weak Convergence Wd
-
processes
In this section we give two proofs of the weak convergence of suitably standardized {W d } to a limit in e[O, 1J. Accordingly, let n
(2.2.1) Gd(t)
:=
I: d;iGni(t) , i=l n
Cd(S, t)
:=
I: d;i[Gni(S
1\
t) - Gni(S)Gni(t)], 0:::; s, t :::; l.
i=l
Let p denote the supremum metric . Theorem 2.2.1 Let {1Jnd , {dnd and {Gnd be as in Section 2.1 . In
addition assume that the following hold;
17
2.2.1. Weak convergence ofWd
Then, [or every (i)
E
> 0,
lim lim sup P (sup 0-+0
It-sl
n
(ii) Moreover, Wd
I Wd(t)
- Wd(S)
I> E)
= O.
some W on (1)[0,1], p) ij and only ij [or every 0 ::; s, t ::; 1, Cd(s,t) converges to some covariance junction C(s, t). In this case W is necessarily a continuous Gaussian process with covariance junction C and W(O) = 0 = W(l). =}
Remark 2.2.1 Perhaps a remark about the labeling of the conditions is in order. The letter N in (N1) and (N2) stands for Noether who was the first person to use these conditions to obtain the asymptotic normality of certain weighted sums ofr.v .'s . See Noether (1949) . The letter C in the condition (C) stands for the specified conti-
nuity of the sequence {G d}. Observe that the d.f.'s {Gd need not be continuous for each i and n; only {G d } needs to be equicontinuous in the sense of (C). Of course if {7Jd are i.i.d. G then, because of (N1), (C) is equivalent to the continuity of G. 0 The proof of the theorem will be given after the following two lemmas.
Lemma 2.2.1 For any 0 ::; s ::; t ::; u ::; 1 and each n
~
1
E!Wd(t) - Wd(s)12IWd(U) - W d(t)!2
(2.2.2)
< 3 [Gd(U) - Gd(t)][Gd(t) - Gd(S)]
(2.2.3)
::; 3 [Gd(U) - Gd(s)f
Proof. Fix 0 ::; s ::; t ::; u ::; 1 and let
Pi = Gi(t) - Gi(S) , qi = Gi(u) - Gi(t) , ai = I(s < 7Ji ::; t) - Pi, f3i = I(t < 7Ji ::; u) - % 1 ::; i ::; n. Observe that Ea, = 0 = Ef3j for all 1 ::; i, j ::; n, {ad are independent as are {f3i} and that o; is independent of f3j for i =/= j. Moreover,
Wd(t) - Wd(S) =
L diai, i
Wd(U) - Wd(t) =
L dif3i · i
2. Asymptotic Properties of W .E.P. 's
18
Now expand and multiply the quadratics and use the above facts to obtain
(2.2.4) EIWd(t) - Wd(s)12IWd(U) - Wd(t)\2
"D d~2 ECt~f3? 2 2
+" " d~d2J ECt2d~ECt~ Ef32J DD J 2
2
2
iii
+ 2 I:I: dTdjE( Ctif3d E(Ctjf3j) . i#j
But
ECtT
=
ECtTf31
=
Pi(1 - pd , Ef3] = qj(1 - qj) , 2piq; (1 - pd + (1 - qd 2qiPT + Piqi(1 - qi - Pi)
< {(I - Pi) + (1 - qi) + (1 - qi - Pi)}Piqi < 3piqi, -(1 - pi)Piqi - (1 - qdqiPi + Piqi(1 - qi - pi)
E(Ctif3i)
l:;i:;n,l:;j:;n .
Piqi, Therefore,
LH8(2.2.4)
< 3{
3
:~;>ttPiqi + ~:L d; d]Piqj }
[~dlPi] [yd]qj] .
This completes the proof of (2.2.2) , in view of the definition of {pi, qj} . That of (2.2.3) follows from (2.2.2) and the monotonicity of the Gi , 1 :; i :; n . 0
Lemma 2.2.2 For every
(2.2 .5)
E
> 0 and s :;
U,
P[ sup IWd(t) - Wd(s)1 2: E] s9'5:u :; f,;E -
4
[Gd(U) - Gd(s)f
where r: does not depend on
E,
+ P[IWd(U)
- Wd(s)1 2: E/2]
n or on any underlying quantity.
2.2.1. Weak convergence Of Wd Proof. Let 8 = u - s, m (2.2.6)
~j
~
1 be an integer , and for 1 :::; j :::; m , let
Wd((j /m)8
=
19
+ s) -
Wd(((j - 1)/m )8 + s),
k
2::= ~j, u.; =
s,
j=l
max
l ::;k ::;m
ISkl·
The right cont inuity of W d implies that for each n and each sample pa th , M m -+ sup{ IWd(t ) - Wd(s)l; s :::; t :::; u} as m -+ 00, w.p.L In view of Lemma 2.2.1, Lemma 9.1.1 in the Appendix is applicable to t he ab ove LV . ' S { ~j } with 2, a 1 and
,=
=
Hen ce (2.2.5) follows from t hat lemma and the right cont inuity of Wd. 0 Proof of Theorem 2.2.1. For a 8 > 0, let r = [8- 1 ], th e grea tes t integer less than or equa l to 1/8. Define tj = j8, 1 :::; j :::; rand to = 0. Let f j = Wd(tj ) - Wd(tj-d , 1 :::; j :::; r. Then
p( It-sl< sup IWd(s) o
Wd(s)1
~
E)
r
:::; 2::= p( j=l
su p
r
2 :::; KE- 2::= [Gd(t j ) - Gd(t j -dF j= l :::; KE -
r
+ 2::= P[ifj \ ~ E/ 6] j=l r
2
sup [Gd(t
+ 8) -
09 ::;1- 0
(2.2.7)
~ E/3)
IWd(S) - Wd(tj - d l
t j _ l ::;S::;t j
= I n( 8)
+ II (8) ,
Gd{t)]
+ 2::= P[lfjl j= l
~ E/ 6]
(say).
n
In t he above, the first inequ ality follows from Lemm a 9.1.2 of the Appendix, the second inequ ali ty follows from Lemma 2.2.2 above and t he last inequ ality follows becau se, by (N l) , r
(2.2.8)
2::= lGd(t j ) - Gd(tj - !lJ :::; Gd{l) j=l
= 1.
2. Asymptotic Prop erties of W.E.P.'s
20 Next, observe that a;
.- Var(fj)
L d;{G(t j) - Gi(tj _t}}{l - Gj(tj)
+ Gi(tj-l)} ,
and , by (2.2.8), th at T
(2.2.9)
L aj :s;
[Gd(t + 0) - Gd(t) ], all rand n .
sup O::; t:9 - <5
j=l
Furthermore, (Nl) and (N2) enable one to apply the Lindeberg Feller Central limit Theorem (L-F CLT) to conclude that atfj -+d Z , Z a N( o, 1) L V . Therefore, for every 0 > 0 (or r < (0) , T
II (0) - L P (IZ I ~ (E / 6)aj l)
(2.2.10)
-+ 0 as n -+
00 .
j=l
n
By the Markov inequality applied to the summa nds in the second term of (2.2.10) and by (2.2.9) , lim sup n
IIn (0)
T
< 3 lim sup L (6aj / E)4 (E Z 4 = 3) j=l
n
<
IiE-
4
lim sup n
sup [Gd(t + 0) - Gd(t )]. O
The resul t (i) now follows from t his, (2.2.7) and t he assumption (C ).
Proof of (ii) . Suppose Cd -+ C. Let m be a positive integer , o :s; iv . : . . .t-« :s; 1 and aI , . . . ,am be arbitrary but fixed numbers . Consid er m
t;
(2.2.11)
:=
L ajWd(tj ) = LdWi j=l
where
n
i= l
m
Vi := L aj{I(17i :s; tj ) - Gi(t j)} , j=l
1
:s; i :s;
n.
21
2.2.1. Weak convergence OfWd Note that m
IVil ::; L Jajl < 00,
(2.2.12)
1 ::; i ::; n.
j==l
Also, Var(Tn ) ---+ (]"2:= 'LJ==I'LJ==lajarC(tj,t r) . In view of (Nl) and (N2) , the L-F CLT yields that Tn ---+d N(O,l). Hence all finite dimensional distributions of Wd converge weakly to those of a Gaussian process W with the covariance function C and W(O) = 0 = W(l) . In view of (i) , this implies that Wd =?- W in (1)[0,1], d) with W denoting a continuous Gaussian process tied down at 0 and 1. Conversely, suppose Wd ==?- W. By (i), W is in C[0,1]. In particular the sequence Tn of (2.2.11) converges in distribution to T := 'LJ==1 aj W(tj) . Moreover, (2.2.12) and (Nl) imply that , for all n ~ 1,
E(f diVi)
4
t==l
< 3
=
f d;Ev; + f f; 3
t==l
t==l J :;<:t ,l
(~Iajr
Therefore {T;, n ~ I} is uniformly integrable and hence m
ET;
=L
m
m
m
L ajakCd(tj , tk) ---+ L L ajakCov[W(tj), W(tk)]
j==lk==l
j==lk==l
for any set of numbers 0 ::; {tj} ::; 1 and any finite real numbers al ,·· · , am' Hence Cd(S , t) ---+ Cov[W(s), W(t)] = C(s , t) for all 0 ::; s, t ::; 1.
Now repeat the above argument of the "only if" part to conclude that W must be a tied down Gaussian process C[O, 1]. 0 Another set of sufficient conditions for the weak convergence of {Wd} is given in the following
2. Asymptotic Prop erties of W .E.? 's
22
Theorem 2.2.2 Under the notation of Th eorem 2.2.1, suppose that
(N'l ) holds. In additio n, assume that the follo wing hold:
(B)
n max d; i = 0(1) . l
(D)
1
n- L
Gni(t) - t is nonincreasing in t , 0 ~ t ~ 1, n 2:: 1.
i =l
Th en also (i) and (ii) of Th eorem 2.2. 1 hold. Remark 2.2.2 Clearly (B) implies (N2). Moreover
[Gd(t
+ <5)
- Gd(t)] < n max dT[n- 1 L {Gi (t I
+ <5)
- Gi(t)}]
+ <5)
- (t
. I
nmrxd7[n-l L{Gi(t
+ <5)}
i
_n- 1 L {Gi (t ) - t}
+ <5]
i
< n max dT <5, I
0 ~ t ~ 1 - <5, by (D) .
Thus (B) and (D) together imply (N2) and (C) . Hence Theorem 2.2.2 follows from Theorem 2.2.1. However , we can also give a different pr oof of Theorem 2.2.2 which is direct and qu it e interest ing (see (2.2.15) below) . This pro of will be bas ed on the following two lemmas. Lemma 2.2 .3 Under (D) , for all n 2:: 1, 0
~
s, t
~
1,
(2.2.13)
Proof. Suppose 0 ~ s ~ t ~ 1. Let ai and Pi be as in the pr oof of Lemma 2.2.1. Usin g the indep endence of {ad and th e fact th at Eo; = 0 for all 1 ~ i ~ n, one obtains
E(Ldia i f
o~ d
4 Ea 1
I
I
+ 30 ~ ~ d~d~Ea2 Ea~ 0 J J
"2;- d; {Ea; I
I
I
ih
2( 3E aT)}
+ 3 ( L dT E aT) 2 I
2.2.1. Weak convergence
23
OfWd
+3 [
~ d;Pi(1- Pi)]
2
z
k~{ n-
< But s
:s: t
2
LPi
+3
z
(n-1~Pi) 2}. z
and (D) imply
a:s: n- 1 LPi =
n- 1
L[Gi(t) - Gi(S)J
i
:s: (t -
s).
i
Hence the
L .H.S.(2.2.13) :s: k~{n-l(t - s) + 3(t - sf} ,
o:s: s :s: t :s:
1.
The proof is completed by interchanging the role of sand t in the above argument in the case t :s: s. 0 Next , define, for (i - l)jn :s: t :s: ijn, 1:S: i :s: n,
(2.2.14) Zd(t)
=
Wd((i - l)jn)
+ {nt -
(i - l)}[Wd(ijn) - Wd((i - l)jn)].
Lemma 2.2.4 The assumption (D) implies that V a:s: s , t
:s: 1, n
~
1, (2.2.15) If, in addition, (N!) and (B) hold, then
(2.2.16)
sup IWd(t) - Zd(t)1 = op(l). t
Proof. Let n ~ 1 and a :s: s, t integers 1 :s: i , j :s: n such that
(i - l)jn
(2.2.17)
:s:
:s: s :s: ijn
1 be arbitrary but fixed. Choose
and (j - l)jn
:s: t:s:
jjn.
For the sake of convenience, let Ok,m
.-
4k~[(m - k)jnf ,
bk,m ~u,v
IZd(mjn) - Zd(kjn)1
.-
IZd(U) - Zd(V)! ,
= IWd(mjn) - Wd(kjn)l,
m , k integers ;
2. Asymptotic Properties of W .E.? 's
24 From (2.2.13), (2.2.18)
E8tm
< k~{3(m - k)2 jn 2 + n- 2lm - kl} < 4k~[(m - k)jn]2 =: bk,m'
The proof of (2.2.15) will be completed by considering the following three cases . Case 1. i < j - 1. Then because of (2.2.14) and (2.2.17), D.s,t ::; max{ 8i ,j -
l , 8i ,j , 8i - 1 ,j - l , 8i - 1 ,j }
which entails that (2.2.19)
< E{8 4 . 1 + 84 . + 84 1 ,J.- 1 + 84 1 ,J. } < bi,j-l + bi,j + bi-1 ,j-l + bi-l ,j < 4bi-l ,j = 16k~[(j - (i - l)) jnf ~,J -
~ ,J
~-
~-
where the second inequality follows from (2.2.18) and the last from o ::; j - i - I < j - i < j - (i - 1). Note that (2.2.17), i < j -1 and i ,j integers imply that
3(t - s) 2: [j - (i - l)] jn . From this and (2.2.19) one obtains (2.2.20) Case 2. i = j . In this case (i - l)jn ::; s , t ::; ijn. From (2.2.14) one has D.s,t =
nit - sI8i -
1 ,i
so that from (2.2.18) (2.2.21) The last inequality follows because n( t - s) ::; 1. Case 3. i = j - 1. By the triangle inequality
2.2.1. Weak convergence Of Wd
25
Thus by Case 2, applied once with sand ifn and once with i fn and t , one obtains
E,6.~,t
< 24 { E~~,i/n ,t } < 26 k~ {(i fn - s)2 + (t - ifn)2}
:s 27k~(t - sf .
In view of this, (2.2.21) and (2.2.20) , t he proof of (2.2.15) is comp lete . To prove (2.2.16), let di+ = max(O, dd , di.: = max( O, - di ) . Then one has d; = di+ - di _ . Decompose Wd and Zd accordingly. Note = dT, I :s i :s n. This and (Nl) t hat max (dT+ , dL ) = dT+ + imply that Td+ :s 1, Td- :s 1. It also implies that if (N2 ) is sat isfied by the {dd then it is also sa t isfied by {di+ , di - } . By the triangle inequality,
dL
Moreover , di+ 1\ di.: 2: 0, for all i . Therefore, it is enough to prove (2.2 .16) for d i 2: 0, I :s i :s n. Accordingly suppos e t hat is the case . T hen
(2.2.22) where m!tx
sup
\Wd (t ) - Wd((i - 1)/n)J,
m ax
sup
IWd(t ) - Wd(i f n )1
l ::;z::;n (i- l )/ n::; t::;i/n l ::;z::;n (i- 1)/n::;t::;i/ n
:s t :s ifn, and a;
For (i - l)fn
2: 0,1
:s i :s n ,
IWd(t) - Wd(if n)1
< I
n
n
j=l
j=l
I: dj I (t < rlj :s i f n )1+ I: dj i
i - I
n
[Gj (i f n ) - Gj (t )] i
i - I
< I Wd(~) - Wd(---;-) I + 2 I:dj [Gj ( ~ ) - Gj (---;- )] j=l
< Ji -
1i
,
+ 2 1<max d ', '
by (D)
2. Asymptotic Properties of W .E.P.'s
26
This inequality, (2.2.18) and the Markov inequality imply that for every to > a and for n sufficiently large such that 2 maxj dj < E, the existence of which is guaranteed by (B) ,
P(U2 2:: E)
2m~xdd
< P(max6i- 1,i 2:: El
l
n
< (E-2maxdd-4"I:-E6f-l ,i l
< (E -
i==1
2maxdi)-4.4k~n-l -+ O. z
Exactly similar calculations show that U1 = op(1) .
D
Proof of Theorem 2.2.2 . Observe that Zd(O) = a = Zd(1) and that Zd E C[0,1] for every n 2:: 1 and each sequ ence {dd · Hen ce by (2.2.15) and Theorem 9.1.2 of the Appendix, {Zd} is tight in C[O , 1]. Thus claim (i) follows from (2.2.16). To prove (ii) just argue as in 0 th e proof of (ii) of Theorem 2.2.1 above. The following corollary will be useful later on. To state it we need some mor e notation. Let Fn1, ..., Fnn denote d.f.'s 011 JR and Xn i be a LV . with dJ. F ni , 1 < i ~ n. Define, for x E JR, 1 ~ i ~ n, a ~ t S; 1,
(2.2.23) H( x)
n- 1
.-
n
"I:- Fni(x) ;
H- 1(t):= inf{ x ; H( x) 2:: t} ;
i== 1 n
Lni(t)
'-
Fni(H-
1(t)) ;
Ld(t) :=
"I:- d~iLni(t) , i==1
n
W;(t)
"I:- «; {I(Xni ~ H- 1(t)) -
:=
Lni(t)} .
i== 1
Corollary 2.2 .1 As sum e that (2.2.24)
X n1, . . . , X n n are independent r.v. 's with respective d.f. 's Fn1, . . . , Fnn on JR.
In addition, suppose that {dnd , {Fnd satisfy (N1) , (N2) and (C*)
lim lim sup n
o-tO
sup [Ld(t + 6) - Ld(t)] = O.
0::;1::;1-0
Then, for every E > 0, (2.2.25)
lim limsupP( sup IW;(t) - W;(s)1 2:: E) = O. o-tO
n
It-si d
27
2.2.2. Vh - Processes
Proof. Follows from Theor em 2.2.1 (i) applied to
1]i
=H (Xd , G = i
L i ,1 :'S i:'S n.
0
Remark 2.2.3 Not e that if H is continuous then n
n-
1
2:: Lni(t ) = t. i= l
Therefore, sup [L d(t +8) -Ld(t)] :'S nm~xd;i8.
O
~
Thus , if we strengthen (N2) to require (B) then (C*) is a priori sa tisfied. That is, the conditions of Theorem 2.2.2(i) are satisfied. If Fni = F, where F is a cont inuous d.f. , th en Lni(t) = t. Therefore, in view of (N l) , (C*) is a priori sa tisfied . Moreover C;[( s, t ) := Cov (Wd'(s) , W d' (t )) = s (1 - t) , a :'S s :'S t :'S 1. Therefore we obtain
Corollary 2.2.2 Suppose that X n 1 , . . . , X nn are i.i. d. F , F a contin-
uous d.f. . Suppose that {dnd satisfy (Nl) and (N2 ). Then W d' in ("P[O, 1], p) with B a Brownian bridge in C[a, 1].
~
B
Ob serve th at dni = n - 1 / 2 satisfy (N l) and (N 2) . In ot her word s t he above corollary includes t he well celebrated result , viz., the weak convergence of t he sequen ce of t he ordinary empirical processes. Note. A vari an t of Theorem 2.2.1 was first proved in Koul (1970). The above formul ation and proof is based on thi s work and that of Withers
(1975). Theorem 2.2.2 is motiv at ed by the work of Shorack (1973) which deals only with the weak converge nce of the W1-process, t he pr ocess Wd with d-« == n- 1!2. The sufficiency of condit ion (D) for (2.2.13) was observed by Eyst er (1977).
2.2.2
Vh
-
0
processes
In t his subsect ion we shall investi gat e the weak convergence of the R.W.E.P.'s {Vh(x ), x E IR} of (1.4.1). To state t he genera l result we need some more st ructure on t he un derlyin g r. v.'s.
2. Asymptotic Properties of W.E.P. 's
28
Accordingly, let (n, A , P ) be a pr obability space and G be a d.f on 1Ft For each integer n ~ 1, let (( ni , h ni , 8nd, 1 :::; i :::; n, be an array of trivariate LV. 'S defined on (n ,A) su ch that {(ni, 1 :::; i :::; n} ar e i.i.d. G LV . 'S and (ni is ind ependent of (hni' and for each 1 :::; i :::; n. Furthermore, let {And be an array of sub o-fields such that A ni c A n,i+l ,l :::; i:::; n,n ~ 1; (hnl,an d is A nI - measurable; the L V. ' S {( nl ,'" ,(n,j - l; (h ni, and , 1 :::; i :::; j } are A nj - measurabl e, 2 :::; j :::; n; and (nj is indep endent of A nj , 1 :::; j :::; n . Define, n
I: hniI (( ni :::; x + ani),
l
(2.2.26) Vh(X) := n -
i= l
n
Jh( X)
:= n-
l
I: E [hniI(( ni :::; x + 8ni )IA ni] i=l n
= n-
I: hniG(x + ani),
l
i= l
n
Vh*(x)
:= n -
l
I: hniI (( ni :::; x),
n
Jh (x )
i= l
Uh(X) := n l / 2 [Vh (x ) - Jh(X)], Uh(x ) := n l / 2 [Vh*(x ) - Jh(x)],
:= n -
l
I: hniG(x) , i= l
x E lIt
We shall give three results here. The first one st ates and pr oves t he weak converge nce of t hese processes for t he bounded weights while t he second states the same resul t under weaker assumptions for only square integrable weights. The proofs of t hese resul ts are similar in sp irit yet different. An addit iona l similar theorem along with it s proof per taining to ana logues of t he Vh process suitable for heteroscedastic autoregressive models is also given in this sect ion. To begin with we state th e following Theorem 2.2.3 In addition to the above, assume that the f ollowing
conditions hold: (2.2.27) (2.2.28)
sup max Ihnil :::; c, a.s. f or som e constant c
n>l
max I
I
lanil =
Op(l).
< 00 .
2.2.2. Vh
-
29
Processes
Then, for every x E lR at which G is continuous,
(2.2.29) In addition, if
(2.2.30) (2.2.31)
G has a uniformly continuous a. e. positive Lebesgue density g,
then
(2.2.32) If, in addition,
(2.2.33)
Ani C A n+l,i, 1 ~ i ~ n; n 2:: 1.
(2.2.34)
(n
-1","""
2 )1 /2
Z:: h ni
=
0:
+ op(l),
0:
a positive r.v.,
Uk
0:'
B(G)
i
then
(2.2.35)
Ui, =?
0: '
B(G) ,
=?
where B is a Brownian Bridge in e[O, 1], independent of 0:.
The proof of (2.2.29) is straightforward using Chebbychev's inequality, while that of (2.2.32) uses a restricted chaining argument and an exponential inequality for martingales with bounded differences. It will be a consequence of the following two lemmas.
Lemma 2.2.5 Under {2.2 .27} - {2.2.31}, \:IE>
li~n P (sx~; n-
1 2 /
t,
Ihnil'IG(y + Oni) - G(x
°
and for r = 1,2,
+ onill ,,; 2,'
where the supremum is taken over the set
{x ,y E lR;n 1/ 2 IG(x ) - G(y)1 ~
E} .
c) ~
1,
2. Asymptotic Properties of W .E.P. 's
30
Proof. Let E> 0, q(u) := g(G- 1 (u)) , 0 :::; u :::; 1; I n := maxi Ibil , sup{lq(u) - q(v) l; lu - vi :::; m - 1/ 2 }
.-
Wn
sup{ lg(x) - g(y) l; IG(x ) - G(y) 1:::; m- 1/ 2 } , sup{lg( x) - g(y)l ; Iy - z ] :::; I n }.
L\n .-
By (2.2.31), q is uniform ly continuous on [0,1] . Hence , by (2.2.28) , (2.2.36)
L\n = op(l ),
Wn
= 0(1 ).
But n
supn- 1 / 2 x ,y
L
Ihi n G(y + bi) - G(x
+ bd l
i= l
n
<
sup n- 1 / 2 x, y
L
n
Ihi II'IG(y) - G(x)1
+ n-
i= l
1
L
Ihin bil[w n + 2L\n]
i= l
< c1' E + Op(l ).op(l ),
by (2.2.30) and (2.2.36).
o
This completes the pr oof of the Lemma.
Lemma 2.2 .6 Let {Fi , i 2: O} be an in creasing sequence of a - fields , m be a positive integer,
T :::;
m be a stoppi ng tim e relative to
{Fi} and
{(i , 1 :::; i :::; m} be a sequen ce of real valued martingale differen ces ui.r.i . {Fd. I n add it ion , suppose that for so m e con stan t M , L
I(il
(2.2.37)
<M
< 00 ,
1 :::; i :::; m ,
T
L E((TIFi-d :::; L .
(2.2.38)
i= l
Th en, for every a
(2.2.39)
p
> 0,
(It \;1a) >
:::;
aT
2 exp { - (a /2M) arcsi nh(Ma/2L)} .
Proof. Write = E((fIFi-d , i 2: 1. F irst , consider th e case T = m : Recall t he following elementary facts : For all x E JR ,
< 00 ,
31
2.2.2. Vh - Processes exp (x ) - x - I::; 2(cosh x -1 ) ::; x sinhx, sinh(x)jx is increasin g in [z], x ::; exp (x - 1).
(a) (b) (c)
Because E(~i IFi- d 1 ::; i ::; m , E{[exp(o~d
-
== 0 and by (2.2.37 ), for a
°>
0 and for all
1J1Fi-d < E{o6 sinh(06)I F i- l } , < a; o sinh(oM)jM,
(2.2.40)
by (a), by (b).
Use a condit ioning ar gum ent to obtain
}
E ex+t,~i
~ E[exp (S~
<
E[exp (S~
<
E[exp (S~
<
E[exp (;~ <}xp {(L - ~ a; ).(S/M ) Sinh(SM)}] ,
where th e second inequ ali ty follows from (2.2.40) while the first follows from (c). Obs erve that L - ~{:; is F j - 2 measurabl e, for all j ~ 2. Hence, it erating the above argument m - 1 times yields
a;
E exp { S
t,
cxp {L.(S/M) . sinh(SM)} .
Now, by the Markov inequ ality, Va > 0,
< eXP { o[L j M . sinh(oM )- aJ} .
2. Asymptotic Properties of W .E.? 's
32 T he choice of equa lity p
a=
(l/M) arcsinh(Ma/2L) in this leads to the in-
(~ ~i ~ a)
"
exp{( - a/2M) arcs inh(Ma/2L)} .
An application of this inequality to { -~i} will yield the same bo und for P(Z= ~l ~i ::; - a), thereby completing the proof of (2.2.39) in the case T = m. Now consider the general case T ::; m : Let xs = ~j I(j ::; T). Becau se the event [j ::; T] E Fj -l , it follows t hat {Xj , Fj} satisfy th e conditions of t he pr evious case. Hence
P(l t~il ~ a)
p ( l f Xil ~ a)
i= l
i= l
< exp{( -a/2M)arcsinh(Ma/2L)} .
0
Proof of Theorem 2.2.3. For th e clarity of th e pr esent proof it is important to emphasize the dep end ence of various underlying processes on n . Accord ingly, we shall write Vn , U'; et c. for Vh, Uh etc. in the proof. On IR define th e matric d(x, y) := IG(x ) - G(y) 11 / 2 . T his metric makes IR totally bounded. T hus, to prove th e th eorem, it suffices to prov e
IUn(Y) - U~( y)1 = op(l ),
(a) (b)
VE (i)
> 0:3 a > a lim sup P( n
(ii)
3
IUn(Y) - Un(x) 1 > E) < E,
sup d(x ,y )):S.c5
I U~ ( y )
lim sup P ( sup n
V yE IR,
- U~(x) 1
> E) < E.
d(x ,y) :S.c5
Proof of (a). The fact that Un -U~ is a sum of condit ionally centered bounded r.v.'s yields that n
Var(Un(y) - U~(y)) ::; En - 1
L
h~IG (y
+ ad -
i= l
by (2.2.27), (2.2.28), (2.2.31) and t he D.C.T .
G(y)1 = 0(1),
33
2.2.2. Vh - Processes
Proof of (b)(i) . The following proof of (b)(i) uses a restricted chaining argument as discussed in Pollard (1984, p. 160-162), and the exponential inequality of Lemma 2.2.6 above . Fix an E > O. Let an := [n l / 2 IE], the greatest integer less than or equal to n 1/2 IE, and define the grid 1i n := {Yj ;G(Yj) = jm- l / 2 ,
1::; j
::; an},
n 2:: 1.
Also let
= hi+ -
Write hi
hi-
Uh( X)
= max(O, hd , so that
n- l / 2
n
L hi+Zi( x) -
n- l / 2
i= l
n
L hi-Zi(X) i= l
UJ~(x) - Ui:(x) , say .
Thus to prove (b)(i) , by the triangle inequality, it suffices to prove it processes. The details of the proof shall be given for the u;t for process only ; those for the U;; being similar. Next , let , for k 2:: 1, and x , Y E JR,
o;
k
Ddx ,y)
:=
L hr+E {[Zi( X) -
Zi(y)]2IAi} ,
i=l
and define th e sequence of stopping times
+ ._ { . Dk(x , y)) < 3EC2 n } . . - n 1\ max k 2:: 1, max d2 ( x, yE 1i n x ,Y
Tn
Observe that T;t ::; n. To adapt the present situation to that of the Pollard , we first prove that P(T;t < n) --7 0 (see (2.2.41) below). This allows one to work with n- l / 2(T;t)1 /2U\ -instead of u;t . By r« Lemma 2.2.6 and the fact that arcsinh(x) is increasing and concave in x, one obtains that if X,Y in 1i n are such that d2 (x , y ) 2:: tEn - 1/ 2 then P(n-l/2(T:)1/2IU~(x) - U~(Y)I 2:: t) 2
::; 2 exp {- 2Cd 2t(2 , y) EarCSinh(1 /(6E c))}, X
for all t > O.
2. Asymptotic Properties of W .E.? 's
34
This enables on e to carry out the chaining arg ume nt as in P ollard. What rem ains to b e done is t o connec t between the points in IR and a point in 7-l n whi ch will b e done in (2.2.42) b elow. We shall now prove
P (r;t < n) -+ O.
(2.2.4 1)
Proof of (2.2.41) . For Yj , Yk in 7-l n with Yj < Yk , d2 (Yj , Yk) (k - j) E n- 1/ 2 . Hence, us ing the fact t hat (hi+)2 :::; hI ,
>
n
h;[G(Yk + 6d
< L
- G(Yj + 6d]{ (k -
12 j )E }-l n /
i= l
n
< {(k - j)E} - ln 1/ 2 L i= l
k- l
h; L[G(Yr+l
+ 6i )
-
G(Yr + 6dJ
r=j
n
Now apply Lemma 2.2.5 with r = 2 t o obtain P ( max
ll eA~ ,J -::: a n
2 } Dd X,Y) { d 2 (x. , Y ) < 3EC n ----* l.
This completes t he proof of (2.2.41). Next , for each x E IR, let Yjx deno te t he p oin t in 7-l n t hat is the closest to x in d-met ric from t he po ints in 7-l n t hat satisfy Yjx :::; x . We sha ll now prove: V E > 0, P (su p lu;t(x ) - u;t (Yjx)1 > SCE) -+ O.
(2.2.42)
x
Proof of (2.2.42) . Now write Vn+, J;t for Vn , I n wh en {hi } in these qu anti ti es is repl aced by { hi+} ' The defin it ion of Yjx' G increasing, and t he fact tha t lu.; :::; Ihil for all i, imply t hat sup In 1 / 2 [J;t (x ) - J;t (Yjx)J! x
35
2.2.2. Vh - Processes An application of Lemma 2.2.5 with r = 1 now yields that P(sup In 1/ 2[J: (x) - J:(Yj x)]1
(2.2.43)
x
> 4CE) ----+ O.
But hi+ 2:: 0, 1 ~ i ~ n , implies that Vn+ is nondecreasing in x . Therefore, using the definition of Yjx'
n - 1/ 2[U: (Yjx_d - U:(Yjx)]
+ J:(Yjx-d -
J:(Yj x)
Vn+(Yjx-d - Vn+(Yjx)
< Vn+(x) - Vn+(Yjx) < Vn+ (Yjx+1) - Vn+(Yjx) =
n - 1/ 2 [U: (Yjx+l ) - U:(Yjx)]
+ J:(Yjx+d
- J:(Yjx)'
Hence, (2.2.44)
sup In 1/ 2[Vn+(x ) - Vn+(Yjx] I x
< 2
max
l :'SJ :San
1U:(Yj+d - U:(Yj)1
+2 max In 1/ 2[Jlt (Yj +l ) - J:(Yj)]1 l :SJ :Sa n
Thus, (2.2.42) will follow from (2.2.43) , (2.2.44) and
P( m ax 1U:(Yj+l) - U:(Yj)1 > CE) -+ O.
(2.2.45)
l :'SJ :'San
In view of (2.2.41), to prove (2.2.45) , it suffices to show that (2.2.46) P( max n- 1/ 2(T: )1/21U++(Yj+d - U\(Yj)1 > CE) -+ O. t«
l:S J:San
r«
But the L.H.S. of (2.2 .46) is bounded above by an
T;;
L p(1 L hi+[Zi(Yj+l j=l
Zi(Yj)] I > etn
1 2 / ) .
i= l
Now apply Lemma 2.2.6 with ~i == hi+[Zi(Yj+d - Zi(Yj)], Fi-l == A i, T == T;; , M = c, a = etn 1 / 2 , m = n . By the definition of T;;, L = 3c2E2n1 / 2 . Hen ce, by Lemma 2.2.6,
T;;
p(1 L
lu.; [z, (Yj+d - Zi(Yj)]! > cm
i= l
n 1/ 2E
< 2exp[--2-arcsinh(1/6E)] .
1 2 / )
2. Asymptotic Properties of W.E.P. 's
36
Since this bound does not depend on j , it follows that L.R.S . (2.2.46)
n 1/ 2 E
~ 2E- 1n1 / 2 exp[--2-arcsinh(1/6E)] ~ 0.
This completes the proof of (2.2.42) for u;t. As mentioned earlier the proof of (2.2.42) for U;; is exactly similar, thereby completing the proof of (b)(i). Adapt the above proof of (b)(i) with Oi == to conclude (b)(ii) . Note that (b )(ii) holds solely under (2.2.27) and the assumption that G is continuous and strictly increasing, the other assumptions are not required here. The proof of (2.2.32) is now complete. The claim (2.2.35) follows from (b)(ii) above, Lemma 9.1.3 of the Appendix and the Cramer - Wold device . 0 As noted in the proof of the above theorem, the weak convergence of Uh holds only under (2.2.27), (2.2.34) and the assumption that G is continuous and strictly increasing. For an easy reference later on we state this result as
°
Corollary 2.2.3 Let the setup of Theorem 2.2.3 hold. Assume that G is continuous and strictly increasing and that {2.2.27}, {2.2.34} hold. Then, Uh '* a · B (G), where B is a Brownian bridge in C[a, 1], independent of a. Consider the process Uh(t) := Uh(G- 1(t)),0 ~ t ~ 1. Now work with the metric It - 81 1/ 2 on [0,1]. Upon repeating the arguments in the proof of the above theorem, modified appropriately, one can readily conclude the following
Corollary 2.2.4 Let the setup of Theorem 2.2.3 hold. Assume that G is continuous and that {2.2.27}, {2.2.34} hold. Then {Uh} ==> a.B, where B is a Brownian Bridge in C[a, 1], independent of a . Remark 2.2.4 Suppose that in Theorem 2.2.2 the r.v.'s 1]nl,'" Tln» are i.i.d . Uniform [0,1] . Then, upon choosing h n i == n 1/ 2 dn i , (ni == G- 1 (1]n d , one sees that Ui, == Wd(G), provided G is continuous. Moreover the condition (D) is a priori satisfied , (B) is equivalent to (2.2.27) and (Nl) implies (2.2.34) trivially. Consequently, for this
37
2.2.2. Vh - Processes
sp ecial setup , Theor em 2.2.2 is a special cas e of Corollary 2.2.4 . But in general t hese two res ults serve differe nt purposes. T heore m 2.2.1 is the most general for the indep endent setup given there and ca n not b e deduced from Theorem 2.2 .3. The next t heo rem gives an analog of t he above theorem when t he weights are not necessarily bounded r. v. 's nor do they need to be square integrable.
Theorem 2.2.4 Suppos e th e setup in (2.2.26), and the assump tions
(2.2.28), {2.2. 30) , (2.2.31), and (2.2.34) hold. Th en (2.2.32) continues to hold. If, in addition, (2.2.33) holds, and
(2.2.47) For each n 2: 1, {h ni ; 1::; i ::; n} are square in tegrable, th en (2.2.35) cont in ues to hold. A proof of this theorem appears in Koul and Ossiander (1994) . This proof uses the ideas of bracket ing and chaining and is a bit involved , hence not included her e. We are rather interested in its numerous applications to time ser ies models as will be seen la ter in the Chapters 7 and 8 of this monograph. The next t heorem gives an analogu e of the above two res ults useful in the presence of het eroscedasticity. According ly, let Tni, 1 ::; i ::; n b e another array of r. v.'s indep en dent of (ni , for each 1 ::; i ::; n . Let {And b e an array of sub a -fields such that Ani C A n,i+I , 1 ::; i ::; n , n 2: 1, (h nl, Snl , Tnl) is AnI - meas urable; the r.v. 's {(n l ,· · · ,(n,j - I;(hni,Sni, Tni), 1 ::; i ::; j} are A nj - measurable, 2 ::; j ::; n ; and ( nj is indep endent of A nj , 1 ::; j ::; n . Define, n
(2.2.48)
Vh(x)
I 2 := n- /
L
hniI((ni ::; x
+ XTni + Sni) ,
i= l
n
Jh( x)
I 2
:= n- /
L
hniG( x
+ XTni + Sni ),
i= l
x ER
2. Asymptotic Properties of W.E.P. 's
38
Next , we introduce the addit ional needed assumptions. (2.2.49)
G has Lebesgue den sity 9 satisfying the following:
> 0 on the set { x : 0 < G(x) < I} , g(G- 1( u))
9
is uniformly cont inuous in 0 :::; u :::; 1, (2.2.50)
sup(1
+ Ixl)g(x) < 00 ,
x EIR
(2.2.51)
lim sup Ixg(x(1
u-.OxEIR
E
(2.2.52)
(n- t 1
+ u))
- xg(x )1=
o.
h;i) = 0(1) .
1=1
(2.2.53)
max ITnil = op(I). l 'S I'Sn
(2.2.54)
n'/' E[n-' t{h~;(IOn;1 + ITn;I)}]' ~ 0(1).
(2.2.55)
n-
n
1 2 /
L
IhniTnil
= Op(I) .
i= l
Remark 2.2.5 Not e that (2.2.50) impli es that for some const ant
0
00 ,
(2.2.56)
D( s) := sup IG(x
+ xs) -
G(x)1 :::; C [s],
vs E R
xEIR
Clearly, D( s) :::; 1 :::; 21 sl , for all lsi 2: 1/2 . Now consider -1 /2 O. In this case -log(1 + s ) :::; -2s = 21 sl , and D( s) =
l
x
X(!+ S)
1 yg(y)-dy :::; sup Iylg(y){ -log(1 y yE IR
+ s)}
<s<
:::; Cl sl·
A similar and somewhat simpler argument proves the claim in the case 0 < s < 1/2, using the fact log(1
+ s ) :::; S .
D
Theorem 2.2.5 Suppo se the setup in (2.2.26), and the assu mption s (2.2.28) , (2.2.30) and (2.2.4g)-{2.2.55) hold. Th en, (2.2.57)
sup IUh(X) - U':( x)1 xE IR
=
op(I).
39
2.2.2. Vh - Pro cesses
Suppose T n i == O. Then (2.2.57) is equivalent t o the conclus ion (2.2.32) of Theor em 2.2.4. But the conditio ns of Theor em 2.2.5 are clearly mu ch st ro nge r t han those needed by Theor em 2.2.4. Theor em 2.2.4 gives t he b est kn own result of t his ty pe useful in homosced ast ic autoreg ress ive (AR) t ime series models. Theorem 2.2.5 gives on the ot her hand a sim ila r result useful in autoreg ressive condit ionally het erosced astic (ARCH) time series mod els. As will be seen in the Chapters 7 and 8 of this mon ograph, these results are used t o ob tain t he asymptotic uniform lin earity of the analogues of M and R scores, and of the empirica l process of the residuals in standardized underlying paramet ers in a large class of AR and ARCH models . The latter result about the emp irical processes in turn is useful in variety of inference problem s in these models, including the investigation of the consiste ncy of t he er ror density estimates . Theorems 2.2.4 and 2.2.5 are also useful in obtaining the asympto t ic uniform qu adraticity result of a certain class of minimum dist ance scores in t hese mod els.
Proof of Theorem 2.2.5 . Ass u me withou t loss of generality t hat all b-« are non-n egati ve. Nex t , wri te Uh(X) = U:(x) + U;:(x) , wher e U:(x), U;:(x) correspond to t hat part of t he sum in Uh(X) whi ch has T n i 2: 0, Tn i < 0, respectively. Decompose Uh(x) sim ila rly. It t hus suffices to show tha t (2.2.58)
sup IU:(x ) - U~ +(x) 1
op( l) ,
x EIR
(2.2.59)
sup IU;:(x) - U~- (x ) 1
op( l ).
xE IR
Details will be given only for (2.2.58) , they b ein g similar for (2.2.59) . Now, fix a o > oa nd let - 00 = Xo < Xl < . .. < Xrn - l < Xrn = 00 I (j8/ n l / 2 ) , 0 ~ j ~ r - 1 and be a partition of ~ wh ere Xj = n l 2 T ri := [n / /8J+ 1. Note that
a-
(2.2.60) The dep enden ce of X j 's on n is supp ressed for the sake of convenience . Also , in t he proof be low C is a generic constant , possib ly differen t in different conte xt, but never dep ending on n or t he {x j }.
40
2. Asymptotic Prop erties of W .E.P. 's
Using the monotonicity of th e indicator fun cti on and the d.f. G, we obt ain that for Xj- I < x ~ Xj,
(2.2.61) IU: (x) - U~ + (x)1
< IU: (x j ) - U~+(xj- Il l + IU:(xj- Il - U~ +(xj)1 -+ *+ + 2 1J-+ h (Xj) - J h (xj- I)I + 2 1Jh (Xj ) Bnj,1 + Bnj ,2 + 2Bnj,3 + 2Bnj,4, say .
Jh*+ (xj- Ill
Note th at by (2.2.52), n
I~~~n Bn j,4 ~ n _J_
I
L Ih i= 1
n il8
= Op(8).
Next , conside r Bnj,l. For th e sake of br evity, let tni = Then , one can rewrite
Bnj ,1 =
n- I / 2
T
ni
+
1.
n
L hni { I ((ni ~ Xjtni + 8nil - I ((ni ~ .Tj - Il i=1
which is a sum of mar t ingale differences . We sha ll apply the inequality (9.1.4) given in t he Append ix below, with p = 4, V i = A ni' and
Di == n- I/ 2 h ni { uc; ~ Xjtni + 8nil
- I( ( ni ~ xj - Il
- G(Xjtni + 8nil + G(x j-Il} . Use
IDil ~ n - I / 2 hni , \:j i,
to obtain, for an
E
and th e fact
> 0,
P[jBnj,l ! > E] n
< CE- 4
(n- L Eh~i 2
i= 1
+E [n- I
t i= 1
h~i lG(Xjtni + 8ni) -
G(xj -Il!f ).
41
2.2.2. Vh - Processes
The first term in the above inequ ality is free from j . To deal with the second term, use the boundedness of g, (2.2.56), and a teles coping argument to obtain that for all 1 ::; j ::; r«. 1 ::; i ::; n,
G(Xjtni + ond - G(Xj-l) G(Xj) - G(x j-d
+ G(Xjtni + Oni) -
G(Xjtni)
+G( Xjtnd - G(Xj)
< C [0 n- 1!2 + IOnil +
ITnil].
Therefore,
Hence, using r n p (
= O(n 1!2),
~ax IBnj,l I > E)
l SJSrn
< C (n- 1! 2 (1 + 02) n-1
n
L Eh; i i=l
+ n 1!2E[n - 1 t h; i(IOnil + ITnil)f) , i=l which in turn, together with (2.2.52) and (2.2.54) , implies that maXlSj Srn IBnj,l I = op(l) . A similar state ment holds for Bnj,2. Next , consider n
1
Bnj ,3 = n - !2
L hni [G(Xjtni + ond -
G(Xj-ltni + Oni)]
i=l n
<
n-
1
n
L
hniO+ n - !2 1
i=l
L hniTni[Xjg(Xj) -
Xj- l g(xj- d ]
i=l n
+n- ! 2 L hni[G(x jtni + Oni) - G(Xjt nd - Onig(Xjtni)] 1
i=l n
+n- !2 L hni[G(Xj tnd - G(Xj) - TniXjg(Xj)] 1
i=l
2. A symptotic Prop erties of W .E.P. 's
42 n
1 2 - n- /
L hni[G( Xj-ltnd -
G(xj -d
i=l n
1 2 _n- /
L hni[G( Xj-ltni + Oni) -
G( Xj -ltni)
i=l n
+ n-
1 2 /
L hni oni [g(Xj tnd -
g(Xj- ltni )J.
i=l Now, let m.; := maxl ::;i::;n IOnil , /-Ln := maxl ::;i::;n ITnil· Not e that , uniformly in 1 :S j :S Tn , the sum of the absolute values of the t hird and sixt h term in th e above bound is bounded above by Cn - 1/ 2
n
L
Ihni ond
i=l
sup Ig(x) - g(y)1 = op ( l), I:r - yl::;mn
by (2.2.30) and the uniform continuity of g , implied by the the boundedness of 9 and th e uniform cont inuity of g(G- 1 ) on [0, 1J . Next , we bound the fourth term; the fifth term can be handled similarly. Clearly, the absolute valu e of the fourth term is bounded above by 1 2
n- /
n
~ IhniTniXjl
S; n -1/2
~ IkniTnil
[1
io
Ig(Xj
+ t XjTni)
l' ~~~{IxlIg(x
- g(Xj ) ldt
+ txli) -
g(x)11 dt = op(I) ,
by (2.2.51) and (2.2.55) . Finally, consider t he seventh term; the second term can be dealt with sim ilarl y. To begin with observe that by (2.2.56), _ max IG(xjtnd - G( xj)1 :S C m ax ITnil· l ::;t::;n,l ::;J::;r n l ::;t::;n Hen ce by the uniform cont inuity of g( G- 1 ) assure d by (2.2.49) , and by (2.2.53) , max Ig(x -t -) - g(x -)1 l ::;i::;n,l ::; j ::;rn J nz J =
_ max Ig(G-1(G(Xjtnd)) - g(G-1(G(xj)))1 = op (l ). l ::;t::;n,l ::;J::;r n
43
2.2.3. an - processes
Upon combining all these bounds and using E(n- 1 L:~l hnd = 0(1) , we obtains l~'~m IBn j ,31::; <5 Op(l) _ J_
+ op(l).
AIl of the above facts together with the arbitrariness of <5 thus imply 0 (2.2.57) , thereby completing the proof of Theorem 2.2.5.
Not e: The inequality (2.2.39) and its proof has its origins in th e papers of Johnson , Schechtman and Zinn (1985) and Levental (1989) . The proof of Theorem 2.2.3 has its roots in Levental and Koul (1989) and Koul (1991) . Theorem 2.2.5 was first proved in Koul and Mukherjee (2001). 0
2.2.3
(Xn-
processes
In this sub-section we sh all prove the weak convergence of a sequ ence of partial sum pro cesses useful in autoregressive mod el fitting. Accordingly, let X i , i = 0, ±1 ,··· be a strictly stationary ergodic Markov process with st ationary d.f. G, and let F, = 0"(Xi ,Xi - 1, ... ) be the o-field generated by the observations obtain ed up to time i. Furthermore, let , for each n 2: 1, {Zni , 1 ::; i ::; n } be an array of LV. ' S adapted to {Fd such that (Zni , X i) is strictly stationary in i , for each n 2: 1, and satisfying (2.2.62)
1 ::; i ::; n .
Our goal here is to establish the weak convergence of the pro cess n
(2.2.63)
a n( x) = n -
1 2 /
2: Z niI(Xi - 1 ::; x ),
x E R
i == l
The process an is also a weighted empirical process. It is also called a marked empirical process of the X i-1'S, the marks being given by the martingale difference array {Znil. An example of this process is the partial sum process of the innovations where Zni = Zi = (Xi - E(Xi/Xi-t}) , assuming the expectation exists. Further examples appear in Section 7.7 below. The process an takes its value in the Skorokhod sp ace D( - 00 , 00). Extend it continuously to ± oo by putting n
an(- oo) =0 and an (+oo) =n-1 /22:Zni . i == l
2. Asymptotic Properties of W.E.P. 's
44
Then an becomes a process in 1)[-00,00] . We now formulate the assumptions that guarantee the weak convergence of an to a continuous limit in 1)[-00,00] . To this end, let
Ly(x) := P(X I
-
ep(Xo) S; xlXo = y),
x, Y E
~,
where ip is a real valued measurable function . Because of the Markovian assumption, Lxo is the d.f. of Xl - ep(Xo), given Fo. For example, if {Xd is integrable, we may take ep(x) := IE[Xi+1 I Xi = x], so that epi+l := X i+! - ep(Xd are just the innovations generating the process {Xi} ' In the context of time series analysis the innovations are often i.i.d . in which case L y does not depend on y. However , for our general result of this section, we may let L y depend on y . This section contains two Theorems. Theorem 2.2.6 deals with the general weights Zni satisfying some moment conditions and Theorem 2.2.7 with the bounded Zni . The following assumptions will be needed in the first theorem: For some 7J > 0, 6 > 0, K < 00 and for all n sufficiently large: (2.2.64) (2.2.65) (2.2.66) There exists a function ip from ~ to ~ such that the corresponding family of functions {L y, y E ~} admit Lebesgue densities ly which are uniformly bounded:
suply(x) S; K < 00.
(2.2.67)
x ,y
There exists a continuous nondecreasing function [0, (0) such that \:j x E [-00,00], (2.2.68)
l n-
n
L E[Z;iIFi-l] I(X
i-
7
2
on ~ to
2(X) + op(l) . 1 S; x) = 7
i=l The assumptions (2.2.64) - (2.2.67) are needed to guarantee the continuity of the weak limit and the tightness of the process an , while (2.2.68) is needed to identify the weak limit . Here, B denotes the Brownian motion on [0, (0) .
45
2.2.3 . an - processes
Theorem 2.2.6 Under (2.2.64) - (2.2.68}, an ~ B
(2.2.69) where B time
0 7
2
0 7
2
in the space D[-oo, 00],
is a continuous Brownian motion on JR with respect to
2 7 .
Proof. For convenience, we shall now not exhibit the dependence of Zni on n . Apply the CLT for martingales given in Lemma 9.1.3 of the Appendix, to show that all finite dimensional distributions converge weakly to the right limit, under (2.2.64) and (2.2.68). As to tightness, fix -00 ~ tl < t2 < t3 ~ 00 and assume, without loss of generality, that the moment bounds of (2.2.64)- (2.2.66) hold for all n ~ 1 with K ~ 1. Then we have
[an (t3) - an (t2)J2[an (t2) -- an (td J2 n
2
n-2[l:ZiI(t2 < i =l
n-
2
X i-l
n
~ t3)] [l:ZiI(tl < Xi-l ~ t2)]
2
i=l
l: ~i~j(k(l, i,j ,k,l
with ~i = ZiI(t2 < X i-l ~ t3), ( i = ZiI(tl < X i-l ~ t2)' Now, if the largest index among i , j, k , l is not matched by any other, then JE{~i~j(k(l} = O. Also, since the two intervals (t2,t3J and (tl,t2J ar e disjoint , ~i(i == O. We thus obtain (2.2.70)
E{ n-
=
l: ~i~j(k(l} n - l: E{(i(j~n + n- 2 l: E{~i~j(n· 2
i,j,k,l 2
i,j< k
i,j
The mom ent assumption (2.2.64) guarantees that th e above expectations exist . In this proof, the constant K is a generic constant, which may vary from expression to expression but never dep ends on n or the chosen t 's . We shall only bound the first sum in (2.2.70), the second being dealt with similarly. To this end , fix a 2 ~ k ~ n for the time being
2. Asymptotic Prop erties of W .E.P. 's
46 and write
(2.2.71)
L
E{ (i(j~n
i,j
k- l
i=1
i=1
E{ (L (if~k} = E{ (L (iflE(~k IFk-d} k-2 < 2E{ (L (if E(dIFk-d}
+ 2E{ (LI E(~k IFk-d } .
i= 1
The first expectation equa ls
k- 2 E{ (L (ifI(t2 < X k- 1
(2.2.72)
::;
i=1
t3)IE(Z~IFk-d}.
Write for an appropriate fun cti on r . Not e that du e to stat ionar ity r is the sa me for each k . Condit ion on Fk- 2 and use t he Mark ov property and Fubini 's Theorem to show t hat (2.2.72) is th e sa me as
E{(?= Vi ) Jr( x , X k- 2, . . .) lX k- 2
t3
2
1= 1
k_ 2 ( X
t2
k-2
JIE{ (?= Vi ) t3
2 r( x , X k- 2, . ..
t2
te
<
- 1fJ(X k- 2))dx}
) lXk _ 2 ( X
-
IfJ (X k- 2))} dx
1=1
k- 2 I {E(?= (i)4V
J
t2
1= 1
2
1
{E( r( x , X k- 2,' " )lXk_2(X - IfJ (Xk - 2))) } 2 dx ,
X
where the last inequ ality follows from the Cauchy-Schwarz inequ ality. Since the (i's form a centered martingale difference array, Burkholder 's inequ ality (Chow and Teicher : 1978, p. 384) and the moment inequality yield
k-2
E( L
(ir : ; KE(L (If::; K
i=1
k-2
i=1
(k - 2)2 E(t.
47
2.2.3. an - processes
We also have
where L 1 (t) = EZt I(Xo :::; t) ,
- 00 :::;
t :::;
00 .
Let , for - 00 :::; t :::; 00,
! t
L 2(t ) =
{E(r( X,Xk_2, · · ·) lXk_2(X-
- 00
Not e that due to stationarity, L 2 is the same for each k . It thus follows that (2.2.72) is bounded from the above by (2.2.73) The assumption (2.2.64) implies th at L 1 is a cont inuous nondecreas ing bounded fun cti on on the real line. Clearly, L 2 is also nondecreasing and cont inuous. We shall now show that L2(00) is finit e. For this , let h be a st rictly posi tive continuous Lebesgue density on the real line such that h(x) '" lxi -I -I) as x -t ± oo, where fJ is as in (2.2.65) . By Holder 's inequ ality,
!
00
L 2(00) :::; [
E(r (x ,Xk_2, . .. ) lx k_2(x -
]!.
- 00
Use th e assumpt ion (2.2.67) to bound one power of lXk_2 from the above so that the last integral in turn is less than or equal to
The finit eness of th e last exp ect ation follows, however , from assumption (2.2.65) . We now bound th e second expectat ion in (2.2.70) . Since Zk - l is measurable w.r.t. Fk - l , t here exists some fun ct ion s such that
2. Asymptotic Properties of W .E.P. 's
48
Put u = r s with r as before. Then we have, with the 0 as in (2.2.66) ,
E{ d-I E(~~ IFk -t} } =
E{ J (t2 < X k- l ::; t3)J(tl < X k- 2 ::; t2) U( X k- l , X k- 2,··· )} t3
J
E{ tu, < X k- 2 ::; t2) u( x , Xk- 2, · · . )
t2
6
< [G(t2) - G(tt}] I H ts
X
J 1~6 E
{
ul+ 8(x , X k- 2, · . . ) lk~~2 (x -
t2
Put , for - 00 ::; t ::; 00, t
£ 3(t ) =
J 1~6 E
{ u l +8 (x , X k- 2, · · . )lk~~2 (x -
- 00
Arguing as above, now let q be a positive continuous Leb esgue density on the real line such th at q(x) rv !xl - l - } as x -+ ± oo. By Holder 's inequ ali ty, £ 3(00)
[J
00
<
1
E{ ul+8(x ,Xk_ 2, ... )
Ik~~2(X -
- 00 1
< K {E(Ul+ 8(Xk_ 1 , .. . ) q-8(Xk_1))}I H 1
< K {E(Z~(1 +8 )Z~(1 +8 )q - 8 (Xt})}I H , where the last inequ ali ty follows from Hold er's inequality applied to cond it iona l expec tat ions. The las t expect a tio n is, however , finite by ass umption (2.2.66). Thus £ 3 is also a nond ecreasing continuous bounded fun cti on on t he real line a nd we obtain
49
2.2.3. an - processes
Upon combining this with (2.2.70) and (2.2.73) and summing over k = 2 to k = n , we obtain
n- 2
L
E{ (i (j ~n
i,j
< K {[L1 (t2) -
+ [G (t2) -
Ll (td] ~
[L2(t3) - L 2(t2]
G(tl )]1~" IL3(t3) - L 3(t 2}]}.
One has a similar bound for the second sum in (2.2.70). Thus summarizing we see that the sums in (2.2.70) satisfies Chents ov's crite rion for ti ghtness. For relevant details see Billingsley (1968: Theorem 15.6) . This completes the pr oof of Theorem 2.2.6. 0 The next theorem covers t he case of uniformly bounded {Zni }· In this case we can avoid th e moment condit ions (2.2.65) and (2.2.66) and repl ace the conditio n (2.2.67) by a weaker cond itio n. Theorem 2.2.7 Suppose the r.v. 's {Znd are un iformly bounded and {2.2.68} holds. In addit ion, suppose there exists a m easurabl e function sp from IR to IR such that the corresponding f am ily of f unctions { L y , y E 1R} has Lebesgue densities {ly , y E 1R} satisfying
(2.2.74)
J[E
f or some 0
li~8 (x -
1
> O. Th en also the con clusion {2.2.69} holds.
Proof. Proceed as in the proof of the previous theorem up to (2.2.71) . Now use the boundedness of {Zd and argue as for (2.2.73) to conclude that (2.2.71) is bo unded from the above by
K
f E{ (~ (i)' lx._, f (EI~ ({" )'t.{Ell~'
<
( x - 'I' (Xk-2) ) } dx
K
s
(x - '1'( X o )) }
< K (k - 2) [G (t2) - G(tl )] 1+" [f(t3) - r(t2)],
.~.
dx
2. Asymptotic Properties of W .E.P. 's
50
with 8 as in (2.2.74) . Here t
J{E ll~8(x
f(t) =
-
-00 :::;
t :::;
00 ,
- 00
Note that (2.2.74) implies that I' is strictly incr easing, continuous and bounded on R Similarly
E{ eLl E(~~IFk-d} <
KE{ eLl I(t2 < Xk-l :::; ts) } E{I(tl < X k- 2:::; t 2)E[ZL I I(t2 < X k- l:::; ts)!Fk -2]}
<
KE{ tu, < X k- 2 :::; t2)
[1:
3
lXk_2(X -
r
E{ tu, < X k- 2 :::; t2) lXk_2(X -
lt2
Upon combining the above bounds we obtain that (2.2.70) is bounded from the above by
Summation from k = 2 to k = n thus yields
n- 2
L
lE{eiej~n
: :; K [G(t2) -
G(tl)]1~<5 [f(ts) - r( t2)]'
i ,j
The rest of the details are as in the proof of Theorem 2.2.4. 0 Note : Theorems 2.2.6 and 2.2.7 first appeared in Koul and Stute (1999).
2.3
AUL of Residual W.E.P.'s
In this section we shall obtain the AUL (asymptotic uniform linearity) of residual W.E .P.'s. It will be observed that the asymptotic continuity property of the type specified in Theorem 2.2.1(i) is the
51
2.3. AUL of residual W .E.P. 's
basic tool to obtain this result . Accordingly let {Xnil , {Fnil , {H} and {L nil be as in (2.2.2 3) and define
n
L dniI(Xni ~ H -
.-
(2.3.1) Sd(t , u)
1(t)
+ C~iU) ,
i=l n
J-ld(t,U) .-
i=l Sd(t , u) - J-ld(t , u) ,
.-
Yd(t, u)
L dniFni(H - l (t ) + C~iU) , o~
t ~ 1, uEW ,
where {Cni' 1 ~ i ~ n} are p x 1 vectors of real numbers . We also need n
(2.3.2)
S~( x , u)
:=
L dnJ(Xni ~ x + C~iU) , i =l n
J-l~( x , u)
:=
L dniF~!i(X + C~iU) , i= l
YJ( x , u)
:=
S~(x , u) - J-l~(x , u) ,
-
00
~ x ~
00,
u E W.
Clearly, if H is strictl y increasin g then S~( x , u) = Sd(H( x) , u) . Similar rem ark applies to other fun ctions. Throughout the text, any W.E.P. with weights dni n - 1 / 2 will be indicated by the subscript 1. Thus, e.g., \j - 00 ~ x ~ 00 , u E W ,
=
n
1 2 n- /
(2.3.3) S~(x , u)
L tix;
~x
+ C~iU) ,
i=l n
1 2 n- /
L {I (X ni ~ x + C~iU)
-
Fni(x
+ c~iU)}.
i= l
Theorem 2.3.1 In addition to (2.2.24), (N1) , (N2), and (C*) as-
sume that d.f. 's {Fni , 1 ~ i ~ n } have densities {fni, 1 w.r.t. A such that the following hold: (2.3.4) (2.3.5)
sup Ifni(X) - fni (y )1= 0,
lim lim sup max
o-t O m ax 1,n
n
lS;iS;n Ix- ylS; o
Ilfnilioo
~ k
~ i ~
<
00 .
n}
52
2. Asymptotic Properties of W .E.P. 's
In addition, assume that (2.3.6)
0(1),
lJ!fln II Cni II = n IIdnicnill =
L
(2.3.7)
0(1) .
i==l
Then , for every 0 < b <
00 ,
n
(2.3.8)
sup ISd(t, u) - Sd(t, 0) - u'
L dnicniqni(t) I = Op(l) , i==l
where qni := fn iH-1, 1 ::; i ::; n, and the supremum is taken over o ::; t ::; 1, lIull::; b. Consequently , if H is strictly increasing on JR, then n
(2.3.9)
sup IS~(x, u) - S~(x, 0) - u'
L dnicndni(x)1 = op(l) . i==l
where the supremum is taken over
-00 ::;
x ::;
00 ,
lI uli ::; b.
Theorem 2.3.1 is a consequence of the following four lemmas. In these lemmas the setup is as in the theorem. In what follows, SUPt ,u stands for the supremum over 0 ::; t ::; 1 and lIuli ::; b, unless mentioned otherwise. Let , for 0 ::; t ::; 1, u E W , n
(2.3.10)
Vd(t)
:=
L dnicniqni(t) i==l
Rd(t, u) := Sd(t, u) - Sd(t , 0) - U'Vd(t).
Lemma 2.3.1 Under (2.3.4) - (2.3.7) , sup IfLd(t , u) - fLd(t, 0) - u' v d(t )1 = 0(1).
(2.3.11)
t ,u
Proof. Let On = bmaxi [c.] . By (2.3.4) , {Fd are uniformly differentiable for sufficiently large n, uniformly in 1 ::; i ::; n . Hence , the L.H.S. of (2.3.11) is bounded from the above by n
L i ==l
IIdi ci li max z
sup
lli( x) -li(y)1 = 0(1),
Ix-yl :SJn
by (2.3.4) , (2.3.6) and (2.3.7).
o
2.3. AUL of residual W.E.P. 's
53
Lemma 2.3.2 Under (Nl) , (N2 ) , {C*}, {2.3.4} - {2.3.7}, Vllull ::;
b, (2.3.12)
Proof. F ix a llu ll ::; b. The lemma will follow if we show
(i)
Yd(t , u ) - Yd(t , 0 ) = op(l) for each 0 ::; t ::; 1,
(ii)
V E > 0, and for a = u or a = 0 , lim lim sup P ( sup IYd(t , a ) - Yd(s, a ) 12:
6--+0
It- sid
n
E) = O.
Since Yd(-, O) = W;(-) of (2.2.23), for a = 0, (ii) follows from (2.2.25) of Coro llary 2.2.1. To verify (ii) for a = u , take 'T]i = H (X i e~u ) , 1 ::; i ::; n , in (2.2.1) . Then Yd(- , u) == Wd(-) of (2.2.1) and G i( ·) = Fi(H- 1 (-) + e~u ) , 1 ::; i ::; n. Moreover , by (2.3.4)-(2 .3.6), n
(2.3.13)
sup
L d;[Fi (H -
1
(t + 8) + e~ u ) - Fi(H- 1 + e~u) ]
09 :::1 - 6 i= 1
::; 2bk max lIeill
+
[Ld(t
sup
+ 8) -
Ld(t)],
0
l
0(1 ), as n ----+ 00, and th en e ----+ 0, by (C *). Hen ce, (C) is sa tisfied by the above {Gd . The other condit ions being (N l ) and (N 2) which ar e also ass um ed here, it follows t hat th e above {'T]i} and {Wd } satisfy the conditions of Theorem 2.2.1(i) . Thus (ii) for a = u follows from this theor em . To obtain (i) , not e that again by (2.3.4)-(2 .3.6) , n
Var [Yd(t , u) - Yd(t , 0)] <
L d;IF (H i
1
(t)
+ e~u)
-
Fi (H - 1 (t))l
i=1
<
bkm~xll eill = 0(1). l
This and th e Chebych ev inequ ality imply (i) and hence (2.3.12). 0
2. Asymptotic Properties of W .E.P. 's
54
To state and pro ve t he next lemma we need some more not ation. Let lini = Ilcnil\' 1 ::; i ::; n , and define for 0 ::; t ::; 1, u E W , a E JR, n
(2.3.14) Sd(t , u , a)
L dniI(X ni ::; H - 1(t ) + C~iU + alind , i= l
J-Ld(t , u , a) =
ESd(t , u , a),
Yd*(t , u ,a)
Sd(t , u , a) - J-Ld(t , u , a).
Lemma 2.3.3 Und er (Nl) , (N2 ) , (C *) , (2.3.4) - (2.3. 7), V 0, 0 < b <
00 ,
lal < 00
and
n
>
Il ull ::; b,
(2.3.15) lim lim su pP ( sup IYd*(t , u , a) - Y;(s , u , a)1 8~ O
E
It- sid
~ E) = O.
Proof. In T heor em 2.2.1(i) , take 17i = H (X i - c~ u - alid , 1 ::; i ::; n. T hen W d(·) = Yd*(- , u ,a) and G i(-) = Fi( H - 1(-) + ci , u + alid, l ::; i ::; n . Again simi lar to (2.3.13), sup
[Gd(t
+ J) -
Gd(t)] < 2k(b + a) max IIcili ~
O::;t::;1- 8
+
sup [Ld( t + J) - Ld(t )] O
0(1),
by (2.3.6) and (C *) .
o
Hence (2.3.15) follows fro m Theorem 2.2.1(i).
Lemma 2.3.4 Und er (Nl) , (N2 ) , (C*), (2.3.4}-(2.3. 7), V E > 0
< b < 00 , IIvll ::;
th ere is a J
>0
(2.3. 16)
lim supP ( sup IRd(t , u ) - Rd(t , v )1 n t,llu- vll::;8
suc h that for eve ry 0
b,
~ E) = 0,
where Rd is defined at (2.3. 10) .
Proof. Assu me, wit hout loss of generality, that d; ~ 0, 1 ::; i ::; n . For, ot herwise write d; = di+ - di:. , 1 ::; i ::; n, where {d i+ , di -} ar e as in the proof of Lemma 2.2.4. T hen Sd = Td+Sd+ - Td-Sd- , Rd = Td+Rd+ - Td- Rd- , where TJ+ = ~i(di+)2 , T1- = ~i(di_)2 . In view of (N l ), Td+ ::; 1,Td- ::; 1. Moreover , if {dd satisfy (N 2) and (2.3.7)
2.3. AUL of residual W.E.P.'s
55
above , so do {di+ ' di - } because dy+ V dL = dy+ + dL = dy, 1 ::; i ::; n. Hence the triangle inequality will yield (2.3.16), if proved for R d + and Rd- . But note that a., 1\ a.: ~ 0 for all i . Now Ilu - vII ::; 8 implies
~
Therefore, because di
0 for all i ,
S;t(t, v , -8) ::; Sd(t, u) ::; S;t(t, v , 8) for all t , yielding (2.3.18)
L 1(t ,u,v)
.-
S;t(t, v , -8) - Sd(t ,v - (u - V)'Vd(t)
< Rd(t, u) - Rd(t ,v) < S;t(t,v , 8) - Sd(t,v) - (u - V)'Vd(t) -'
L 2 (t, u , v) .
We shall show that there is a 8 > 0 such that for every (2.3.19)
p (
sup \Lj(t , u , v)\ t,llu-vll:So
~ E)
= 0(1),
Ilvll ::; b,
j = 1,2.
We shall prove (2.3.19) for L 2 only as it is similar for L 1 . Obs erve that
(2.3.20)
L 2 (t , u , v) ::; IYd*(t , v , 8) - Yd'(t,v,O)1
+1J-l:J(t , v , 8) - J-l:J(t , v , 0)1
+ I(u -
The Mean Value Theorem, (2.3.4) , (2.3.5) and
V)'Vd(t)\
Ilu - vII ::; 8 imply n
(2.3.21)
sup 1J-l:J(t, v , 8) - J-l:J(t , v , 0)1
< 8k
L
IldiCill,
i == l
n
sup t
I(u -
v)'vd(t)1
< k8
L i == l
Iidicill-
2. Asymp tot ic Properties of W .E.P. 's
56
Let M(t) den ot e t he first t erm on the R. H.S. of (2.3.20) . I.e .,
M(t) = Yd*(t , v , 0) - Yd*(t , v , 0) ,
0 ~ t ~ 1.
We shall first prove that
(2.3.22)
sup IM(t) 1 = op( l) . t
To begin wit h , n
Va r(M(t))
<
L d;[Fi(H -
<
s» max Iii ,
by (2.3.4), (2.3.5) ,
0(1),
by (2.3.7) .
1
(t ) + c~v + <5kd - Fi(H - 1 (t ) + c~v)]
i=l 2
Hence
(2.3.23)
M(t) = op(l) ,
VO ~t ~1.
> 0,
Next, note t hat , for a , sup IM (t ) - M( s)1 It-s l<,
<
sup IYd*(t , v , 0) - Yd*( s, v , <5)1 It- sl< , + sup IYd*(t , v , 0) - Yd*( s, v , 0) 1· It-sl <,
Apply Lemma 2.3.3 twic e, on ce with a = <5 and on ce with a = 0, to obtain that V E > 0, lim lim sup P ( sup IM (t ) - M( s)1 n It-s l<,
, -t o
~ E) = O.
T his and (2.3.23) prove (2.3 .22) . Now cho ose <5 > 0 so t hat n
(2.3.24)
lim sup <5k n
L
II di cdl ~ E/3.
(Use (2.3.7) here).
i =l
From t his , (2.3.20) , (2.3.21) and (2.3.22) one readily obtains limsup P ( n
sup t,lIu- vll:<:;J
IL 2 (t , u , v )1 ~ E )
~ limnsup P (S~PIM(t) 1 ~ E/3)
= O.
2.3. AUL of residual W .E.P. 's
57
This prove (2.3.19) for L 2 . A similar argument proves (2.3.19) for L 1 with the same 8 as in (2.3.24), thereby completing the proof of the Lemma. 0 Proof of Theorem 2.3.1. Fix an E > 0 and choose a 8 > 0 satisfying (2.3.24) . By the compactness of the ball N(b) := {u E W ; Ilull :::; b} there exist points VI ,' " vr in N(b) such that for any u E N(b) , lIu - vjll :::; 8 for some j = 1,2"" , r . Thus
lim sup P(sup IRd(t , u)1 2:: E) n t ,U r
:::; L
limsupP( n
j=l
sup IRd(t , u - Rd(t , vj)1 2:: E/2) t,II U - V jll ::;J
r
+ LlimsupP(suPIRd(t ,vj)l2:: j =l
n
E/2) = 0
t
o
by Lemmas 2.3.2 and 2.3.4.
Remark 2.3.1 A reexamination of the above proof reveals that
Theorem 2.3.1 is a sole consequence of the continuity of certain W .E. P.'s and the smoothness of {Fnd . It do es not use the full force 0 of the weak converg ence of these W .E.P.'so Remark 2.3.2 By the relationship
and by Lemma 2.3.1, (2.3.8) of Theorem 2.3.1 is equivalent to sup IYd(t, u) - Yd(t, 0)1 = op(l). o::;t::;l,llull::;b
(2.3.25)
This will be useful when dealing with W .E .P.'s based on ranks in Chapter 3.
0
The above theorem needs to be ext ended and reformulated when dealing with a linear regression model with an unknown scale parameter or with M - estimators in the presence of a preliminary scale estimator. To that end , define , for x , S E~, 0:::; t :::; 1, u E ~p , n
Sd( S, t,
u)
:=
L dniI(Xni < (1 + sni=l
1j 2
)H - 1 (t ) + C~iU) ,
2. Asymptotic Properties of W .E.? 's
58 n
S~(s ,x , u)
L dniI (X ni :::; (1 + sn -1 /2 ) x +C~iU ) ,
:=
i= l
and define Yd(s , t, u ), f-Ld( S, t, u ) simi larly . We are now ready to prove
Theorem 2.3.2 I n addition to th e assumptions of Th eorem 2.3.1, assume that (2.3.26)
max sup Ixfni(x) 1:::; k l ,n
x
< 00 .
T hen (2.3.27)
su p ISd(S, i, u ) - Sd(O, t , 0) n
I
- L dnd sn - 1/ 2H - 1(t ) + C~iU} qni (t) = op(1). i= l
where the supremum is tak en over
lsi :::; b, Ilull :::; b, 0 :::; t
:::; 1.
Consequently, if H is stricilsi increasing for all n 2: 1, th en (2.3.28) sup
IS~(s , x , u ) - S~(O, x , 0) n
- L dnd sn- 1 /2x +c~iu } fni (x) 1 = op(1 ). i =l
where th e supremum is taken over
lsi:::; b, lI uli :::; b and
x E IR.
Sketch of Proof. T he argument is qu ite similar to t hat of T heorem 2.3. 1. We briefly indica te the modifica t ions of t he prev ious proof. An analogue of Lemma 2.3. 1 will now assert sup If-Ld( S, t, u ) - f-Ld(1, i, 0)
I
- {(n - 1/ 2"L,diqi(t )H- 1(t)) s + U' Vd(t )} = 0(1). T his uses (2.3.4), (2.3.5), (2.3 .6) , (2.3.7) , (2.3 .26) and (N l) . An analogue of Lemma 2.3.2 is obtained by applying T heor em 2.2.1 (i) to 7] := H (X i - C~u) O"; l ) , 1 :::; i :::; n , O"n := (1 + sn- 1/ 2). T his states t hat for every lsi :::; b and every Il ull :::; b, sup IYd (s, t , u ) - Yd(s , t, 0)1= op(1).
O
59
2.3. AUL of residual W.E.P. 's In verifying (C) for these {TJi} , one has an analogue of (2.3.13):
+ 0) -
sup [Gd(t 099- 8
Gd(t)]
:S 2k{B max Ilcill + bn- 1/ 2 } + sup [Ld(t + 0) - Ld(t)]. 099-8
z
Note that here Gd(t) = L:~=l d; Fi((]"nH-1(t) + c~u) . One similarly has an analogue of Lemma 2.3.3. Consequently, from Theorem 2.3.1 one can conclude that for each fixed s E [-b ,b], (2.3 .29)
sup IRd(S , t , uj] o::;t::;l,llull ::;b
= op(l) ,
where Rd(S , t , u) equals the L.R .S. of (2.3 .27) without the supremum. To complet e the proof, once again exploit the compactness of [-b ,b] and the monotonic structure that is present in 3d and fJd. Details 0 are left for int erest ed read ers. Consider now the specialization of Theor ems 2.3.1 and 2.3.2 to the case when Fni F , F a d .f. Not e that in this case (N1) implies that Ld(t) = t so that (C *) is a priori satisfied. To state these spe cializations we need the following assumptions:
=
(F1)
F has uniformly con ti n uous density f w.r.t. A.
(F2)
f > 0 , a .e.
(F3)
sUPx EIR IXf( x)1
A.
Note that (F1) implies that F is strictly increasing.
:S k < 00.
f is bounded and that (F2) impli es that
Corollary 2.3.1 LetXn1,' " , X nn be i. i.d. F. In addition, suppose that (N1) , (N2) , {2.3.6}, {2.3.1} and (1'1) hold. Th en {2.3.8} holds with qni = f(F- 1) . If, in addition, (1'2) holds, th en {2.3.9} holds with f ni
=f .
0
Corollary 2.3.2 Let X n1,' " , X nn be i. i.d. F . In addition, suppose that (N1), (N2) , {2.3.6}, {2.3.1}, (1'1) and (1'3) hold. Th en
{2.3.21} holds with H
=F and qni =f(F -
1
).
If, in addition, (1'2) holds, th en {2.3.28} holds with fn i
= f.
0
60
2. Asymptotic Prop erties of W .E.P. 's
We sha ll now ap ply the above results to the mod el (1.1.1) and the {Vj} - pro cesses of (1.1.2). The resul ts thus ob tained are useful in studying the asymptoti c distributions of certain goodness-of-fit t ests and a class of M-estimators of {3 of (1.1.1) when t here is an unknown scale par am et er also. We need the followin g ass umpt ion abo ut the design matrix X.
This is Noether 's cond it ion for t he des ign matrix X . Now, let (2.3.30) A
.-
(X ' X) - 1/2,
D := XA ,
q' (t ) .-
(qn 1(t), · · · ,qnn(t )),
A (t ): = diag(q (t )) ,
r 1 (t) .r 2 (t ) .-
AX' A (t )XA , n - 1/ 2 H - 1(t )D'q (t ),
0 ~ t ~ 1.
Wri te D = ((d i j ) ) , 1 ~ i ~ n , 1 ~ j ~ p , and let d U) denote t he j -th column of D . Note t hat D'D = I p x p . This in t urn implies t hat (2.3.31)
(N l) is sa tisfied by d (j) for all 1
~
j
~ p.
Mor eover , with a U) denoting the j -t h column of A , (2.3.32)
max dI2) . i p
m?-x(x' a U))2 ~ max I)x~aU ) ) 2 I
I
mfxx;
(t
j=l
a (j)a(j))
Xi
) =1
m?-x x;( X ' X)- l X i = 0(1),
by (N X) .
I
Let n
(2.3.33)
Lj (t ) :=
L dTj Fi( H - 1(t)) , i= l
We are now read y to state
0 ~ t ~ 1, 1 ~ j ~ p.
61
2.3. AUL of residual W.E.P. 's
Theorem 2.3.3 Let {(x~i,Yni),1 SiS n} ,,B ,{Fni,1 SiS n} be as in the model (1.1 .1) . In addition, assume that {Fnd satisfy (C*) is satisfied by each L j of (2.3.33) ,
(2.3.4), (2.3.5) and that 1
S
j
S p.
Then , [or every 0 (2.3.34)
< b < 00 ,
sup IIA{V(H- 1 (t) ,,B + Au) =
op(I).
where the supremum is over 0 S t S 1,
Ilull S
b.
If, in addition, H is strictly increasing for all n every 0 (2.3.35)
fdt)ull
V(H- 1(t),,Bn -
< b<
~
1, then, for
00 ,
sup IIA{V(x ,,B + Au) -
where the supremum is over
V(x,,Bn - f 1 (H(x))ull = op(I).
-00 :::;
Theorem 2.3.4 Suppose that obey the model
x S
00 ,
Ilull S
{(x~i ' Y ni) , 1
b.
SiS n} and,B E
ffi.P
(2.3.36) with {Eni} independent r.v. 's having d.f. 's {Fnd . Assume that (NX) holds. In addition, assume that (2.3.4), (2.3.5), (2.3.26) are satisfied by {Fnd and that (C*) is satisfied by each L j of (2.3.33}, 1 S j S p . Then for every 0
< b < 00 ,
(2.3.37)
A [V(aH- 1(t) ,,B + Au-r) - V(rH- 1(t) ,,B)] =
T 1 ( t) u - I' 2 ( t) v
where v := n 1 / 2(a - ,),-1, a
+ up ( 1) ,
> o.
If, in addition, H is strictly increasing for every n (2.3.38)
A [V(ax ,,B =
+ Au-y] -
~
1, then
V(rx , ,B)]
f 1 (H(x)) u - f 2 (H (x )) v
+ u p (I ).
In (2.3.37) and (2.3.38}, u p (l ) are a sequences of stochastic processes in (t , x , u , v) tending to zero uniformly in probability over 0 S t S
1,
-00
Sx S
00,
Ilull S band Ivl S
b.
2. Asymptotic Properties of W.E.P. 's
62
Proof of Theorem 2.3.3. Apply Theorem 2.3.1 to Xi = Yi x~{3 , c, = x~A , 1 ::; i ::; n . Then F; is the d.f. of Xi and the j -th components of AV(H- 1(t) ,{3 + Au) and AV(H- 1(t) ,{3) are Sd(t, u) , Sd(t, 0) of (2.3.1), respectively, with d i = d i j of (2.3.30) , 1 ::; i ::; n, 1 ::; j ::; p . Therefore (2.3.34) will follows by p applications of (2.3.8), one for each dUl' provided the assumptions of Theorem 2.3.1 are satisfied. But in view of (2.3.31) and (2.3.32), the assumption (NX) implies (Nl), (N2) for dUl' 1 ::; j ::; p. Also, (2.3.6) for the specified {cd is equivalent to (NX). Finally, the C-S inequality and (2.3.31) verifies (2.3.7) in the present case. This makes Theorem 2.3.1 applicable and hence (2.3.34) follows. 0 Proof of Theorem 2.3.4. Follows from Theorem 2.3.2 when applied to Xi = (Yi -x~{3h -1, c~ = x~A, 1 ::; i ::; n , in a fashion similar 0 to the proof of Theorem 2.3.3 above. The following corollaries follow from Corollaries 2.3.1 and 2.3.2 in the same way as the above Theorems 2.3.3 and 2.3.4 follow from Theorems 2.3.1 and 2.3.2. These are stated for an easy reference later on . Let C(b) := {s E W , IIA -l(S - {3)II ::; b} . Corollary 2.3.3 Suppose that the model (1.1.1) with F n i == F holds. Assume that the design matrix X and the d.f. F satisfying (NX) and (Fl) . Then , V 0
(2.3.39)
< b < 00,
A [V(F- 1(t) ,s) - V(F- 1(t),{3)] =
f(F- 1(t))A -l(s - {3) + up(1) ,
where u p(1) is a sequ enc e of stochastic processes in (t , s) tending to zero uniformly in probability over 0 ::; t ::; 1; sEC (b) . If, in addition, F satisfies (F2), then
(2.3.40)
IIA{V(x,s) - V(x ,{3)} - f(x)A -l(s - {3)II = up(1) ,
where u p(1) is a sequence of stochastic processes in (x, s) tending to zero uniformly in probability over
-00 ::;
x ::; 00 ; s E C(b).
0
Corollary 2.3.4 Suppose that the model (2.3. 36) with F n i F holds and that the design matrix X and th e d.f. F satisfy (NX),
2.4. Some additional results for W .E.P. 's
63
(Fl) and (F3). Then {2.3.37} holds with f(F- 1(t))Ip x p ,
(2.3.41) f 1 (t)
o:::; t :::; 1.
F- 1 (t)f(F- 1 (t) )AX'l,
f 2 (t )
If, in addition, F satisfies (F2), then {2.3.38} holds with rj(H)
= rj(F) ,j = 1,2,
i.e.,
A [V(ax ,,8 + Au-y) - V(--yx ,,8)] = f(x)u - xf(x)v
+ u p (l ), o
where v is as in {2.3.37} and u p (l ) is as in {2.3.38}.
We end this section by stating an AUL result about the ordinary residual empirical processes H n of (1.2.1) for an easy reference later on. Corollary 2.3.5 Suppose that the model {1 .1.1} with Fn i
=F holds .
Assume that the design matrix X and the d.f. F satisfying (NX) and
(Fl). Th en, V 0 < b <
00,
(2.3.42) n
=
1
f(F - (t ))
· n- 1 / 2
LX~iA . A -1(8
-
,8)1
+ u p (l ),
i=1
where u p (l ) is as in Corollary 2.3.3. If, in addition, F satisfies (F2), then, V 0
< b < 00 ,
n
=
f(x)· n- 1 / 2 LX~iA . A -1(8 -,8)
+ up(l),
i=1
where u p (l ) is as in {2.3 .40}.
Proof. The proof follows from Theorem 2.3.1 by specializing it to the case where d n i n -1 /2 and the rest of the entities as in the proof of Theorem 2.3.3. 0
=
Note: Ghosh and Sen (1971) and Koul and Zhu (1991) prove an almost sure version of (2.3.40) in the case p = 1 and p
> 1, respectively.
0
2. Asymptotic Properties of W.E.P. 's
64
2.4
Some Additional Results for W.E.P.'S.
For t he sake of general interest , here we state some further results abo ut W .E.P. 's. To begin with, we have
2.4.1
Laws of the iterated logarithm
In this subsec tio n, we ass ume that (2.4.1)
Define n n _ L d2 L di{ I (7]i ::::; t ) - Gi(t)} , (In2.'i, i=l i=l .- u; (t )/ {2(J; £n£n(J;} 1/2, » 1, t ::::; 1.
.-
(2.4.2) Un(t ) ~n (t)
>
°:: ;
°:: ;
Let r (s, t) := s /\ t - st , s, t ::::; 1, and H( r) be the reproducing kern el Hilb ert space gene rate d by the kernel r with II . li T denoting t he associated norm on H (r). Let
K = {f
E H( r) ; IlfilT:: :; I} .
Let p denote the uniform metric. Theorem 2.4.1 If7]1 ,7]2, '" are i.i .d. uniform on [0,1] r.v.'s an d d 1, d 2, . . . are any real numbers sa tisfying 2 · 1Im(Jn = n
.
00 ,
2
hm( max d i n
1:S1:Sn
)
£n£n(J; 2 (In
= 0,
th en P(P (~n , K) ~
°
an d the se t of lim it points of {~n } is K) = 1.0
This theorem was proved by Vanderzanden (1980 , 1984) usin g some of the results of Ku elbs (1976) and certain martingale properties of ~n'
Theorem 2.4.2 Let 7]1, fJ2, . . . be in dependen t no n n egat ive ran dom variables. Let {d i } be any real numbers. T hen lim sup sUP (J~ l IUn (L ) 1 n
t >O
< 00 a.s ..
o
2.4. Some additional results for W.E.P. 's
65
A proof of this appears in Marcus and Zinn (1984). Actually they prove some other interesting results about w.e.p.'s with weights which are r .v.'s and functions oft. Most of their results, however, are concerned with the bounded law of the iterated logarithm. They also proved the following inequality that is similar to, yet a generalization of, the classical Dvoretzky - Kiefer - Wolfowitz exponential inequality for the ordinary empirical process. Their proof is valid for triangular arrays and real r.v .'s. Exponential inequality. Let X n1, X n2 , ' " ,Xnn be independent r.v. 's with respective d.f. 's Fn1, ... , Fnn and {dnd be any real numbers satisfying (Nl) . Then, V .\ > 0, V n ~ 1, p (sup xE IR
t
i=1
dni{I(Xni ::; x) - Fni(X)}
~.\)
< [1 + (81f)1 /2.\] exp( _.\2 /8) .
o
The above two theorems immediately suggest some interesting probabilistic questions. For example, is Vanderzanden's result valid for nonidentical r. v.'s {'l7d? Or can one remove the assumption of nonnegative {'l7d in Theorem 2.4.1?
2.4.2
Weak convergence of W.E.P.'s in 1)[0, I]", metric and an embedding result.
III
pq -
Next, we state a weak convergence result for multivariate r.v.'s. For this we revert back to triangular arrays. Now suppose that 'l7ni E [0 , 1]P , 1 ::; i ::; n , are independent r.v.'s of dimension p. Define n
(2.4.3)
Wd(t)
:=
L dni{I('l7ni ::; t) -
Gni(t)}, t E [0, 1]P .
i=1 Let c.; be the j-th marginal of Gni , 1 ::; i ::; n, 1 ::; j ::; p. Theorem 2.4.3 Let {'l7ni, 1 ::; i ::; n} be independent P: variate r.v. 's and {dnd satisfy (Nl) and (N2) . Moreover suppose that for each 1 ::; j ::; p , n
L
lim lim sup sup d;i{Gnij(t 0-+0 n 09::;1-0 i=1
+ 8) -
Gnij(t)} = 0.
66
2. Asymptotic Properties of W.E.P.'s
Th en, f or every to >
°
2. Moreover, W d ==;> som e W on (V[O , l]P,p) if, and only if,
fo r each s, t E [O ,l]P, Co v( Wd( S), W d(t )) -+ Co v( W( s) , W (t) ) =: C(s , t ). In this case W is necess arily a Gaussian process, P (W E C[O, I]" ) = 1, W (O ) = = W(l ).
°
o
Theor em 2.4.3 is essent ially proved in Vanderzanden (1980) , using results of Bickel a nd Wi chura (1971) . Mehra and Rao (1975) , Withers (1975), and Koul (1977), among ot hers, obtain t he weak convergence resul ts for { Wd } - processes when {17nd are weakl y dep endent. See Dehling and Taqqu (1989) and Koul and Mukherj ee (1992) for similar resul ts when {1Jnd are lon g range dep endent . Shor ack (1971) pr oved the wea k convergence of W d/q - process in the p-met ric, wher e q E Q, with
Q :=
{q ,q a cont inuous fun ction on [0, l],q q(t) t , C 1/ 2q(t ) .j,. for 0
~ O,q(t)
= q(l -
< t < 1/2, fal q-2(t )dt
<
t) ,
oo}.
Theorem 2.4.4 Suppo se that 1Jnl , . .. , 1Jnn are in dependent random variables in [0,1] with respective d.f. 's G n1, ' " G nn such that n
n-
1
L Gni (t ) = t ,
0 ~ t ~ 1.
i= l
In addition, suppose that {dnd sati sf y (Nl ) and (B) of Theor em 2.2 .2. Th en, (i) \/ to > 0, \/ q E Q , lim lim sup P ( sup 8--+ 0
n
It- sid
IWqd(t()t)
I
- W d((s)) > to) = O. q s
(ii) q- l W d ==;> «:' W , W a continuous Gaussian process with 0 covariance function C if, and only if Cd -+ C.
2.4. Some additional results for W .E.P. 's
67
Shorack (1991) and Einmahl and Mason (1991) proved t he following embedding result.
Theorem 2.4.5. Suppose that rJnl , . . . Tln n. are i. i. d. Uniform [0 , 1] r.v. 'so In addition, suppose that {dni} satisf y (Nl) and that n
n
Ldni = 0, i=l
n
L d; i = 0(1) . i=l
Th en on a rich enough probability space there exist a sequence of versions Wd of the processes W d and a fixed Brownian bridge B on [0, 1] such that sup n V IWd(t) - B(t)1 l/n:::;t:::;l -l /n {t(l - t) P /2-v
= Op(l) ,
for all 0 ::; v < 1.
Th e closed interval 1/ n ::; t ::; 1 - 1/ n may be replaced by the open 0 interval min{rJnj ; 1 ::; j ::; n } < t < max{rJnj ; 1 < j ::; n }.
2.4.3
A martingale property
In this subsection we shall prov e a mar tingale pr op erty of W. E.P.'s. Let X n1, X n2, · · · , X nn be indep end ent real r.v .'s with res pect ive d .f.'s Fn1,· · · , Fnn ; dn1,· · · , dnn be real numbers . Let a < b be fixed real numbers. Define n
Mn(t) := L dni{I(Xni E (a, t ]}{ 1 - Pni(a, t]}i=l
1,
n
Rn(t)
:=
Ldni{I(Xni E (t ,b] - Pni(t ,b]}{1- Pni (t, b]}- l , t E JR, i=l
where
Pni(S, t] := Fni(t) - Fni( s) , 0 ::; s ::; t ::; 1, 1 ::; i ::; n . Let T 1 C [a , oo),T2 C (- oo,b] be such that M n (t)[R n (t)] is welldefined for t E T1[t E T 2] . Let
F1n(t) := a - field{I(Xni E (a, sD, a::; s ::; t , i = 1, · · · ,n}, t E T 1, F2n(t) :=a-field{I(XniE( s ,bD, t ::; s ::;b, i = 1, · · · ,n }, tET2.
2. Asymptotic Properties of W.E.P. 's
68
Martingale Lemma. Under the above setup, 'in 2: 1, {Mn(t) , Fln(t) , t E Tt} is a martingale and {Rn(t) , F2n(t), t E T 2} is a reverse martingale. Proof. Write qi(a, s] = 1 - Pi(a, s]. Because {Xd are independent , for a ::; s ::; t ,
E{ Mn(t)IFln(S)} n
Ldi1(Xi E (a,s]) i= l
x
[E{ [I (Xi EE (a, t]) - Pi(a, t]] IXi E (a, s]} qi(a, t]
+f(Xi ~ (a, sl )E{ [f(Xi E (a, t) -
p,(a, tl]
[x, ~ (a, sl] }]
n
Ldi {1(Xi E (a,s]) i=l
+
1(X.I 5'=, d (a s])
i=l
d,
' ,
- Pi(a, t l }
}
qi(a, t] 1(Xi
n
L
{Pi(S q-(a ,ts
E
(a,s]) -qi(a ,s] ()
qi a,s
A similar argument yields the result about Rn .
= M n (s).
o
Note; The above martingale lemma is well known when the r.v.'s {X ni} are i.i.d. and dn i == n- 1 / 2 . In the case {X n ;} are i.i.d. and {d n i } are arbitrary, the observation about {M n} being a martingale first appeared in Sinha and Sen (1979). The above martingale Lemma appears in Vanderzanden (1980, 1984). Theorem 2.4.1 above generalizes a result of Finkelstein (1971) for the ordinary empirical process to W .E.P.'s of i.i.d. r.v.'s .. In fact , the set K is the same as the set K of Finkelstein . DO
3 Linear Rank and Signed Rank Statistics 3.1
Introduction
Let {X ni , Fnd b e as in (2.2.23) and {cnd be p x 1 real vectors. The rank and the ab solu te rank of the i th residual for 1 ::; i ::; n , U E W , ar e defined , resp ectively, as n
(3.1.1)
Riu
=
L
I (X nj - U'Cnj ::; X ni - U'Cni),
j= l n
Rtu
= L I (IX nj -
u ' cnj l ::; IXn i - u 'cni l)'
j= l
Let sp be a nondecreas ing real valued fun cti on on [0, IJ and define
for
U
E W , where
cp+(s ) = cp((s + 1)/2 ), 0 ::; s ::; 1;
s(x) = I (x > 0) - I (x < 0).
H. L. Koul, Weighted Empirical Processes in Dynamic Nonlinear Models © Springer-Verlag New York, Inc 2002
70
3. Linear Rank and Signed Rank Statistics
The processes {Td(cp, u), u E Il~.P} and {Tt(cp , u) , u E ffi.p} are used to define rank (R) estimators of f3 in the linear regression model (1.1.1). See, e.g., Adichie (1967), Koul (1971) , Jureckova (1971) and Jaeckel (1972). One key property used in studying these R-estimators is the asymptotic uniform linearity (AUL) of Td(CP , u) and Tt(cp , u) in u over bounded sets. Such results have been proved by Jureckova (1969) for Td(cp, u) for general but fixed functions ip ; by Koul (1969) for Tt(I, u) (where I is the identity function) and by van Eeden (1971) for Tt (cp , u) for general but fixed cp functions. In all of these papers {X n i } are assumed to be i.i.d . In Sections 3.2 and 3.3 below we prove the AUL of Td(cp , .), Tt(cp , .), uniformly in those cp which have Ilcplltv < 00 , and under fairly general independent setting. These proofs reveal that this AUL property is also a consequence of the asymptotic continuity of certain W .E.P.'s and the smoothness of {Fnd . Besides being useful in studying the asymptotic distributions of R-estimators of f3 these results are also useful in studying some rank based minimum distance estimators, some goodness-of-fit tests for the error distributions of (1.1.1) and the robustness of R-estimators against certain heteroscedastic errors.
3.2
AUL of Linear Rank Statistics
At the outset we shall assume
(3.2.1) cp E C
:=
{cp : [0, 1J -+ JR, cp E VI[O , 1], with
Ilcplltv :=
cp(1) - cp(O) = 1}.
Define the W.E.P. based on ranks, with weights {dnd , n
(3.2.2) Zd(t , u)
:=
L dnJ(R
i u ::;
nt) , 0::; t ::; 1, u E JRP .
i=1
Note that with ndn
(3.2.3) Td(.p; u)
= 2::7=1 c:
J -J
cp(nt j(n + 1))Zd(dt, u) Zd((n + 1)tjn, u)dcp(t) + ndncp(1) .
3.2. AUL of linear rank statistics
71
This representation shows t hat in order to prove the AUL of Td(cp , ') , it suffices to prove it for Zd(t ,' ), uniformly in 0 ::; t ::; 1. Thus, we shall first prove t he AUL prop erty for the Zd-process. Define, for x EIR, O ::;t ::;1 , u E W , n
(3.2.4)
Hnu(x )
:=
n -
1
L tix;
-
C~iU
::;
x ),
i= l
n
.-
n-
1
L
Fni( x
+ C~iu) ,
i= l
inf{ x ; Hnu( x) 2: t} , inf{ x; Hu(x) 2: t} . Not e that H o is the H of (2.2.23) . We shall write t hat for any d.f. G ,
tt;
for H no. Recall
G( G- 1(t)) 2: t , 0 ::; t ::; 1 and G-1 (G(x )) ::; x , x E lR. This fact a nd t he relati on n Hnu(X i t ::; 1, 1 ::; i < n ,
-
c~u )
== Mu yield that V 0 ::;
For t echnica l convenience , it is desirable to cente r t he weigh ts of lin ear rank statistics appropriately. Accordingly, let (3.2.6) Then , with Zw den oting the Zd wh en weights are {Wni} ,
Hen ce, for all 0 ::; t ::; 1, u E IRP , (3.2 .7) Nex t define, for ar bit rary real weights {dnd , and for 0 ::; t 1, U E IRP, n
(3.2.8)
Vd(t , u )
:=
L dni I (Xni i= l
C~iU ~ H;~ ( t )) .
<
3. Linear Rank and Signed Rank Statistics
72
By (3.2.5) and direct algebra, for any weights {dnd ,
(3.2.9)
sup IZd(t, u ) - Vd(t , ul] ::; 2 max Idnil · f ,u
t
Cons ider t he condit ion
(N 3) In view of (3.2.7) and (3.2.9), (N 3) implies t hat t he problem of proving t he AUL for t he Zd-process is reduced to provin g it for t he Vw process. Next , recall the definition (2.3.1) and define
(3.2.10) Note t he basic decomposition : for any real numbers {d nd and for all 0 ::; l ,u E IRP,
i -:
provided H is strict ly increas ing for a ll n 2 1. This decomposit ion is basic to t he following proof of t he AUL prop erty of Zd. Theorem 3.2.1 Suppose that {Xni , Fnd satisfy (2.2.24) , (N 3) holds, and {end satisfy (2.3.6) and (2.3.7) with d ni = Wni ' I n addition, assume tha t (C*) holds with d ni = Wni , H is strictly in creasing, the densities {J nd of { Fnd satisfy (2.3.5) an d that
(3.2.12)
lim lim sup max
sup
n
IH( x )- H(y)l <J
o-tO
Th en , f or eve ry 0
(3.2.13)
t
Ifni (X) - f ni (y )1=
o.
< b < 00 ,
sup ITw(t , u ) - Yw(t , 0 ) - J1w(H H;;J(t ), 0 ) + J1w(t, 0 ) I = op(I) ,
whe re . the supremum is being taken ov er 0 ::; t ::; 1,
Hull::; b.
Befor e proceeding to p rove t he t heo re m , we prove t he followin g lemma which is of indep endent int erest . In t his resul t , no ass umptio ns ot her t han independence of {Xnd are be ing used .
3.2. AUL of linear rank statistics
73
Lemma 3.2.1 Let H , H n, H u and H nu be as in (3.2.4) above. Assuming only (2.2.24), we have
IIHn
(3.2.14)
-
Hlloo -+ 0,
a.s ..
If, in addition, (2.3.6) holds and if, for any (3.2.15)
°< b <
IH(x) - H(y)1 -+ 0,
sup
(m n
Ix-yl:S2m n b
00 ,
= max Ileill), ~
then, (3.2.16)
sup IHnu(x) - Hu(x)1 -+ 0, a.s .. Ixl
Proof. Note that Hn(x) - H( x) is a sum of centered independent Bernoulli r.v .'s . Thus E[Hn(x) - H(x)J4 = O(n- 2 ) . Apply the Markov inequality with the 4t h moment and the Borel - Cantelli lemma to obtain
IHn(x) - H(x)1 -+ 0, a.s., for every x E lit Now proceed as in the proof of the Glivenko - Cantelli Lemma (Loeve (1963), p. 21) to conclude (3.2.14). To prove (3.2.16), note that Ilull ::; b implies that
The monotonicity of H nu and H u yields that for
lIull ::; b,
Hn(x - bm n) - H(x - bm n) + H(x - bm n) - H(x
x E JR,
+ bmn)
::; Hnu(x) - Hu(x) ::; Hn( x
+ bmn) -
+ bmn) +H(x + bmn) H(x
H( x - bmn) .
Hence (3.2.16) follows from (3.2.15) and the following inequality: L.H.S. (3.2.16) ::; 2 sup IHn(x) - H(x)1 Ixl
+
sup
IH(x) - H(y)l .
Ix-yl:S2mn b
o
74
3. Linear Rank and Signed Rank Statistics
Proof of Theorem 3.2.1. From (3.2.11), for all 0 :; t :; 1, u E JRP,
Tw(t , u)
=
[Yw(HH;;~(t) , u) - Yw(H H;;~(t), OJ +[Yw(H H;;~ (t) , 0 - Yw(t, O)J +Yw(t , O) - [J.L w(t, u) - J.L w(t, 0) - u'vw(t)J +[J.Lw(H H;;~ (t), u) - J.L w(H H;;~ (t), 0) -u'vw(HH;;~(t))J
+J.Lw(H H;;~ (t) , 0) - J.Lw(t , 0)
+ u'vw(t).
Therefore L.H.S. (3.2.13)
< sup IYw(t, u) - Yw(t, 0) - Y(t , 0)1
+ sup IYw(HH;;~ (t) , 0) - Y(t , 0)1 + 2 sup lJ.Lw(t , u) - J.L w(t , 0) - u'vw(t)1 + sup lu'[vw(HH;;~(t)) - v w(t)J1 = Al + A 2 + A 3 + A 4 , say ,
(3.2.17)
where, as usu al, the supremum is being taken over 0 :; t :; 1, lIull :; b. In what follows, the range of x and y over which the supremum is being taken is JR, unless specifi ed otherwise. Now, (2.3.5) implies that IH(x) - H(y)1 :; Ix - ylk . This and (2.3.6) together imply (3.2.15). It also implies that sup Ifni(Y) - fni( X)1 :; Ix - y l<8
Ifni(Y) - fni( X)I ·
sup IH(x) -H (y) l
for all 1 :; i :; n and all fJ > O. Hence, by (3.2.12) , it follows that {fnd satisfy (2.3.4) . Now apply Lemma 2.3.1 and (2.3.25) , with dni = Wni, 1 :; i :; n , to conclude that
(3.2.18)
j = 1,3.
Next , observe that x ,u
+supIHu(x) - H( x)l , x ,u
sup IHu( x) - H( x)1 :; sup IH(x x ,u
x
+ mnb) - H( x - mnb)l.
3.2. AUL of linear rank statistics
75
Hence, in view of Lemma 3.2.1, we obtain
sUPIHH;J(t) - tl-7 0, a.s.
(3.2.19)
t ,u
(We need to use the convergence in probability only) . Now, fix a 0 > 0 and let B~ = [SUPt,u jHH;J(t) - tl < oj. By (3.2.19) ,
((B~) C)
limnsup P
(3.2.20)
Next, observe that Yd( ' ,O) = as in (3.2.17), for every fJ > 0,
Wd'O
= O.
of (2.2.23) . Hence , with A 2
limsupP (IA 2 1 ~ fJ) n
S limsupP ( sup n
It-sl <8
jW~(t) - W~(s)1 ~ n, B~) .
Upon letting 0 -7 0 in this inequality, (2.2.25) implies (3.2.21) Next , we have (3.2.22)
lim lim sup sup Ilvw(t) - vw(s)11 0-+0
<
It- sl
n
lim lim sup max 0-+0
n
~
Ifni(Y) - f ni(X)1
sup IH(x) -H(y)l<8
n
xL II wi
Cil1
i=l
=0
,
by (3.2.12) and (2.3.7).
From (3.2.22) and (3.2.20) one obtains, similar to (3.2.21) , that A 4 = op(l) . This completes the proof of the theorem. 0 From a practical point of view, it is worthwhile to state the AUL result in the i.i.d. case separately. Accordingly, we have Theorem 3.2.2 Suppos e that X n1, ' " , X nn are i. i.d. F . In addition, assum e that (Fl) , (F2) , (N3) , (2.3.6) and (2.3.7) with dni == Wni hold. Then, V 0 < b < 00 , n
L
I
(3.2.23) sup IZd(t, u) - Zd(t, 0) - u' wnicniq(t) = op(l), o:::;t:Sl,llull:Sb i= l
3. Linear Rank and Signed Rank Statistics
76
ITd(
(3.2.24) sup
+ u'
'PEc,lIull:Sb
t
WniCni
i=1
J
qd
= op(1) .
where q = !(F- 1 ) . Proof. Take Fni == F in Theorem 3.2.1. Then (F1) and (F2) imply that q is uniformly continuous on [0, 1] and ensure the satisfaction of all assumptions pertaining to F in Theorem 3.2.1. In addition, /-lw(t , 0) = 0, t ::; 1. Thus, Theorem 3.2.1 is applicable and one obtains sup ITw(t, u) - Yw(t, 01 = op(1)
°::;
t ,u
which in turn yields (3.2.25)
sup ITw(t, u) - Tw(t, 0)1 = op(1). t,u
Next , let p = 2:7=1 WniCni· From (3.2.7), L.H.S. (3.2.23)
= sup IZw(t, u)
- Zw(t, 0) - u' pq(t)l .
t ,u
Hence, L.H.S.(3.2.23)
< sup {IZw(t, u) - Vw(t, ul] + IZw(t, 0) - Vw(t, 0)1 t ,u
+ITw(t, u) - Tw(t, 0)1 =
+ l/-lw(t , u)
- u' pq(t)I}
op(1) ,
by (3.2.9), (3.2.10), (N3), (3.2.25) and Lemma 2.3.1 applied to Fn i == F, d n i == Wni . To conclude (3.2.24), observe that by (3.2.23), the uniform continuity of q and (2.3.7) with dn i == Wni , L.H.S.(3.2.24)
::;
sup {IZd(t, u) - Zd(t, 0) - u' pq(t)1 t ,u
-l-ju'pllq((n + 1)t/n) - q(t)l} op(l) .
0
Remark 3.2.1 Theorem 3.2.2 continues to hold if F depends on n, provided now that the {q} are uniformly equicontinuous on [0,1] . 0
3.2. A UL of linear rank statistics
77
Remark 3.2.2 An analogue of Theorem 3.2.2 was first proved in Koul (1970) under somewhat stronger conditions on various underlying entities. In Jureckova (1969) one finds yet another variant of (3.2.24) for a fixed but a fairly general fun ction
(a)
F has an absoluiels) continuous density f whose a.e. derivative f satisfies
o < I(f) < 00 ,
I(f) :=
(b)
{wnd satisfy (N3).
(c)
1.
I: 7=:1 (Cni -
2.
maxi(cni - cn )2 = 0(1),
(d)
(e)
Then,
cn )2 ::;
M
< 00 ,
J(j I
2
f) dF.
(recall here Cni is 1 xl) ,
cn = n - 1 I:~=:l Cni ·
Either (dni - dnj) (Cni - Cnj) ;::: 0, \:j 1 ::; i ,j ::; n , or (d ni - dnj)(Cni - Cnj) ::; 0, \:j 1 ::; i,j ::; n . \:j
0
< b < 00 , n
sup Td(
where b(
:= -
+uL
Wni Cnib(
i=:l
f~
0
The strongest point of Theorem 3.2.3 is that it allows for unbounded score functions , such as the "Normal scores " that corresponds to
being the d.f. of a N(O , 1) r.v . However, this
3. Lin ear Rank and Signed Rank Statistics
78
is balan ced by requiring (a), (cl ) and (e) . Not e that (b) and (cl ) together imply (2.3.7) with dn i = Wn i , 1 ~ i ~ n . Moreover , Theorem 3.2.2 does not require anyt hing like (e) . Claim 3.2.1 (a) implies that
f is Lip(I/2) .
First , from Haj ek - Sidak (1967), pp . 19-20, we recall that (a) implies th at f (x ) --t 0 as Ixl --t 00. Now, absolute continuity and nonnegati vity of f implies t hat
~
If (x ) - f (y )1
i
Y
l(j/ J)ldF, x < y. < y,
Therefore, by the Cauchy-Schwarz inequality, for x
{i (j / J) 2dF . [F (y ) _ F (X)]} 1/2 Y
(3.2.26) If( x) - f (y )1
<
(3.2.27)
< /1 / 2(1 ).
Letting y
--t 00
in (3.2.27) yields
(3.2.28) Now (3.2.26) and (3.2.28) together imply
A similar inequ ality holds for x > y , thereby giving
If( x) - f (y )! ~ J3/4(1)l y - x \1/2, V x, Y E JR, and pr oving the claim . Consequ ently, (a) implies (F1). Not e t hat f can be uniformly cont inuous, bounded , positive a .e. , yet need not sa tisfy / (1 ) < 00. For exa mple, consider
f( x)
.-
f( x)
o~ x
~ 1, 2 (x - 2j + 1)/ 2 + , 2j - 1
(1 - x )/ 2,
j
+1-
.-
(2j
.-
f (- x) ,
x)/2j+2 , 2j x
~
o.
~
x
~
x
~
~
2j
2j
+ 1,
j 2: 1;
3.2. AUL of linear rank statistics
79
The above discussion shows that both Theorems 3.2.2 and 3.2.3 are needed. Neither displaces the other. If one is interested in the AUL property of, say, Normal scores type rank statistics, then Theorem 3.2.3 gives an answer. On the other hand if one is interested in the AUL property of, say, the Wilcoxon type rank statistics, then Theorem 3.2.2 provides a better result . The proof of Theorem 3.2.3 uses contiguity and projection technique a La Hajek (1962) to approximate T d (
I2 n- /
(3.2.29)
n
L
Ilcnill
= 0(1) .
i= 1
Then , V 0
(3.2.30)
< b < 00,
sup o::;tSI,lluI ISb
In
l 2 / (H H;
where YI , VI etc . are Yd , Cons equently,
Vd
J (t) - t)
+ YI(t ,O) + u'VI(t)1
of (2.3.1) , (2.3.10) with dn i
= op(1),
== n- I / 2 .
(3.2.31)
sup In l / 2 (H H ; J (t ) - t)1 = Op(1) . OSt::;1,lluII Sb
Proof.
Write Yd ') ,I-lI(-) for YI(',O),I-lI(-, 0) , resp ectively. Let I
80
3. Linear Rank and Signed Rank Statistics
denote the identity fun ction and set ~nu := n l / 2 (H nuH ;;J -1) . Then n l / 2 (H H;;~ - 1)
nl /2(HH;;~ - HuH;;~ - HnuH;;~) =
-[J-LI (HH;;J , u ) - J-LI(HH;;~) - u'vI(HH;;~)] -u'[vI(HH;;~) -
(3.2.32)
+ ~nu
VI -
+ ~nu
Y1
-[YI(HH;;~ ,u) - YI(HH;;~)] - [YI(HH;;~) - Y1 ] .
Now, not e that
SUPt,u
l~nu(t)1 ::; n- I / 2 . Hence
sup Inl / 2(H H;;~(t) - t) t ,u
+ YI(t) + U'VI (t)1
< sup 1J-LI(t, u) - J-LI(t) - u'vI(t)1 t ,u
+B sup IlvI(H H;;~ (t)) t ,u
(3.2.33)
VI (t)
II
+ sup IY1 (t, u) t ,u
- Y1 (t) I
+suplwt(HH;;~(t)) - wt(t) l, t ,u
where we have used th e fact that YI(t) = Wt (t ) of (2.2.23). The first term on th e r.h.s. of (3.2.33) tends to zero by Lemma 2.3.1 when applied with d ni = n- I / 2 . The third term tend s to zero in probability by (2.3.25) applied with d ni = n - I / 2 . To show th at th e other two terms go to zero in pr obabili ty, use Lemma 3.2.1, (2.2.25) and an ana logue of (3.2.22) for VI and an ar gument similar to th e one th at yielded (3.2.21) and (3.2.23) above. Thus we have (3.2.30). Since SUP t,u IYI(t, 0 ) + u'vI(t)1 = Op(l ), (3.2.31) follows. 0 Lemma 3.2.3 In addition to the assumptions of Th eorem 3.2.1 and (3.2.29), suppose that for every 0 < k < 00, (3.2.34)
max l
n l / 2ILni(t) - Lni( s) - (t - s )£ni(s )1
sup It-s l:S kn-
1/ 2
= op(l)
where Lni := FniH- 1, £ni := f ni(H -1) /h(H - 1), 1 < h := n- l I:~l f ni. Moreover, suppose that, with n
w(t)
:= n -
l
2:= Wni£ni (t ), i= l
0 ::; t ::; 1,
2
< n , with
81
3.2. AUL of linear rank statistics sup n 1 / 2 !w(t)! = 0(1).
(3.2.35)
°9 :'S1
Th en, V 0 < b <
00 ,
(3.2.36) sup IlLw( H H;J(t)) - lLw(t ) + {Ydt)
+ U'Vl (t)}n 1/ 2w(t)I
= op( l) ,
where ILw(t) , Y1 (t) stand for ILw(t , 0), Y1 (t , 0), respectively, and where the supremum is being tak en over 0 :S t :S 1, lI ull :S b. Proof. Let M u := ILw(H H;;J) - ILw . From (3.2.31) it follows that V E > 0, 3 K € and N It such t hat (3.2.37)
where
A~ = sUPIHH;J(t) - t l:S K€n - 1 / 2 ]. t ,u
By assumption (3.2.34) , there exists N 2€ such that n 2: N 2€ implies (3.2.38)
m9X l
n 1/ 2I L i(t) - Li(S) - (t - s )fi(s )1 <
sup It- sl:'SJ(,n-
1/ 2
Define
In view of (3.2.38) and (3.2.37) , (3.2.39)
P
(m9X sup nl/2 IZ~i (t)1 > E) t,u
< E,
l
for all n 2: N € := N 1€ V N 2€ . Moreover ,
«:
Mu I(A~) n
+ Mu I((A~) C)
L Wi Z~i + Zuo + n i= 1
where
1
/
2[ HH;J - I ]n 1/ 2w,
E.
82
3. Linear Rank and Signed Rank St atisti cs
Note that P (sup IZ~ol t,u
f= 0) ::; P ((A~) C) < E,
\:j
n
> N c.
By t he C-S inequality, (N 3) and (3.2 .39) , p
(St~.f ~ WiZ~i(t) > E ) "E, V n > N, .
These derivations together with Lemma 3.2.2 and (3.2.35) complete 0 t he pr oof of (3.2.3 6) . We combine Theorem 3.2.1, Lemmas 3.2.2 and 3.2.3 to obtain th e following Theorem 3.2.4 Under th e no tation and assum ptions of Th eorem 3.2.1, Lem m as 3.2.2 and 3.2.3, \:j 0 < b < 00 , n
(3.2.40)
sup IZd(t , u ) - Zd(t , 0) -
u'2~)dni - dn(t)) Cniqni(t) I i= l
= op( l) ,
(3.2.41)
su p !Td(
+u'
Ji»:-
dn(t)) Cniqni(t)d
I
i =l
= op( l),
where th e supremum in (3.2.40) is over 0 ::; t ::; 1, IIull ::; b, in (3.2.4 1) over sp E C, IIull ::; b, and where dn(t) := n - 1 I:~l dni€ni(t), qni := f ni(H- 1 (t)) , 0 ::; t ::; 1, 1 ::; i ::; n .
Proof. Let p(t) := I:~l(di - d(t)) Ci qi(t ). Not e th at th e fact that n - 1 I:~=l €i(t) == 1 implies t hat p (t ) = I:~=l (Wi - W(t))Ci qi(t) , where {wd are as in (3.2.6) . From (3.2.7)-( 3.2.9) ,
(3.2.42) L.H .S.(3.2.40)
<
sup IZw(t , u ) - Zw(t , 0) - u' p (t )1 o9 :S 1 ,llulI :Sb 4 max IWil + sup !Vw(t , u ) - Vw{t, 0) - u' p(t )l. ~ o:St:Sl,lI ull:Sb
83
3.3. AUL of linear signed rank statistics
Now, from Theorem 3.2.1 and Lemma 3.2.3, uniformly in 0 ::; t ::; 1, Ilull ::; b, (3.2.43)
sup ITw(t, u) - Yw(t)
+ {Yt{t) + Vl(t)u}n 1/ 2w(t)!
= op(l) , where Yd(t) stands for Yd(t, 0) for arbit rary weights {dnd . Therefore, sup IVw(t, 0) - Vw(t, 0) - u' p(t)1
= sup ITw(t, u) - Tw(t, 0) + fLw(t, u) - fLw(t, 0) ::; sup ITw(t , u) - Tw(t, 0)
+ sup IfLw(t, u)
+
- u' p(t)1
1 2w(t)1 U'VI (t)n /
- fLw(t , 0) - u'vw(t)1
= op(l) ,
by (3.2.43) and Lemma 2.3.1 and the fact that p(t) = v w(t) vt{t)n 1 / 2w(t) . This completes the proof (3.2.40). The proof (3.2.41) follows from (3.2.40) in the same fashion as does that (3.2.24) from (3.2.23) .
of of 0
Remark 3.2.3 As in Remark 2.2.3, suppose we strengthen (N3) to require
(BI) Then (C*) and (3.2.35) ar e a priori satisfied by L w .
0
Remark 3.2.4 If one is interested in the i.i .d . case only, then The0 orem 3.2.2 gives a better result than Theorem 3.2.4.
3.3
AUL of Linear Signed Rank Statistics
In this section our aim is to prove analogs of Theorems 3.2.2 and 3.2.4 for the signed rank processes {Td' (.p; u) , u E W} , using as many results from the previous sections as possible. Many details are quite similar. Define , for u E W , 0 ::; t ::; 1, x 2: 0, n
(3.3.1)
zt(t , u) :=
L dniI(R~ ::; nt) S(X ni i=l
n
Jnu(x) := n-
1
L I(IXni -
c~iul ::; x)
i= l
= Hnu(x) - H nu( - x) ,
C~iU) ,
3. Lin ear Rank and Signed Rank Statisti cs
84 n
Ju (x ) .- n- 1 L[Fni( X + C~iU) - Fni (- x i=1 = Hu( x) - H u( - x) ,
+ C~iU)]
n
.-
L dniI(IXni - c~iul :S J;~ (t )) S(X ni - C~iU) , i=1
s t(t , u ) .-
L dniI (! Xn i - c~iul :S J - 1(t )) S(X ni - C~iU) , i=1
J.L d (t , u ) .-
L dni J.L~i (t , U) = ESt (t , u }, i=1 Fni(J - 1(t) + CniU) + Fni( -J- 1(t) + C~iU)
v t(t , u)
n
n
J.L~i( t , u)
.-
-2Fni(C~iU) ,
1 :S i :S n.
In the above and sequel, J and I n stand for Jo and J nO' resp ectiv ely. We also need
(3.3.2)
Yd+(t , u )
:=
st(t , u ) - J.L d (t , u ),
f ;j (t , u )
:=
v t (t , u ) - J.Ld(t , u ),
O:S t :S 1, u E IW .
An alogous to (3.2.11), we have the basic decomp osition : For 0 :S t :S 1, u E ffiP ,
Now, not e that , w.p . 1, for all 0 :S t :S 1, u E IRP ,
(3.3.4) Y/ (t , u ) =
Yd(H J - 1(t ), u ) +Yd(H (_J- 1(t)), u ) - 2Yd(H( O) , u ),
where Yd is as in (2.3.1). Therefore, by Theorem 2.3.1 (see (2.3.25)) , under the assumptions of that theorem and strictly increasing nature of J and H ,
(3.3.5)
sup IY/ (t , u ) - Y/ (t , 0)1= op( l). t, u
On e also has , in view of the cont inuity of {Fnd , a relation like (3.3.4) between J.L d a nd J.L d. Thus by Lemma 2.3.1, un der the ass ump ti ons t here, (3.3.6)
sup lJ.Ld (t , u) - J.L d(t , 0) - u'v+(t)1 = op(l ). t, u
3.3. AUL of linear signed rank statistics
85
where, for 0::; t ::; 1, n
vd(t)
:=
Ldni ni [Jni(J C
1
(t )) + fni(-J- 1(t)) - 2fni(0)].
i=l
We also have an analogue of Lemma 3.2.1: Lemma 3.3.1 Without any assumption except (2.2.24), sup IJn(x) - J(x)1 -+ 0 a.s . OS;xS;oo
If, in addition, (2.3.6) and (3.2.15) hold, then IJnu(x) - Ju(x)j -+ 0 a.s ..
sup oS;xS;oo,llullS;b
Using this lemma, arguments like those in Theorem 3.2.1 and the above discussion, one obtains Theorem 3 .3.1 Suppose that {X ni , Fnd satisfy (2.2.24) , (2.3.5) and that {dni , cnd satisfy (N1), (N2), (2.3.6) and (2.3.7). In addition, assume that (3.3.7)
lim lim sup max n
0-+0
l
sup
Ifni(x) - fni(y)1 = 0
lJ(x)-J(y)l
and that H is strictly increasing for every n. Then, for every 0 < b < 00 , (3.3.8) sup ITt (t, u) - Y/ (t , 0) - fhd(J J;~ (t) , 0) e
+ fhd (t, 0)1 0
We remark here that (3.3.7) implies (3.2.12) . Next , note that if {Fi } are symmetric about 0, then
fhd(t,O) = 0, 0::; t ::; 1, n ~ 1.
(3.3.9)
Upon combining (3.3.9) , (3.3.8) with (3.3.6) one obtains Theorem 3.3.2 In addition to the assumptions of Theorem 3.2.1, suppose that {Fni, 1 ::; i ::; n} are symmetric about O. Then , for every 0 < b < 00, n
(3.3.10)
sup 099,lIull:Sb
= op(I) ,
Izt(t , u) - zt(t , 0) - u'
L dnicnw;;i(t) I i=l
3. Linear Rank and Signed Rank Statistics
86
n
(3.3.11) sup Ir/(
+ u' L
1 1
dniCni
i=l
v;t(t)d
I
0
= op(l),
where v;t(t)
:=
2[jni(J- 1(t)) - fni(O)], 1:::; i :::; n ,
0:::; t :::; 1,
and where the supremum in (3.3.11) is taken over
= 0(1),
!
Thus (3.3.9) follows from this, (3.3.8), (3.3.7) and (3.3.6). The conclusion (3.3.11) follows from (3.3.9) in the same way as (3.2.24) follows from (3.2.23) . 0 Because of the importance of the i.i .d. symmetric case, we specialize the above theorem to yield Corollary 3.3.1 Let F be a d.j. symmetric around zero, satisfying (F1), (F2) and let X n 1 , ' " ,Xnn be i.i.d. F. In addition, assume that {d ni, cnd satisfy (N1), (N2) , (2.3.6) and (2.3.7) . Then, for every 0 < b < 00 ,
(3.3.12)
sup Izt(t , u) - zt(t, 0) - U'L;idniCniq+(t) I o:St:Sl,llull:Sb = op(l),
(3.3.13) sup Ir/(
where q+(t)
:=
+ L;idnic~iu
e
q+(t)d
Jo
2[j(F- 1((t + 1)/2)) - f(O)], 0 :::; t :::; 1.
0
Remark 3.3.1 van Eeden (1972) proved an analogue of (3.3.13) without the supremum over ip ; but for square integrable
3.3. AUL of linear signed rank statistics
87
Now, we return to Theorem 3.3.1 and expand the J.l!-terms further so as to recover an extra linearity term. Define , for 0 ~ t ~ l ,u E W ,
y; (t , u)
n
L dni[I(IX ni - cniul ~ J-1(t)) - Fit(J-l(t))) ,
:=
i= l
n
v;t(t)
:=
LdniCnilfni(J-1(t)) - fni(-J-1(t))] , i= l
where
Not e the relation: For arbitrary {dnd , (3.3.14)
From (3.3.14) and (2.3.25) applied with dni = n (3.3.15)
1/ 2 ,
we obtain
sup IYt(t , u) - Yt(t , 0)1 = op(I) . t ,u
Note that in the case dni == n- 1/ 2 , (2.3.7) reduces to (3.2.29) . Next , under (3.3 .7) and (2.3.7), just as (3.2.22) , (3.3 .16)
limlimsup sup IIv;t(t) - v;t(s)11 = 0, n It-sl
0---+0
for the given {dnd and for dni == n- 1 / 2 . Using (3.3.15) , (3.3.16) and calculations similar to those don e in the proof of Lemma 3.2.2, we obtain Lemma 3.3.2 Und er th e condit ion s of Theorem (3.2.) and (3 .2.29) (3.3.17)
sup Inl/2(JJ;~(t) - t) t ,u
+ Yt(t , 0) + u'vi(t)1
= op(I) .
Cons equently,
(3.3 .18)
o
Similarly arguing as in Lemma 3.2.3, we obtain the following Lemma 3.3.3. In it J.l!(t) , J.lt(t) etc. stand for J.l!(t ,O),J.lt(t ,O) etc . of (3.3.1) .
3. Linear Rank and Signed Rank Statistics
88
In addition to the assumptions of Theorem 3.2.1, (3.2.29), assume that for every 0 < k < 00 ,
Lemma 3.3.3
(3.3.19)
sup nl/2ItL~i(t) - tL~Js) - (t - s)e~i(s)1 It-sl:Skn- 1j 2 = 0(1)
max t
where {tL~i} are as in (3.3.1), e~i(S) := [fni(J- 1(S)) - fni(-J- 1(s))Jlh+(J -l(s)) , 0
S s S 1,
n
h+(x) := n- 1 L[fni(X) - fn i(-x)],
x 2: O.
i=1
Moreover, with dt(t) := n- 1 I:~=1 dnie~i(t) , 0 S t S 1, assum e that sup Inl/2d~(t)1 = 0(1) .
(3.3.20)
°9:S 1
Then, for every 0 < b <
00,
(3.3.21) sup ItLt(JJ;~(t)) - tLt(t)
+ {yt(t) + u'v;:(t)}nl /2d~(t)1
= op(l) ,
where the supremum is taken over the set 0 S t S 1, Ilull s b. Finally, an analogue of Theorem 3.2.3 is Theorem 3.3.3 Under the assumptions of Theorem 3.3.1, (3.2.29) , (3.3.19) and (3.3.20), for every 0 < B < 00 , (3.3.22)
sup
09:S 1 ,llull:Sb = op(l),
(3.3.23)
0
Izt(t , u) - zt(t , 0) - u'[vt(t) - v;:(t)nl/2d~(t)]1
sup
IT/(
+u' 10 [vt(t) = op(l) .
v;:(t)nl/2d~(t)]d
Remark 3.3.2 Unlike the case in Theorem 3.2.3, there does not appear to be a nice simplification of the term vt - vin 1/ 2dt. However,
3.4. Weak convergence of rank and signed rank W.E.P. 's
89
it can be rewritten as follows:
n
=
L diCi[Ji(J-1(t)) + Ji( -J-1(t)) -
2Ji(O)]
i=l
This representation is somewhat revealing in the following sense. The first term is due to the shift U'C i in the LV. Xi and the second term is due to the nonidentical and asymmetric nature of the distribution of X i, 1 ::; i ::; n . 0 Remark 3.3.3 If one is interested in the symmetric case or in the i.i.d, symmetric case then Theorem 3.3.2 and Corollary 3.3.1, re0 spectively, give better results than Theorem 3.3.3.
3.4
Weak Convergence of Rank and Signed Rank W.E.P. 'so
Throughout this section we shall use the notation of Sections 3.2 - 3.3 with u = O. Thus, e.g., Zd(t) , zt(t), etc . will represent Zd(t ,O), zt(t , 0), etc. of (3.2.2) and (3.3.1) , i.e., for 0 ::; t ::; 1, n
(3.4.1)
Zd(t) =
L dniI(Rni ::; nt) , i= l
n
zt(t) =
L
dniI(R~i ::; nt)s(Xni) ,
i =l
n
Vd(t) =
L i=l
n
dniI(Xni < H;;l(t)) ,
f1d(t) =
L dniLni(t) , i=l
where Rni (R~i) is the rank of X ni (IXnil) among X n1, . . , , X nn ( IXn11 , ... , IXnnl ). We shall first prove the asymptotic normality of Zd and zt for a fixed t, say t = v, 0 < v < 1. Note that this corresponds to proving the asymptotic normality of simple linear rank and signed rank statistics corresponding to the score function r.p that is degenerate at v.
90
3. Linear Rank and Signed Rank Statistics
To begin with consider Zd(V). In the following theorem v is a fixed number in (0,1) . Theorem 3.4.1 Suppose that {Xnd, {Fnd, {Lnd, L d are as in (2.2.23) and (2.2.24) . Assume that {dnd satisfy (N1), (N2) and that H is strictly increasing for each n. Also assume that lim limsup[Ld(V
(3.4.2)
0--+0
n
+ <5) -
Ld(v - <5)] = 0,
and that there are nonnegative numbers £ni(v), 1 ~ i ~ n , such that for every 0 < k < 00 ,
(3.4.3)
max I
sup n 1 / 2IL ni(t) - Ln i(v) - (t - v)£n i(v)1 It-v l::;kn- 1 / 2
= o(1). Denoting n
dn(v)
.-
n- 1
L
dni£ni(v),
i=l
n i=l
assume that
(3.4.4) liminf a~(v) >
(3.4.5)
n
o.
Then,
The proof of Theorem 3.4.1 is a consequence of the following three lemmas. In these lemmas the setup is the same as in Theorem 3.4.1. Lemma 3.4.1 Under the sol e assumption of (2.2 .24), sup IHH;;l(t) -
tl
= op(l).
09::;1
Proof. Upon taking u = 0 in (3.2.19) , one obtains sup IHH;;l(t) O
tl
~
sup IHn(x) - H(x)1 - oo::;x::;+oo
+ n- 1 =
op(l) ,
3.4. Weak convergence of rank and signed rank W.E.? 's
91
by (3.2.14) of Lemma 3.2.1. 0 Lemma 3.4.2 Let Yd(t) denote the Yd(t , 0) of (2.3.1). Then, under (3.4.3), for every E > 0, limlim supP( sup IYd(t ) - Yd(v )! > n Jt- vl
0-+ 0
E) =
0.
Proof. Apply Lemma 2.2.2 to Ilni = H (Xnd , G ni = L ni , to obtain t hat Yd == Wd of t hat lemma and th at
p( It-vl sup IYd(t )
Yd(v )1 >
E)
< IiE-2 [Ld(v + 0) - Ld(V - - : + P(lYd(V - 0) - Yd(v)1 > E/2) +P(lYd(V + 0) - Yd(v - 0)1>
E/4)
< (K+ 20)E- 2 [L d(v +0) - Ld(v -o) ] , (by Chebyshev) . The Lemm a now follows from t he assumption (3.4.3) . Lemma 3.4.3 Under (3.4.3), for every E > 0,
limsu pP(!Yd(HH;l(V)) - Yd(v)1 > E)
= O.
n
Proof. Follows from Lemmas 3.4.1 and 3.4.2.
o
Remark 3.4.1 Lemmas 3.4.2 could be deduced from Coro llary 3.3.1 wh ich gives t he tight ness of t he process Yd un der stronger cond it ion (C*). Bu t here we are interested in the behavior of Yd only in the neighb orh ood of one point v and th e above lemma proves t he continuity of Yd at th e point v at which (3.4.3) holds. Simil arly, many of the approxima tions that follow could of course be dedu ced from pr oofs of Theorem 3.2.1 and 3.2.2. Bu t these th eorems obtain results uniformly in 0 :S t :S 1 un der rather st ronger conditions t ha n would be needed in the pr esent case. Of course vari ous decompositi ons used 0 in t heir proofs will be useful here also. Proof of Theorem 3.4.1. In view of (3.2.9) and (N 2) , it suffices to prove that {ad(v )}-lTd(v ) --+ d N(O, 1), where
3. Linear Rank and Signed Rank Statistics
92
But, from (3.2.11) applied with u = 0,
Td(V)
+ ILd(HH;;l(v)) -lLd(V), Yd(V) + op(1) + ILd(HH;;I(v)) -lLd(V) ,
Yd(HH;;I(v)) =
Apply the identity (3.2.32) with u = n- 1j 2 to obtain ,
w.p.1 by (3.4.5) .
°and Lemma 3.4.3 with di ==
- YI (H H;;I (v)) - YI ( v)
+ op(1)
+ "»(1).
Since Yr(v) --'td N(O ,v(1 - v)) , WI (v)1 = Op(1) . Again, argue as for (3.2.36) with u == 0, t == v (i.e., without the supremum on the L.H.S. and with u == 0 , t == v), to conclude that
Combine this with (3.4.6) to obtain
(3.4.7) Td(V) Yd(V) -
n l / 2 d(v )Y1(v ) + op(1)
n
2:)dni -
d(v)){I(Xni :::; H -1(v)) - Lni(v)}
+ op(1).
i=1
The theorem now follows from (3.4.5) and the fact that {ad(v)}-Ix { leading term in the RHS of (3.4.7) } - t d N(o, 1) by the L-F CLT , 0 in view of (Nl) and (N2) . Remark 3.4.2 If {Fnd have densities {fnd then eni (v ) can be taken to be f ni(H-1(v)) jh(H -1(v)) , just as in (3.2.34) . However , if one is interested in th e asymptotic normality of linear rank statistic corresponding to the jump score function, with jump at v , then we need {L nd to be smooth only at that jump point. The above Theorem 3.4.1 bears strong resemblance to Theorem 1 of Dupac-Hajek (1969). The assumptions (Nl) , (N2) , (3.4.3), and (3.4.5) correspond to (2.2) , (2.13) and (2.22) of Dupac - Haj ek. Condition (3.4.2) above is not quite comparable to condition (2.12) of Dupac-Hajek but it appears to be less restrictive. In any case, (2.12) and (2.13) together imply th e boundedness of {ei(v)} and
3.4. Weak convergence of rank and signed rank W .E.P. 's
93
hence the condition (3.4.4) above. Taken together, then, the assumptions of the above theorem are somewhat weaker than those of Dupac - Hajek . On the other hand, the conclusions of the Dupac Hajek Theorem 1 are stronger than those of the above theorem in that it asserts not only {Zd(V) - J.Ld(v)}ai1(v) -+d N(O, 1) but also that E [ai1(v)(Zd(V) - J.Ld(V)W -+ 0, for r = 1,2 , as n -+ 00 . However , if one is only interested in the asymptotic normality of {Zd(V)} then the above theorem appears to be more desirable. Moreover , in view of the decomposition (3.2.11) , the proof presented below makes the role played by conditions (3.4.2) and (3.4.3) transparent. The assumption about H being strictly increasing is not really an assumption because, without loss of generality, one may assume that {Fi } are not flat on a common interval. For, if all {Fi } were flat on a common interval, then deletion of this interval would not change the distribution of R n1,' . . , R nn and hence of {Zd} . 0 Next, we turn to the asymptotic normality of zt(v) . Again, put u = 0 in the definition (3.3.1) to obtain, n
(3.4.8)
:L dniI( IXnil S J;l(t))s(Xnd ,
vt(t)
i= l n
st(t) =
:L dniI(IXni
!
S J -1(t))s(Xnd ,
i=l
J.L~i(t) =
Fni(J-l(t))
+ Fni( -J-1(t)) -
2Fni(0) ,
n
J.Lt(t)
=
:L dniJ.L~i(t) ,
Os t
S 1,
i=l
y+ = d
s d+ - J.Ld+'
Like (3.2.9), we have (3.4.9)
sup Iz t (t ) - vt(t) 1 OStSl
s 2 max Idil· t
Because of (N2) , it suffices to consider vt only. Observe that
3. Linear Rank and Signed Rank Statistics
94
where Yd is as in (2.3.1). Rewrite
Y/(t)
(3.4.10)
=
{Yd(H J-1(t)) - Yd(H(O))} -{Yd(H(O)) - Yd(-J-1(t))}
Yd1(t) - Yd2(t ), say . This representation motivates the following notation as it is required in the subsequent lemma. Let Pi := Fi(O) , qi := 1 - Pi and define for o:::; t :::; 1, 1 :::; i :::; n, (3.4.11)
L~(t)
.-
L1;(t) .-
{Fi(J-l(t)) - pd/qi,
qi > 0,
0,
qi = 0;
{Pi - Fi(-J -1(t))}/pi ,
Pi > 0,
0,
Pi = 0;
Observe that f-Lt(v) = qiL~(v) - PiL1;(v), 1 :::;
n
Ldl(t)
.-
i:::; n . Also define ,
n
Ld;lqiL~(t) ,
L~ (t)
:=
i=l
L d;PiL~(t) , 0 ~ t ~ l. i=l
Argue as for the proof of Lemma 2.2.2 and use the triangle and the Chebbychev inequalities to conclude Lemma 3.4 .4 For every
p( It -sup IYdj(t) vl
E
> 0 and 0 < v < 1 fixed,
Ydj(v)1 >
E)
~ ('" + 20)E - 2 [Lt (v + 6) - Lt(v - 6)], j = 1,2,
uihere r: does not depend on E,6 or any other underlying entities. 0
Theorem 3.4.2 Let X n1, '" ,Xnn be independent r.v. 's with respective continuous d.j. 's Fn1, . .. ,Fnn and d n1, .. . , d nn be real numbers. Assume that {dnd satisfy (Nl) , (N2) . In addition, assume the following.
3.4. Weak convergence of rank and signed rank W .E.? 's
95
For v fixed in (0,1 ), (3.4.13) lim lim sup ILJ.(v 8-+ 0
n
!1
+ 8) -
LJ.( v - 8)1 'J
= 0,
j
= 1,2.
(3.4.14) There exist num bers {ft(v ), 1 ::; i ::; n ; j = 1, 2} such such that max 1
< k < 00 , j
0
\j
sup It- vi:Sk n -
= 1,2 ,
n 1/2ILt (t) - Lt (v) - (t - v)ft(v)1 = 0(1). 1/2
With n
dt(v)
.-
n-
1
Ldni{qi f~( v) -Pi f 7;(v)} , i= l
2( T V) .-
n
L
{d~i [L~i(V)
-
{1t~i(v)} 2]
i= l
(3.4.15)
(a)
liminfT 2 (v) > O.
(b)
lim sup n 1 / 2 Idt
n
(v)1< 00.
n
Th en, (3.4.16)
where ItJ is as in (3.4.8). Proof. The proof of thi s th eorem is similar to th at of Theorem 3.4.1 so we shall be bri ef. To begin with, by (3.4.9) and (N2) it suffices to prove that {T(V)}-lrt(v) --'td N(0 ,1) , where r t(v) := Vd( v) - ItJ(v) . Apply Lemma 3.4.1 above to the LV. 'S jXn11,... ,IXnnl , to conclude that sup IJ(J;l(t )) - tl = op(1). O
From t his, (3.4.10), (3.4.12) and (3.4.13),
rt(v)
Yd+(JJ; l (v )) + ItJ (JJ;l (v )) - ItJ( v). Yd+(v ) + [lt J (JJ;l (V)) - ItJ (v )] + op(1).
3. Linear Rank and Signed Rank Statistics
96
Again, apply arguments like those that yielded (3.4.6) to {IXnil} to obtain n 1/2[JJ;1(V) = v] = -yt(V) + op(l), where yt(v) is as in (3.3.19) with t
r;j(v)
= Yd+(v)
= v and u = O.
- nl/2d~(v)yt(v)
Consequently,
+ op(l) = Kt(v) + op(l)
where
Kt(v) n
L
{dndI(J(IXnil) 'S v)s(Xnd -
/-l~i(v)]
i=l
-d~(v)[I(J(IXnil) 'S v) - L~i(V)]}. Note that Var(Kt(v)) = r 2(v) . The proof of the theorem is now completed by using the L-F CLT which is justified, in view of (Nl) , (N2), and (3.4.15a) . 0 Remark 3.4.3 Observe that if {Fi } are symmetric about 1/+ :::::: 0:::::: d+n and r 2(v ) = 'f:, I·d2L+(v) t"l nz nl .
a then 0
Remark 3.4.4 An alternative proof of (3.4.16), using the techniques of Dupac and Hajek (op .cit.}, appears in Koul and Staudte, Jr. (1972a) . Thus comments like those in Remark 3.4.1 are appropriate here also . 0 Next , we turn to the weak convergence of {Zd} and {zt}. These results will be stated without proofs as their proofs are consequences of the results of the previous sections in this chapter. Theorem 3.4.3 (Weak convergence of Zd). Let X n1, ... , X nn be independent r, v. 's with respective continuous d.j. 's Fn1, ' " ,Fnn . With notation as in (2.2.23), assume that (Nl), (N2) , (C*) hold. In ad-
dition , assume the following: There are measurable functions {£ni, 1 'S i 'S n} on [0,1], such that V a < k < 00 , (3.4.17) max I
sup n 1/ 2ILni(t) - Lni(s) - (t - s)£ni(s)1 = O. It- sl::;kn- 1 / 2
Moreover, assume that (3.4.18)
lim sup sup n 1 / 2 Idn (t )1 < n
O::;t::;l
00 ,
~,4 .
Weak convergence of rank and signed rank W.E.P. 's
(3.4.19)
lim lim sup sup n 1/ 2Idn( t ) - dn(s )1= 0, n
0-.0
(3.4.20)
97
it-s id
liminf0"2(t) > 0,
0 < t < 1.
n
Finally, with Kd(t) assume that
:= 2:~=1 (dni -
dn(t)){I(Xni ~ H-1( t )) - Lni(t)} ,
(3.4.21) C(t , s) lim COV(Kd(t), K d(s )) n
n
li~
'L
(dni - dn(t ))(dni - dn(s)) L ni(S)( 1 - Lni(t )),
i= l
exists for all a ~ s ~ t ~ 1. Then, Zd - fL d converges weakly to a mean zero, covariance C continuous Gaussian process on [0,1 ], tied down at a and 1. 0 Remark 3.4.5 In (3.4.17), withou t loss of generality it may b e assumed t hat n - 1L;i£ni( S) = 1,0 ~ s ~ 1. For , if (3.4.17) holds for some {£ni, 1 ~ i ~ n} , then it also hold s for £ni (S) == £~i ( s ) := n 1f2[Lni(S + n - 1/ 2) - Lni( s)], 1 ~ i ~ n , a ~ s ~ 1. Because
n- 12:;=1 Lni(s) == s, n - 1 2:~=1 £~i (s ) == 1. 0 Remark 3.4.6 Condit ions (C *), (N l ) and (3.4.18) may b e repl aced by t he condit ion (B) , b ecause, in view of t he previous rem ark, n
n 1/ 2Idn (t )1= In-1/2 'Ldni£ni(t )1 ~ n 1 / 2m ?-x jdn il, 1
i= l
a ~ t ~ 1.
0
Remark 3.4.7 In t he case Fni have density f ni, one ca n choose
£ni = f ni(H- 1)/n- 1
n
'L f nj(H- 1),
1 ~ i ~ n.
j= l
Remark 3.4.8 In t he cas e Fni == F, F a cont inuous and st rict ly increasi ng d.f., Lni(t ) == t , £ni (t ) == 1, so t hat (C *) and (3.4.17) (3.4.20) are t rivia lly satisfied . Moreover , C(s, t) = s(1 - t) , a ~ s ~ t ~ 1, so t hat (3.4.21) is satisfied . Thus Theor em 3.4.3 includes Theor em V.3.5.1 of Haj ek and SId ak (1967). 0 Theorem 3.4.4 ( Weak convergence of zt ). Let X n 1 , ' " , X nn be independent r.v. 's with respective d.f. 's F n 1 , " ' , Fnn and let dn 1 ,
98
3. Lin ear Rank and Signed R ank Statistics
... , dnn be real num bers. A ssume that (N 1) and (N 2) hold and that the following hold. (3.4.22) lim lim sup n
8-t0
sup
0
[Ld·(t 'J
+ 0) -
(3.4.23) There are measurable functions
1
sup It-sl:Skn -
(3.4.25)
= 0,
j
= 1,2.
i t , 1 ~ i ~ n , j = 1,2 , 00 ,
n 1/2I L ;j(t) - L;j(s ) - (t - s)it(s) 1= 0(1). 1/ 2
(3.4.24) lim sup sup n 1 /2I d~(t) 1 n
'J
such that for any 0 < k <
on [0, 1] max
Ld·(t)]
< 00 ,
0
lim lim sup sup n1 /2I d~(t) - d~ ( s)1 = n It - sl <6
o.
8-t0
(3.4.26) lim infT 2 (t) > 0, n
0
< t < 1.
(3.4.27) lim Cov(Kd (s ), K d (t )) = C+(s , t) exists, 0 n
< s ~ t ~ l.
Then, Zd - f.l d converges weakly to a continuous mean zero covariance C+ Gaussian process, tied down at O. 0 Remark 3.4.9 Rem arks 3.4.5 through 3.4.7 are applicable here also, with ap pro priate modifi cations. 0 Remark 3.4.10 Su pp ose t hat Fni n - 1 / 2 . T hen
== F, F cont inuous, and dni
sup IZd(t) - f.l d (t )1 0
sup n 1 / 2 1{H n (x ) - Hn(O)} - {Hn( O) - Hn(- x )}
o<x
-{F( x) - F(O)} - {F(O) - F (-x )} /, pr ecisely the st atisti c T~ prop osed by Smirnov (1947) to tes t t he hypothesis of symme try about F. Smirno v considered only t he null distribu tion. Theorem 3.4.4 allows one to st udy its asy mptotic distribution un der fairly general indep endent alternatives. For arbit rary {dnd , subject to (N 1) and (N 2) , t he stat ist ic sup{ IZd(t) - f.l d (t )I;O :S t :S I } may be considered a generalized Smirnov stat istic for testing symmet ry.
4 M, R and Some Scale Estimators 4 .1
Introduction
In th e las t four decad es stat ist ics has seen t he emerge nce and consolid ation of many compe titors of t he Least Squ ar e estimator of {3 of (1.1.1) . The most prominent are th e so-called M- and R- estimators. The class of M-estimators was introduced by Huber (1973) and its computational asp ects and some robustness properties are available in Huber (1981). The class of R-est imators is bas ed on th e ideas of Hod ges and Lehmann (1963) and has been develop ed by Adi chie (1967), Jureckova (1971) and J aeckel (1972). One of the attractive features of these est imators is th at they are robust aga inst certain outliers in erro rs. All of these estimators ar e translation invariant , whereas only R-estimators are sca le invariant. Our purpose here is to illustrate the usefulness of the results of Chapter 2 in der ivin g the asymptotic distributions of these estimators under a fairly general class of heterosced astic errors . The Section 4.2.1 gives the asymptot ic distributions of M-estimators while those of R-estimators are given in Section 4.4. Among ot her things, th e resul ts obtained ena ble one to study their qualitative robustness against an array of non-identical error d.f.'s converging to a fixed error d .f. T he sufficient cond iti ons given here are fairly general for H. L. Koul, Weighted Empirical Processes in Dynamic Nonlinear Models © Springer-Verlag New York, Inc 2002
100
4. M, R and Some Scale Estimators
the underlying score functions and the design variables.
Efron (1979) introduced a general resampling procedure, called the bootstrap, for estimating the distribution of a pivotal statistic. Singh (1981) showed that the bootstrap estimate B n is second order accurate, i.e. provides mor e accurate approximation to th e sampling distribution Gn of the standardized sample mean than the usual normal approximation in the sense that sup{IGn(x) - Bn(x)l ;x E 1R} tends to zero at a fast er rat e than that of th e square-root of n. This kind of result holds mor e generally as noted by Babu and Singh (1983, 1984).
Section 4.2.2 dis cusses similar results pertaining to a class of Mest imators of {3 when th e errors in (1.1.1) ar e i.i.d. It is not ed that th e th e distribution of the bootstrap est imators obtained by resampling the residuals according to a W .E.P. is second order acc urate . A similar result is shown to hold for Shorack's (1982) modified bootstrap est imators .
In an attempt to make M-estimators scale invariant one often needs a pr eliminary robust sca le est imator . Tw o such esti mators are th e MAD (median of absolute deviations of residuals) and the MASD (median of absolute symmetrized deviations of residuals) . The asymptotic distributions of th ese estimators under heteroscedasti c errors appear in Section 4.3.
In carrying out th e analysis of variance of an experimental design or a linear model based on ranks one needs an estimator of the asymptotic variance of certain rank statistics, see, e.g., Hettmansperger (1984). These variances involve the functional Q(J) = J fd rp(F) where sp is a known functi on , F a common error d.f. having a density f · Some est imators of Q(J ) under (1.1.1) are pr esented in Section 4.5. Again, the results of Cha pter 2 are found useful in proving th eir consiste ncy.
4.2. M-ESTIMATORS
4.2 4.2.1
101
M-Estimators First order approximations: Asymptotic normality
This subsection contains th e asymptot ic distributions of M - est imators of (3 when the errors in (1.1.1) are het eroscedastic. The following subsect ion 4.2.2 gives some results on t he boot st rap approximat ions to t hese distributions . Let the mod el (1.1.1) hold . Let 'ljJ be a nondecreasing functio n from JR to JR. The correspo nding M- est imato r .6. of (3 is defined to be a zero of the M - score J 'ljJ (y )V( dy , t) , where V is as in (1.1.2) . Our objec t ive is to invest igate t he asym ptot ic be havior of A - 1(.6. - (3) when the errors in (1.1.1) ar e het erosced astic. Our meth od is st ill the usu al one , v.i.z., to obtain the expansion of the M - score uniformly in t E {t ; IIA - 1(t - (3)11 ~ b}, 0 < b < 00, to observe t ha t there is a zero of the M - score , .6. , in this set and then to apply this expansion to obtain the approxima t ion for A -1(.6. - (3) in te rms of the given M - score at the true (3 . To make all this pr ecise, we need to standardize the M - score . For t hat reason we need some more not ation. Let (4.2.1)
.- diag (fn 1(y) , ' . . , fnn(Y) ), Y E JR, C .- AX' A * (y)d'ljJ (y )XA,
A *(y)
J A!
T( 'ljJ , t)
.- _C - 1
T( 'ljJ , t)
.-
'ljJ (y )V (dy , t) ,
A ~1(t - (3 ) - T( 'ljJ ,{3), tEJRP .
An approximat ion to .6. is given by the zero A of T('ljJ , t ), v.i .z., A - 1(A - (3)
(4.2.2)
= T('ljJ ,{3).
A basic resu lt needed to make this pr ecise is t he AUL of T( 'ljJ , t) in A -1(t - (3) , given in Theorem 4.2 .1 below. Let (4.2.3)
W :=
{'ljJ: JR I--7JR, 'ljJ E'DI(JR) , !'ljJ! tv bounded }.
Theorem 4.2.1 Let {(x~i ' Ynd , 1 ~ i < n}, {3 , {Fni , 1 ~ i ~ n } be as in the model (1.1.1) satisfying all conditions of Th eorem 2.3.3. In
4. M, R and Some Scale Estimators
102 addit ion, assume the follo wing:
lim supllC- 1 l1 oo <
(4.2.4)
00 .
n
Th en, V 0
(4.2.5)
< b < 00 ,
sup IIT(7P,,8
+ Au) - T( 7P,,8 + Au)11
= op(l) ,
l/J ,u
where the supremum is tak en over all 7P E w and Ilull ::s: b.
Proof. Rewrite, afte r int egration by parts ,
T( 7P, t) - T( 7P, t )
=
J
C - 1 A [V(y, t)
-V(y ,,8) - fdH( y))A - l (t - ,8)Jd7P(Y). Now (4.2.5) rea dily follows from this and (2.3.35). 0 In ord er to use this theorem , we must be able to argue th at IIA - l (Li - ,8)11 = Op(l) . To t hat effect , define J-li
.-
E7P (ed,
T? = V ar{ 7P (ei )},
1 ::S: i ::s: n ,
n
b.,
.-
ET(7P ,,8) = _C- 1 A L
XiJ-li ,
i=l
and observe that n
EllA - l (a -,8) - b n l12 = C- 1 LX~(X'X) -lxiTlc-1 = 0(1) , i=l
by (4.2.3), (4.2.4) and t he fact that 2::7=1 x~(X'X)-lX i == P < Hence by th e Markov inequ ality, V f. > 0, 3 0 < K € < 00 , 3 ,
00 .
Thus, assuming that n
(4.2.6)
LXiJ-li = 0, i=l
and arguing via Bro uwer 's fixed point theorem as in Huber (1981, p. 169), one concludes , in view (4.2.5), that IIA-l (A - ,8)11 = Op(l ).
4.2.1. Asymptotic normality of M-estim ators
103
A rout ine app lication of (4.2.5) enables one to conclude t hat
A -1(.6. - (3) = T ('IjJ ,{3 ) + op(l) .
(4.2.7)
Not e th at , un der (4.2.6), wit h T o('IjJ , (3 )
= CT ('IjJ , (3 ),
n
ETo('IjJ , {3) T~ ('IjJ , {3) = A LX~Xi TlA = AX'TXA i=l where T = diag (Tf , " . , T~) . Moreover , for any A E IRP , P
n
n
A'To ('IjJ ,{3) = L L Aj dij'IjJ (ed = LA'Ax~ 'IjJ ( ei ) i=l j=l i=l where {dij} are as in (2.3.30). In view of (2.3.31) and (2.3.32), (NX) and (4.2.6) imply t hat A'T o('IjJ ,{3) is asymptoti cally nor ma lly distrib ut ed wit h mean 0 and t he asymptot ic vari an ce A' AX'TXAA. T hus by the Cram er - Wold device [Theorem 7.7, p. 49, Billingsley (1968)], (4.2.4) and (4.2.7), :E- 1/ 2A - 1(.6. - (3 ) ---7d N( O, I p x p ) ,
(4.2.8) with :E as a
:=
C - 1 AX'TXAC - 1 . We summarize the above discussion
Proposition 4.2.1 Suppose tha t th e d.f. 's {Fnd of th e errors an d th e design m at rix X of (1.1.1) satisfy {4.2.4}, {4.2.6} an d the assumptions of Th eorem 2.3.3 including that H is stric tly increasing f or each n 2: 1. Th en {4.2.8} holds.
Now, consider t he case of the i.i .d. errors in (1.1.1) wit h Fn i Then ,
(4.2.9) Tl C
J (J
'ljJ 2 dF -
(J
jd'IjJ) I p x p ,
'ljJdF )2 = :E =
2 T ,
(J
j d'IjJ) - 2 T2 I p x p .
t XiJ 'ljJdF = O. i= l
F.
(say), 1 ::; i ::; n ,
Con sequent ly (4.2.4 ) is equivalent to requiring observe t hat (4.2.6) becomes (4.2.10)
==
J j d'IjJ > O.
Next ,
4. M, R and Some Scale Estimators
104
Obviously, this is satisfied if either I:~=l Xi = 0, i.e. if X is a centered design matrix or if f'lj;dF = 0, the often assumed condition. The former excludes the possibility of the presence of the location parameter in (1.1.1) . Thus to summarize, we have Proposition 4.2.2 Suppose that in (1.1.1), F ni
==
F. In addition,
assume that X and F satisfy (NX), (Fl), (4·2.10) and that f fd'lj;
>
O. Then ,
o Condition (4.2.10) suggests another way of defining M- estimators of (3 in (1.1.1) in the case of the i.i. d. errors. Let n
(4.2.11 )
Xnj
.-
l
n- L
Xnij,
1::; j
::; p;
i=l
x~
.-
X'
.- [x n ,"' , xn]nxp , X c := X - X .
Assume (NXl)
(Xnl,"', xnp ),
(X~XC)-I exists for all
n 2: p , maxi X~i(X~Xc)-IXni = 0(1).
Let n
(4.2.12) T*('lj;,t)
.-
Al L(Xni - xn)'lj;(Yi - x~it) , t E W , i=l
Al
.-
(X~Xc)-I/2 .
Define an M-estimator
a * as a solution t
of
T*('lj;, t) = O.
(4.2.13)
Apply Corollary 2.3.1 p times, lh time with dni = i th element of the i" column of X cA I , 1 ::; i ::; n,1 ::; j ::; p , to conclude an analogue of (4.2.5) above , v.i .z., sup ljI,IIA j l (t-{3)II :Sb
IIT*('lj;, t) - T*('lj; , t)11 = op(l)
105
4.2.1. Asymptotic normality of M-estimators where
T*(1/J , t) := A 1I (t - (3) - ( / f d1/J) -1 T*(1/J,{3) . The proof of this claim is exactly similar to that of (4.2.5) with appropriate modifications of replacing X by X , and A by Al and using Fni == F in the discussion there. Now, clearly, Fni == F implies that ET*(1/J , (3) = 0 , n
EIIT*(1/J,{3)11 2= 2:)Xi -x)/A~Adxi -X)T 2 =
0(1).
i= I
Hence , IIT*(1/J,{3)11 = Op(l). If A" is the zero ofT*(1/J, ·), then
Argue, as for (4.2.7) and (4.2.8) to conclude the following Proposition 4.2.3 Suppose that in (1.1.1), Fni == F . In addition, assume that X and F satisfy (NX1) , (F1) and (F2) . Then,
Remark 4.2.1 Note that the Proposition 4.2.2 does not require the condit ion J 1/JdF = O. An advantage of this is that d * can be used as a pr eliminary est imator when constructing adaptive estimators of
{3 . An adaptive estimator is one that achieves the Hajek - Le Cam (Hajek 1972, Le Cam 1972) lower bound over a large class of error distributions. Often a minimal condition required to const ru ct an adaptive estimator of {3 is that F have finite Fisher information, i.e., that F satisfy (3.2.a) of Theorem 3.2.3. See, e.g., Bickel (1982), Fabian and Hannan (1982) and Koul and Susarla (1983). Recall, from Remark 3.2.2, that this implies (F1) . On the other hand, the condition (NX1) does not allow for any location term in the linear regression model.
0
4. M, R and Som e Scale Estimators
106
So far we have been dealing with the linear regression model with kn own scale. Now consider the model (2.3.36) where"( is an unknown sca le param et er . Let s be an n l / 2 - consiste nt estimator of "(, i.e. (4.2.14) Define an M-estimat or (4.2.15)
t
x i7j!((Yi -
.6.. 1 of f3
as a solut ion t of
x~t)S-l) =
0 or
J
7j! (y )V (sdy , t) = O.
i= l
To keep exposition simple, now we shall not exhibit 7j! in some of the fun cti ons defined below. With C is as in (4.2.1) above, define, for an a > 0, t E W , (4.2.16)
f ' (y ) .-
(!I( y) , '" , f n(Y)),
Cl
.-
« :"? AX'
S(a , t)
.-
A
S( a , t)
.-
A - l (t -
J
y f (y )d7j! (y ),
J
7j! (y )V (a dy, t ),
f3h- l + C - lC l n l / 2( a
-
"(h- l
-C-lSh, (3) , Not e t hat by (NX) , (Fl) , (F3) , and (4.2.3), (4.2.17)
IICll1 = 0(1 ).
The following theorem is a direct consequence of Theorem 2.3.4. In it N; := {( a , t) : a > 0, t E W , IIA -l(t - (3)11 :::; lYy , Inl / 2( a- "()1 :::; lYy }, O < b < 00 . Theorem 4.2.2 Let {(x~i ' Y n i ) , 1 :::; i :::; n }, f3,"( , {Fn i , 1 :::; i :::; n} be as in {2.3 .36} satisfying all the condi tions of Th eorem 2.3.4. Moreover, assum e {4.2.3} and {4.2.4} hold. Th en, fo r ever y 0 < b < 00 ,
(4.2.18)
sup IIS(a , t) - S( a , t)11 = op (l ).
where the supremum is taken over all 7j! E W, and (a , t')' E Ni .
0
4.2.1. Asymptotic normality of M-estimators
107
Now argue as in the proof of the Proposition 4.2.1 to conclude Proposition 4.2.4 Suppose that the design matrix X and d.f. 's
{Fnd of {Eni} in {2.3.36} satisfy {4.2.5}, {4.2.6} and the assumptions of Theorem 2.3.4 including that H is strictly increasing for each n
:2:
1. In addition assume that there exists an estimate s of s satisfying
{4.2.14}. Then (4.2.19)
A -1(A 1 - 13),-1
= C- 1 Sb ,13) - C- 1C 1n 1/ 2 (s where
.6. 1
')'),-1
+ op(l), o
now is a solution of (is).
Remark 4.2.2 In (4.2.6), F; is now the d.f. of Ei, and not of ')'Ei, 1 i
~
n.
~
0
Remark 4.2.3 Effect of symmetry on
.6. 1 .
As is clear from
(4.2.19) , in general the asymptotic distribution of:0 1 depends on s . However, suppose that for all 1
d'ljJ(y)
=
~
i
~
n, y E JR,
-d'ljJ(-y) , /i(y) == /i( -y).
Then J y/i (y)d'ljJ (y) = 0, 1 ~ i ~ n, and, from (4.2.16), C 1 Consequently, in this case,
Hence, with
~
= o.
as in (4.2.8) , we obtain
Note that this asymptotic distribution differs from that of (4.2.8) only by the presence of ')'-1. In other words, in the case of symmetric errors
{Ei}
and the skew symmetric score functions {'ljJ} , the asymp-
totic distribution of M-estimator of 13 of (2.3.38) with a preliminary n 1/2 - consistent estimator of the scale parameter is the same as that of ')'-1 x M - estimator of 13 of (1.1.1). 0
108
4.2.2
4. M, R and Some Scale Estimators
Bootstrap approximations
Befor e discussin g t he specific b oot strap approximations we shall describe the conce pt of Efron 's b oot strap a bit more gen erall y in the one sample set up . Let 6 ,6, ' .. , ~n be n i.i .d . G LV . 's, Gn be t heir emp irical d.f. and Tn = Tn(en, G) be a fun cti on of e' := (6 ,6, ' " , ~n) and G such tha t Tn(e , G) is a LV. for every G. Let ( 1 , (2,' " , ( n den ot e i.i.d. Gn LV .'S and ( ~ := ((1, (2 , ' " ,(n). The bootstrap d.f. B n of Tn(e , G) is the d.f of Tn((n, G n) under G n. Efron (1979) showed , via numeri cal studies, that in several exam ples B n provides better approximat ion t o the d.f. I'n of Tn(en' G) under G than the normal approximation. Singh (1981) substantiated this obse rvat ion by provin g that in the case of the standardized sample mean the b ootstrap est imate B n is secon d order accurate , i.e.
(4.2.20)
supj lf' n(x) - Bn(x) l; x E ~} = o(n - l / 2 ) , a.s..
Recall that t he Edgeworth expansi on or t he Berry-Esseen b ound gives that sup [ I' n(x) - cI> (x )l; x E ~} = O(n- l / 2 ) , wh ere cI> is the d .f. of a N(O , 1) r .v. See, e.g., Feller (1966) , Ch. XVI ). Babu and Singh (1983, 1984), among ot hers , po inted out that this phen omen on is shared by a large class of statistics. For fur ther reading on boot st rapping we refer the reader to Efron (1982), Hall (1992). We now turn to the probl em of bootstrapping M-estimators in a lin ear regr ession model. For the sake of clarity we shall restrict our attention to a si m ple lin ear regr ession model only. Our main purpose is to show how a certain weighted emp irica l sampling distribution naturally help s t o overco me some inherent difficulties in defining boot strap M-estimators. What follows is based on the work of Lahiri (1989). No proofs will be given as they involve intricate te chnicalities of the Edgeworth expans ion for indep endent non-identicall y dist ributed LV .'s. Accordingly, ass ume that {ei , i ~ I } are i.i.d . F r. v.'s, {Xni' 1 ~ i ~ n} are t he known design points, {Yni , 1 ~ i ~ n } are obse rvable
4.2.2. Bootstrap approximations
109
r.v.'s such that for a (3 E JR, (4.2.21) The score function 'ljJ is assumed to satisfy (4.2.22)
J'ljJdF=O.
Let An be an M-estimator obtained as a solution t of n
(4.2.23)
L xni'ljJ{Yni -
Xni t ) = 0,
i=1
and Fn be an estimator of F based on the residuals eni := Yni XniAn , 1 ::; i::; n. Let {e~i' 1::; i::; n} be i.i.d. Fn r.v.'s and define (4.2.24)
1 ::; i ::; n.
The bootstrap M-estimator
~~
is defined to be a solution t of
n
(4.2.25)
L xni'ljJ{Y;i - Xni t ) =-= O. i=1
Recall, from the previous section, that in general (4.2.22) ensures the absence of the asymptotic bias in An . Analogously, to ensure the absence of the asymptotic bias in ~~ , we need to have Fn such that (4.2.26) where En is the expectation under Fn . In general, the choice of Fn that will satisfy (4.2.26) and at the same time be a reasonable estimator of F depends heavily on the forms of 'ljJ and F . When bootstrapping the least square estimator of (3, i.e., when 'ljJ{x) = x, Freedman (1981) ensure (4.2.26) by choosing Fn to be the empirical d.f of the centered residuals {eni - en., 1 ::; i ::; n} , where en. := n- 1 '2:/]=1 enj' In fact , he shows that if one does not center the residuals, the bootstrap distribution of the least squares estimator does not approximate the corresponding original distribution. Clearly, the ordinary empirical d.f. tt; of the residuals {eni; 1 ::; i ::; n} does not ensure the validity of {4.2.26} for general designs
110
4. M, R and Some Scale Estimators
and a general '1/;. We are thus forced to look at appropriate modifications of the usual bootstrap. Here we describe two modifications. One chooses the resampling distribution appropriately and the other modifies the defining equation (4.2.6) a La Shorack (1982) . Both provide the second order correct approximations to the distribution of standardized An .
Weighted Empirical Bootstrap Assume that the design points {xnd are either all non-negative or all non-positive. Let W x = L~=I IXnil be positive and define n
(4.2.27)
F1n(y) := w;1
L
jXnilI(eni :S y) , Y
E R
i= l
Take the resampling distribution Fn to be FIn. Then, clearly, TI
EIn'l/;(e~l) = w;1
L
I Xnil'l/;U~nd
i=1
n
sign(xt}w;1
L Xni'I/; (Yni -
Xni A) = 0,
i= 1
by the definition of An . That is, FIn satisfies (4.2.26) for any '1/; .
Modified Scores Bootstrap Let Fn be any resampling distribution based on the residuals . Define the bootstrap estimator ~ns to be a solution t of the equation n
(4.2 .28)
L Xni['I/; (Y; i -
Xni t ) - En'l/;(e~i)J = O.
i=1
In other words the score function is now a priori centered under Fn and hence (4.2.26) holds for any Fn and any '1/; . We now describe the second order correctness of these procedures. To that effect we need some more notation and assumptions. To begin with let r; := L~=I x~i and define n
TI
bi« >
L x~dr~, i= l
bx
:=
L i=l
IX~il/r~ .
111
4.2.2. Bootstrap approximati ons For a d .f. F and any sampling d .f. F n , define, for an x E JR,
I(X)
.- E 'ljJ (el - x) ,
w(x) = (J2(x ) := E { 'ljJ(el - x) - I(x)}2, In(X) := En 'ljJ ( e~l - x) ,
Wl(X) .-
E{'ljJ (el - x) - I(x)}3 ,
wn(x)
.-
(J~ (x) = En{'ljJ(e~ l - x) - In(x)}2 ,
Wln(X)
.-
En{ 'ljJ ( e~l - x) - ,n(x)}3 ,
{i: 1 ~ i ~ n , IXni l > cTx bx }, /1;n(c):= #An(c),
An( c) .-
C> 0.
For any real valued fun cti on g on JR, let g, 9 denot e its first and second derivat ives at wh enever t hey exist, resp ectively. Also, write In, Wn et c. for IN (O),WN(O), etc. Finally, let a := - -y/ (J, an := --Yn/(In and, define for x E JR, H 2(x) := x 2 - 1, and
°
Pn(x )
:=
In InWn .. " } X2
(x) - bix [{ - - - 3- -2 2 (In (In an
+
WIn ] -6 3 H 2(x ) ep (x ). (In
In t he following t heore ms , a .s, means for almost a ll sequences {ei; i 2: I} of i.i.d. F r. v.'s. Theorem 4.2.3 Let the model (4.2.21) hold. In addition, assum e
that 'ljJ has uniformly continuous bounded second derivative and that the following hold: (a) ---t 00. (b) a > O. (c) There exists a constant 0 < C < 1, such that livr; = O(K;n(c)) . (d) m xlnTx = O(Tx) . (e) There exist constants 0 > 0, 8 > 0, and q < 1, such that
T;
sup[IEex p{it'ljJ( el - x)}1 : [z ] < (f)
2:~=1 exp ( -
Th en, with
D.~
8, ltl > OJ < q.
AW;/ T;) < 00, V A > O. defined as a solution of (4.2.25) with Fn = FIn,
sup IPIn( anTx (D.~ - Lin) ~ y ) - 'Pn( y)1 = o(mx/Tx ), y
sup IPIn(a Tx(Lin - {3) ~ y) - PIn (Tx (D.~ - Lin) ~ y)1 y
= o(mx /T x)'
a.s .,
where PIn denot es the bootstrap probability under FIn, and where the supremum is over y E JR. 0
4. M, R and Some Scale Estimators
112
Next we state the analogous result for b. n s .
Theorem 4.2.4 Suppose that all of the hypotheses of Th eorem 4.2.3 except (J) hold and that b. n s is defined as a solution of {4.2.28} with Fn =
ti;
the ordinary em pirical of th e residuals . Th en,
sup IFn (anTx (b. ns - A n) ~ y) - Pn(y)1 = o(mx/Tx)' y
sup IPn (a Tx (A n,B) ~ y) - Pn(Tx(b. ns - A n) ~ y)1 y
=
where
i;
o(mx/Tx)'
a.s .,
denot es th e bootstrap probability under
it.;
o
The proofs of t hese t heorems appear in Lahiri (1989) where he also discusses analogous results for a non -smooth 'lj;. In th is case he chooses th e sampling distribution to be a smooth est imator obtained from the kernel type density est imator. Lahiri (1992) gives exte ns ions of th e above theorems to multiple linear regression models. Here we br iefly comment about the assumptions (a) - (f) . As is seen from th e pr evious section , (a) and (b) ar e minimally required for the asymptotic normality of M-estimators. Assumptions (c), (e) and (f) ar e required to carry out the Edgewort h expansions while (d) is slightly stronger than Noether's cond ition (N X) applied to (4.2.21). In particular, X i == 1 and X i == i sa tisfy (a) , (c) , (d) and (f). A sufficient cond ition for (e) to hold is that F have a positive density and 'lj; have a cont inuous positive derivative on an open int erval in JR.
4.3
Distributions of Some Scale Estimators
Here we shall now discuss some robust sca le estimators. Definitions. An est imator !1(X, Y ) based on th e design matrix X and the observat ion vector Y of f3 is said to be locatio n in vari ant if (4.3.1)
!1(X, Y
+ Xb) = !1(X , Y ) + b , V b E JRP.
It is said to be scale in variant if
(4.3.2)
!1(X ,aY) =a!1( X, Y), VaE JR, a#O .
113
4.3. Distributions of some scale estimators
A scale estimator 8{X , Y) of a scale parameter , is said to be location invariant if
8{X, Y
(4.3.3 )
+ Xb) = 8{X , Y) ,
V bE 1W.
It is sa id to be scale invariant if
8{X ,aY) = a8 {X , Y ), V a > O.
(4.3.4 )
Now observe that M-estimators Li and a * of f3 of Section 4.2.1 are location invariant but not scale invariant. The est imators Lil defined at (4.2.15) , ar e location and scale invariant whenever 8 satisfies (4.3.3) and (4.3.4) . Not e that if 8 does not satisfy (4.3.3) then Lil need not be location invari ant. Some of the candidat es for 8 ar e (4.3.5)
8
.-
{{n -
n
p )-1
2)Yi - x~/3)2}
1/ 2
,
i= 1
81
.-
med { IYi - x~/3I ; 1 < i
82
·-
med{ IYi -Yj -{Xi - Xj )'/3 1;1 ::; i < j ::; n } ,
::; n} ,
where /3 is a pr eliminary est imato r sa t isfying (4.3.1) and (4.3.2). Estimator 8 2 , wit h /3 as the least square est imator , is the usual est imator of the error var iance, ass uming it exists. It is known to be non-robust against outl iers in the errors . In robustness studies one needs scale estimators th at are not sensit ive to outliers in t he err ors . Estimator 8 1 has bee n mentioned by Huber (1981, p. 175) as one such candidate . The asymptotic properties of 8 1, 8 2 will be discussed shortl y. Here we just mention that each of these est imators est imat es a different scale paramet er but that is not a point of concern if our goal is only to have location and scale invariant M-estimators of f3 . An alte rn ative way of having location and scale invariant Mest imators of f3 is to use simultaneous M-estimation method for est imating f3 and , of (2.3.36) as dis cussed in Huber (1981). We ment ion here, without giving det ails, that it is possible to st udy the asymptotic joint distribution of these est imators under heteroscedastic errors by using th e results of Ch apter 2.
4. M, R and Some Scale Estimators
114
We shall now study the asymptotic distributions of 81 and 82 under th e model (1.1.1) . With F, denoting the d.f, of ei , H = n - 1 L:Z:::1 r; let (4.3.6)
pdy ) .- H(y ) - H( -y) ,
(4.3.7)
P2( y)
J[H(Y + x) - H( - y + x )JdH (x ), y
.-
~ O.
Define 1'1 and 1'2 by th e relations
P1hd =
(4.3.8)
1/2,
Not e that in th e case F, == F , 1'1 is median of the distribution of 1e1 1and 1'2 is median of the distribution of lei - e21· In general, 1'j , Pj , etc . depend on n , but we suppress this for th e sake of convenience. The asymptotic d istribution of S j is obtained by th e usual method of connecting the event { 8 j :S a} with certain events based on certain empirical processes, as is done when studying th e asy mptot ic distribution of th e sample medi an , j = 1,2. Accordingly, let , for y ~ 0, n
(4.3.9)
S( y)
'-
L I (IYi - x~,B1 :S y) , L L I (IYi - Yj i= l
T(y )
:=
(Xi -
xj)',B1:S y )
.
l :Si :Sj :Sn
Then , for an a > 0,
+ 1)2- 1},
(4.3.10) {Sl :S a}
=
{S( a) ~ (n
{S(a) ~n2-1}
C
{81 :Sa} ~ {S(a) ~nr1 -1} , neven.
n odd,
Simi larly, wit h N := n(n - 1)/2, for an a > 0, {T(a) ~ (N
(4.3.11)
{S2:S a} {T(a) ~ N2- 1}
C
+ 1)r 1},
N odd {82 :S a} ~ {T(a) ~ N2- 1 - I} , N even
Thus, to study the asymptotic distributions of 8 i - j = 1,2 , it suffices to st udy thos e of S(y) and T (y ), y ~ O. In what follows we shall be using the notation of Ch apter 2 with t he following modifi cations. As before, we shall write S~ , J-L~ etc . for
115
4.3. Distri bu tions of some scale estimators s~ , IL~ et c. of (2.3.2) whe never dni we shall take
== n- 1/ 2 . Moreover, in (2.3 .2),
(4.3.12) With these modifications, for all n
~
1, Y
~
0,
Sr(y , v ) - Sr( - y, v ) = n- 1/ 2
S (y)
n
L I(l ei -
c~v l :S y) ,
i= l
J
[S r(y + x , v ) - Sr( - y + x, v )JSr (dx , v) - 1,
with probability 1, where v = A - 1 (fJ - (3). Let (4.3.13) IL~(Y , u )
1L1(H(y) , u ),
y10 (y , u )
Y1(H(y ), u ),
J
K(y , u )
[y10(y
W(y , u ) 9i(X)
'-
hi( x)
.-
y
E IH;;
+ X, u ) - y 10( -y + X , u )JdH (x ),
y10(y , u ) - y 10( - y, u ), Y ~ 0, u E {fih'2 + x ) - Ji( + x)},
-,2 {fih'2 + x ) + Ji( -,2+ x )} ,
IH;P,
for 1 :S i :S n , x E R We shall write W(y) , K (y) etc . for W(y , O) , K (y,O) et c. Theorem 4 .3. 1 Assume that (1.1.1) holds with X and {Fnd satis-
fying (NX ) , (2.3.4) and (2.3.5) . Moreover, assume that H is strictly in creasing for each n and that lim lim sup
o~O
n
sup
O ~ s ~l - o
(4.3.14)
About {fJ} assume that (4.3.15)
[H(H - 1(s + 15) ± , 2) - H (H - 1(s) ± ,2)J = O.
4. M, R and Some Scale Estimators
116 Then, V a E JR, (4 .3.16)
P(n 1 / 2 ( s l
-
"n) ::; a"/'l)
= P ( W("/'l) + n- 1 / 2 ~ x~A{fk/'l) - Ji( -')'t}} . v
~ -a· ')'In-1 t[Ji(')'t} + Ji( -')'t}]) + 0(1), i= 1
(4.3.17) P(n / 2(s2 1
=
P (2K(')'2)
')'2) ::;
a')'2)
+ n- 3 / 2 L ~ Cij
9i(X)dFj(x) . v
J
1
~ -')'2 an - 1
J
tJ
hi( X)dH(X))
+ 0(1) .
1=1
where
Cij
=
(Xi -
Xj)' A,
1 S i , j ::; n .
Proof. We shall give the proof of (4.3.17) only ; that of (4.3.16) being similar and less involved . Fix an a E JR and let Qn(a) denote the left hand side of (4.3.17) . Assume that n is large enough so that an := (an- 1 / 2 + 1)')'2 > 0. Then , by (4.3.11), N odd
It thus suffices to study P(T(a n ) ~ N2- 1 + b), bE JR. Now, let
.- n- 1 / 2[2n- 1T(y) + 1] - n 1 / 2p2(Y) , y ~ 0, kn .- (N + 2b)n- 3 / 2 + n - 1 / 2 - n 1 / 2p2(an) .
Tt{y)
Then , direct calculations show that (4.3.18) We now analyze k n : By (4.3.8) ,
4.3. Distributions of some scale estimators
117
But
n 1/ 2(p2(an) - P2(,2)] =
n
1 2 /
J [{ H(a n + x) - H ('Y2
+ x)} -{H (-an + x) - H (
-,2 + x)} ]dH(x).
By (2.3.4) and (2.3.5), t he sequence of distributi ons {P2} is tight on (IR, B), implying that ,2 = 0( 1), n- 1/ 2 , 2 = 0(1). Consequently, 1 2
n /
J {H(±an + x ) - H(±' 2 + x )} dH (x) a,2 n - 1
t
J h(±, 2 + x )dH (x ) + 0(1),
i= l
- a, 2n - 1
(4.3.19 ) kn
t
J[h (,2
+ x) + fi(
-,2+ x) ]dH(x)
i =l
+ 0(1). Next , we approximate T1 (an ) by a sum of independ ent LV . ' S . The pr oof is similar to t he one used in approximating linear rank statistics of Secti on 3.4. From t he definition of T 1 , T 1 (y)
n - 1 / 2 J [Sr(y
+ x,v) -
Sr( - y + x ,v )]Sr (dx , v )
n - 1 / 2 J[YIO(y + x, v ) - y 10 ( - y + x ,v )]Sr (dx , v )
+ n- 1 / 2 J[f-L~ (Y + x, v ) - f-L~( - Y + x, v)]y1o(dx , v) + n- 1 / 2 J[f-L~(Y+ x,v)
-
f-L~( -y+ x,v)]f-L~(dx,v) - n 1/ 2p 2 (y)
Bu t E 3 (y)
.-
n- 1 / 2 J
[f-L~ (Y + x, v ) - f-L~( -Y + x , v)]f-L~ (dx, v ) -
n
1/ 2p2(Y)
4. M, R and Some Scale Estimators
118
n- 3/2
t 2; !
{Fl (y + X +
C~jv) -
-Fi(y + x) =
n- 3/2
Fi(-y + x
+ C~jv)
J
2=1
t L C~jV ! i =1
+ F1(-y + x) }dFj(X)
[Ji(y + x) - Ji( -y + x)]dFj(x)
+ u p (1),
j
by (2.3.3) , (NX) and (4.3.15). In this proof, up(1) means op(1) uniformly in Iyl :::; k , for every 0 < k < 00. Integration by parts, (4.3.15) , (2.3.25) , H increasing and the fact that J n -1 / 2f-l~ (dx , v) = 1 yield that
E2(Y) .-
n- 1/2 n - 1/2
!
! {JJ,~(Y + ! + {y10(y
x , v) -
f-l~( -y + x , v)}YI0(dx , v)
x , v) - y 10(-y + x, v)}f-l~(dx, v)
{y10(y + x ) - y 10(-y + x)}dH(x)
+ u p (1).
Similarly,
(4.3.20)
Et{y) =
n - 1/2
!
{YI0(y
+ x)
- y 10(-y + x)}S~(dx)
+ up (1).
Now observe that n-1/2S~ = H n , the ordinary empirical d.f of the errors {ei} ' Let
Eu(Y)
.- !
{y10(y + x) - Y?( -y + x)}d(Hn(x) - H(x))
Z(y) - Z( -y), where
Z(±y) := We shall show that (4.3.21)
!
y 10(± y + x)d[Hn(x) - H(x)], y 2: O.
4.3. Distributions of some scale estim ators
119
But
IZ(±an )
(4.3 .22)
-
Z(±'Y2)1
I![Yt{H(±an + x)) -
Yl(H(±,2 + x ))]
xd(Hn(x) - H(X)}I
S 2
sup 1Y1(H(y)) - Yl(H(z))1 Iy-s!:::; lain -1 / 2,
= op(l) ,
bec ause of (2.3.4}-(2.3.5) and Corollary 2.3.1 applied with dni n- l / 2 . Thus, to prove (4.3.21) , it suffices to show that
(4.3.23) But
IZ(±,2}!
111 <
[Yl (H(±'2 + H;;l(t)}} - Yl (H(±,2 + H-l(t)})]dtl
sup O
IY (H (± , 2 + H-l(H H;;l(t)}} - Yt{H(±'2 + H -l(t})}1 l
op(l }. by the assumption (4.3.14) , Lemma 3.4.1 and Corollary 2.3.1 applied with dni == n- l / 2 . This proves (4.3.21) . Cons equently, from (4.3.20) and an argument like (4.3.22) , it follows that
Et{a n} =
! !
{ylo(a n + x } - YlO( -an
+ x )}dH (x ) + op(l}
{Yl° (,2 + x } - YlO(-,2
+ x )}dH (x } + op(l} .
From these derivations and the definition (4.3.13), we obtain
Tt{a n} =
2K(,2}
+n- 3 / 2 ~ c~jA
!
{fi(,2
+ x} -
f i(
-,2 + x)}dFj(x} . v
2,J
(4.3.24)
+ op(l} .
4. M, R and Some Scale Estimators
120
Now, from the definition of k n and (4.3.19), it follows that the lim c., does not depend on b. Thus the limit of the l.h .s. of (4.3.18) is the same for b = -1 ,0,1/2, and, in view of (4.3.18), (4.3.19) and (4.3.24) , it is given by the first term on the r.h .s. of (4.3.17). 0 Remark 4.3.1 Observe that , in view of (4.3.8), n
Whd
n- l / 2
2:)I(l eil ::; gd
!
i=l
=
K(2)
{Hh2
+ x)
- 1/2} ,
- H(-"(2
+ x)}dYlO(x)
n
n- l / 2
L {Hh2 + ed -
H( -"(2
+ e.) -
1/2}.
i=l
Thus, Whl) and K(2) are the sums of bounded independent centered LV .'S and by the L-F CLT one obtains
where
ai .=
Var(Whd)
l n-
n
L {Fihd -
Fi( -"(1)}{1 - Fihl)
+ Fl (-"(I)},
i =l
a~
.-
Var(Kh2)) n-
l
t
![Hh2
+ x)
- H( -"(2
+ x )fdFl (x ) -
(1/4) .
0
i=l
Remark 4.3.2 If {F i } are all symmetric about zero, then from (4.3.25), (4.3.16) and (4.3.17) , it follows that the asymptotic distribution of
51
and
52
does not depend on the initial estimator
{3. In fact , in this case we can deduce that
(4.3.26)
/3 of
121
4.4. R-ESTIMATORS where
n
h(x)
:=
n- 1
L h(x) .
o
i= 1
Remark 4.3.3 i.i.d. case. In t he case F,
== F , t he asy mptotic
dist ribut ion of 8 1 depe nds on 13 unl ess F is symmetric around zero. However , t he asy mptotic dist ributi on of 82 does not depend on 13. This is so becau se in t his case t he coefficient of v in (4.3.17) is
n - 3 / 2 ~ ~(Xi ~
X j )' A
J[j b 2 + x ) - f( -,2
+ x )]dF (x ) =
0.0
J
T hat t he asy mptotic dist ribution of 82 is independent of 13 is not surpris ing because 82 is essent ially a symmetrized variant of 8 1. We summarize t his property of 8 2 as Corollary 4.3.1 If in the model (1.1.1), Fni == F, F satisfies (Fl) , (F2 ) and X satisfies (NX ), then T 2- 1 n 1/ 2(82- ' 2) -7d N(O , 1), where
now J[F(g2 + x ) - F(
-,2 + x )j2dF (x ) - 1/ 4
(J fb2 + x)dF(x))2 Not e t hat ,2 is now the med ian of t he distribution of leI Also, observe t hat t he cond it ion (4.3.14) now is equivalent to
[P (F (e1
sup
-
y)
0::;8::;1-8
e21 ·
~ 8 + 8) - P (F (el - y) ~ 8)] -7 0,
as 8 -7 0, VY E JR, which is implied by t he ass umptions on F .
4.4
R-Estimators
Consider t he model (1.1.1) and t he vecto r of linear rank statistics n
(4.4.1)
T (t ) := A L (Xni - xn ) cp ( ~tI (n + 1)), t E W , i= 1
o
4. M, R and Some Scale Estimators
122
where Al is as in (4.2.12) and Rit is the rank of Yni - x~it among {Ynj - Xnjt , 1 ~ j ~ n} . One of the classes of R-estimators of f3 is defined by the relation p
(4.4.2)
i~f IT(t)ll = IT(.8l)ll =
L
ITj(t)1 = 0,
j=l
T j being the lh component of T of (4.4 .1).
The estimators .81 were initially studied by Adichie (1967) for the case p = 1 and by J ureckova (1971) for p 2': 1. Another class of R-estimators can be defined by the relation (4.4 .3)
inf IIT(t)1I = IIT(.8)II · t
Yet another class of estimators , introduced by Jaeckel (1972) , is defined by the relation (4.4.4) n
J(t)
:=
L(Yni - x~ it)
Jaeckel (op .cit .) showed that for every observation vector (Yl , . . . , Yn ) and for every n 2': p, L~=l
°
°
4.4. R-Estim ators
123
where Tj (t ) is a Td(cp , u ) - statistic of (3. 1.2) wit h
x;
(4.4.5)
Yni - x~d3 ,
u = A
1
1
(t - (3 ),
a(j)(Xni - x n ), Cni = A l (Xni i" column of A I , 1 ~ j ~ p.
X n ),
1 ~ i ~ n;
T hus sp ecializing T heorem 3.2.4 to t his case readily yields
Lemma 4.4.1 Suppos e that (1.1.1) holds with Fni as a d.f. of eni , 1
~
i
~
n. In addition, assume that (X~Xc ) -1
(N X c)
exists for all
n 2: p ,
maxI::;i::;n(Xni - xn )'X~Xc ) - I (Xni - x n ) = 0(1).
A bout {Fni} assume that H is stri ctly increasing for each nand that (2.3.5) , (3.2.12) , (3.2.34), (3.2.35) hold and that for all j = 1""
, p,
(4.4.6)
lim lim sup
8 ~0
where f or
0
~
n
S
~
sup
[Lj( s
0::; s9 - 8
1, 1
~
+ 8) -
j ~ p,
n
L j( s)
2
L
:=
L j( s)] = 0,
( a(j)( Xni - X n )) Fni(H-I( s)) .
i= l
Th en, for every 0 < b < (4.4.7)
00 ,
IIT(t ) - T ((3 )
sup
+ K nA1 I (t -
(3 ) 11 = op( l)
K n := Al
n
rL
Jo i = 1
(Xni - xn( S))( Xni - x n)' qni (s )dcp(s )AI n
x n(s ) := n- I L
Xnif ni(S),
i= 1
o
4. M, R and Some Scale Estimators
124
In order to prove the asymptotic normality of 132, we need to show that IIA1 I(132 - ,8)11 = Op(l). To this effect let J-L
.-
S .-
Al t(Xni i== 1 T(,8) - J-L.
xn )
J
Fni(H-I)drp ,
Observe that the distribution of (132 -,8) does not depend on,8 , even wh en {end are not identically distributed.
Lemma 4.4.2 In addition to the assumptions of Lemma
4.4.1
sup-
pose that
liS + J-LII = Op(l) ,
(4.4 .8) (4.4 .9)
lim inf inf le'KneJ > ex for an ex n liell==1 -
(4.4 .10)
K~I ex is ts for all n 2': p ,
Th en , for eve ry
E
> 0,0 < z < 00 ,
> 0,
IIK~111 = 0(1).
there ex is t a 0
< b < 00
it and N,
such tha t
(4.4 .11)
P( inf IIT(AIu + ,8)112': z ) 2': 1 Ilull >b
E,
n 2': Ns,
Proof. Fix an E > 0,0 < z < 00 . Without loss of generality assume ,8 = O. Observe that by the C-S inequality inf IIT(A Iu)11 22': inf (e'T(rA Ie))2. lI ull>b Ilell==I,lr l>b thus it suffices to prove that there exist a 0 that
(4.4 .12)
P (
inf (e'T(rA Ie))2 2': Ilell==I ,lrl>b
z) 2':
1-
00
E,
and N, such
n>n..
Let , for t E II~.P , T(t) := T(O) - KnA1It , so that, by (4.4.7) for every 0 < b < 00 ,
(4.4 .13)
sup le'T(rAIe) - e'T(rAIe)1 = op(l) . Ilell==I ,lrISb
125
4.4. R-Estimators But e'T (rAIe)
= e' (S + p,) -
e'Kner.
By (4.4.8), t here exist a K E and an N I < such that
Choose b to sat isfy I, b ~ (K•, + z l/2)",u.
(4.4.14)
a
. (4.4.9) . as In
Then (4.4.15)
P (
inf (e'T(rA Ie))2 Ilell=I ,lr l>b
~ z)
> P (liS + p,11 :S _ zl /2 + b inf le'Kn el) lI ell=1
> P (liS+ ILII :S K t )
~ 1-
E/ 2, \fn
~
NI, ·
Therefore by (4.4.13) ad (4.4.15) t here exist N, and b as in (4.4.14) such t hat (4.4.16) But
where di = e'Al (Xi-X) , Rir is the rank of rj - r (xi-x)AIe. But such a linear rank statistic is nondecreasing in r , for every e . See, e.g., Hajek (1969; Theorem 7E , Chapter II) . This together with (4.4.16) 0 enables one to conclude (4.4.12) and hence (4.4.11). Theorem 4 .4.1 Supp ose tha t (1.1 .1) holds an d th at th e design matrix X an d th e error d.j. 's { Fn i } satisfy th e assumptions of Lem mas 4·4 · 1 and
4·4·2 above.
Th en
(4.4.17) Proof. Follows from Lemmas 4.4.1 and 4.4.2.
o
4. M, R and Some Scale Estimators
126
Remark 4.4.1 Arguing as in J aeckel combined with an argument of Lemma 4.4 .2, one can show t hat II A l (.82 - .83)1/ = Op(l) . Con-
1
sequently, under t he condit ions of Lemmas 4.4 .1 an d 4.4 .2, .82 and th e Jaeckel estimator .83 also satisfy (4.4.1 7). Remark 4.4.2 Cons ider the case when Fn i
0
== F, F a d.f. satisfying
(F l) , (F 2) . Then J.L = 0 and S = T ({3 ). Moreover, under (N X c ) all ot her ass umptions of Lem mas 4.4 .1 and 4.4.2 ar e a priori satisfied. Note t hat here
Moreover , from Theorem 3.4 .3 above, it follows that S converges in distribut ion to a
N( o, a~ Ip xp) ,
a~
=
LV . ,
where
t' r.p (u )du ) 2 . Jt'o r.p2(u )du - (Jo
We summar ize t he above d iscussion as Corollary 4.4.1 Suppo se that (i .i .i ) with Fn i
== F holds. Suppos e
that F and X satisfy (Fl ) , (F2 ), and (NX c ) . In addit ion, suppose that r.p
is nondecreasing bound ed on [0, 1] and
J f dr.p (F ) > O.
Then
A 1l (.82 - (3 ) =
(J
f dr.p (F )) - IT({3)
+ op(l) .
Hence,
T his resul t is qui te general as far as t he conditio ns on the design mat rix X and F are concerned but not that general as far as t he score function ip is concerne d . 0 Remark 4.4.3 Robustness against heteroscedastic gross errors . F irst we give a working definit ion of qualitative rob ustness .
Estim ation of QU )
127
Consider the model (1.1.1). Suppose that we have modeled the errors {eni' 1
~
i ~ n } to be i.i.d. F whereas their actua l d.f. 's ar e
{Fni , 1 ~ i ~ n }. Let p n := TI ~=l F, Qn := TI ~=l Fni denote the cor respond ing product probability measure. Definition 4.4.1. A sequence of est imators 13 is sa id to be qualit atively robust for f3 at F against Qn if it is consistent for f3 under P " and un der those Qn that sat isfy 'Dn
:=
max i SUPy IFni (Y) - F( y)1 -t O.
The above definition is a variant of that of Hampel (1971). On e could use the notions of weak convergence on product probability spaces to give a bit more general definition. For example we could insist that the Prohorov dist an ce between Qn and P " should tend to zero inst ead of requiring 'Dn -t O. We do not pursue this any further here. The result (4.4.17) can be used to st udy t he qu ali tativ e robust-
132
ness of
against certain heteroscedasti c errors. Consider, for ex-
amp le, t he gross errors model where , for some 0
~
6n i
~
1, wit h
maxi bni -t 0,
and , where G is d .f. having a uniformly cont inuous a.e. positive density. If, in addit ion, {6n d satisfy n
(4.4.18)
II A1 I)Xni - Xn)6nill =
0 (1 ),
i=l
t hen one can readily see that IIK;l ll = 0(1 ) and IIILII = 0(1). It follows from (4.4.17) t hat 132is qualitatively robust against the above het eroscedastic gross errors at every F that has uniformly continuous a.e. positiv e density. Examples of bni satisfyin g (4.4.18) would be
6n i == n- 1/ 2 or 6n i = p- l /2I1 At{x n i
-
xn )lI ,
1 ~ i ~ n.
It may be arg ued t hat the lat ter choice of contaminating prop ortions
{bni } is more natural to linear regression t ha n t he former. A similar remark is applicable to 131and 133' 0
4. M, R and Som e Scale Estimators
128
4.5
Estimation of Q(f)
Consid er the mod el (1.1.1) with Fni de nsity J on JR. Define (4.5.1)
QU ) =
JJ
F , where F is a d .f. with
drp(F ),
where sp E C of (3.2.1). As is seen from Corollary 4.4.1, the paramet er Q appears in the asy mptotic vari ance of R - est imato rs. The comp lete rank an alysis of the mod el (1.1.1) requires an est imate of Q. This est imate is used to standardize rank test stat ist ics when carrying out the ANOVA of linear models using J aeckel' s dispersion .J of (4.4 .4). See, for example, Hettmansp erger (1984) and refer ences t herein for the rank based ANOVA . Lehmann (1963) and Sen (1966) give est imators of Q in t he one and two sa mple location mod els. These est imators are given in te rms of lengths of confide nce interva ls based on linear rank statistics. Cheng and Serft ing (1981) discuss several est imators of Q when obs ervations are i.i.d . F , i.e., when t here are no nuisance pa rameters . Some of t hese est imators ar e obtained by replacin g J by a kernel ty pe dens ity estimator and F by an emp irical d. f. in Q . Schewede r (1975) disc usses similar estimates of Q in t he one sa mple location mod el. In this sect ion we discuss two ty pes of est imators of Q . Both use a kernel typ e density est imator of J based on t he residuals and the ordinary residual em pirical d. f. to estimate F. The d iffer ence is in the way the window widt h and the kernel are chosen. In one the window width is parti ally based on the data and is of the orde r of square root of n and the kernel is the histogram typ e whereas in the ot her the kernel and the window width are arbit rary. It will be observed t hat the AUL result abo ut the residual empirical process of Corollary 2.3.5 is t he basic too l needed to prove t he consistency of t hese est imat ors. We begin with th e class of est ima tors wher e the wind ow wid th is partly based on the data. Define (4.5.2)
p(y)
:=
J
[F(y
+x) -
F (- y + x )]drp (F (x )),
y
~ O.
4.5. Estimation of Q(J )
129
Since cp is a d.f., p(y) = P (le- e*1 ::; y), where e, e* ar e indep endent r.v .'s with respective d.f.'s F and cp(F) . Con sequ ently, under (Fl ), t he densi ty of pat 0 is 2Q . This suggests that an estimate of Q can be obtained by est imating t he slope of p at O. Recall the definiti on of t he residual empirical pr ocess H n (y, t ) from (1.2.1). Let fJ be an est imato r of f3 and define (4.5.3) A natural est imator of p is obtained by substituting
it;
for F in p ,
v.i.z. ,
Let - 00 = e(O) < e( l ) ::; e(2) ::; . . . ::; ern) < e (n + l ) = 00 denote t he ordere d residuals {ei := Yi - x~fJ , 1 ::; i ::; n}. Since cp(H n ) assigns mass {cpUI n ) - cp(j - 1) In))} to each e(j ) and zero mass to each of the intervals (e (j - l ) ' e(j) ), 1 ::; j ::; n + 1; it follows that 'Ii y E JR, (4.5.4)
Pn (y) n
L {cpU In) -
cpU - 1)l n ) }[Hn(y + e(j) ) - H n( - y + e(j))]
i= l
From t his one sees t hat Pn (y) has the following int erpret ation. For each j, one first computes the proportion of {e(i) } falling in th e interval [-y + e(j), y + e(j) ] and then Pn(y) gives the weight ed average of such proportions. Formula (4.5.4) is clearly suitable for computations . Now, if {h n } is a sequence of positive numbers te nding to zero , an est imator of Q is given by
T his estimator can be viewed from t he density est ima tio n po int of view also. Consider a kern el-type density est imator In of I base d on
4. M, R and Some Scale Estimators
130 the residuals {ei}:
n
fn(x)
:= (2nh n)-1
L 1(lx - eil :S hn ), i=l
which uses the window wn(x) = (1/2) . 1(lxl :S h n) . Then a natural estimator of Q is
Scheweder (1975) studied the asymptotic properties of this estimator in the one sample location model. Observe that in this case the estimator of Q does not depend on the estimator of the location parameter which makes it relatively easier to derive its asymptotic properties. In Qn, there is an arbitrariness due to the choice of the window width h n . Here we recommend that h n be determined from the spread of the data as follows . Let 0 < 0: < 1, t~ be o-th quantile of Pn and define the estimator Q~ of Q as (4.5.5) The quantile t~ is an estimator of the o-th quantile to: of p . Note that if cp(s) := s, then to: is the o:-th quantile of the distribution of leI - e21 and t~ is the o-th quantile of the empirical d .f. Pn of the LV . 'S {lei - ejl , 1 :S i,j :S n}. Thus, e.g., t~ = 82 of (4.3.5) . Similarly, if cp(8) = 1(8 ~ 0.5) then to: (t~) is o-th quantile of the d.f. of lei I (empirical d.f. of leil, 1 :S i :S n). Again, here t:/! would correspond to 81 of (4.3.5) . In any case , in general, to: is a scale parameter in the sense of Bickel and Lehmann (1975) . Note that the estimator Q~ is location and scale invariant in the sense of (4.3.3) and (4.3.4) , as long as is location and scale invariant in the sense of (4.3.1) and (4.3.2). This follows from the fact that under (4.3.1) and (4.3.2), Pn(Y, aY + Xb) = Pn(y/a , Y), t~(aY + Xb) = at~(Y) , for all y E~, a > 0, bE W. The consistency of Q~ is asserted in the following
/3
4.5. Estimation of QU)
131
Theorem 4.5.1 Let (1.1.1) hold with Fni (NX), (Fl) and (F2), assume that
/3
==
F.
In addition to
is an estimator of (3 satis-
fying (4.3.15). Then, (4.5.6)
sup IQ~
- QU)I =
op(1) .
The proof of (4.5.6) will be a consequence of the following three lemmas.
Lemma 4.5.1 Under the assumptions of Theorem 4.5.1, V 0 ::; a < 00
(4.5.7) Consequently, V 0 ::; a
< 00 ,
(4.5.8)
Proof. We shall apply Corollary 2.3.5 . Let
V= A -1(/3 - (3),
n
b~ = n- 1 / 2 LX~iA . i=1
Then, from (2.3.43) , (4.5.3) and (4.3.15) , we obtain (4.5.9) where n
Hn(y) == Hn(y , (3) == n - 1
L I(eni ::; y) ,
y E lit
i=1
Also , we will use the notation (2.3.1) with u = (4.5.10)
1 2 d n~. = - n- / ,
X' n~ -
"\7 . J. n~
X'ni(3'
°
and
Fn~· = - F.
Then Yl(t ,O) == n 1 / 2 [H n (F - 1 (t )) - t], 0 ::; t ::; 1. Write Y1 (-) for Y1 (-,0). Now, (4.5.9) and cp bounded imply that,
4. M, R and Some Scale Estimators
132
n 1 / 2 {Pn(Y) - p(y)} 1 2
n / /
{Hn(y
+ x)
- H n( -Y + x)}dip(Hn(x))
+ b~v /[J(Y+X)
- f(-y+x)]dip(Hn(x))
- n 1 / 2 p(y) + up (l )
+ R n2(y) + R n3(Y) + up ( l ),
Rnl(Y)
(4.5.11)
where u p (l ) stands for a sequence ofrandom processes that converge to zero, uniformly in Y E JR, ip E C, in probability, and where
R n1(y)
/{Y1(F(Y+X)) - Y1(F(-y+x))}dip(Hn(x))
b~v /[J(Y+x)-f(-y+X)]dip(Hn(X)) 1 2
n / { / [F(y
- /[F(Y
+ x)
- F( -y + x)]dip(Hn (x))
+ x)
- F( -y + x)]dip(F(x))},
for Y E R From (Fl) , (F2), the boundedness of ip ; and the asymptotic continuity of Y1 , which follows from Corollary 2.2.1, applied to the quantities given in (4.5 .10) , we obtain, with k = 2allflloo , (4.5.12)
IRn t{ n - 1 / 2 z)1 <
sup 0:Sz:Sa,
sup
1Y1(t)-Y1(s)1
It-sl :Skn - 1 / 2
op(l) . Again, (Fl) and the boundedness of ip imply, in a routine fashion, that (4.5 .13)
sup
IRn l ( n - 1/ 2 z)1=
op(l) .
0:Sz:Sa,
Now consider R n3. By the MVT, (Fl) and the boundedness of .p, the first term of R n 3(n - 1/ 2z ) can be written as
133
4.5. Estimation of Q(J)
where {~xzn} ar e real numbers su ch that I~xzn - z ] ::; an - 1 / 2 . Do the sa me with the second integral and put two to gether to obtain
Rn 3(n - 1/ 2z ) =
2z { / fd
fd
- /
+ u p (l )
{1 [q (FH;l (t )) - q(t )]d
2z
u p( l) .
But, (4.5.14) sup IFH;l(t ) °9 :::;1
tl <
n- 1
+ sup IHn(y) -
F( y)1
y
=
op(l) ,
by (4.5.9) and the Glivenko - Cantelli Lemma. Hen ce, q being uniformly continuous , we obtain
This together wit h (4.5.11) - (4.5.14) completes t he proo f of (4.5.7) whereas t hat of (4.5.8) follows from (4.5. 7) and the fact In 1/ 2p(n- 1/ 2 z ) - 2z Q (J )1 -+ 0,
sup 0:::; z :::; a ,ep EC
which in t urn follows from t he uniform cont inuity of
f.
o
Lemma 4.5.2 Und er the assump tions of Th eorem 4.5.1, Vy ~ 0,
sup IPn(Y) - p(y)1 = op(l ). ep EC
Proof. Proceed as in the proof of the pr evious lemma to rewrite
where Anj = n - 1 / 2 Rnj , j = 1,2 ,3 , with Rnj defined at (4.5.11). By Corollary 2.2.2 applied to the quantities given at (4.5.9), IJYliioo = Op(1) and hen ce f , sp bounded t rivially imply that sup ep EC,y?, O
IAnj(y)1 = op( l) , j = 1, 2.
4. M, R and Some Scale Estimators
134 Now, rewrite A n 3 (y) =
[ / F(y
+ x )dep (Hn(x )) -
/ F( y + x )dep (F (x ))]
- [ / F (- y + x )dep (Hn(x )) - / F (- y+ x )dep(F (x ))] An(y)
+ A n( - V),
say
Bu t , V Y E IR,
1 1
An(y)
= =
{ F (Y + F - 1 (FH;;I (t ))) - F(y
+ F- 1 (t ))} dep(t)
op(l ),
because of (4.5.14) and because, by (F l) and (F 2) , V y 2: 0, F (y + F - 1 (t )) is uniformly continuous function of t E [0, 1]. 0 Lemma 4.5.3 Und er the conditio ns of Th eorem 4.5 .1, "Y'
> 0,
E
Proof. Ob serve t hat t he event
[Pn(1 - E)t
Q )
< 0: :S Pn(l + E)t ~
[(1 - E)t
Q
Q ) ]
:s t~ :s (1 + E)t
Q ] •
Hence, by two applications of Lemma 4.5.2, once wit h y = (1 + E)t and once wit h y = (1 - E)t we obtain that
Q ,
Q
,
lim inf P (l t ~ - t Q I :S Et Q , V sp E C) n
> P(p( l - E)t =
Q )
< 0: :S p(( l + E) t
Q ) ,
Vep E C)
o
1.
Proof of Theorem 4.5.1. Clearly, V ep E C,
I Q~ - Q(f)1
=
(2t~)- 1 I nl /2Pn (n- l/2 t~ ) - 2t~Q(f)I ·
By Lemma 4.5.3, V E > 0,
P (o < t~ :S (1 + E)t
Q ,
Vep E C)
----1
1.
Hence (4.5.6) follows from (4.5.8) applied wit h a = (1 + E)t 4.5.3 and t he Slutsky Theorem .
Q ,
Lemma 0
4.5. Estimation of Q(J)
135
Remark 4.5.1 The estimator
Q~
shifts the burden of choosing the
window width to the choice of a . There does not seem to be an easy way to recommend a universal a . In an empirical study done in Koul, Sievers and McKean (1987) that investigated level and power of some rank tests in the linear regression setting, a to be most desirable.
= 0.8 was found 0
Remark 4.5.2 It is an interesting theoretical exercise to see if, for
some 0 < <5 < 1, the processes {n1 /2(Q~ - Q(J)), <5 ::; a ::; 1 <5} converge weakly to a Gaussian process . In the case cp(t)
== t ,
Thewarapperuma (1987) has proved under (F1) , (F2) , (NX) , and (4.3.15), that V fixed 0 < a < 1, n1 /2(Q~ - Q(J)) -td N(0 ,a 2), where a 2 = 16{J f3(x)dx - (J f2(x)dx)2}.
0
{t~ ,
(p E C} provides a class
Remark 4.5.3 As mentioned earlier,
of scale estimators for the class of scale parameters {to: , cp E C}. Recall that
81
and
82
of (4.3.5) are special cases of these estimators.
The former is obtained by taking cp(u) == I(u :::: 0.5) and the latter by taking cp(u) == u . For general interest we state a theorem below , giving asymptotic normality of these estimators. The details of proof are similar to those of Theorem 4.3 .1. To state this theorem we need to introduce appropriately modified analogues of the entities defined at (4.3.13) : K 1(y)
.-
K 2(y ) .-
K(y) where
.-
J J
[Y10(y + x) - Y10( -y + x)]dcp(F(x)), Y10(X)[J(y
+ x)
- f( -y + x)]{f(x)} - l dcp (F (x )),
K 1 (y) - K 2(y) ,
Y :::: 0,
yt is as (4.3 .13) adapted to the i.i.d . errors setup. It is easy to
check that K(tO:) is n- 1 / 2x { a sum of i.i.d. LV.'S} with EK(tO:) and 0 < (aO:)2:= Var(K(tO:)) < 00, not depending on n.
== 0
0
Theorem 4.5.2 In addition to the conditions of Theorem 4.5.1, assume that either cp( t) = I (t :::: u), 0
< u < 1,
fixed or sp is uniformly
136
4. M, R and Some Scale Estimators
differentiable on [0, 1] . Then ,
\:j
°< a < 1,
n1 /2(t~ - t a ) -+d
(v a )2 :=
((ja)2
J
{t a
[J(t a
N(o , (zp)2) ,
+ x) + f( _t a + X)]dCP(F(X))} -2
0
We now turn to the arbitrary window width and kerneltype estimators of Q. Accordingly, let K be a probability density and {ed be as on JR., h n be a sequ ence of positive numbers and before. Define, for x E JR.,
i3
in(x)
Qn
.:=
_1
nhn
J
'tK(-e i ) , .
l=l
hn
n
x - ei fn( x):= nh LK(-h-) ' 1 '""'" n i=l
n
in (x)dcp(iI n(x)).
Theorem 4.5 .3 Assum e that the model {i .l.l} with Fni == F holds.
In addition, assume that (FI), (F2) , (NX) and (4.3.15) and the following hold: (i) h n > 0, h n -+ 0, n 1/2 h n -+ 00. (ii) K is absolutely continuous with its a.e. derivativ e K satisfy ing
J KI
< 00 .
Then, (4.5.15)
sup IQn - Q(/) I = Op(l).
Proof. First we show in approximates f. This is done in several steps. To begin with, summation by parts shows that
so that ~ - f nlloo ~ (n 1/2 h -1 . lin 1/2 (H~ - H Ilfn n) n n)lIoo .
Henc e, by (4.5.9) and the fact that (4.3.15) , it readily follows that
lIin - f nll oo
Ib~
vi
JIKI. ·
= Op(l) guaranteed by
= Op(n1/ 2hn )- 1) = op(1) .
137
4.5. Estimation of QU) Now, let
Yn(x)
:=
h;;l
J
K((x - y)jhn)J(y)dy.
Note that integration by parts shows that
so that
Ilfn -Ynlloo
(4.5.16)
:S (nl /2hn)-11Inl/2[Hn - F]lIoo op(1),
=
by (i) and by the fact that Iln 1 / 2(Hn
llYn -
JIKI
flloo:S
-
F)lloo =
Op(1). Moreover,
sup If(y) - f(x)1 = 0(1), Iy- xl::;h n
by (Fl).
Now, consider the difference
a. - QU) Let q(t)
=
J
=
Dn1 + D n2, say .
= f(F -l(t)) .
(in - J)dcp(Hn) +
J
fd[cp(Hn) - cp(F)]
Then
sup IDn21:S sup Iq(F(H;l(t))) - q(t)1 = op(1) O::;t::;l
~EC
by the uniform continuity of q and (4.5.14) . Also, from the above bounds we obtain sup IDn11 :S ~EC
thereby proving (4.5.15).
Ilin - flloo
= op(1) ,
5 Minimum Distance Estimators 5.1
Introduction
The practice of obtaining estimators of parameters by minimizing a certain distance between some functions of observations and parameters has long been present in statistics. The classical examples of this method are the Least Square and the minimum Chi Square estimators. The minimum distance (m .d.) estimation method , where one obtains an estimator of a parameter by minimizing some distance between the empirical d.f. and the modeled d.f., was elevated to a general method of estimation by Wolfowitz (1953, 1954, 1957). In these papers he demonstrated that compared to the maximum likelihood estimation method, the m.d. estimation method yielded consistent estimators rather cheaply in several problems of varied levels of difficulty. This methodology saw increasing research activity from the mid 1970's when many authors demonstrated various robustness properties of certain m.d. estimators. Beran (1977) showed that in the i.i.d. setup the minimum Hellinger distance estimators, obtained by minimizing the Hellinger distance between the modeled parametric density and an empirical density estimate, are asymptotically efficient at the true model and robust against small departures from the model, where the smallness is being measured in terms of the H. L. Koul, Weighted Empirical Processes in Dynamic Nonlinear Models © Springer-Verlag New York, Inc 2002
5.1. INTRODUCTION
139
Hellinger metric. Beran (1978) demonstrated the powerfulness of minimum Hellinger distance estimators in the one sample location model by showing that the estimators obtained by minimizing the Hellinger distance between an estimator of the density of the residual and an estimator of the density of the negative residual are qualitatively robust and adaptive for all those symmetric error distributions that have finite Fisher information. Parr and Schucany (1979) empirically demonstrated that in certain location models several minimum distance estimators (where several comes from the type of distances chosen) are robust. Millar (1981, 1982, 1984) proved local asymptot ic minimaxity of a fairly large class of m.d. estimators , using Cramer - von Mises type distance, in the i.i.d. setup . Donoho and Liu (1988 a , b) demonstrated certain further finite sample robustness properties of a large class of m.d. estimators and certain additional advantages of using Cramer - von Mises and Hellinger distances. All of these authors restrict their attention to the one sample models or to the two sample location model. See Parr (1981) for additional bibliography on m.d.e. through 1980. Little was known till the late 1970's about how to ext end the above methodology to one of the most applied models , viz., the multiple linear regression model (1.1.1) . Given the above optimality properties in the one and two-sample location models, it became even more desirable to extend this methodology to this model. Only after realizing that one should use the weighted, rather than the ordinary, empiricals of the residuals to define m.d . estimators was it possible to ext end this methodology satisfactorily to the model (1.1.1). The main focus of this chapter is the m.d. estimators of (3 by minimizing the Cramer - von Mises type distances involving various W.E.P. 's. Some m.d. est imators involving the supremum distance are also discussed. Most of the estimators provide appropriate extension of their counterparts in the one - and two - sample location models. Section 5.2 contains definitions of several m.d. estimators. Their finit e sample properties and asymptotic distributions are discussed in Sections 5.3, 5.5, respectively. Section 5.4 discusses an asymptotic
5. Minimum Distance Estimators
140
theory about general minimum dispersion estimators that is of broad and independent interest. It is a self contained section. Asymptotic relative efficiency and qualitative robustness of some of the m.d. estimators of Section 5.2 are discussed in Section 5.6. Some of the proposed m.d. functionals are Hellinger differentiable in the sense of Beran (1982) as is shown in Section 5.6. Consequently they are locally asymptotically minimax (LAM) in the sense of Hajek - Le Cam.
5.2
Definitions of M.D. Estimators
To motivate the following definitions of m.d. estimators of (:J of (1.1.1) first consider the one sample location model where Y1 e,' " ,Yn - e are i.i.d. F , F a known d.f. Let n
(5.2.1)
Fn{y)
:=
n- 1 LI{Yi::; y),
YER
i=l
If e is true then EFn{y + e) = F{y), Vy E lIt This motivates one to define m.d. est imator iJ of () by the relation
iJ = argmin{T{t) ; t E IR}
(5.2.2)
where, for aGE lDI{IR), (5.2.3)
T{t)
:=
n ![Fn{y + t) - F{y)J2dG{y) ,
tER
Obs erve that (5.2.2) and (5.2.3) actually define a class of estimators
iJ, one corresponding to each G. Now suppose that in (1.1.1) we model the d.f. of eni to be a known d.f. H ni' which may be different from the actual d.f. Fni ,l ::; i ::; n. How should one define a m.d. estimator of {:J? Any definition should reduce to iJ when (1.1.1) is reduced to the one sample location model. One possible extension is to define (5.2.4)
/31 = argmin{K 1(t ); t
where , for atE JRP ,
E IRP } ,
141
5.2. D efinitions
If in (1.1.1) we take p = 1, Xni1 == 1 and Hni == F then clearly it reduces to the one sample location mod el and /31 coincides with of (5.2.2). Bu t this is also t rue for the estimato r /3x defined as follows. Recall t he definition of {ltj } from (1.1.2). Let , for y E JR, t E lW,
e
n
(5.2.5)
Zj (Y, t ) .- ltj(y , t ) Kx (t )
where Z'
:=
:=
J
L Xnij Hni(Y),
1 ~ j ~ p,
i= 1
Z' (y, t )(X'X)-1 Z (Y, t )dG(y),
(Z1 ,'" , Zp ) and define
(5.2.6)
/3x = argmin{K x (t ), t E JRP }.
Which of the two est imato rs is the right ext ens ion of e? Since {ltj , 1 ~ j ~ p} summarize t he data in (1.1.1) wit h pr obability one un der t he cont inuity ass umption of {eni' 1 ~ i ~ n} , /3x should In Sect ion 5.6 we shall see be considered t he right extension of t hat /3x is asymptotically efficient amo ng a class of est imators {/3D} defined as follows.
e.
Let D = ((d nij)) , 1 ~ i ~ n , 1 and define, for y E JR, t E JRP,
~ j ~
p , be an n x p real matrix
n
(5.2.7) Vjd(y, t ) .-
L dnij I(Yni ~ y + x~it ),
1~ j
i= 1
Define a class of estima tors, one for each D , (5.2.8)
/3D := argmin{K D(t) , t E JRP}.
If D = n- 1 / 2 [1, 0,' " ,O]nxp t hen /3D = /31 and if D = XA then /3D = /3x , where A is as in (2.3.30) . The above menti oned optimality of /3x is stated and proved in Theorem 5.6a.1 below.
Anoth er way to define m.d. est imators in t he case t he modeled
5. Minimum Distance Estimators
142
error d.j.
'8
are known is as follows. Let , for s E 0,1], Y E
~,
t E ~P ,
ns
(5.2.9)
M( s, y , t} .- n- 1 / 2 L {I (Yni ~ y} - Hni(Y - x~it)}, i=1
1J 1
Q(t)
{M(s , y ,t)}2dG(y}dL( s},
:=
where L is a d.f. on [O,lJ . Define
13 =
(5.2 .10)
argmin{Q(t} , t E W}.
The estimator 13 with L(s} == s is essentially Millar's (1982) proposal. Now suppose {Hnd ar e unknown. How should one define m.d. of /3 in this case.? Again, let us examine the one sample location model. In this case () can not be identified unless the errors ar e symmetric about O. Suppose that is the case. Then the r. v.'s Pi _.(), 1 ~ i ~ n} have the sam e distribution as {- Yi +(), 1 ~ i ~ n} . A m.d. estimator ()+ of () is thus defined by the relation ()+ = argmin{T+(t} , t E ~}
(5.2.11)
where
An extension of ()+ to the model (1.1.1) is Let , for y E ~, t E W , 1 ~ j ~ p, n
(5.2.12) Y j(y ,t)
:=
L
Xnij{I(Yni
/3*
defined as follows:
~ y+X~it}
i= 1
-I(-Yni
y+
K~(t)
.-
.-
< y - X~it)} ,
(V/, ... ,V/ )',
J
Y +(y , t}(X'X} -ly+(y, t} dG(y} ,
and define the est imator (5.2.13)
/3*
= argmin{K:i(t} , t E W} .
5.2. Definitions
143
More generally, a class of m.d. estimators of 13 can be defined as follows. Let D be as before. Define, for y E JR, t E W , 1 ::; j ::; p, n
(5.2.14) Y/(y,t)
2:dnij{I{Yni::;y+X~it)
:=
i=1
- I (- Yni < y - X~i t) },
Yi)
(YI+, . . . , y/ )/,
J
Y D+' {y, t)Y D+' (y, t) dG{y) .
+ KD(t)·and
f3fj by the relation f3fj
(5.2.15)
= argmin{K6{t), t E
JRP} .
Note that 131 is f3fj with D = XA . Next, suppose that the errors in (1.1.1) are modeled to be i.i.d. , i.e. H ni == F and F is unknown and not necessarily symmetric. Here , of course the location parameter can not be estimated. However, the regression parameter vector 13 can be estimated provided the rank of X, is p, where X , is defined at (4.2 .11) . In this case a class of m.d. of (5.2.8) provided we assume that estimators of 13 is defined by
i3D
n
(5.2.16)
2:dnij = 0, i= 1
An interesting member of this class is obtained upon taking D XcA I , Al as in (4.2 .12) . Another way to define m.d. est imators here is via the ranks of the residuals. With R it as in (3.1.1) , let n
(5.2.17)
Tj(s , t)
.-
2: dnijI{Rit ::; ns) ,
s E [0,1], 1::; j ::; p,
i= 1
Ki>{t)
:=
J
Tb(s, t) TD{S, t) dL(s) ,
t E JR,
where Tb = (TI, '" ,Tp ) and L is a d.f. on [0,1]. Assume that D satisfies (5.2.16). Define (5.2.18)
f3b
= argmin{Ki> (t), t E W}.
5. Minimum Distance Estimators
144
Observe that {,Bn}, {,Bi;} and {,B} are not scale invariant in the sense of (4.3.2). One way to make them so is to modify their definitions as follows. Define , for t E lRP , a 2: 0,
J
P
(5.2.19) Kn(a , t)
.- L
[Vjd(ay, t) -
j=l
Ki)(a , t)
n
L dnijHni(y)fdG(y) , i=l
J
+ Y n+' (ay, t)Yn(ay, t)dG(y).
:=
Now, scale invariant analogues of ,Bn and ,Bti are defined as , 0
(5.2.20)
.- argmin{Kn(s , t) , t E lRP } ,
,Bn
,Bti° .-
argmin{Kn(s , t), t E lRP } ,
where s is a scale estimator satisfying (4.3.3) and (4.3.4) . One can modify {,B} in a similar fashion to make it scale invariant. The class of est imators {,Bn} is scale invariant because the ranks ar e. Now we define a m.d. estimator based on the supremum distance in the case the errors are correctly modeled to be i.i.d. F, F an arbitrary d.f. Here we shall restrict ourselves only to the case of p = 1. Define n .(5.2.21) Vc(y , t) t, Y E lR, I:( Xi - x)I(Y; ::; y + txd , i= l
V~(t)
.-
sup{Vc(Y, t) ; Y E lR} ,
V;;(t)
.-
- inf{Vc(Y, t) ; y E lR} ,
V n (t)
.-
max{V~(t), V;;(t) ,V;;(t)}
sup{lVc(Y, t)l ; y E lR} ,
t
E lR.
Finally, define the m.d. estimator
/38
(5.2.22)
:=
argmin{Vn(t); t E lR} .
Section 5.3 discusses some computational asp ects including the existence and some finite sample properties of the above estimators . Section 5.5 proves th e uniform asymptotic qu adraticity of K n, Ki) , K and Q as processes in t. These results are used in Section 5.6 to study the asymptotic distributions and robustness of the above defined estimators.
n
5.3. FINITE SAMPLE PROPERTIES
5.3
145
Finite Sample Properties
The purpose here is to discuss some computational aspects, the existence and the finite sample properties of the four classes of estimators introduced in the previous section. To facilitate this the dependence of these estimators and their defining statistics on the weight matrix D will not be exhibited in this section. We first turn to some computational aspects of these estimators. To begin with, suppose that p = 1 and G(y) = y in (5.2.7) and (5.2.8). Write /3, Xi , di for 13, XiI, dil, respectively, 1 ~ i ~ n. Then
(5.3.1)
K(t)
~
J[t
d;{I(Y;
r
~ Y + Xit) - Hi(yll
I=I=didj! {I(Yi i=l j=l
~ Y+Xi t) -
x {I(Yj
dy
Hi(Y)}
~ Y + Xjt) - Hj(Y)} dy.
No further simplification of this occurs except for some special cases. One of them is the case of the one sample location model where Xi == 1 and Hi == F, in which case
K(t)
~ J[tdi{I(Y; ~ y) - F(y - Illr dy.
Differentiating under the integral sign w.r.t. t (which can be justified under the sole assumption: F has a density f w.r.t . ,X) one obtains
K(t)
=
2! I=
di{I(Yi
~ Y + t) -
F(y)}dF(y)
i=l
n
=
-2
L di{F(Yi -
t) - 1/2}.
i=l
Upon taking d; == n- 1! 2 one sees that in the one sample location model Bof (5.2.2) corresponding to G(y) = y is given as a solution of n
(5.3.2)
L i=l
F(Yi - B) = n/2.
146
5. Minimum Distance Estimators
Not e that this {) is pr ecisely the m.l. e. of () when F (x ) = {I exp( - x )} - 1, i.e., when the errors have logistic distribution! Another simplification of (5.3.1) occurs when we assume
+
n
and ~
Fix atE JR and let c := max{ Yi - x it; 1
f [;
(5.3.3) K (t )
i ~ n }. Then
diI (Y; :S Y + Xi tl ]' dy
tt J didj
I [ max(Yj
-
Xjt , Yi - Xit )
i=1 j=1
~ Y < c] dy n
=
-
n
L L di dj max (Y
j -
Xjt, Yi - Xit ).
bl,
a, s « JR,
i=1 j=1
Using t he relati onship (5.3.4)
2 ma x(a , b) = a + b+
la -
and t he assumption L ~=1 d; = 0, one ob tains (5.3.5)
K (t )
=- 2 L L 1 ~i
If di
= Xi -
did jlYj - Yi - (Xj - xi )t l·
X in (5.3.5) , th en the correspo nding ~ is asy mptoti-
cally equivalent to the Wilc oxon typ e R-estimat or of f3 as was shown by Willi am son (1979). This result will also follow from the general asymptotic theory of Secti ons 5.5 and 5.6 below. If d; = Xi - X , 1 ~ i ~ n, and Xi = 0, 1 ~ i ~ r; Xi = 1, r + 1 ~ i ~ n t hen (1.1.1) becomes the two sample location mod el and r
K (t ) = - 2
n
L L
IYj - Yi - tl +
a
LV.
constant in t .
i=1 j=r+1
Consequent ly here ~ = m ed{I Yj - Yi l, r + 1 ~ j ~ n , 1 ~ i ~ r} , t he usual Hodges - Lehmann estimator. The fact t hat in t he two
5.3. Finite sampl e properties
147
sample location model the Cr amer - von Mises type m.d. est imator of the location paramet er is the Hodges - Lehmann est imator was first noted by Fine (1966) . Note that a relation like (5.3.5) is true for general p and G. That is, suppose that p ~ 1, G E VI { ~) and (5.2.16) holds, then \j t E JRP, (5.3.6) K {t ) p
=
-2:L:L:L dij dkjIG {(Yk - x~t) _ ) - G( (Yi - x~t )_)I· j=1 19 < k~ n
To prove this pro ceed as in (5.3.3) to conclude first that p
K(t) = -2:L:L :L dijdk jG(max(Yk - x~t , Yi - x~t)_) j=1 1 ~ i < k~ n Now use the fact t ha t G(aVb)_ ) = G(a_ )VC{L) , (5.2.16) and (5.3.4) to obtain {5.3.6}. Clearly, this formula can be used to compute /3 in genera l. Next , consider K +. To simplify the expos it ion , fix a tE ~p and let r i := Yi - xit , 1 :::; i :::; n j b := max j r, - r i j 1 :::; i :::; n} . Then from {5.2.14} we obtain
Ob serve that the int egrand is zero for y > b. Now expand the qu adratic and integrate term by term, noting that G may have jumps, to obtain p
n
n
:L:L:L dijd kj{ 2G(r i V - rk)} - 2J (ri ) j=1 i=1 k=1
where J (y ) := G{y } - G{y_ ), t he jump in G at y E JR. On ce aga in use t he fact that G(aVb ) = G{a)VG {b ), {5.3.4} , t he invari an ce of the double sum under permutati on and t he definiti on of {rd to conclude
148
5. Minim um Distance Estim ators
that
(5.3.7) K +(t) n
p
=
n
L L L dij dkj [!G( Yi -
- G( -Yk
+ x~t)1
-~{ I G(Yi - x~t) _ ) -
G(Yk -
j= l i= l k=l
x~t)
+ IG(-
Yi
+ x~ t ) -
x~t)_ ) 1
G(- Yk + x~ t )
- J (Yi -
I}
x~t )] .
Before proceeding further it is convenient to recall at t his time the definition of symmet ry for a GE 1JI( ~) . Definition 5.3.1. An ar bitrary G E 1JI ( ~) , including a o - finite measur e on t he Borel line (~, B), is sa id to be sym m etric aro und 0 if
(5.3.8)
IG(y) - G (x )1 =
vx, Y E R
IG(-x - ) - G(- y_ )1,
Or (5.3.9)
dG (y ) = - dG (-y) ,
Vy E R
If G is cont inuous then (5.3.8) is equivalent to
(5.3.10)
IG(y ) - G(x )1 = IG( - x ) - G( -y) 1,
v x,y E R
Conversely, if (5.3.10) holds th en G is symmetric around 0 and continuous. Now suppose t hat G satisfies (5.3.8). Then (5.3.7) simplifies to
(5.3.11)
K+ (t ) p
n
n
L L L dijdkj [IG(Yi - x~t) - G(-Yk + x~t) 1 j =l i=l k=l
- IG(- Yi
+x~t ) -
G (-Yk +xkt )1
-J(Yi -
x~t)] .
5.3. Finite sample properties
149
And if G satisfies (5.3.10) then we obtain the relatively simpler expression (5.3.12)
K +(t) p
n
n
j==l
i == l
k==l
L L L dijdkj [IG(Yi -
x~t)
-IG(Yi -
- G(-Yk + x~t)1
x~t)
- G(Yk - xkt)IJ·
Upon sp ecializing (5.3.12) to the case G(y) and Xi == 1 we obtain
=
y, p
= 1, di
n- 1 / 2
K+(t) = n -
1
n
n
LL i== l
{IYi
+ Yk -
2tl-IYi - Ykl}
k== l
and the correspond ing minimizer is the well celebrated medi an of the pairwise means {(Yi + Yj )/2 ; 1 ::; i ::; j ::; n}. Suppose we spec ialize (1.1.1) to a complete ly randomized design with p treatments, i.e., take Xij
=
+ 1 ::; i
1,
m j -1
0,
ot herwise,
::; mj ,
where 1 ::; nj ::; n is the l h sa mple size, ma = 0, m j = n1 + ... + nj, l ::; j ::; p , m p = n. Then , upon taking G(y) == y, dij == Xij in (5.3.12) , we obtain K+(t) =
p
nj
nj
j == l
i ==l
k== l
LLL {IYij + Ykj -
2t jl-IYij - Ykjl} , t E W,
where Yij = the i t h observation from t he i" treatment , 1 ::; j ::; p . Consequently, f3+ = (13t ,· ·· ,13: ), where 13t = med{ (Yij + Ykj)2-1,1 ::; i::; k ::; nj} ,l ::; i : p . That is, in a complete ly randomized design with p treatments, f3 + correspo nding to the weights d , = X i and G( y) == y is the vector of Hodges - Lehmann est ima tors. Simil ar remark applies to the randomiz ed blo ck, factorial and other similar designs . The class of est imators also includes t he well celebrate d least absolut e deviat ion (LAD) est imator. To see this, assume that the
e:
5. Minimum Distance Estimators
150
errors are con tinuous. Choose G = 00 - the measure degenerate at 0 in K + , to obtain
K +(t)
t [t, t (t,
d;jsgn( Y; -
r
:s 0) -
l (Y; - x;t
> O)}
><:t))',
w. p .l, V
t E IR".
dij{I (Y; - x;t
Upon choosing d , == Xi , one sees that this expression is precisely the square of the norm of a .e. differential of the sum of absolute devi ations V(t) := L ~=::l lY'; - x~tl , t ], t E JRP . Clearly the minimizer of V( t) is also a minimizer of the ab ove K + (t) . Anyone of the express ions amo ng (5.3.7), (5.3.11) or (5.3.12) may b e used to com pute {3 + for a general G. It is also ap pare nt from the above dis cussion that both class es {,B} and {{3+} include rather int er esting est imators . On the on e hand we have a smooth unbounded G, v.i.z. , G(y ) == y, giving ris e to Hod ges - Lehmann ty pe estimators a nd on t he ot her hand a highly discrete G, v.i.z., G = 00 giving rise to t he LAD estimator. Any distributi on t heo ry should b e gene ral eno ug h to cover both of these cases. We now add ress the qu esti on of the exis tence of these est imat ors in the case p = 1. As befor e wh en p = 1, we write unb old letters for scala rs and di , Xi for di1, X il , 1 ::; i ::; n . Befor e stating the result we need t o define n
L
r(y) :=
I (x i = 0) d, {I(Y'; ::; y) - I( -Y'; < y)} ,
Y E JR.
i=:: l Ar guing as for (5.3.7) we obtain, with b = max{Y';, -Y'; ; 1 ::; i ::; n} , (5.3.13)
J
IfldG n
<
L i= l
<
00 .
I (xi
= 0) Idil [G(L)
- G(Y'; - ) + G(b_ ) - G( - Y';)]
151
5.3. Fini te sample properties Mor eover , directly from (5.3.7) we can concl ude that
Jr
(5.3.14)
2
dG <
00 .
Both (5.3.13) and (5.3.14) hold for all n :2: 1, for every sample {Yi} and for all real numbers {di } . Lemma 5.3.1. A ssum e that (1.1.1) with p = 1 holds. In addit ion, assume that either V 1 :s: i
(5.3.15)
:s:
n,
or (5.3.16)
V1
:s: i :s: n .
Th en a m in imizer of K + exis ts if either Case 1: G (JR) = 00, or Cas e 2: G (JR) < 00 and d; = 0 when ever Xi = 0,1 :s: i :s: n . If G is con tinuous then a minimizer is m easurable. Proof. The pr oof uses Fatou 's Lemm a and t he D.C.T. Sp ecialize (5.2.14) to t he case p = 1 to obtain
K +(t)
r
~ J[t, d;{ I(Y; $ y + x;l) - I( - Y; < y - x;t))
dG( y) .
Let K,+(y , t ) denote t he integrand wit ho ut t he squa re. Then
K,+(y , t ) = r( y ) + K,*(y , t ), where
K,*(y , t) n
=
L I( Xi > O)di{I (Yi :s: y + Xi t ) -
I( - Yi < y - Xi t )}
i= l
n
+ L I (xi < O)di{I (Y; :s: y + Xi t ) - I (-
Y; < y - Xit )}.
i= l
Clearly , V y , t E JR, n
1K,*(y , t)j
:s:
L I (xi -1= O)ldil i= l
=: a, say.
5. Minimum Distance Estimators
152 Hence (5.3.17)
f{y) - a ::; I(+{y , t) ::; f{y)
+a,
V y, t E llt
Suppose that (5.3.15) holds. Then, it follows that Vy E 1(* (y, t)
so that V y E (5.3.18)
~,
-+ ±a as t -+ ±oo,
~,
I(+{y, t) -+ f(y) ± a , as t -+ ±oo.
Now consider Case 1. If a = 0 th en either all X i == 0 or do = 0 for those i for which X i i= O. In either case one obtains from (5.3.14) and (5.3.17) that Vt E ~,K+{t) = Jf 2dG < 00 and hen ce a minimizer trivially exists. If a > 0 then, from (5.3.13) and (5.3.14) it follows that
J
(f(y) ± a)2dG{y) =
00 ,
and by (5.3.17) and the Fatou Lemma, lim inf K+{t) = t---7±OO
00 .
On the other hand by (5.3.7), K +(t) is a finite number for every real t , and hence a minimizer exists. Next , consider Case 2. Here, clearly I' == O. From (5.3.17), we obtain
and hence
K+{t) ::; a2G{~) , Vt E llt By (5.3.18) , I(+(y ,t) -+ ±a, as t -+ ±oo. By the D.C .T. we obtain
thereby proving the existence of a minimizer of K+ in Case 2. The continuity of G together with (5.3.12) shows that K+ is a continuous fun ction on ~ thereby ensuring the measurability of
5.3. Finite sample properties
153
a minimizer, by Corollary 2.1 of Brown and Purves (1973). This completes the proof in the case of (5.3.15). It is exactly similar when 0 (5.3.16) holds, hence no details will be given for that case. Remark 5.3.1. Observe that in some cases minimizers of K+ could be measurable even if G is not continuous. For example, in the case of LAD estimator, G is degenerate at 0 yet a measurable minimizer exists. Also, note that (5.3.15) is a priori satisfied by the weights d i = X i. The above proof is essentially due to Dhar (1991a) . Dhar (1991b) gives proofs of the existence of classes of estimators {,B} and {,8+} of (5.2.8) and (5.2.15) for p 2: 1, among other results. These proofs are somewhat complicated and will not be reproduced here. In both of these papers Dhar carries out some finite sample simulation studies and concludes that both, ,B and d " corresponding to G(y) = Y, show 0 some superiority over some of the well known estimators. Now we discuss ~ of (5.2.10). Rewrite
Q(t) =
n-
1
tt J
x~t)}
{I(Yi < y) - Hi(Y -
Lij
i==l j=l
X {I(Yj
::; y) - Hj(Y - xjt) } dG(y)
where L ij = 1- L(i V j)n- 1 ) , 1 ::; i ,j ::; n. Differentiating Q w.r.t. t under the integral sign (which can be justified assuming {Hd have Lebesgue densities {h i} and some other mild conditions) we obtain n
= 2n-
Q(t)
1
n
LL
Lij
J
{I(Yi ::; y) - (Hi(y -
x~t) }
i=l j=l
xhi(y - xjt) dG(y) Specialize this to the case G(y) integrate by parts , to obtain n
Q(t) =
-2n- 2
= y, L( s) = s, p
=
1,Xi
Xj.
= 1 and
n
L L min(n -
i , n - j){Hi(Yi - t) - 1/2}
i=l j=l
n
=
_n- 2 L(n - i)(n + i - l){Hi (Yi - t) - 1/2}. i==l
5. Minimum Distance Estimators
154
Now suppose fur ther t hat H i == F. Then
fJ
is a solution t of
n
(5.3.19)
2:: (n - i )(n + i - l ){F(Yi - t) - 1/ 2} =
o.
i=l
Compare th is fJ wit h iJ of (5.3.2). Clearly fJ given by (5.3.19) is a weight ed M-estimator of t he location parameter whereas iJ given by (5.3 .2) is an ordi nary M-estimator. Of course , if we choose L (s ) = 1(s 2: l) ,p = 1, Xi == 1, G(y) = y t hen iJ = (3. In genera l {3 may be obtained as a solut ion of Q (t ) = o. Next , consider f3* of (5.2.18 ). For the time be ing focus on the case p = 1 and di == X i - ii , Assume, without loss of genera lity, that t he data is so arranged t hat Xl ~ X2 ~ . . . ~ Xn . Let y := {(Yj - Yi)/(Xj - Xi); i < i , Xi < Xj }, to := min {t ; t E Y} and t l := max{t ;t E Y} . Then for Xi < Xj, t < to implies t < (Yj -Yd/ (x j -xd so t hat ~t < R jt . In ot her words the resid uals {Yj - tx j ; 1 ~ j ~ n} are naturally ord ered for all t < to, w.p .1., assum ing t he continuity of th e errors. Hence, with T( s , t ) denoting the Tl d(S, t) of (5.2.17) , we obtain for t < to,
T( s , t) =
'\"' L.., ik=l di ,
kin
0,
~
s < (k
+ l )/ n , 1 ~ k ~ n o~ s < lin, s =
1, 1.
Hence, K*(t) =
n~Wk -l {~k di }2
Similarly usin g the fact I.: ~=l di
= 0, one obtains
As t crosses over to only one pair of adjacent residuals change their ranks. Let Xj < Xj+! denot e t heir resp ective regression con-
155
5.3. Finite sample properties stants. Then
W(to_) -
r k~~#j {t,d;r I: {t r-{ ~ n
~Wk {t,d
W(to+)
i
-Wj {dj+l
Wk
+
di
}2
~=1
~
Wj [
di
dj+l
j
j- l
i= 1
i= 1
+
s.
L di < dj+l + L di < O. Hence K *(to_) > K *(to+) . Similarl y it follows that K *(t l+) K *(tl _) ' Con sequ ently, (31 and (32 are finit e, where
=
K *(~)} ,
(31 .-
min{t E YIK *(t+)
(32
max{t E Y ,K*(L) = inf K *(~)},
.-
inf
.6.EYC
>
.6.EY c
yc
and where denotes the complement of y . Then (3* can be uniquely defined by the relation (3* = ((31 + (32 ) / 2. This (3* corresponding to L(a) = s was studied by Willi amson (1979, 1982). In genera l t his estimator is asymptotically relatively mor e efficient than Wilcoxon type R-estimators as will be seen lat er on in Sect ion 5.6. There does not seem to be such a nice charac terizat ion for p 2:: 1 and general D sat isfying (5.2.16). However , pro ceeding as in the derivation of (5.3.6), a computat iona l formula for K * of (5.2.17) can be obtained to be
This formula is valid for a general used to compute {3 *.
(J" -
finit e measure L and can be
5. Minimum Distance Estimators
156
We now turn to t he m.d. est imator defined at (5.2.21) and (5.2.22). Let di == Xi - ii , The first obse rvat ion one ma kes is t ha t for t E JR, n
V n(t ) := sup yE IR
L di I(~ ~ y + tdd i=l
n
=
L
sup dJ( Rit ~ ns) 0'S s'S l i=l
Proceedings as in th e above discussion per tainin g to 13* , assume , without loss of generality, that the dat a is so ar ra nged that Xl ~ X2 ~ . . . ~ Xn so t hat d1 ~ d2 ~ . . . ~ dn· Let Y1 := {(Yj - ~ )/ ( dj - di);d i < O,dj 2 0, 1 ~ i < j ~ n }. It can be proved that V;i(D~) is a left cont inuous non-decreasing (right continuous non-increasing) st ep fun cti on on JR whos e points of discontinuity are a subse t of Y1. Moreover , if - 00 = to < ii ~ t2·· . ~ t m < t m+ 1 = 00 denote t he ordered members of Y1 then V ;i(t1 - ) = = V ;;(t m+ ) and v ;i(tm+) = 2:7=1 d; = V~ ( t1- ) , where dt == max (di , 0) . Consequent ly, t he following ent it ies are finite:
°
13s1 .-
inf{t E JR; V;i(t) 2 V ; (t )},
13s2 .-
sup] t E JR; V;i (t) ~ V ; (t )}.
Note t hat 13s2 2 13s1 w.p. l.. One can now take f3s = (131 + 13s2)/ 2. Willi am son (1979) pr ovides t he pr oofs of t he abov e claims and obtains t he asy mptotic dist ribut ion of 13s . This est imator is the pr ecise generalization of th e m.d. est imator of th e two sam ple location par ameter of Rao, Schuster and Lit tell (1975). Its asymptotic distribut ion is t he sa me as that of t heir estimator. We shall now discuss some add it ional dist ri bu tional prop erties of th e above m.d. estima tors . To facilit ate this discussion let /3 denote anyone of the est imators defined at (5.2.8), (5.2.15), (5.2.18) and (5.2.22). As in Section 4.3, we shall write /3(X , Y ) to emphasize the depe nde nce on the data { (x~ , Y i ) ; 1 ~ i ~ n} . It also helps to t hink of th e definin g dist an ces K , K + , etc . as fun cti ons of residu als. Thus we shall some times write K (Y - Xt) etc. for K (t ). Let K stand for eit her K or K + or K * of (5.2.7), (5.2.14) and (5.2.17). To begin with , obse rve t hat (5.3.21)
K (t - b ) = K(Y
+ Xb -
Xt ), \I t , b E W,
5.3. Finite sample prop erties
157
so that (5.3.22)
!3(X, Y
+ Xb) =
!3(X, Y)
+ b,
Vb E W .
Consequentl y, the dist ri bu ti on of !3 - {3 does not depend on {3. The dist an ce measure Q of (5.2.9) do es not satisfy (5.3.21) and hen ce the distribution of (:J - {3 will generally depend on {3. In general, the classes of est imators {,8} and {{3 +} are not scale invariant . However , as can be readily seen from (5.3.6) and (5.3.7) , the class {,8} corr esp onding to G (y) = y, H i = F and those {D} that satisfy (5.2.16) and the class {{3+} corres ponding to G(y) = y and general {D} ar e scale invariant in the sense of (4.3.2). An interesting property of all of the above m.d. estima tors is that th ey are invari ant under nonsin gular tran sformat ion of the design matrix X . That is, !3(XB , Y) = B - 1!3 (X, Y) V p x p nonsin gular matrix B. A similar st at ement holds for (:J. We sha ll end t his sect ion by discussin g t he symmetry pr operty of th ese est ima tors . In the following lemma it is implicitly ass umed t hat all integrals involved are finite. Some sufficient condit ions for t ha t to happen will unfold as we proceed in this chapter. Lemma 5.3.2. Let (1.1.1) hold wi th th e actual and the modeled d.f. of e; equal to H i , 1 ::; i ::; n . (i) If either (ia) {Hi ,l ::; i ::; n } an d G are sym me tri c aroun d 0 an d {Hi , 1 ::; i ::; n } are con tinuous, or (ib) dij = -dn- H1 ,j , Xij = - Xn- i+l ,j and H i = F V I::; i ::; n, 1 ::; j ::; p , th en
,8
an d {3* are symmetrically distributed aroun d {3 , whenever th ey exist uniquely.
(ii) If {Hi , 1 ::; i ::; n } and G are symmetric around 0 an d eit her { H i , 1 ::; i ::; n} are cont inuous or G is con tinuous, th en
5. Minimum Distance Estimators
158
13+
is symmetrically distributed around 13, whenever it exists uniquely. Proof. In view of (5.3.22) there is no loss of generality in assuming that the true 13 is O. Suppose that (ia) holds. Then /:J(X, Y) =d /:J(X, - Y). But, by definition (5.2.8) , /:J(X, - Y) is the minimizer of K( - Y - Xt) w.r.t. t . Observe that V t E W , K(-Y - Xt)
=
t J[t
t J[t
t J[t
r r r
dii{ I( -Yi :<: y + x:t) - Hi(y)}
dG(y)
dii{ 1 - I(Y; < -y - x:t) - Hi(y)} d;i{ I(Y; < Y - x:t) - Hi(Y-)}
dG(y)
dG(y)
by the symmetry of {Hd and G. Now use the continuity of {Hd to conclude that , w.p.L, K( - Y - Xt)
= K(Y + Xt) ,
V t E
~P ,
so that /:J(X , - Y) = -/:J(X, Y) , w.p.l , and the claim follows because -/:J(X, Y) = argmin{K(Y + Xt); t E ~P} . Now suppose that (ib) holds. Then
t J[t t J[t
r
K(Y +Xt)
~d
dn-i+U { I (Y; :<: y + x:.-iH t) - F(y) } dn- iH ,i{I(Yn - i+1 :<: y +
x~-Hlt) -
dG(y)
F(y) }] 2
xdG(y) K(Y - Xt),
V t E W.
This shows that -/3(X, Y) 13* is similar.
=d
/:J(X, Y) as required. The proof for
159
5.4. A general case Proof of (ii) . Again, ,8+ (X, Y) symmetry of {Hd . But,
=d
,8+(X, - Y), because of the
K+(-Y - Xt)
tf
[~dij{ I(-Y; S Y +x;t)-1 +I(-Yi:S
t f [~dij{
I(Y; < y +X;t)-1 + I(Yi
=
K+ (Y + Xt) ,
-Y+X~t)}rdG(Y)
Vt EW,
w.p .l , if either {Hd or G are continuous.
5.4
r
< -Y + x~t) } dG(y) o
A General M. D. Estimator
This section gives a general overview of an asymptotic theory useful in infer ence based on minimizing an objective function of the data and parameter in general models. It is a self contained section of broad interest. In an inferential problem consisting of a vector of n observations Cn = ((nl, ' " , (nn)', not necessarily independent, and a p - dimensional parameter 0 E W, an estimator of 0 is often based on an objective function Mn(Cn , 0) , herein called dispersion. In this section an estimator of 0 obtained by minimizing Mn (Cn, .) will be called minimum dispersion estimator. Typically the sequence of dispersion M n admits the following approximate quadratic structure. Writing Mn(O) for Mn(Cn 'O) , often it turns out that Mn(O) - Mn(Oo) , under 0 0 , is asymptotically like a quadratic form in (0 - ( 0 ), for 0 close to 0 0 in a certain sense, with the coefficient of the linear term equal to a random vector which is typically asymptotically normally distributed . This approximation in turn is used to obtain the asymptotic distribution of the corresponding minimum dispersion estimators.
160
5. Minimum Distance Estimators
The two class ical examples of the above typ e are Gauss's least sq uare and Fisher 's maximum likelihood estimators . In t he former th e disp ersion M n is t he error sum of squ ar es whil e in th e lat ter M n equals -log L n , L n denoting t he likelihood function of () based on CnIn the least squares method, M n( (}) - Mn((}o) is exac tly quadratic in (() - (}o), uniformly in () and (}o. The ra ndom vector appearing in t he linear te rm is typi cally asymptotically normally distributed. In the likelih ood method , the well celebrated locally asy mp totically normal (LAN) models of Le Cam (1960, 1986) ob ey t he above typ e of approxima te qu adrati c structure. Other well kn own examples include the least absolute deviation and th e minimum chi-squa re estimators. The main purpose of this section is to unify the basic structure of asy mptot ics underlyin g th e minimum dispersion est imators by exploiting t he above type of common asymptotic qua dratic struct ure inh erent in most of t he dispersions. The formulati on given below is general enough to cover some irr egular cases where t he coefficient of linear term may not be asy mptotically nor mally distribut ed . We now formulate genera l conditions for a given dispers ion to be uniformly locally asy mptotically quadratic (ULAQ). Accordi ngly, let 0 be an ope n subs et of IRP and M n , n ~ 1, b e a sequ ence of rea l valued functions defined on IRn x 0 such t hat M n ( · , (}) is meas urable for each (). We shall often suppress t he Cn coord inate in M n and wri te M n( (}) for Mn( Cn, (}). In orde r to state general cond it ions we need to define a sequence of neighb orh oods Nn( (}o) := { () E 0 , Idn((}o )((} - (}o)1 ~ b}, where (}o is a fixed par am eter valu e in 0 , b is a finit e number and {d n ((}o)} is a sequence p x p symmet ric positive definite matrices with norms Ildn((}o)11 tending to infinity. Since (}o is fixed , write dn, N n for dn((}o ), N n((}o), resp ectively. Similarly, let Pn denote the pr ob ability distribu tion of Cn when () = (}o ·
Definition 5.4.1. A sequence of disp ersions {M n ( (} ) , () E N n }, n ~ 1, satisfying condit ion (Ai) - (A3) given below is sa id to be uniformly locally asymptotically quadratic (ULAQ) . (Ai) There exist a sequence of p x 1 random vector S n((}O) and a seque nce of p x p , poss ib ly random , matrices W n((} o), such that ,
161
5.4. A general case for every 0
< b < 00,
Mn( O) Mn( Oo)
+ (0 -
OO)'S n(OO) + ~ (O - Oo)'Wn(OO)( O -
( 0)
+up (I ), where "u p (I )" is a seq uence of stochastic processes in 0 converging to zero, uniformly in 0 E N n , in Pn - pr obabili ty. (A2) There exists a p x p non-singular , poss ibly random, matrix W (O o) such that
(A3) There exists a p x 1
LV.
Y (O o) such t hat
where L n , L denote jo int probability dist ributions under Pn and in t he limit , respectively. Denote the conditions (Ai) , (A2) by (AI) and (A2) , respectively, whenever W is non-random in t hese conditions. A sequence of dispe rsions {Mn } is called uniformly locally asymptotically normal quadratic (ULANQ) if (AI) , (A2) hold and if (A3) , instead of (A3) , holds, whe re (A3) is as follows: (A3) T here exists a positive definite p x p matrix ~ (O o ) such t hat
If (Ai) holds without t he uniformity requirement and (A2), (A3) hold t hen we call t he given sequence M n LAQ (locally asymp totically qua dratic). If (A1) holds without the uniformity requirement and (A2), (A3) hold then t he given sequence M n is called locally asymptotically normal quadratic (LANQ). In t he case Mn( O) = - fn Ln(O ), t he cond it ion non-uniform (Al) , (A2) , (A3) wit h lI15n ll = O(n 1 / 2 ), determine t he well celebrated LAN models of Le Cam (1960, 1986). For t his particular case, W (O o), ~ (Oo ) an d t he limit ing Fisher information matrix, F (Oo), whenever it exists, are t he same.
5. Minimum Dist ance Estimators
162
In the above general formulation, M n is an arbitra ry dispersion satisfying (Ai) - (A3) or (AI) - (A3) . In the lat ter the three matrices W (9 0), ~ (9 0 ) and F (9 0) are not necessarily identical. The LANQ dispersions can thus be viewed as a generalizat ion of the LAN models. Typically in the classic al i.i.d. setup the normalizing matrix ~n is of the order square root of n whereas in t he linear regression model (1.1.1) it is of the ord er (X'X)1/2. In general ~n will depend on 9 0 and is determined by the ord er of the asymptotic magnitude of Sn(9 0). We now t urn to th e asymptot ic distribution of th e minimum dispersion estimators. Let {Mn } be a seq uenc e of ULAQ disp ersions. Define (5.4.1)
On = ar gmin{Mn(t), t E D}.
Our aim is to investigate th e asymptotic behavior of On and Mn( On)' Akin to th e study of th e asy mptotic distribution of m.l.e.'s, we must first ensure th at th ere is a On satisfying (5.4.1) such th at (5.4.2) Unfort unate ly th e ULAQ assumptions ar e not enough to guarantee (5.4.2). One set of additional assumptions th at ensures this is th e followin g. (A4) V E > 0 :3 a 0 < Zf < 00 and N 1f such th at
Pn(IMn( 9 0)1 :S Zf) ?: 1 (A5) V E > 0 and 0 and 0:) such t hat
E,
V n?: Nl{-
< 0: < 00, :3 an N 2f and a b (depending on
E
It is conveni ent to let , for a 9 E W ,
Qn(9 , 9 0)
:=
(9 - 90)'S n(90) + ~(9 - 9 0)'W n (9 0)(9 - 9 0),
and On := argmin{Qn(9 , 9 0) , 9 E W }. Clearly, relation (5.4.3)
On must
satisfy the
5.4. A general case
163
where B n := 8~ lwn8~1, where W n = Wn(l~O) ' Some generality is achieved by making the following assumption. (A6) 118 n(On - ( 0)11 = Op(I). Note that (A2) and (A3) imply (A6). We now state and prove Theorem 5.4.1 Let th e dispe rs ions M n satisfy (Ai) , (A4) - (A6). Th en , under ti;
(5.4.4)
I(On - On)'8 nB ndn(On - On)1 = op(I) , inf Mn(O) - Mn(Oo)
(5.4.5)
O ED.
= -(1 /2)(On -
Oo)'Wn(On - ( 0) + op(I) .
Con sequently, if (A6) is repla ced by (A2) and (A3), th en
8n(O n - ( 0) ~d {W(OO)} -ly(OO),
(5.4.6) (5.4.7)
inf Mn(O ) - Mn(Oo) O ED.
= -(1 /2)S~(00)8~113~18~ISn(00)
+ op(I) .
If, ins tead of (Ai) - (A3), M n satisfi es (AI) - (A3), and if (A4) and (A5) hold th en also (5.4.4) - (5.4.7) hold and
(5.4.8) whe re
Proof. Let
Zt
be as in (A4). Choose an a >
[ IMn(Oo)1 ~
Zt,
inf Mn(Oo Ilhll>b
C [ inf Mn(Oo +
Ilhll:Sb
C [ inf
IIhll>b
Mn(Oo
Zt
in (A5). Then
+ 8~lh) 2:: a]
8~lh) ~ Z t , Ilhll>b inf Mn(Oo + d~lh)
+ 8~lh) >
inf Mn(Oo Ilhll :Sb
+ 8~lh)]
2:: a ]
.
Hence by (A4) and (A5), for any E > 0 there exists a 0 (now dep ending only on E) su ch that V n 2:: NIt V N2t ,
00
5. Minimum Distan ce Estimators
164
This in turn ensures the validity of (5.4.2) , which in turn together with (Ai) yields (5.4.9) From (A6), the inequality inf [Mn(Oo)
OEN n
<
+ Qn(O, 00)] I
sup IMn(O) - [Mn(Oo)
+ Qn(O, 0 0 ) ] I
OEN n
and (Ai) , we obtain (5.4.10) Now, (5.4.9) and (5.4.10) readily yield
Qn(On , OO) =
a.»: 0
0)
+ op(l) , (P'l) ,
whi ch is pr ecisely equivalent to the statement (5.4.4). The claim (5.4.5) follows from (5.4.3) and (5.4.10). The rest is obvious. 0 R em a rk 5.4.1. Rou ghly speaking, the assumption (A5) assures t ha t th e sma llest value of Mn(O) for 0 outside of N n can be made asy mptot ically arbitrarily large with arbitrarily large probability. The ass umpt ion (A4) means that th e sequ ence of LV .'S {Mn(Oo)} is bounded in probability. This assumption is usually verified by an applicat ion of the Markov inequality in th e case EnIMn(Oo)1 = 0(1) , where En denotes the expec tat ion under Pn. In some applications Mn(Oo) converges weakly to a LV. which also implies (A4) . Oft en the verification of (A5) is rendered easy by an applicat ion of a vari ant of the C-S inequality. Examples of this appear in the next section when deal ing with m.d. estimators of t he pr evious sect ion . We now discuss the minimum dispersion tests of simple hypothesis , bri efly without many details. Consider the simple hypothesis H o : 0 = 0 0 . In the sp ecial case when M n is - £n L n , the likelihood ratio stat istic for testing H o is given by -2 inf{Mn(O)-Mn(Oo); 0 En}. Thus, given a general disp ersion fun ction M n , we ar e motivated to base a test of H o on th e st atisti c
5.4. A general case
165
wit h lar ge values of Tn being sign ificant . To study t he asymptotic null distribution of Tn, note t hat by (5.4.7), Tn = S~~;;- IB;;- I~;;- ISn + op(I) , (Pn) . Let Y , W etc. stand for Y (8 0), W (8 0), etc. Proposition 5.4.1. Under (Ai) - (A3) , (A4) , (A5) , the asymptotic null distribut ion of Tn is the same as that of Y'W -l y. Under (AI) - (A5) , the asymptotic null dist ribution of Tn is the sam e as that of Z'B Z where Z is a N( O, I pxp) r.u. and B = ~1 / 2W - I~ I /2 . 0 Remark 5.4.2. Clearly ifW (8 0) = ~ (8 0 ), then the asymptotic null distribution of Tn is X~ . However, if W(8 0) i= ~ (8 0 ), t he limit distribution of Tn is not a chi-square. We shall not discuss t he dist rib ut ion of Tn under alte rnatives. 0 Here we sha ll now discuss a few examples illust rati ng the above th eory. EXAMPLE 1: THE ONE SAMPLE L l. D . SETUP. Let e be an open interval in IR and {Fo; 0 E e} be a family of distribution functions on R Let 9 be real valued fun ction on IR x e such that g(·,O) is measurable for each 0 E e. Let Xl , ' " , X n be i.i.d. from an Foo ' where 00 is fixed value of 0 in e. Define n
Mn(O)
:=
L g(Xi, 0) ,
oE e.
i= l
We make t he following additional assumptions: Let A be an open neighborhood of 00 . a. g(x , 0) is absolutely continuous in 0 E A and for almost all x (Fo o)' wit h g(x,') denoting t he a .e. der ivative of g(x , ') satisfying (5.4 .11)
J
g2(x,0) dFoo(x) <
00,
b. T he fun ction ..\(0) := J g(x ,O) dFoo(x) is differentiable on A and the der ivative ..\ satisfies (5.4 .12)
c. The function s t--+ J[g( x , s) - g(x ,OoW dFoo(x) is cont inuous at 00 .
166
5. Minimum Distance Estimators
°
Under the above assumptions, for every < b < 00 and for every sequence of real numbers On E N n := {t: tEe, n 1 / 2 1t - 00 1~ b}, n
(5.4.13)
Mn(On) = Mn(Oo) - (On - ( 0 ) Lg(Xi,Oo) i=l
Proof. Let D n denote the difference between Mn(On) and the leading r.v .'s on the right hand side of (5.4.13), F = F oo' E, V denote the expectation and variance with respect to F , Un := On - 00 , and let t n := n 1 / 2 Un ' To prove (5.4.13), it suffices to prove (5.4.14)
ED n
-t
0,
First , suppose Un > 0. Consider n
E[Mn(On) - Mn(Oo) - (On - ( 0 ) Lg(Xi,Oo)] n n
J Jl
i=l
[g(x, On) - g(x . ( 0 )
(On - ( 0 ) g(x , ( 0 ) ] dF(x)
un
[g(x,00
l J l x l un
n
-
[g(x. eo
+ s) - g(x ,Oo)] ds dF( x) , + s) - g(x , ( 0 ) ] dF(x) ds ,
un
n
n 1/ 2
[
(00
+ s) - >.(( 0 ) ] ds
tn
[>'(0 0
+ sn - 1/ 2 ) -
>.(00 ) ] ds .
In the above , the second equality follows from a and (5.4.11) while the third uses Fubini. Note that t;.j2 = J~n s ds. Hence
167
5.4. A general case
<
b2(bn- 1/ 2) -
-+
0,
1
bn- 1 / 2
1
1~(Oo + s) - ~(Oo)1
ds
by (5.4.12), thereby proving the first part of (5.4.14) when To prove the second part in this case , similarly we have
V(D n)
nV [g(X , en) - g(X, ( 0 )
-
Un
> 0.
(On - Oo)g(X,( 0 ) ]
< nE[g(X, On) - g(X,Oo) - (On - Oo)g(X,Oo)f
J[l Jl
un
n
< n
{g(x , s + ( 0 )
un
u -,
Tnnl/21un
-
g(x , Oo)} ds
f
dF( x)
{g(x , s + eo) - g(x , Oo)} 2 dsdF(x)
J
{g(x , s + ( 0 )
-
g(x , ( 0 ) } 2 dF( x)ds
< b2(bn- 1/ 2) - 1
xl! s+ ( bn -
1/ 2
{g(x ,
-+
0,
0) -
g(x , Oo)}2dF(x)
by the assumption c .
This completes the proof of (5.4.13) in the case On > 00. The proof is similar in the opposite case , thereby completing the proof of (5.4.13) . Actually we can prove mor e under the same assumptions . Let
We sh all now show , under the same assumptions as above , that for every 0 < b < 00 , (5.4.15)
E(SUItl:SbP ID
n(t)I)2
-+ o.
To prove this , because of a , we can rewrite with probability 1 for all t 2': 0 and for all n 2': 1,
5. Minimum Distance Estimators
168
n
Dn(t)
=
L
[g(Xi,(JO+ tn- l / 2 )
-
g(Xi , ( 0)
i::::l
2
. ] t A(Oo) 2n
- tn" 1/2 g(Xi , ( 0) -
=
=
rt
Jo
n- l / 2
t
[g(Xi , 00 + sn- l / 2 )
-
g(Xi , ( 0)
i :::: l
- sn- l / 2 ~(OO) ] ds
it t
[g(Xi , 00 + sn- l / 2 )
n- l / 2
o
-
g(Xi , ( 0)
i::::l
- A(Oo + sn- l / 2 )
+ A(Oo) ] ds
t
+ 1
[
n l / 2[A(00
+ sn- l / 2 )
-
A(Oo)
- sn- l / 2 ~(Oo) ] ds say . Hence ,
E( sup ID
(t)12 )
nl
O
b
<
E(1
In- l / 2 t . [ g ( Xi,00+ sn- 1/ 2)-g(Xi ,00) _ A(Oo
<
+ sn- l / 2 ) + A(Oo) ] I dS) 2
blob E(n- 1!2t. [!i(Xi ,OO+sn-1!')-!i(Xi ,OO)
_ A(Oo
+ sn- l / 2 ) + A(Oo) ]) 2 ds
b
=
b1
V (g(X , 00 + sn - l / 2 ) - A(Oo
2(bn- l
< b
2)-11 /
bn -
-
g(Xi , ( 0)
+ sn- l / 2 ) + A(Oo) ) ds
1/ 2
E[g(X,00+U)-g(X,00)]2du
5.4. A general case
169
--+
by c.
0,
Similarly,
1
bn- 1 / 2
sup IDn2(t)1 099
b2(bn- 1 / 2 ) - 1
<
I~(eo + s)
-
~(eo)1 ds
by b.
0(1),
This completes the proof of (5.4 .15) for t ~ O. The details are exactly similar for the case t ::; O. Hence (5.4.15) is proved, which in turn together with the Markov inequality implies that for every 0 < b < 00, SUPltl::;b IDn(t)1 = op(I) . 2. Now consider a more general setup where the parameter (J is p-dimensional real vector, the observations X ni , 1 ::; i ::; n, are independent r.v.'s, each having possibly a different distribution depending on the true parameter value (Jo under a sequence of probability measures Pn := TI~=l Fni . Here , Fni denotes the d.f of X ni, 1 ::; i ::; n, that will typically depend on (Jo. Also, let Cni, 1 ::; i ::; n, be arrays of p-vectors in W, e s;;; W, and EXAMPLE
n
Mn((J) :=
L g(Xni , C~i(J),
(J E
e.
i=l Let C denote the n x p matrix whose i t k row is c~i' 1 ::; i ::; n , and Ani(t) := Eng(Xni , t), t E JR, where En is the expectation under Pn · About the constants Cni, assume the following: d. The vectors Cni are such that maxl::;i::;n Ilcnill = 0(1), and C'C = I pxp. e. The function 9 is as in a satisfying (5.4.16)
:t J IIc nil1
i=l
(5.4.17)
2
g2(x,
C~i(JO) dFni(x) =
lim lim sup max n-+oo l::;t::;n
s-e O
J
{g(x , C~i(JO
0(1),
+ s)
-g(x, C~i(JO) } 2 dFni(x) = O. f. The functions {And are absolutely continuous with their a.e.
170
5. Minimum Distance Estimators
derivatives {~ni} satisfy the following: For any sequence ani of positive numbers with maxl :::;i:::;n ani = 0(1) , (5.4.18) Using the ar guments of the typ e used in Example 1 above, one can prove that under d - f the followin g holds: For every 0 < b < 00 , n
(5.4.19) Mn( O) = Mn(Oo) - (0 - ( 0
)'2: cnig(Xni , C~i OO ) i=1
-
~(O -
n
(0
)'2: CniC~i ~ni (C~iOO)(O -
( 0)
i=1
where u p ( l ) is a sequence of pr ocesses in () that converges to zero uniformly over the set 110 - 0 0 II ~ b, in probability. Consider as a sp ecial case the M-dispersion used by Huber (1981) to construct M-estimators in the linear regr ession model (1.1.1) . Here one takes Cni == X ni (X ' X )- 1/ 2, and n
M n({3) =
2: p(Yni i=1
n
x~d3) =
2: p(eni -
Cni(X' X )I/2(,8 - ,80)) ,
i=1
where p is measurable fun ction from JR to JR. Upon identifying X ni == eni, 0 = (X'X)I / 2(,8 _,80) ' 0 0 = 0, one readily obtains the following: Suppose (1.1.1) holds with the errors eni i.i.d. F; p is ab solutely continuous wit h a .e. der ivative 'ljJ su ch t hat
!
1'ljJ( x
+ y)ldF( x) < 00 , V Y E JR,
!
'ljJdF = 0,
!
'ljJ2dF
< 00.
Let IT(t ) := f['ljJ( x - t) - 'ljJ (xW dF( x) , t E JR, r = 1, 2. Not e that upon ident ifying g(y , t ) of the EXAMPLE 1 with p(y - t) , one sees that g(y , t) == - 'ljJ(y - t) and that 1 1 = >'(0) - >'(t) . Suppose, additionally, that 1 1 is continuously differ entiable at 0 and that 1 2 is cont inuous at O. Then , under (NX) , it follows that Huber 's dispersion M n is
171
5.4. A general case ULANQ with n
=
00
~n
0,
= (X'X)I/2,
Sn(j3)
= - LXi1/J(J!i -
x~{3),
i=1
W
= -\(O)X'X, W = -\(O)Ipxp,
n({3)
:E = J'ljJ2dFlpxp .
If p is additionally assumed to be convex , then using a result in Rockafeller (1970), to prove the ULANQ, it suffices to prove that the score statistic Sn({3) is asymptotically linear in (X'X)I/2({3 - {30) , for {3 in the set II (X'X)I/2({3 - {30)11 :::; b, in probability. See Heiler and Weiler (1988) and Pollard (1991) for details. For p(x) = Ixl and F continuous, 'ljJ(x) = sgn(x) and Ir(t) = 2r IF (t ) - F(O)I. The condition on II now translates to the usual condition on F in terms of the density f at O. For p(x) = x 2, 'ljJ(x) == 2x , 11(t) == 2t , so that II is trivially continuously differentiable with ')'1 (0) = 2. Note that in general W =I :E unless -\(0) = J 'ljJ2dF which 0 is the case when 'ljJ is related to the likelihood scores.
3. An example where the full strength of (Ai) - (A3) is realized is obtained by considering the least square dispersion in an explosive AR(I) process where X o = 0 and for some 101 > 1, EXAMPLE
i = 1,2, ' " ,
where it is assumed that e, are mean zero finite variance i.i.d. r.v .'s . Let 1 n Mn(t) := 2 (Xi - tXi _ 1)2, ItI > 1.
L
i=1
Then the least square estimator of 0 is en := argmintMn(t). We now verify (Ai) - (A3) for this dispersion. Let 00 denote the true value. Rewrite, for a 101 > 1, 1 n
Mn(O)
= 2 L (s, -
(0 - 00)X i _ 1)2
i=1
1
n
n
2L i=1
(0 - 00 )
L i=l
ciXi - l
+
1
2(0 - 00 ) 2
n
L i=l
xLI
5. Minimum Distance Estimators
172
So one read ily sees that what Sn and W n should be. Now t he question is what should be On , Wand Y . F irst note that one can rewrite j
x, =
L O~ - i ei, i= l
so t ha t j
ooj x, =
L «' e, i= l
is a mean zero square int egrabl e martingale. Not e also that , with a 2 = E eI , and because 05 > 1, as j -7 00,
sup)· E ( 4)2 =
The above derivati ons also show t hat
T
2
°0 exists theorem , t here
Hence, by t he martingale convergence with mean zero and vari an ce T 2 such th at a.s. and in
£2 ,
as j
< a
00 .
LV.
Z
-7 00 .
This implies that IXjl explodes to infinity as j -+ 00. Also not e that becaus e ~!=1 00iei is a sum of i.i.d . mean zero finit e vari an ce r.v.'s , by the CLT , Z is a N(O , (05 _1) -1) LV. Now let Zi := 00 iXi . Then
Oo 2nW n =
n
L 002i Z~ _ i
-7
Z 2/(05 - 1),
a.s .
i=1
This suggests that th e On
= O~
and W
= Z 2/ (05 -
n
00 n s;
= L Oo iZn-i i= 1
en-HI ,
1). Now consider
173
5.5. Asymptotic uniform quadraticity
Because of the indep end ence of Ei from X i-I , Sn can be verified to be a means zero squa re int egrable martingale. By the martingale cent ral limit theorem it follows that
where ZI is a N (O , (115 - 1)-1 ) LV. , indep endent of Z . So here Y = !ZIZI. 0 The next secti on is devoted to verifyin g (A1) - (A5) for vari ous disp ersion introduce in Sect ion 5.2.
5.5
Asymptotic Uniform Quadraticity
In this section we sh all give sufficient conditions under which K D , KiJ of Sect ion 5.2 will satisfy (5.4.Al) , (5.4.A4) , (5.4.A5) and K and Q of Section 5.2 will satisfy (5.4.AI). As is seen from t he pr evious sect ion this will bring us a step closer to obtaining the asy mptot ic distributi ons of various m.d. est imators introduced in Sect ion 5.2. To begin wit h we shall focus on (5.4.A l) for KD ,KiJ and K Our bas ic objective is to study t he asy mptotic dist ribu tion of when t he actual d.f.'s of { e n i ' 1 ~ i ~ n} are {Fni , I ~ i ~ n} bu t we mod el them to be {Hn i , 1 ~ i ~ n} . Similarly, we wish to st udy t he asy mptotic distribution of 13i) when actually t he errors may not be symmetric but we model t hem to be so. To achieve these object ives it is necessary to obtain t he asy mptot ic resul ts un der as genera l a setting as possible. This of course makes the expos it ion t hat follows look somewha t complicated. The results thus obtained will ena ble us to study not onl y th e asymptotic distributions of th ese estimators at the true model but also some of their robustness pr op erties. With this in mind we pr oceed to state our assumptions.
n
n·
/3D
(a) X
satisfies
(NX).
(b) With d(j) denoting t he j -t h column of D , IId(j) 11 2 > 0 for at least one j ; Ild(j)1 12 = I for all t hose j for which Ild(j) 1I 2 > 0, 1 < j
<».
(c) {Fn i , 1
~ i ~
n} admit densiti es
{fni ,
1 ~ i ~ n} W.Lt. A.
5. Minimum Dist ance Estimators
174
(d) {G n } is a sequence in 'DI(IR). (e) With d~i
= (dni1,' " , dnip), the i-th
/t
Iidni l12 Fni (1 -
row of D , 1 ::; i ::; n ,
FnddGn
= 0(1) .
i= l
(f) W ith "in
:=
~7=1
Iidn i l12 L« ,
lim sup n
ibn/
"in (y + X )dG n (y )dx = 0
Jan
for any real sequences {an} ,{bn} ,an
< bn, bn - an -+ O.
(g) With dnij = d~ij-d;;ij ' 1 ::; j ::; p; Cni = AXni , /'\,ni i ::; n, V 6
lim sup n
:=
Ilcnill, 1 ::;
> 0, V IIvll ::; b, p
n
j= l
i= l
L / [L d;ij {Fni (y + v' Cni + 6/'\,nd - Fni (y + v ' Cn i
6/'\,ni)}
-
r
dGn (y)
where k is a constant not dep ending on v and 6.
(h). With
n., :=
t j=l
~7=1 dnij xndni' Vnj := ARnj , 1 ::; j ::; p ,
J[JL~j(Y, u ) - JL~j(Y' 0) -
U 'V n j
2
(y)J dG n (y) = 0(1).
(j) Wi th mnj := ~ ~=1 dnij[Fni - H ni ], 1 ::; j ::; p; m np),
mi:>
= (m n1 , ' . . ,
5.5. Asymptotic uniform quadraticity
175
(k) With r~(y) := (Vnl(Y), '" ,vnp(Y)) = D'A*(y)XA, where A* is defined at (4.2.1) , and with I'n := JrngndG n , where gn E Lr(Gn), r = 1,2, n ~ 1, is such that gn > 0,
°< li~inf
/ g;dGn
~ limnsup /
and such that there exists an a
g;dGn <
00,
°
> satisfying
liminfinf{e'I'ne; e E 1W ,lIell = n
1} ~ a .
(l) Either (1) e'dniX~iAe ~ 0, V 1 ~ i ~ n and VeE W , lIell = 1.
Or
(2) e'dniX~iAe ~ 0, V 1 ~ i ~ n and VeE W , Ilell = 1.
In most of the subsequent applications of the results obtained in this section, the sequence of integrating measures {Gn } will be a fixed G. However , we formulate the results of this section in terms of sequences {G n} to allow ext ra generality. Note that if Gn == G, G E VI(JR) , then th ere always exist a agE Lr(G) , r = 1,2, such that g > 0, < J ldG < 00 . Define , for y E JR, u E W , 1 ~ j ~ p ,
°
(5.5.1)
sJ(y , u)
.-
Yjd(y , Au) ,
yjO(y, u)
.- SJ(y , u) - f-L1(y , u).
Note that for each j , SJ , f-L1 , ~o are the same as in (2.3.2) applied to X ni = Yni , Cni = AXn i and dni = dnij , 1 ~ i ~ n , 1 ~ j ~ p . Notation. For any functions g , h : W+l -+ JR,
Igu -
hvl~ := /
{g(y , u) - h(y, v)}2dG n(y) .
Occasionally we shall write Igl~ for Igo I~ · Lemma 5.5.1. Let Yn 1 , ' " , Y n n be independ ent r.v. '8 with respective d.f.'s F n 1 , ' " , Fn n . Then (e) implies p
(5.5.2)
E
I: Ilj~l~ = 0(1) . j=l
5. Minimum Distance Estimators
176
Proof. By Fubini's Theorem, n
P
E
L IYJol; = / L j=l
2
IIdi ll Fi (l - Fi)dGn
i=l
and hence (e) implies the Lemma. 0 Lemma 5.5.2. Let {Ynd be as in Lemma 5.5.1. Then assumptions (a) - (d) , (f) - (j) imply that, for every 0 < b < 00, P
L
E sup IYj~ Ilull:Sb j=l
(5.5.3)
Proof. Let N(b) .- {u E VUE N(b),
jRP;
-
Yj~l; = 0(1).
IIuli :S b}. By Fubini 's Theorem,
P
L
E
I}~?u - Yj~I;
j=l
:S /
t
IIdi ll
r:(/
2lFi(Y
+ c~u) - Fi(y)ldGn
i =l
:S where b.,
= b max, K,i , In P
(5.5.4)
In(Y + x)dGn(y))dx
as in (f). Therefore, by the assumption (J),
ELI Yjou -
Yj~ I~ = o(1), VuE
jRP.
j =l
To complet e the proof of (5.5.3), because of the compactness of N(b) , it suffices to show that V E > 0, :3 a 8 > 0 such that V v E N(b) , P
(5.5.5)
lim sup E n
L
sup IL j u - Ljvl :S lIu- vll :So j =l
E,
where Lj u
:=
IYj~
-
Yj~I; ,
u E
JRP , 1:S j :S p .
Expand th e quadratic, apply the C-S inequality to the cross product terms to obtain , for all 1:S j :S p ,
177
5.5. Asymptotic uniform quadraticity ~
Moreover, for all 1 (5.5.6)
IYj~
j ~ p,
< 2{ISJu - SJvl~ + IltJu - ItJvl~}, ISJu - SJvl~ < 2{lstu - stl~ + ISju - Sjvl~}' IltJu - ItJvl~ < 2{llttu - Ittvl~ + Iltju - ItJvI~} ,
-
Yj~I~
are the SJ, ItJ with dij replaced by dtJ := max(O, dij), dij := dt; - dij , 1 ~ i ~ n , 1 ~ j ~ p . Now, Ilu - vii ~ 6, nonnegativity of {dtJ}, and the monotonicity of {Fd yields (use (2.3.17) here) , that for all 1 ~ j ~ p , where SJ'
It]=
r
± - Itjv ± 1n2 Itju
I
s;
J[t, d~
{F,(y + c;v +OK,) - F;(y + c;v - OK,)}
dGn(y).
Therefore, by assumption (g), p
(5.5 .7)
lim sup
sup
L IltJu - ItJvl~ ~ 4k6
2
.
Ilu-vll~o j=l
n
By the monotonicity of SJ and (2.3.17) , for all 1 ~ j ~ p, Y E ~,
Ilu-vll
~
0 implies that
n
- "d*I(-6K L lJ l' < 1':l - v' c.l - Y < - 0) i= l
~ SJ(Y, u) - SJ(Y , v) n
~
L dtJI(O < Yi -
V'Ci -
Y ~ 6Kd·
i=l
This in turn implies (using the fact that a ~ b ~ c implies b2 ~ a2+c2 for any reals a, b, c)
{ SJ(Y, u) - SJ(Y, v) } 2
< {
t
dtJ I (0 < Yi - Y - v'c,
l=l
+
{t l=l
~ 0Ki) }
dtJI( -OKi < Yi - Y -
2
V'Ci
~ 0)}2
5. Minimum Distance Estimators
178
<
2
{t d~I
(-<5Ki < Yi
- Y-
V'Ci S; <5Ki)}2 ,
z=1
for all 1 S; j S; P and all Y E lit Now use the fact that for a, b real, (a + b)2 S; 2a2 + 2b2 to conclude that , for all 1 S; j S; P,
ISTu - stl~ <
4
J{t, d~
[I( -OKi < Y; - Y - v' Ci S OKi) - Pity, v , oj]
r
x dGn(y )
J{t, d~
+4 4 {Ij
+ II j } ,
Pi(Y, v,
OJ}
2
dGn(yj
(say),
wh er e Pi(Y, v , <5) = Fi(y + v' ci<5 Ki) - Fi(y + V'Ci - <5Kd · But (d~)2 S; drj for all i a nd j im plies that p
p
n
L / L(dm 2pi(Y, v , <5)(1 - Pi(Y, v , <5))dGn(y) j=1 i=1
ELIj j=1
< /
t
l:
IldiIl 2 pi(Y, v , <5)dGn(y)
i=1
<
n (/
1n(Y+S)dG n(Y)) ds ,
by (c) and Fubini , where an = (-b - <5) maxi Ki, b., = (b + <5) maxi Ki a nd where "[ri is defined in (f). Therefore, by the assumption (1) , p
ELIj = 0(1). j=1 From the definition of IIj above and the assumption (g), p
lim sup
n
L I t, S; k<5
2
.
j=1
From the above bounds, we r eadily obtain p
(5.5 .8)
limsupE
n
sup
L
lIu- vll:S" j=1
IYj Ou
-
Yj~I~ S; 40k<5 2 .
5.5. Asymptotic uniform quadraticity
179
Thus if we choose 0 < 0 :S (E/40k)1/2, then (5.5.5) will follow from (5.5.8), (5.5.7), (5.5.6) and (5.5.4). This also completes the proof of (5.5.3) . 0 To state the next theorem we need to introduce KD(t)
L J{Y/(y , 0) + t'Rj(y) + mj(Y)} P
:=
2
dGn(y)·
j==l
In (5.5.9) below, the G in KD is assumed to have been replaced by the sequence G n , just for extra generality. Theorem 5.5.1 Let Yn 1 , ' " , Ynn be independent r.v. 's with respective d.f. 's F n 1 , ' . . , Fnn . Suppose that {X , Fni, H ni , D , Gn} satisfy (a) - (j) . Then, for every 0 < b < 00 , (5.5.9)
E sup IKo(Au) - Ko(Au)1 = 0(1). IlullSb
Proof. Writ e K , K etc . for Ko , Ko etc . Not e that K(Au)
L J[SJ(y , u) P
J-l~(Y) + mj(Y)] 2 dGn(y)
j ==l
L J[Y/(y , u) - Y/(Y) + Y/(Y) + U'Vj(Y) + m j(Y) P
j==l
+J-l~(Y, u) - J-l~(Y) -
U'Vj(y)f dGn(y)
where yjO(y) == yjO(y , 0) , J-l~(Y) == J-lJ(Y,O) . Expand the quadratic and use the C-S inequality on the cross product terms to obtain
(5.5.10)
IK(Au) - K(Au)1 P
<
L
{IY~Fu - Yj~I~ + IJ-lJu -
J-lJ -
u'vjl~
j ==l
+ 21Yjou - Yj~ln [IYjO + U'Vj + mjln + IJ-lJu - J-lJ - U'Vjln] + 21Yjo
+ U'Vj + mjln 'IJ-l~u - J-l~ -
U'Vjln} .
5. Minim um Distance Estim ators
180
In view of Lemmas 5.5.1, 5.5.2 and assumpt ions (h) and (j) , (5.5.9) will follow from the above inequ ality, if we pr ove p
(5.5.11)
sup L lIull:SB j= l
°
IfLJu -
fLJ - u'vj l; = 0(1).
°
2 l c . .I n VIew . Let <'JU . - IfL ju - fL j - U "1 v J n' :S J' :S P, U E T!llp IN>: . 0 f th e compac t ness of N (b) and th e assumption (i), it suffices to pr ove that V E > 0, 3 a 0 > 0 3 V v E N(b ),
(5.5.12)
lim sup n
sup s~p ~ju Ilu- vll:S8 j = I
- ~jv l
:S
E-
But
I~ju - ~jvl < 2{lfLJu-/-lJvl; + Ilu - vll 2 l1 vjll; + ~;~2 [1J-lJu + IfLJu -
fLJvln + Ilu -
VIl IlVj lln]
fLJvlnllu - VIIIIVjlln}.
Hen ce, from (5.5.7) and t he ass um ption (i) , the left hand side of (5.5.12) is bo unded above by
2 { 4k02 + 02(a + 2k 1 / 2 a 1/ 2 ) } = k10 2 where a = lim sup., L~=l IIvjll; . Thus up on choosing 02 :S Elk1 one obtains (5.5.12), hence (5.5.11) and t herefore t he Theorem . 0 Our next goal is to obtain an analogue of (5.5.9) for Ki) . Before stat ing it rigorously, it helps to rewr ite Ki) in terms of standardized processes {Y/} and {fLJ } defined at (5.5.1). In fact , we have
Ki)(Au ) =
P
L
j= l
!
]2
n
[SJ(Y,u) - Ldij +SJ (-y, u ) dGn(y) i= l
P
=
L ! [yjO(y, u ) - yjO(y)
+ Y/( - V, u ) -
Y/( - V)
j= l
+ fLJ(y , u) - fLJ(y) - U'Vj(Y) + fLJ( - V, u ) - fLJ( - V) - U'Vj (Y) + u'vj (y) + W/(y) + mj(y)f dGn(y),
5.5. Asymptotic uniform quadraticity
181
where
n
mj(y)
.-
Ldij{Fi(Y) - 1 + Fi(-y)} i=l
n
=
p,J(y)
+ p,J(-y)
- L
dij ,
Y E JR, 1
~
j
~ p.
i=l
Let p
(5.5.13)
ki)(Au) = L
j[w/ +mj +u'vj]2dGn,
u E IW.
j=l
Now proceeding as in (5.5.10) , one obtains a similar upper bound for IKi)(Au) - ki)(Au)\ involving terms like those in R.B .S. of (5.5.10) and the terms like IYj~ -lj~l-n, Ip,Ju - p,J -u'Vjl-n , Ilvjl!-n, IYjOI- n , where for any function h : W+1 ~ JR,
It thus becomes apparent that one needs an analogue of Lemmas 5.5.1 and 5.5.2 with G n (-) replaced by G n ( _ .) . That is, if the conditions (e) - (j) are also assumed to hold for measures {G n ( _ · )} then obviously analogues of these lemmas will hold . Alternatively, the statement of the following theorem and the details of its proof are considerably simplified if one assumes Gn to be symmetric around zero, as we shall do for convenience. Before stating the theorem, we state Lemma 5.5.3. Let Y n1, ' " , Y nn be independent r.v. 's with respective d.f. 's Fn1,'" , Fnn · Assume (a) - (d) , (f) , (g) hold , {G n} satisfies (5 .3.8) and that (5.5.14) hold, where
(5.5.14)
j t IldniI1 2
{ Fni(-y)
+1-
Fni(y)}dGn(y) = 0(1).
i =l
Then, p
(5.5.15)
E
L j=l
IYj~l=-n
= 0(1),
5. Minimum Distance Estimators
182
and \:j 0 < b <
00 , p
L
E sup IYj~ - Yj~I=-n = 0(1). Ilull:Sb j=:1
(5.5.16)
o
This lemma follows from Lemmas 5.5.1 and 5.5.2, since under (5.3.8), LHS's of (5.5.15) and (5.5.16) are equal to those of (5.5.2) and (5.5.3) , respectively. The proof of the following theorem is similar to that of Theorem 5.5.1. Theorem 5.5.2 Let Yn1, ' " , Ynn be independent r.u. 's with respective d.f. 's Fn1, ' " , Fnn. Suppose that {X, Fni , D , G n} satisfy (a)(d), (f) - (i), (5.3.8) for all n 2': 1, (5.5.14) and that
L J{mj(y)}2dG n(y) = 0(1), p
(5.5.17)
j=:1
Then,
\:j
0
< b < 00 , E sup IKri(Au) - kri(Au)1 = 0(1). Ilu} :;b
(5.5 .18)
Remark 5.5.1. Recall that we are interested in the asymptotic distribution of A -1(/3n - (3) which is a minimizer of Kn({3 + Au) w.r.t. u . Since i3n satisfies (5.3 .22), there is no loss of generality in taking the true {3 equal to O. Then (5.5.9) asserts that (1/2)Kn satisfies (5.4.Al) with (5.5.19)
(Jo
0,
s; =
A -I ,
s, A-I :Tn, W n = ABnA , a; '- - rn(Y){Y~(Y) + mn(y)}dGn(y) ,
e; .-
J Jrn(y)r~(y)dGn(Y) ,
yg
where rn(y) = AX'A*(y)D ,A* as in (4.2.1), := (y10 , .. . , YpO) and m6 := (ml ,' . . , m p ) . In view of Lemma 5.5.1, the assumptions (e) and (h) of this section imply that EKn(O) = 0(1) , thereby ensuring the validity of (5.4.A4).
5.5. Asymptotic uniform quadraticity
183
Similarly, (5.5.18) asserts that (1/2)Ki!; satisfies (5.4.Al) with (5.5.20)
= =
(Jo
Sn
:J~ .B+n
.-
0, an = A-I ,
A-I :J+ W n = AB~ A , n'
-Jr~(y){W~(y) + Jr
m o(y)}dGn(y) ,
+' (y)dGn(Y), n(y) r n
where r~(y) := AX'A+(y)D ,A+(y) := A*(y) + A*( -y) ,y E ffi.P , WtJ' := (wt ,··· , W:) and (mt , · · , ,mt) ·
mt::=
In view of (5.5.12), (5.5.14) and (5.3.8) it follows that (5.4.A4) is satisfied by Ki!;( O) . Theorem 5.4.1 enables one to study the asymptotic distribution of /:Jo when in (1.1.1) the actual error d.f. F n i is not necessarily equal to the modeled d.f. H ni ,1 :::; i :::; n . Theorem 5.4.2 enables one to study the asymptotic distribution of f3tJ when in (1.1.1) the error d.f. F n i is not necessarily symmetric around 0, but we model it to 0 be so, 1 :::; i :::; n. So far we have not used the assumptions (k) and (l). They will be now used to obtain (5.4.A5) for K o and Ki!; . Lemma 5.5.4. In addit ion to th e assumptions of Theorem 5.5 .1 assume that (k) and (l) hold. Th en, 'i E > 0, 0 < z < 00, 3N (dep ending only on E) and a b E (0,00) (d epending on E, z) such that,
(5.5.21)
P( inf Ko( Au) 2 z) lIull >b
> 1-
E,
'in 2 N ,
(5:5.22)
P( inf K o( A u) 2 z ) > 1 Ilull>b
E,
'in 2 N,
P roof. As before write K , K etc . for K o , K o etc. Recall the definition ofI'n from (k) . Let kn(e):= e'I'ne, e E ~p . By the C-S inequality and (k) , p
(5.5.23)
sup Ikn(eW :::; Ilell=I
Ilr nll2
:::;
L
j =I
II vj ll; · Ign!;
= 0(1).
5. Minimum Distance Estim ators
184 Fix an
E
> 0 and a
z E (0,00) . Define, for t E IRP , 1 ~ j ~ P,
~(t)
.-
!
Vj( t)
.-
! [Vjd(Y, t ) -
{Y/
+ t'Rj + mj }gndGn,
t
dnij Hni(Y)] gn(y)dGn(y) ·
i=I
Also, let , ,
'
V := (VI , '"
,
,
,Vp ) , V := (VI, '"
Write a u E IRP with t he C-S inequality,
2
.
, Vp ) , "[n := Ignln, T := lim sup jc . n
lIuli > b as u =
re ,
Irl > b, lIell
= 1. Then , by
inf K (Au ) Ilull>b
>
inf (e'V(r A e))2 l t« , Irl>b.llell=I
inf K( A u) Ilull>b
>
in f (e'V(rA e))2 h Irl>b,lIell=I
n,
It thus suffices to show t hat B abE (0, 00) and N 3 , V n 2: N , (5.5.24) (5.5 .25)
p( Ir l>b,lIell=I inf (e'V (r A e))2 h n 2: z) p( Ir l>b,llell=I inf (e'V(r A e))2 hn 2: z)
>
1-
E,
> 1-
E.
But V UE IRP , IIV(Au) - V (Au)11 P
~ 2Tn
I: {IYj~ -
Yj~ l; + IflJu -
flJ- u'vjl;}·
j =I Thus, from (k), (5.5.3) and (5.5.10), it follows t hat V B E (0, (0) , (5.5.26)
sup IIV (Au) - V (Au)1I = op( l). IlullSb
Now, let T j := ! {Yl and rewrite
+ m j}g ndGn,
1 ~ j ~ Pi T' := (TI
e'V(rAe) = e'T
+ r k n( e) .
, '"
,Tp ) ,
5.5. Asymptotic uniform quadraticity
185
Again, by the C-S inequality, Fubini, (5.5.3) and the assumptions (j) and (k) it follows that :3 N; and bl , possibly both depending on 10, such that
P(IITII :::; bt}
(5.5.27)
~ 1-
(E/2),
Choose b such that (5.5.28) where a is as in (k). Then, with an := inf{lkn(e)l ; IIeil = I},
(5.5.29)
p(
,
2
inf (e'V(1'Ae)) Irl=b,llell=l "[n. P(le'Y(1'Ae)1
>
~
z)
~ (z"Yn)l /2,
P(lle'TI - 11'llkn(e)11
V Ilell = 1,11'1
= b)
~ (z"Yn)l /2, Vllell
= 1,11'1 = b)
> p(IITI:::; -(z"Yn)l /2 + ban) >
p(IITII:::; _(z"Y)l/2 + ba)
> p(IITII:::; bl ) ~ 1- (E/2),
Vn
~
Nl
·
In the above, the first inequality follows from the fact that Ildl-lell :::; jd+cl , d, e real numbers; the second uses the fact that [e'T] :::; IITII for all IIell = 1; the third uses the relation (-00 , _(z"Y)l /2 + ba) c (-00 , - (Z"Yn)l /2+ban) ; while the last inequality follows from (5.5.27) and (5.5.28) . Observe that e'Y (1' Ae) is monotonic in r for every II e II = l. Therefore, (5.5.29) implies (5.5.25) and hence (5.5.22) in a straight forward fashion . Next , consider e'V(1'Ae) . Rewrite e'V(1'Ae) =
J
i)e'dd[I(Yni :::; Y + 1'X~iAe)) - Hni(y)]gn(y)dGn(y) i=l
5. Minimum Distance Estimators
186
which, in view of the assumption (l), shows that e'V(r Ae) is monotonic in r for every lIell = 1. Therefore, by (5.5.26) :3 N 2 , depending on E, 3
P (
(e'V(rAe))2 >
inf Irl>b,lIell=1
/'n
z) P ( inf (e'V(rAe))2 2: z) - (E/2), Irl=b,lIell=1
> P(
(e'V(rAe))2 >
inf
Ir/=b,llell=1
>
z)
/'n
-
"[n.
>
1-
2: N 2
\j n
E,
V N1,
by (5.5.29). This proves (5.5.24) and hence (5.5.21) . 0 The next lemma gives an analogue of the previous lemma for KiJ. Since the proof is quite similar no details will be given . Lemma 5.5.5. In addition to the assumptions of Theorem 5 .5.2 assume that (k+) and (l) hold, where (k+) is the condition (k) with
T'n replaced by
r;t
:=
(vi , ' " ,vt ) and where {vt} are defined just
above (5.5 .13) . Then , \j E > 0, 0 < Z < 00, :3 N (depending only on (depending on E, z) 3 \j n 2: N ,
Z)
> 1 - E,
P( inf kiJ(Au) 2: z)
> 1 - E.
P ( inf KiJ(Au) 2: lIull>b
lIull>b
E)
and a b
o
The above two lemmas verify (5.4.A5) for the two dispersions K and K+. Also note that (5.5.22) together with (e) and (j) imply that IIA -1(A - ,8)11 = Op(I), where A is defined at (5.5.31) below . Similarly, Lemma 5.5.5, (e) , (5.5.17) and the symmetry assumption (5.3.8) about {G n } imply that IIA -1(A + - ,8)11 = Op(l) , where A + is defined at (5.5.35) below . The proofs of these facts are exactly similar to that of (5.4.2) given in the proof of Theorem 5.4.1. In view of Remark 5.5.1 and Theorem 5.4.1, we have now proved the following theorems. Theorem 5.5.3 Assume that (1.1.1) holds with th e modeled and actual d.f. 's of the errors {eni' 1
:s: i :s: n}
equal to {Hni , 1
:s: i :s: n}
187
5.5. Asymptotic uniform quadraticity
and {Fni, 1 ::; i ::; n}, respectively. In addition, suppose that (a)(l) hold. Then
(5.5.30) where
A satisfies
the equation
(5.5.31) If in addition,
(5.5.32)
J3 n -1 exists for all n ~ p,
then ,
(5.5.33)
:rnand B n
are defined at (5.5.19) . 0 Theorem 5.5.4 Assume that (1.1.1) holds with the actual d.j. 's of the errors {e ni ,l ::; i ::; n} equal to {Fn i , l i S n} . In addition, suppose that {X, Fni , D, Gn } satisfy (a) - (d), (1) - (i), (5.3.8) for all n ~ 1, (k) , (I) and (5.5.14). Then , where
<
(5.5.34) where ~ + satisfies the equation
(5.5.35) If, in addition,
(5.5.36)
(J3~)-1 exists for all n ~ p,
then,
(5.5.37) where
a: and B~ are defined at (5.5.19) .
0
Remark 5.5.2. If {Fi } are symmetric about zero then mt == 0 and f3t is consistent for f3 even if the errors are not identically distributed . On the other hand, if the errors are identically distributed,
5. Minimum Distance Estimators
188
f3ri
but not symmet rically, then will be asymptotically biased . This is not sur prising becau se here t he symmetry, rather t han t he identically distributed nature of the errors is relevant. If {Fi } are symmetric ab out an unknown common point then that point can be also est imated by the above m.d . method by simply augment ing t he design matrix to include t he colum n 1, if not 0 pr esent alrea dy. Next we turn to t he K and f30 (5.2.17) an d (5.2.18). F irst we state a theorem giving an ana logue of (5.5.9) for K Let Yj, J-Lj be Yd,J-Ld of (2.3.1) with {dnd replaced by {d nij} , j = 1, .. · ,p, X ni replaced by Y ni and Cni = Al (Xni - x n ), 1 ::; i ::; n , where Al and x n are defined at (4.3. 10). Set
o
o'
n
R j (s ) := I ) dnij - dnj(S))( Xni - Xn )qni (S), i :::1
where, for 1 ::; j < p , dnj(s) := n - I 2:~1 dnij eni(S), 0 ::; S ::; 1, with {lnd as in (3.2.34) and qni == f ni( H- I ) , 1 ::; i ::; n . Let
In t he assumptions of t heorem be low, L in KlJ is supposed to have been replaced by L n. Theorem 5.5.5. Let Yni , ' " , Ynn be independent r.v. 's with respective d.f. 's Fnl , '" , Fnn . Assume {D , X , Fnd satisf y (a)-(c) , (2.3.5) , (3.2.12), (3.2.34) and (3.2.35) with ui; = dij , 1 ::; j ::; p, 1 ::; i ::; n . Let {L n } be a sequence of d.f. 's on [0, 1] and assume that (5.5.38)
LP 10t J-L]( s , O)dL n( s) = 0( 1). j :::1
Th en, for every 0 < b <
0 00 ,
sup IK o( A u) - K D(A u)1 = op( I) . lI u ll :Sb Proof. The pro of uses the AUL result of T heorems 3.2.1 and 3.2.4. 0 Det ails are left out as an exercise.
5.5. Asymptotic uniform quadraticity
189
This res ult shows t hat t he dispersion K (5.5.39)
80 W
o satisfies (5.4.Al) wit h
an= A l l , s, = A l l :T~ ,
0,
A~B~A l ,
n
-1r~ (S){YD (S) + 1
:T~
.-
B~
.-
1r~(s)r~
IlD(s)}dLn( s) ,
1
(s)d Ln(s) ,
wher e r~ ( s ) = A~XcA ( s )D ( s ) , D (s ) := ((dnij - dnj( s)) , 1 ::; i ::; n , 1 ::; j ::; p; A(s) as in (2.3.30), 0 ::; s ::; 1; x, as in (4.2.11); Y D := (Yl , ' " , Yp),Il~ = (Il l ,' " , J-Lp) with 1j(s) == Yj(s , O) ,llj(s) == Ilj( s , O). Call t he cond ition (k) by t he na me of (k*) if it holds when (F n, Gn) is replaced by (f~ , L n) . Analogous to Theorem 5.5.4 we have Theorem 5.5 .6 As sum e that (1.1.1) holds with the actual d.f. 's of the errors {eni' 1 ::; i ::; n } equal to {Fni , 1 ::; i ::; n }. In addition, assume that {D , X , Fnd satisf y (N X *), (b), (c), (2.3.5), (3.2.12), (3.2.34)' (3.2.35) with ui, = dij , 1 ::; j ::; p , 1 ::; i ::; n , (k*) and (I). Let {L n} be a sequence of d.f. 's on [0, 1] satisf ying (5.5.38). Then
where
a * satisfi es the equation
(5.5.40)
If, in addition, (5.5.41)
( B~)- l exists for n ~ p,
then, The proof of this theorem is simi lar to t hat of Theorem 5.5.3. The details are left out for int erested readers. 0 Remark 5.5.3. Discussion of the assumptions (a) - (j). Among all of these assumptions, (g) and (i) are relatively harder to verify.
190
5. Minimum Distance Estimators
First, we shall give some sufficient conditions that will imply (g), (i) and the other assumptions. Then, we shall discuss these assumptions in detail for three cases, v.i.z ., the case when the errors are correctly modeled to be i.i.d. F , F a known d.f., the case when we model the errors to be i.i.d . F but they actually have heteroscedastic gross errors distributions , and finally , the case when the errors are modeled to be i.i.d . F but they actually are heteroscedastic due to difference in scales . To begin with consider the following assumptions. For any sequence of numbers {ani , bnd , ani
(5.5.42)
< bni, with
max (bni - ani) ---+ 0,
l~i ~n
limnsu p
!~!l~xn bni ~ ani l:~i J{fni(Y + z) -
fni(y)}2
xdGn(y)dz = O. (5.5.43) Claim 5.5.1. Assumptions (a) - (d), (5.5.42), (5.5.43) imply (g) and (i) . Proof. Use the C-S inequality twice , the fact that (d~)2 ::; dlj for all i , j and (b) to obtain P
L
J[Ld~{Fi(Y + c~v + o~d - + < t Ii Jt O~i fbi + < mfx(2o~d-1 l~i J + n
j=l
Fi(y
Ci V -
O~i)}] dGn(y) 2
i =l
dil12
2
z= l
z=l
4p202
f;(y
z)dzdGn(y)
fl{y
z)dGn(y)dz ,
at
where a; = -~iO +
+ c~v , 1 < < (k
which shows that (g) holds.
(by Fubini) ,
n . Therefore, by
= lim sup max IIiI;), n
z
5.5. Asymptotic uniform quadraticity
191
Next, by (b) and two applications of the C-S inequ ality, t he
L.R.S. (i)
~ f [t, d;j{ F;(y + c;u) - F; (y) <
pit
{ Fi(y +
i= l
=
[t u
p{f E;
(J;(y
f
+ Ei [<
2p
c~u) -
f
(J;(y
c~UJi(Y)} 2 dGn(y)
Fi(Y) -
+ z) -
C;Uj;(y))] 2 sa; (yl
f ;(y))dZ] 2 dCn(y)
+ z) -
j;(Y))dZ] 2 dCn(y)}
I [Etc~u I
~
< [m?-x - \I
t:
ciu . - 1C; ul
1
Ji(y)}2dz]dGn(Y)
•
l {h (Y + z ) - h(y)} 2dGn(Y)dz] n
x 4p 2)c~u) 2 , i= l
where Et (Ei) is th e sum over those i for which c~ u 2: O( c~ u < 0). Since E~=l ( c~ u)2 ::; pB for all U E N(b ), (i ) now follows from (5.5.42) and (a). 0 Now we consider th e three sp ecial cases mentioned above. Case 5.5.1. Correctly modeled i.i.d. errors: Fni _ F H ni , Gn == G. Suppose t hat F has a density f w.r.t. A. Assume t hat 2dG (5.5.44) (a) 0 < fdG < 00 , (b) 0 < f < 00 . (5.5.45)
I
(5.5.46)
(a)
I
I
F(l - F)dG <
(b)
lim
I I
z-tO
lim
z-t O
f(y
00 .
+ z )dG(y ) =
f2(y
+ z )dG)y ) =
I I
fdG , f 2dG.
192
5. Minimum Distance Estimators
Claim 5.5.2 . Assumptions (a) , (b) , (d) with G n == G, (5.5.44) (5.5.46) imply (a) - (j) with G n == G. This is easy to see. In fact here (e) and (J) are equivalent to (5.5.44a) , (5.5.45) and (5.5.46a) ; (5.5.42) and (5.5.43) are equivalent to (5.5.44b) and (5.5.46b) . T he LHS (j) is equa l to O. Note that if G is abso lute ly cont inuous then (5.5.44) implies (g). If G is p ure ly di screte and f cont inuous at the p oints of jumps of G then (5.5.46) holds. In part icul ar if G = bo, i.e. if G is degen er ate at 0, 00 > f (O ) > 0 and f is cont inuous at 0 then (5.5.44), (5.5.46) are t rivially sa t isfied . If G (y ) == y, (5.5.44a) and (5.5.46a) are a priori sat isfied whil e (5.5.45) is equivalent to assuming that El e1 - e21 < 00 , e1, e2 i.i.d . F . If dG = {F (1 - F) }-1 dF , the so called Darling - Anderson measure , th en (5.5.44) - (5.5.46) are satisfied by a class of d.f. 's that inclu des normal , logistic and double expo nent ial dis t ributions . Case 5.5.2. Heteroscedastic gross errors: H ni == F, Fni == (1 bndF + bniFO· We sha ll also ass ume t hat G n == G . Let f a nd f o be cont inuous densities of F and Fo . Then { Fnd have densities f ni = f + bni(JO - 1) , 1 :::; i :::; n . Hence (c) is satisfied. Cons ide r t he ass umption (5.5.47)
o :::; bni :::;
(5.5.48)
J!Fo-
max bni -+ 0,
1,
z
Fl dG <
00 .
Claim 5.5.3. Suppose that fo , f , F satisfy (5.5.44) (5.5.46), and (5.5.45), and suppose that (a ), (b) and (d) hold. Then (5.5.47) and (5.5.48) imply (e) - (i). Proof. The relation
Vj -
Ii == f + bi(JO- 1) implies that
n
n
i=1
i=1
L dijcd = L dij cibi(Jo -
1) ,
and n
"[n. -
L i=1
n
2
II d i l1 f
=
L i=1
IIdi l12 bi(Jo -
1) .
5.5. Asymptotic uniform quadraticity
jbn(Y+x) -
193
tlldi I12 f (y + x )]dG(y) i=l
< pmrxoi IjUo(y + x ) - f(y + X)]dG(y)1' Therefore, by (5.5.47) , (5.5.44a) and (5.5.46a) , it follows that (J) is satisfied. Similarly, the inequality
~J
n
Vj -
L dijcd
2
dG
i=l
ensures the satisfaction of (h) . The inequality
j
t
Ildi ll 2 { Fi (1 -
Fd - F(l - F)}dG
i=l
~ 2pmrxoi j!Fo -
FldG ,
(5.5.45) , (5.5.47) and (5.5.48) imply (e). Next ,
j {Ji(y
+ x ) - Ji(y)}2dG(y)
< 2(1 + 2of) j U(y + x) - f(y)}2dG(y) +4of j Uo(y
+ x) -
fO(y)}2dG(y).
Note that (5.5.44b) , (5.5.46b) and the continuity of
f imply that
lim jU(y + x ) - f(y)}2dG(y) = 0
x--+o
and a similar result for fo. Therefore from the above inequality, (5.5.46) and (5.5.47) we see that (5.5.42) and (5.5.43) are satisfied. By Claim 5.5.1, it follows that (g) and (i) are satisfied. 0
194
5. Minimum Distance Estimators
Suppose that G is a finit e measure. Then (Fl) implies (5.5.44) - (5.5.46) and (5.5.48) . In particular these assumpt ions are sat isfied by all those 1's that have finit e Fisher inform ation. The assumpt ion (j), in view of (5.5.48) , amounts to requiring that
(5.5.49)
But
n
p
n
n
n
L(L dij od = L L d~OidkOk :S (L Ild 2
j=1 i=1
i=1 k=1
d2 .
illo
i=1
This and (b) suggest a choice of Oi == p-1/21Id ill will satis fy (5.5.49). Note that if D = XA then IIdl1 2 == X~(X/X)-1xi '
/3x
When studyin g the robustness of in the following section, 0; == p- 1x~(X/X) -1xi is a natural choice to use. It is an an alogue of n -1/2- contaminat ion in the i.i.d. set up. 0 Case 5.5.3. Heteroscedastic scale errors: H ni == F, Fni( y) == F( TniY) , G n == G. Let F have continuous density f . Consider t he condit ions
(5.5.50) (5.5.51)
Tni == IJni + 1; IJni > 0, 1 :S i :S n ; m~XIJni ---+ 0. lim
s~ 1
JIylj
fk( sy)dG( y) =
JIylj
t
f k(y)dG( y) ,
j = 1, k = 1, j = 0, k = 1,2.
Claim 5.5.4. Under (a ), (b), (d) with G n == G , (5.5.44) (5.5.46) , (5.5.50) and (5.5.51) , the assumpt ions (e) - (i) are satisfied.
Proof. By the Lemmas 9.1.5 and 9.1.6 of t he App endix below, and by (5.5.23), (5.5.27) , and (5.5.31), (5.5.52)
lim lim sup max
x~o
lim
x~o
n
J
If(y
t
J
+ x) -
If(T1( y + x )) - f( y + x W dG(y) = 0,
f( yW dG( y) = 0, r = 1,2.
5.5. Asymptotic uniform quadraticity
195
Now,
! t IIdIl
2{F (1 - Fd - F (l - F )}dG i
i= l
-r] W(TIY) - F (y)ldG(y) 2p mrx i ! lylf(sy)dG(y)ds = 0(1),
< 2p
Ti
<
by (5.5.30) and (5.5.31) wit h j = 1, r = 1. Hence (5.5.45) implies (e). Next ,
!
'Yn(Y + x )dG(y) -
t
<
2 Ii dill Ti
i= l
!
t ! Iidi ll 2
fdG
~= l
{ lf h (Y + x )) - f(y + If(y
+ x )1
+ x ) - f(y)1 }dG(Y) +
mrx ow ! fdG .
Therefore, in view of (5.5.30) , (5.5.52) and (5.5.44) we obtain (1). Next , consider
!
U i(Y + x ) - Ji(y)} 2dG(y)
< 4Tl ! { !f(Ti(Y + x )) - f(y + x)J2 + [j (y + x ) - f(yW
+ [j (TiY) - f(y)J2 } dG(Y)
T herefore, (5.5.50) and (5.5.32) impl y (5.5.21), and hence (g) and (i) by Claim 5.5.1. Note t hat (5.5.32) and (5.5.23b ) imply (5.5.22). F inally, n
p
L! I v j - L j=l
mrx ! {Tilh y) - f(y)}2dG(y) 2p2 mrx Tl B i 9 [! Uhy ) - f(y)}2dG(y) + ! f 2dG] = 0(1),
< p2 <
dij cilll2dG
i= l
5. Minimum Distance Estimators
196
by (5.5.50) , (5.5.46b) , (5.5.52) . Hence (5.5.46b) and the fact that
o
implies (h). Here , the assumption (j) is equivalent to having n
p
j[Ldij{F(TiY) - F(y)}fdG(y) = 0(1) .
L
i= l
j=l
One sufficient condit ion for this, besides requiring F to have density
f satisfying lim j(Yf( sy))2dG(y) = j(yf(y))2dG(Y) <
(5.5.53)
00 ,
s-+l
is to have n
Lor = 0(1) .
(5.5.54)
i= l
{od
or
One choice of satisfying (5.5.54) is == n- 1/ 2 and the other choice is = x~(X'X)-lXi , 1 ~ i ~ n . Again, if f satisfies (Fl) , (F2) and G is a finite measure then (5.5.44) , (5.5.46) , (5.5.51) and (5.5.53) are a prior i satisfied. Now we shall give a set of sufficient conditions that will yield (5.4.Al) for the Q of (5.2.9). Since Q does not satisfy (5.3.21), the distribution of Q under (1.1.1) is not independent of f3 . Therefore care has to be taken to exhibit this dependence clearly when formulating a theorem pertaining to Q. This of course complicates the presentation somewhat. As before with {H ni }, {Fnil denoting the modeled and the actual d.f.'s of {end , define for 0 ~ S ~ 1, y E JR, t E W ,
or
m n(s , y)
.-
n- 1/ 2
ns
L {Fni(y -
x~if3) - Hni(y - x~if3)} ,
i=l ns
Hn( s , y , t)
.- n- 1 L Hni(y - x~it) , i= l ns
M 1n( s, y)
.-
n- 1/ 2 I)I(Yni ~ y) - Fni(Y - x~if3)} , i= l
do:n( s , y) .- dLn(s)dGn(y)
197
5.5. Asymptotic uniform quadraticity Observe that
Q(t) =
/[M1n (S, y) - n
+ mn(s , y)
1/2 -
-
2
{H n (s, y , t) - H n (S , y, ,8)}] dan (S , y) ,
where the integration is over the set [0, 1] x lit Assume that {Hnd have densities {hni} w.r .t. A and set, for S E [0,1], Y E JR, ns
(5.5.55)
Rn(s , y)
.-
n- 1/ 2
L Xnihni(Y -
x~i,8) ,
i=1 n
h~(y)
.-
n- 1
L h~i(Y -
x~d3) ,
i=1
v; .-
ARn,
B in := / vnvn,da n·
Finally define, for t E W ,
Theorem 5.5.7. Assume that (1.1.1) holds with th e actual and th e modeled d.f. 's of the errors {eni ' 1 S; i S; n} equal to {Fni ' 1 S; i S; n} and {Hni , 1 S; i S; n}, respectively. In addition, assume that (a) holds, {Hni , 1 S; i S; n} have den sities {hni' 1 S; i S; n} w.r.i. A, and the following hold. Ih~ln
(5.5.56)
vv
E N(b) , V 6
= 0(1).
> 0,
where ani = -6/'l,ni - c~iv , bni AXni' 1 S; i S; n .
=
6/'l,ni - C~iv , /'l,ni
= IIcn il/,
Cni =
5. Minimum Distance Estimators
198
vu
E
N(b),
f{
n 1/2 [H n(s,y ,{3 + Au) - H n(s ,y,{3)]
+u'vn } 2 dan(s, y)
f t x~i{3)(1 f m~(s,y)dan(s,y) n- 1
Fni(y -
- Fni(y -
= 0(1).
x~i{3))dGn(Y) =
0(1).
i=l
= 0(1) .
Then \i 0
< b < 00 , E sup IQ({3 + Au) - Q(Au)1 = 0(1). lIull:Sb
The details of the proof are similar to those of Theorem 5.5.1 and are left out as an exercise for interested readers. An analogue of (5.5.34) of 73 will appear in the next section as Theorem 5.6a.3. Its asymptotic distribution in the case when the errors are correctly modeled to be i.i.d. will be also discussed here . We shall end this section by stating analogues of some of the above results that will be useful when an unknown scale is also being estimated. To begin with, consider Kn of (5.2.19). To simplify writing, let K~(s , u) := Kn((1
+ sn- 1/ 2 ) , Au) ,
s E IR, u E JRP.
Write as := (1 + sn- 1/ 2 ) . From (5.2.19),
where Hi is the d .f. of e. , 1 ::; i ::; n, and where J-lJ , Yjo are as in (i) and (5.5.1)) , respectively. Writing J-lJ(y) , YjO(y) etc. for J-lJ(y , 0), Yi(y, 0)
199
5.5. Asymptotic uniform quadraticity etc., rewrite K~(s, u) p
L j=l
=
j {Y/(ya s, u) - YjO(y) + J-L~(yas) - J-L~(Y) - syv;(y) + YjO(y) + U'Vj(y) + syv;(y) + mj(y) + J-L~(yas , u) - J-L~(yas) - J-L'Vj(ya s) + u'[Vj(ya s) - Vj(y)]} 2 dGn(y)
where Vj are as in (h) and v;(y) := n- 1 / 2 E~=l dnijfni(y) , 1 :::; j :::; p. This representation suggests the following approximating candidate:
k~(s , u)
p
:=
j {Y/ + U'Vj + syv; + mj}2dGn.
L )=1
We now state Lemma 5.5.5 . With In as in (J) , assume that 'lisE JR, (5.5.57)
+ sn- 1/ 2)y + x)dGn(y)
lim limsupj,n((l
x---+o
n
= limnsup
j
In (y )dGn (y)
< 00 ,
and
(5.5.58)
limlimsupj lyl,n(Y
s---+o
n
= limnsu p
+ zy)dGn(y)
j IYiJn(y)dGn(y) <
00
Moreover, assume that V(s, v) E [-b, b] x N(b) = : C1 , 0 and '118 > 0 p
n
j [Ld;ij{Fni(ya s + C~iv + j=l i=l
lim sup L
n (5.5.59)
-Fni(ya s + CniV
-
8(n -
1
/
00,
2 IYI + "'nd)
8(n - 1/ 2IYI + "'ni))}
for some k not depending on (s , v) and 8.
r
dGn(y) :::; k8 2,
5. Minimum Distance Estimators
200 Th en , V 0
< b < 00 ,
L J{Y/ (l + sn p
(5.5.60)
Esup
I 2 / )y, u )
j =I
_Y/(y)} 2 dGn(y) = 0(1), where th e supremum is tak en over (s , u ) E CI . P roof. For each (s, u ) E CI , with as = 1 + sn-
I 2 / ,
p
E.L J {Y/(ya s, u ) -
Y/(y)} 2dGn(y)
j=I
<
i:nnJ
'"Yn(ya s + z )dGn(y )dz
J
+ ibbnn IYhn (y + zy)dGn(y)dz where E n = bmaxi IIKill, bn = bn (5.5.58), for every (s, u ) E CI ,
I 2 / .
Therefore, by (5.5.57) and
p
E .L J {yjo(ya s , u ) - YjO(y)}2dG n(y) = 0(1). j=I
Now pro ceed as in the proof of (5.5 .3), usin g the monotonicity of Vjd(a, t ), f.L~ (a , u ) a nd the compactness of CI to conclude (5.5 .60). Use (5.5 .59) in place of (g). The details are left out as an exercise. D
The proof of the followin g lemma is quite simi lar to that of (5.5.10) . Lemma 5.5.6. Let G~(y) = Gn(y /a T ) . Assume that for each fixed (r , u ) E CI , (h) and (i) hold with G n replaced by G~ . In add it ion, assume th e following : p
L J(yvj (y))2dGn(Y) = 0(1) . j=I
t J{f.L~(yas) j=I
-
f.L~(y) -
ryvj(y)}2dG n(y) = 0(1), V
lsi :s; b.
5.6. Asymptotic distributions, efficiency & robustness
201
< b < 00,
Th en , V 0
t!
sup
{J-LJ(ya s , u ) - J-LJ(aTY)
(S,U)ECI j=I
-U'Vj (aTY)} 2 dGn(y ) = 0(1), p
sup L
!{J-LJ(ya s )
-
J-LJ(y ) - Tyvj (y)}2 dGn (y) = 0(1).
Isl:Sb j = I
Theorem 5.5.8 Let Y n I , .. . , Y n n be in depen dent r.v. 's with respective d.f. 's Fn I , ' .. , Fn n . A ssum e (a) - (e) , (h) , (j) , (5.5.57) - (5.5.59) an d th e con ditions of Lem ma 5.5.6 hold. M oreover assume that for each lsi ::; b p
L ! Ilvj( ya s )
-
v j( y) 11 2dGn(y ) = 0(1).
j=I
Th en, V 0
< b < 00 ,
where the supremum is tak en over (s , u ) E CI . The proo f of t his th eorem is qui te similar to that of Theorem 0 5.5.1.
Asymptotic Distributions, Efficiency &
5.6
Robustness 5.6.1
Asymptotic distributions and efficiency
To begin with consider th e Case 5.5.1 and th e class of estim ators {,8 D}' Recall that in this case the err ors {end of (1.1.1) are correct ly mod eled to be i.i.d. F , i.e., H n i == F == F n i . We sha ll also take Gn == G, G E JD)I(lR). Assume that (5.5.44) - (5.5.46) hold. The various quantities appearing in (5.5.19) and Theorem 5.5.3 now take t he following simpler forms.
r n(y ) = AX'Df (y ), y E lR,
(5.6.1)
e;
=
AX'DD'XA! f dG ,
a; =
-AX'D !
Y~fdG .
202
5. Minimum Distance Estimators
Not e that 8 ;;1 will exis t if and only if the rank of D is p. Not e also that
(5.6.2)
1 8n ..,/T n
- (D'XA)- 1 J
Y~fdG/ (J f 2dG)- 1
(D 'X A J f 2dG )-1
t
d i [1j1(ei ) - E 1j1 (edJ,
i=1
where 1j1(y) = J~oo f dG , y E ~. Because Gn == G E JI))I(~ ) , th ere always exists agE L ; (G) such that 9 > 0, and 0 < J g2dG < 00 . The conditi on (5.5.k ) with gn == 9 becomes (5.6.3)
lim inf inf le'D'XAel 2:: n Ilell=1
0,
for some
0
> O.
Cond ition (5.5.l) implies th at e'D'XAe 2:: 0 or e'D'XAe < 0, VileII = 1 and V n 2:: 1. It need not imply (5.6.3) . The above d iscussion together wit h t he L-F Cra mer-Wold Theorem lead s to Corollary 5 .6a.1. Assume that (1.1.1) holds with the error r.v. 's correctly modeled to be i.i. d. F, F known . In addition, assume that (5.5.a), (5.5.b), (5.5.l) , (5.5.44) - (5.5.46), (5.6.3) and (5.6.4) hold, where (5.6.4)
(D 'X A) - 1 exists for all n 2:: p.
Then,
(5.6.5)
A-I (!3D - (3)
(D'XA J f 2dG) -1
t
i=1
If, in addition, we assum e (5.6.6)
then
(5.6.7)
d ni[1j1(end - E 1j1(endJ+ op (l ).
5.6.1. Asymptotic distributions, efficiency & robustness
203
whe re
~D
'=
.
(D'XA)-1D'D(AX'D )-1
2 ,T
= Va7"lp( ed
(J j2dGF'
0
For any two square matrices L 1 and L 2 of the same order, by L 1 ~ L 2 we mean t hat L 1 - L 2 is non-negative definite. Let L and J be two p x n matrices such tha t (LL')-1 exists . The C-S inequ ality for matrices states that
(5.6.8) with equality if and only if J ex: L. Now note that if D = XA then ~D = I p x p . In general, upon choo sing J = D' , L = AX' in this inequality, we obtain
D'D
~
D'XA . AX'D or
~D ~ I p x p
with equality if and only if D ex: XA . Fr om t hese obse rvat ions we deduce Theorem 5.6a.l ( Opt imality of fJx ). Suppose that (1.1.1) holds with th e error r. v. 's correct ly modeled to be i.i. d. F . In addi tion, assume that (5.5 .a) , (5.5.d) with G n == G , (5.5.44) - (5.5.46) hold. T hen , among the class of estim ators {fJ D; D satisfying (5.5.b), (5 .5.l) , (5.6.3) , (5.6.4) an d (5.6.6) }, th e estim ator th at minimizes the asymptotic variance of b ' A - 1(fJn - (3), f or every b E IRP , is fJx- the fJn with D = XA . 0 Ob serve that under (5.5.a) , D = XA a priori sati sfies (5.5.b), (5.6.3) , (5.6.4) and (5.6.6). Conse que ntl y we obtain Corollary 5.6a.2. (A sym pt ot ic n ormalit y of fJx) . A ssume that (1.1.1) holds with th e er ror r.v. 's correctly modeled to be i.i. d. F . In add it ion, assu me that (5 .5.a) an d (5.5.44) - (5.5.46) hold. Th en , A
- 1
((3x - (3) A
-'td
2
N( O, T Ip x p ) .
o
Remark 5.6a.1. Wri te fJn (G) for fJn to emphasize the dependen ce on G. The above t heorem proves the op t imality of fJx (G) among a class of estimators {fJn (G) , as D varies}. To obtain an asy mpt ot ically efficient est imator at a given F amo ng t he class of estimato rs
5. Minimum Distance Estimators
204
{.8x (G) , G varies} one must have F and G satisfy the following relation. Assume that F satisfies (3.2.a) of Theorem 3.2.3 and all of the derivatives that occur below make sense and that (5.5.44) hold. Then, a G that will give asymptotically efficient .8x (G) must satisfy the relation
-fdG
(I/I(J) . d(j / J),
I(J) :=
J
(j / J) 2 dF.
From this it follows that the m.d . estimators .8x(G), for G satisfying the relations dG(y) = (2/3)dy and dG(y) = 4doo(y), are asymptotically efficient at logistic and double exponential error d.f. 's , respectively. For .8x(G) to be asymptotically efficient at N(O,I) errors, G would have to satisfy f(y)dG(y) = dy . But such a G does not satisfy (5.5.58) . Consequently, under the current art of affairs , one can not estimate f3 asymptotically efficiently at the N(o , 1) error d.f, by using a .8x (G) . This naturally leaves one open problem, v.i.z., Is the conclusion of Corollary 5.6a.2 true without requirinq f f dG < 00,0 < f f 2 dG < oo? 0 Observe that Theorem 5.6a .l does not include the estimator .81 - the .8D when D = n 1 / 2 [1, 0"" ,O]nxp Le., the m.d. estimator defined at (5.2.4) , (5.2.5) after Hni is replaced by F in there. The main reason for this being that the given D does not satisfy (5.6.4). However, Theorem 5.5.3 is general enough to cover this case also . Upon specializing that theorem and applying (5.5.31) one obtains the following Theorem 5.6a.2. Assume that (1.1.1) holds with the errors correctly modeled to be i.i.d. F. In addition, assume that (5.5.a), (5.5.44) (5.5.46) and the following hold. (5.6.9)
Either
n-l/2elx~iAe 2: 0 for all 1 ~ i ~ n, all
lIell
= 1,
or
n-l /2elx~iAe ~ 0 for alII ~ i ~ n, all lle] = 1.
(5.6.10)
5.6.1. Asymptotic distributions, efficiency & robustness where
xn
(5.6.11)
is as in (4.2a.11) and
n
1/2-'
xnA . A
-1
fit
205
is the first coordinate of O. Then
A
_
({31 - {3) -
Zn J j2dG + "» ( 1) ,
where n
Zn
= n- 1 / 2 I)1/J(e nd - E1/J(end} ,
with
1/J as in (5.6.2) .
i=1
Consequently, nl/2x~(,Bl - {3) is asymptotically a N(O, 72) r.v. 0 Next, we focus on the class of estimators {{3i;} and the case of i.i. d. symmetric errors. An analogue of Corollary 5.6a.1 is obtained with the help of Theorem 5.5.4 instead of Theorem 5.5.3 and is given in Corollary 5.6a.3. The details of its proof are similar to those of Corollary 5.6a.1. Corollary 5.6a.3 . Assume that (1.1.1) holds with th e errors correctly modeled to be i.i.d. symmetric around O. In addition, assume that (5.3.8), (5.5.a), (5.5 .b), (5 .5.d) with G n == G, (5.5.44) , (5.5.46), (5.6.3), (5.6.4) and (5.6.12) hold, where
1
00
(5.6.12)
(1 - F)dG <
00
Then ,
(5.6.13) =
A -1({3i; - {3)
-{2AX'D
J
f 2dG} -1
J
W +(y)f+(y)dG(y)
+ op(l) ,
where f +(y) := f(y) + f(-y) and W+(y) is W+(y,O) of (5.5.13). If, in add it ion , (5.6.6) holds, then
(5.6.14)
o
Consequently, an analogue of Theorem 5.6a.1 holds for {3j{ also and Remark 5.6a.1 applies equally to the class of estimators {{3j( (G) , G varies} , assuming that the errors are symmetric around O. We leave it to interested readers to state and prove an analogue of Theorem 5.6a.2 for {3t .
5. Minimum Distance Estimators
206
o}
Now consider the class of estimators {,B of (5.2.18) . Recall the notation in (5.5.39) and Theorem 5.5.6. The distributions of these estimators will be discussed when the errors in (1.1.1) are correctly modeled to be LLd. F, F an arbitrary d.f. and when L n == L. In this case various entities of Theorem 5.5.6 acquire the following forms:
fni(s) == 1; D(s) == D , under (5.2.21) ; D q(s ), q = f(F- 1);
0;
A 1X c
= B*n
=
-AIX~D /
YDqdL
(AIX~DD'XcAd/
= AIX~D
t
dniepo(F(end ;
i=1
q2 dL ,
where X, and Al are defined at (4.2.11) and where (5.6.15)
epo(u)
l
:=
u
q(s)dL(s), O:S u :S 1.
Arguing as for Corollary 5.6a.1, one obtains the following Corollary 5.6a.4. Assume that (1.1.1) holds with the errors correctly modeled to be i.i. d. F and that L is a d.j. In addition, assume that (Fl) , (NX c ), (5.2.16), (5.5.b), and the following hold. liminf inf le'D'X cA 1 el 2: a > O.
(5.6.16)
lIell =1
n
(5.6.17)
Either
e'dni(xni -
x n )' A 1e 2: 0, VI :S i:S n, V [e] = 1,
or
(5.6.18)
(D'X c A 1 )
-1
exists for all n 2: p .
Then,
(5.6.19)
1
A1
CBo- 13)
1 1
(D'X cA 1
o
q2dL)-1
t
i =1
dniep(F(end)
+ op(l) .
5.6.1. Asymptotic distributions, efficiency & robustness
207
If, in addition, (5.6.6) holds, then
(5.6.20) where ~D = ( D/Xc Al) -l D'D(AlX~ D)-l , and 2 _
(5.6.21)
ao -
Varrpo(F(ed) (fol q2dL)2 '
with rpo as in (5.6.15) . Consequently, (5.6.22) and {,8x e } is asymptotically efficient among all {,8D' D satisfying o above conditions .
Consider the case when L(s) 2
J J[F(x /\ y) -
ao =
== s. Then
F(x)F(y)]j2(x)f2(y)dxdy (J P(x)dx)2
It is interesting to make a numerical comparison of this var iance with that of some other well celebrated estimators. Let alad' a~ and a;s denote the respective asymptotic variances of the Wilcoxon rank, the least absolute deviation, the least square and the normal scores estimators of ,8. Recall, from Chapter 4 that
a;,
a:; ~ als
=
{12(J f'(X)dX) ' } a2;
a;s
- 1
a1'd ~ (2f(OW' ;
= {[J f 2 (x )/ rp (
- l (F ))Jdx } -2 ;
where a 2 is the error variance. Using these we obtain the following table.
F Double Exp. Logistic Normal Cauchy
Table 1 a 02 a w2 1.2 1.333 3.0357 3 1.0946 tt /3 2.5739 3.2899
2 a/ad 1
4 7f/2 2.46
2 a ns 7f2 7f 1
a2 2 7f2/3 1 00
208
5. Minimum Distance Estimators
It thus follows that the m.d. estimator j3xJL), with L( s) == s, is superior to the Wilcoxon rank estimator and the LAD estimator at double exponential and logistic errors, respectively. At normal errors, it has smaller variance than the LAD estimator and compares favorably with the optimal estimator. The same is true for the m.d. estimator x (F) . Next , we shall discuss 13 . In the following theor em the framework is the same as in Theorem 5.5.7. Also see (5.5.82) for the definitions of ti.; BIn etc. Theorem 5.6a.3. In addition to the assumptions of Theorem 5.5.7
/3
assume that (5.6.23) lim inf inf n
11011 =1
J
I v~danOI ~
a , for some a > O.
Moreover, assume that (5.6.9)) holds and that 8 1;
(5.6.24)
exists for all n ~ p.
Then, (5.6.25)
A -1(13 - 13)
JJ
-8 1;
vn( s , y){Mln( S, y)
+ mn(s , y)}dan(s, y) +op(l) .
Proof. The proof of (5.6.25) is similar to that of (5.5.33), henc e no 0 det ails are given . Corollary 5.6a.5. Suppose the conditions of Theorem 5.6a.3 are satisfied by Fni == F == Hni' Gn == G, L n == L , where F is supposed to have continuous density f . Let
JJ11 1
C
=
o
ns
1
[{An-
0
1
nt
L L x ixjfi(y)fj(y)A}(s l\t) i= l j = 1
X
(F(Y
1\
z) - F(y)F(z))] dai« , y)da(t , z) ,
where Ii(y) = f(y - x~j3) , and da( s ,y) = dL(s)dG(y) . Then the asymptotic distribution of A - 1(13 _ 13 ) isN(o, '£0(13)) where '£0(13) = 1CB - I B0 In In '
209
5.6.2. Robustness
Because of the dependence of ~o on f3 , no clear cut comparison between f3 and iJx in terms of their asymptotic covariance matrices seems to be feasible. However, some comparison at a given f3 can be made. To demonstrate this, consider the case when £(s) = s, p = 1 and /31 = O. Write Xi for XiI etc . Note that here, with = I:~=1
T;
BIn
=
T;2 inr n-
1
n
ns ns L Xi L xjds . i=1
1
C
xl,
T;211 o
0
X
1
J2
j dG ,
j=1
ns nt n- 1 LXi L Xj(s i=1
1\
t)dsdt
j=1
J J[F(Y 1\ z) - F(y)F(z)]d'IjJ(y)d'IjJ(z).
Consequently
L;o(O) =
Recall that T2 is the asymptotic variance of Tx(Sx - {3). Direct integration shows that in the cases Xi == 1 and Xi == i , Tn -+ ~~ and ~~ , respectively. Thus, in the cases of the one sample location model and the first degree polynomial through the origin, in terms of the asymptotic variance, Sx dominates 7J with £(s) = s at /3 = o. 0
5.6.2
Robustness
In a linear regression setup an estimator needs to be robust against departures in the assumed design variables and the error distributions. As seen in the previous section, one purpose of having general weights D in iJn was to prove that iJx is asymptotically efficient among a certain class of m.d. estimators {iJn,D varies}. Another purpose is to robustify these estimators against the extremes in the design by choosing D to be a bounded function of X that satisfies all other conditions of Theorem 5.6a.1. Then the corresponding iJn would be asymptotically normal and robust against the extremes in
5. Minimum Distance Estimators
210
the design, but not as efficient as fJx . This gives another example of the phenomenon that compromises efficiency in return for robustness. A similar remark applies to {,sri} and {,sD}' We shall now focus on the qualitative robustness (see Definition 4.4.1) of fJx and ,si . For simplicity, we shall write 13 and ,s+ for fJx and ,si, respectively, in the rest of this section. To begin with consider 13. Recall Theorem 5.5 .3 and the notation of (5.5.19) . We need to apply these to the case when the errors in (1.1.1) are modeled to be i.i.d. F, but their actual d.f.'s are {Fnd, D = XA and G n = G. Then various quantities in (5.5.19) acquire the following form.
rn(y)
where for 1
,
on
AX'A*(y)XA,
13 n = AX'! A*IIA*dGXA,
!r
+ An(y)]dG(y)
~
n(y)AX'[on(Y)
=
z, + b.,,
say,
i ~ n, y E JR,
.- (a nl,an2 , '
"
, ann), A~:= (.0. nl,.0.n 2, · · · ,.0.nn );
II .- X(X'X)-lX';
bn
:=
! r n(y)AX' An(y)dG(y).
The assumption (5.5.a) ensures that the design matrix X is of the full rank p . This in turn implies the existence of 13;;-1 and the satisfaction of (5.5.b) , (5.5.1) in the present case. Moreover , because c; = G, (5.5.k) now becomes (5.6.26)
liminf inf kn(e);:: T ' for some T > 0, n Ilell=l
where
kn(e) :=
e'AX'
!
A*gdGXAe,
°
Ilell
= 1,
and where 9 is a function from JR to [0,00], < J gr dG < 00, r = 1,2. Because G is a (J - finite measure, such a 9 always exists. Upon specializing Theorem 5.5 .3 to the present case, we readily obtain Corollary 5.6b.1. Assume that in (1.1.1) the actual and modeled d.J. 's of the errors {eni , 1 ~ i ~ n} are {Fni , 1 ~ i ~ n} and F ,
5.6.2. Robustness
211
respectively. In addition, assume that (5.5.a), (5.5.c) - (5.5.j) with D = XA, Hni == F, Gn == G, and (5.6.26) hold. Then A -1(/3 - {3) = -B~l{Zn
+ b n} + op(l) .
0
Observe that B~lbn measures the amount of the asymptotic bias in the estimator /3 when F ni i= F . Our goal here is to obtain the asymptotic distribution of A - 1(/3 - {3) when {Fnd converge to F in a certain sense. The achievement of this goal is facilitated by the following lemma. Recall that for any square matrix L , IILlioo = sup{IIt'LIl :s: I}. Also recall the fact that (5.6 .27)
IILlloo
:s:
{tr .LL'}1/2,
where tr. denotes the trace operator. Lemma 5.6b.1. Let F and G satisfy (5.5.44). Assume that (5.5.e) and (5.5.j) are satisfied by G n == G,{Fnd ,Hni == F and D = XA. Moreover assume that (5.5.c) holds and that (5.6.28)
Then with I = I pxp, (i) IIB n - If f 2dGIIoo = 0(1). (ii) IIB~l - I(f f 2dG)-1I1oo = 0(1). (iii) Itr.B n - p f f 2dGI = 0(1). (iv) I Z=~=1 f II vjll2dG - p f j 2dGI = 0(1). (v) IIb n - fAX' An(y)j(y)dG(y)} = 0(1). (vi) IIZn - fAX'O:n(y)j(y)dG(y)1I = op(l) . (vii) sUPllell=llkn(e) - f jgdGI = 0(1). Remark 5.6b.1. Note that the condition (5.5.j) with D = XA, G n == G now becomes (5.6 .29)
Proof. To begin with, because AX'XA == I, we obtain the relation rn(y)r~(y) - j2(y)I
= AX'[A*(y) - j(y)I]XA ·A X ' [A *(y) - f(y)I]XA AX'C(y)XA· AX'C(y)XA 1)(y)1)'(y) ,
y E IR,
5. Minimum Distance Estim ators
212
where C(y ) := A *(y ) - If (y ), 'D(y ) := AX'C (y )XA , y E lit Therefore, II S n
(5.6.30)
I
-
J
f 2dGli oo
<
sup Il tl l~ l
= 'D'D'.
1It''D(y)'D'( y)lIdG(y)
J
< where L
J
{t r.LL'}1/2dG
Note t hat , by t he C-S inequa lity,
tr.L L ' = tr.'D'D''D''D ::; {t r.'D'D'}2 .
Let Oi =
Ii - t,
1 ::; i ::; n . Then n
(5.6.31)
Itr·'D'D'1 =
tr.
n
2:: 2:: AXiX~A . AXjxjA .s.s, i=l j=l
n
n
i= l
j=l
n
n
2:: 2:: OiOj (xj A A x d2 <
2:: 2:: IOiOjl ' lIx~ A I1 2 . II x jA II 2 i=l j=l
=
(t II
AXiIl 21&il) 2 = p"
Consequent ly, from (5.6.30) and (5.6.31),
IIBn
-
I
f
f'dG II 00
<
f (t II
AXi Il)
' If; -
f l'dG
~ 0(1),
by (5.6.28). This proves (i) while (ii) follows from (i) by using the determinant and co-facto r form ula for th e inverses. Next , (iii) follows from (5.6.28) and the fact that Itr.S n
-
P
J
f 2dGI =
I
J
tr .'D'D'dG I ::; Pn ,
by (5.6.31) .
To prove (iv) , note t hat with D = XA ,
2:: JII v j ll 2dG p
j=l
=
t t Jx~AAxkX~AAxdi (y)fk(y)dG(y) . i=l k= l
5.6.2. Robustness
213
Note that t he R.H.S. is p J f 2 dG in t he case
~
I II
v j ll' dG - p
I
I'dG =
I
Ii == f . Thus
tr.v v 'dGI S Pn·
This and (5.6.28) prove (iv) . Sim ilarly, with d j (y) denoting t he j-th row of 1) {y ), 1 ::; j ::; p,
Ilb
n -
!
AX'
~nfdGl12
= =
I ! 1)AX~dGII 2 t {! ~n (y)dG(y) d j {y)AX'
}2
j= l
< Pn ! IIAX' ~n (y) 11 2dG(y)
(5.6.32) and (5.6.33)
II Zn -
!
A X 'O:n(y)f (y )dG(y )112
::; Pn
!
II A X' O:nIl 2dG .
Moreover , (5.6.34)
E!
II AX'cx n l12dG =
! t Ilx~A11 2
Fi (1 - Fi)dG .
1= 1
Cons equently, (v) follows from (5.6.28) , (5.6.29) and (5.6.32) . The claim (vi) follows from (5.5.e), (5.6.28), (5.6.33) and (5.6.34). F inally, with 1)1/ 2 = AX' C 1 / 2 , V eE W ,
Therefore,
1I ~\l~, Ik,,(e) - I I9dGI
<
I {t, II
< Pn
A x; II' lh
{! g2dG}
1/ 2
- I I} gdG
= 0(1). by (6). 0
214
5. Minimum Distan ce Estimators
Corollar y 5.6b.2. Assume that (1.1.1) holds with the actual and the modeled d.f. 's of {eni, 1 ~ i ~ n } equal to {Fni , 1 ~ i ~ n} and F , respectively. In addition , assum e that (5.5.a) , (5.5.c) - (5.5.g), (5.5.i) , (5.5.j ) with D = XA , H ni = F, Gn = G; (5.5.44) and (5.6.28) hold. Then, (5.5.h) and (5.5.b) are satisfied and (5.6.35)
where Z n .- / AX'o n{y )d'ljJ {y ) =
A t x ni['ljJ (end - / 'ljJ {x )dFni{x) ], i == l
bn
.-
/ A X 'A n{y )d'ljJ (y ) = / t A x ni[Fni - F]d'ljJ , t== l
with 'ljJ as in (5.6.2). 0 Consider Zn. Note t hat wit h (T~i := Var {'ljJ {eni)lFnd, 1 ~ i ~ n ,
One can rewrite
By (5.5.44), 'ljJ is nondecreasin g and bounded. Hence max i IlFni Flloo -7 0 readily imp lies th at maXi(T~i -7 (T2 , (T2:= V ar{'ljJ{ e) IF}. Moreover , we have t he inequ ality
It t hus readily follows from t he L-F CLT th at (5.5.a) implies t hat 2
'
Zn -7d N{ O, (T I p x p ) , If max i IlFni - F ll oo -7 O. Cons equently, we have A
5.6.2. Robustness
215
Theorem 5.6b.1. (Qualitative Robustness). Assume the same setup and conditions as in Corollary 5.6b.2. In addition, suppose that (5.6.36)
max t
(5.6.37)
IlFni - Flloo IIAlloo
=
0(1),
=
0(1).
/3
Then, the distribution of under rr~l Fni converges weakly to the degenerate distribution, degenerate at (3. Proof. It suffices to show that the asymptotic bias is bounded. To that effect we have the inequality
From this, (5.6.35) , and the above discussion about {Zn}, we obtain that V 7J > 0, :J KTJ such that pn(ETJ) --t 1 where P" denotes the probability under rr~l Fni and ETJ = {IIA -1(/3 - (3)11 :::; KTJ} ' Theorem now follows from this and the elementary inequality 11/3 - (311 :::;
IIAlloollA -1(/3 - (3)II.
0
Remark 5.6b.2 . The conditions (5.6.28) and (5.6.36) together need not imply (5.5.g), (5.5.i) and (5.5.j) . The condition (5.5.j) is heavily dependent on the rate of convergence in (5.6.36) . Note that
IIbnl1 2
(5.6.38)
:::;
min {1j;(oo) /
II AX'a II 2 d1j;,
(/ f 2 dG) /II Ax 'a Il2dG} . This inequality shows that because of (5.5.44), it is possible to have IIbn l12 = 0(1) even if (5.6.29) (or (5.5.j) with D = XA) may not be satisfied. However , our general theory requires (5.6.29) any way. Now, with ip = 1j; or G, (5.6.39)
/
IIAX'a11 2 d
=
/
t t x~AAxjtljtljd
5. Minimum Distance Estimators
216 Thus, if n
2: II A XiII lFi{Y) -
(5.6.40)
F)y)1 ::; kb.~{y), y E]R,
i=l
where k is a constant and (5.6.41)
b.~
limnsup
is a function such that
J(b.~)2drp <
00,
then (5.6.29) would be satisfied and in view of (5.6.38), Ilbnll = 0(1). The inequality (5.6.40) clearly shows that not every sequence {Fni } satisfying (5.6.28), (5.6.36) and (5.5.c) - (5.5.i) with D = XA will satisfy (5.6.29). The rate at which F ni ===} F is crucial for the validity of (5.6.29) or (5.6.40) . 0 We now discuss two interesting examples. Example 5.6b.1. Fni = (1 - JndF + JniFo, 1 ::; i ::; n . This is the Case 5.5.2. From the Claim 5.5.3, (5.5.e) - (5.5.i) are satisfied by this model as long as (5.5.44) - (5.5.46) and (5.5.a) hold. To see if (5.6.28) and (5.6.29) are satisfied, note that here
J(ti=l IIAxill) mfx J
Pn
< 2 and
Jlp2 .
2 Jilf
(f2
n
n
i=l
i=l
- foll
2dG
+ f6)dG ,
2: IIAxillIFi - FI = 2: IIAxillJilF - Fol · Consequently, here (5.6.28) is implied by (5.5.44) for (f , G) , (fo, G) and by (5.5.47) , while (5.6.29) follows from (5.5.48), (5.6.39) to (5.6.41) upon taking b.~ == IF - Fol , provided we additionally assume that n
2: II Ax
i liJi
(5.6.42)
= 0(1) .
i=l
There are two obvious choices of {Jd that satisfy (5.6.42). They are: (5.6.43)
(a) Jn i = n- 1/ 2 or
(b)
Jni = p-l /2I1Ax ni ll ,
1::; i ::; n.
5.6.2. Robustness
217
The gross error models with {od given by (5.6.43b) are more natural than those given by (5.6.43a) to linear regression models with unbounded designs. We suggest that in these models, a proportion of contamination one can allow for the i t h observation is p-1/21IAxill. If Oi is larger than this in the sense that 2:7==1 IIAxilloi ~ 00 then the bias of /3 blows up. Note that if G is a finite measure, f uniformly continuous and {oil are given by (5.6.43b) then all the conditions of the above theorem are satisfied by the above {Fi} and F . Thus we have Corollary 5.6b.3. Every /3 corresponding to a finite measure G is qualitatively robust for {3 against heteroscedastic gross errors at all those F's whi ch have uniformly continuous densities provided {Oi} are given by (5.6.43b) and provided (5.5.a) and (5.6.37) hold. Example 5.6b.2. Here we consider {Fnil given in the Case 5.5.3. We leave it to the reader to verify that one choice of {O"nd that implies (5.6.29) is to take O"ni = IIAxnill, 1 :::; i :::; n . One can also verify that in this case , (5.5.44) - (5.5.46), (5.5.50) and (5.5.51) entail the satisfaction of all the conditions of Theorem 5.6b .1. Again, the following corollary holds. Corollary 5.6b.4. Every /3 corresponding to a finite measure G is qualitatively robust for {3 against heteroscedastic scale errors at all those F ' s which have uniformly continuous densities provided {O"nil = IIAxnill, 1 :::; i :::; n , and provided (5 .5.a) and (5.6.37) hold. As an example of a 0" - finite G with G(l~.) = 00 that yields a robust estimator, consider G(y) == (2/3)y . Assume that the following hold .
(i)
F, F o have continuous densities
0< (ii)
J
Jf J 2
d>. ,
F(l - F)d>' < 00.
t, fo;
fJd>. < (iii)
00.
JIF -
Fold>. < 00 .
Then the corresponding /3 is qualitatively robust at F against the heteroscedastic gross errors of Example 5.6a.1 with {ond given by (5.6.43b) . Recall, from Remark 5.6a.1, that this /3 is also asymptotically efficient at logistic errors. Thus we have a m.d. estimator /3 that is
5. Minimum Distance Estimators
218
asymptotically efficient and qualitatively robust at logistic error d.f. against the above gross errors models!! We leave it to an interested reader to obtain analogues of the above results for {3+ and {3*. The reader will find Theorems 5.5.4 and 5.5.6 useful here. 0
5.6.3
Locally asymptotically minimax property
In this subsection we shall show that the class of m.d. estimators {{3+} are locally asymptotically minimax (LAM) in the Hajek - Le Cam sense (Hajek (1972), Le Cam (1972)) . In order to achieve this goal we need to recall an inequality from Beran (1982) that gives a lower bound on the local asymptotic minimax risk for estimators of Hellinger differentiable functionals on the class of product probability measures. Accordingly, let Qni , Pni be probability measures on (JR, B) , f-Lni , Vni be a o-finite measures on (JR, B) with Vni dominating Qni, Pni ; qni := dQni!dvni , Pni := dPni!dvni; 1 ~ i ~ n. Let Qn = Qn1 x .. . x Qnn and P" = Pn1 X ... X Pnn and let Il" denote the class of n-fold product probability measures {Qn} on (JRn , Bn) . Define, for a c > 0 and for sequences 0 < Tln1 -t 0, 0 < ttnz -t 0,
1i n (p n , c) = { Qn E
rr- ,
t J(q~(2
-
p~(2)2dvni ::; c2},
i=l
ICn(pn ,c, TIn) = {Qn E rrn ;Qn E 1i n (p n , c),
J J
mr-x mr-x
(qni - Pni)2df-Lni 1/2
~ Tln1,
1/2 2
}
(qn i - Pni ) dVni ~ Tln2 ,
where TI~ := (Tln1 , Tln2)' Definition 5.6c.1. A sequence of vector valued functionals {Sn Il" -t JRP , n 2: 1} is Hellinger - (H - ) differentiable at {p n E Il"] if there exists a triangular array of P x 1 random vectors {eni' 1 ~ i ~ n} and a sequence of P x P matrices {An , n 2: 1} having the following properties: (1)
J enidPni = 0, J IIenil12dPni < 00 , 2:~=1 J enie~idPni == I p x p .
1 ~ i ~ n;
5.6.3. Locally asymptotically minimax (2) For every
°<
c
219
< 00, every sequence n« -+ 0,
where the supremum is over all Qn E 'Hn(pn, c, fJn) .
(3) For every
€
°
> and every a
t
E
W with
lIall
!(a'enifI(la'enil > €)dPni
= 1,
= 0(1).
i= l
Now, let X n1, '" , X nn b e independent r .v.'s with Qnl , '" , Qnn denoting their respective distributions and s, = Sn(Xn1"" , X nn ) be an est imat or of Sn (Qn). Let U be a nondecreasing bounded fun ction on [0,00] to [0, (0) and define the risk of est imat ing Sn by s, to be
where En is the exp ectation under Qn.
Theorem 5.6c.1. Suppos e that {Sn : TIn -+ W , n 2: 1} is a sequ en ce of H-d ifferentiable functionals and that the sequence {pn E 1S
rr}
su ch that
(5.6.44) Th en ,
(5.6.45) where Z is a N(O , I p x p ) r. v.. Sketch of a proof. This is a reformulation of a result of Beran (1982) , pp . 425-426. He actually proved (5.6.45) with lCn(pn , c, fJn) replaced by 'Hn(p n , c) and without requiring (5.6.44) . The assumption (5.6.44) is an assumption on the fixed sequence {pn} of probability measures. Beran's proof proceeds as follows :
5. Minimum Distance Estimators
220
Under (1) and (2) of the Definition 5.6c.1, there exists a sequence of probability measures {Qn(h)} such that for every 0 < b < 00,
L/
sup n IIhll::;b i=1
{ 1/2
qni (h) -
1/2 v.; -
1/2
I
(1.2)h eniPni
}2 dVni = 0(1).
Consequently,
and for n sufficiently large, the family {Qn (h), IIh} :::; b, h E JRP} is a subset of Hn(pn ,(b/2)). Hence , V c > O,V sequence of statistics {Sn} , (5.6.46)
lim inf inf n
s,
sup QnEHn(pn ,c)
2: lim inf inf sup n
Sn
Ilhll ::;2c
R n (Sn, Qn)
u; (Sn, Qn (h))
Then the proof proceeds as in Hajek - Le Cam setup for the parametric family {Qn(h) , Ilhll :::; b}, under the LAN property of the family {Qn(h) , IIhll :::; b} with b = 2c. Thus (5.6.45) will be proved if we verify (5.6.46) with Hn(pn , c) replaced by Hn(pn , c, 1]n) under the additional assumption (5.6.44). That is, we have to show that there exist sequences 0 < 1]nl -70,0 < 1]n2 -7 0 such that the above family {Qn(h), Ilhll :::; b} is a subset ofHn(pn , (b/2) , 1]n) for sufficiently large n . To that effect we recall the family {Qn (h)} from Beran. With eni as in the Definition 5.6c.1, let ~nij denote the j-th component of ~ni, 1 :::; j :::; P, 1 :::; i :::; n. By (3) , there exist a sequence En > 0, En .j,. 0 such that
Now, define , for 1 :::; i :::; n, 1 :::; j :::; P,
221
5.6.3. Locally asymptotically minimax Note that (5.6.47) For a
Ilenill :::; 2pE n ,
°<
b<
00,
j enidPni = 0,
1:::; i
:::; n.
Ilhll :::; b, 1 :::; i :::; n , define En
< (2bp)-1,
En ~ (2bp)-1.
Pni,
Because of (5.6.47) , {qni(h) , Ilhll :::; b,l :::; i :::; n} ar e probability density functions . Let {Qni(h) ; Ilhll :::; b, 1 :::; i :::; n} denote the corresponding probability measures and Qn(h) = Qnl(h) x ... x Qnn(h). Now, note that for IIhll :::; b, 1 :::; i :::; n ,
0,
Consequently, since En -!- 0, En
< (2bp)-1 eventually, and
sup max j(qni(h) - Pnd2dJ.Lni Ilhll ~b
:::;
l
(2pE n)2b2 mr x j
P~idJ.Lni = : 1]nl ·
Similarly, for a sufficiently large n , 1/2(h) - Pni 1/2)2dV m· :::; 2bPEn -_ .. 1]n2 · sup max j( qni
Ilhll~b
l
Because of (5.6.44) and because En -!- 0,maX{1]nl,1]n2} -+ 0. Consequently, for every b > and for n sufficiently large, {Qn (h) , Ilhll :::; b} is a subset of1-ln (p n , (b/2) ,1]n) with the above 1]nl ,1]n2 and an analogue of (5.6.46) with 1-l n (p n, c) replaced by lCn(pn , (b/2) ,1]n) holds. The rest is the same as in Beran. 0 We shall now show that achieves the lower bound in (5.6.45). Fix a f3 E IW and consider the model (1.1.1) . As before, let Fni be the actual d.f. of eni , 1 :::; i :::; n , and suppose we model the errors to be i.i .d. F , F symmetric around zero. The d.f. F need not be
°
e:
5. Minimum Distance Estimators
222
known. Then the actual and the modeled d.f of Yn i of (1.1.1) is Fni ( · - x~J3) , F(- - x~J3), respectively. In Theorem 5.6c.1 take Xni == Yni and {Qni , Pni, Vni} as follows:
Q~(Yni ::;.) XJ-L~(-)
x~J3) , x~d3),
Fni( · -
=
G(· -
pe (Yni ::; .) Vni
= F( · -
x~J3),
== A, 1::; i ::; n.
Also, let Q~ = Q~l X . . . x Q~n ; P~ = P~ x .. . x The absence of {3 from the sub - or the super - script of a probability measure indicates that the measure is being evaluated at {3 = O. Thus, for example we write Qn for QB (= I1~=1 Fnd and p n for P{f, etc. Also for an integrable function 9 write J 9 for J qd):
p!!n.
Let ]ni, ] denote the respective densities of Fni' F, w.r.t . A. Then q~ (.) = ] ni ( . - x~i{3), p~ (-) = ] (. - x~i{3) and , because of the translation invariance of the Lebesgue measure,
{
t, ! t, !(f~i2
Q~ E rr- ,
{ Q" E
n- ,
{(lo)' /' -
~)l/'}' :s c'}
- /1 /2)'
< C'}
u.u», c). That is the set
1in(P~ ,
c) does not depend on {3. Similarly,
{o: E n- , o: E 1i n(pn , c), mfx mfx
J J(f~{2
(fni - ]fdG ::; 'f}nl, - ]1 /2) ::; 'f}n2 }
IC n (pn, c, 'f}n)'
Next we need to define the relevant functionals . For t E IW , y E
223
5.6.3. Locally asymptotically minimax ~,1 ~
i ~ n, define
m~i(Y ' t)
bn(y, t) .-
n
L AXnim~i(Y' t), i=l
f-ln(t , Q~)
=
F' .-
f-ln(t, F)
:=
J
2
Ilbn(y, t)1I dG(y),
(Fni , ' " ,Fnn) .
Now, recall the definition of 'Ij; from (5.6.2) and let Tn (f3 , Q~) =
Tn(.B , F) be defined by the relation
T n(f3, F)
:=
/3 + (X'X X
Jt
JJ
2
dG)- 1
Xni[Fni(Y) - 1 + Fni(-y)]d'lj;(y) .
i=l
Note that , with bn(y) (5.6.48)
== bn(y , /3) ,
A- 1 (T n (/3, F ) -/3) =
(J J
2
dG) - 1
J
bn(y)d'lj;(y).
Some times we shall write Tn(F) for T n (/3 , F) . Observe that if {Fnd are symmetric around 0, then T n (/3 , F ) = /3 = T n (/3 , P~) . In general, the quantity A -l(Tn(F) - j3) measures the asymptotic bias in /3+ due to the asymmetry of the errors . We shall prove the LAM property of /3+ by showing that Tn is H-differentiable and that /3+ is an estimator of Tn that achieves the lower bounds in (5.6.45) . To that effect we first state a lemma. Its proof follows from Theorem 5.5.4 in the same fashion as that of Lemma 5.6b.1 and Corollary 5.6b .2 from Theorem 5.5.3. Observe that the conditions (5.5.17) and (5.5.k+) with D = XA , respectively, become (5.6.49)
J
IIb n(y)1I 2dG(y) = 0(1) ,
(5.6.50) lim inf inf e' AX' n
Ilell =l
J
A + gdG XA e
~ 0: ,
for an
where A + is defined at (5.5.20) and 9 is as in (5.6.3).
0:
> 0,
5. Minimum Distance Estimators
224
Lemma 5.6c.1. Assume that (1.1. 1) holds with the actual d.f. 's of {en i , 1 ~ i ~ n} equal to {Fni , 1 ~ i ~ n} and suppose that we model the errors to be i.i .d. F , F symmetric around zero. In addition, assum e that (5.3.8); (5 .5.a), (5 .5.c), (5.5.d), (5 .5.f) , (5.5.g ), (5.5.i) with D = XA , G n == G; (5.5.44) , (5.6.12) , (5.6.28) and (5.6.49) hold. Th en (5.5.h) and its varian t where th e argument y in the integrand is repla ced by -y , (5.5.14) , (5.6.50) and th e follow ing hold.
(5.6.51)
A -1 (.8 + - T n(F)) = - {2
J
f
2dG
} -1 Z~
+ op(l) ,
under {Qn} .
where n
Z~ = LAxni { 1jJ ( - end
- 1jJ(end
-
Jm~i(Y)dG(Y)} ,
1= 1
o
wi th m~i(Y ) == m~i (Y ,f3) and 1jJ as in (5.6 .2). Now , define, for an 0 < a < 00, Mn(pn ,a) { Qn E
rr- , Qn =
mfx
IT i= 1
J
Ifni - f lTdG
J[t,
mfx IlFni - Fll oo --T 0,
F ni ,
--T
II A x nill lFni -
0, r = 1,2
Fi]'
dG S
a'}.
Lemma 5.6c.2 . A ssum e that (1.1.1) holds with the actual d.f. 's of {eni, 1 ~ i ~ n} equa l to {Fni , 1 ~ i ~ n} and suppose that we model th e errors to be i.i. d. F , F symmetric around ze ro. In add it ion, assume that (5.3.8), (5.5.a) , (5.5.44) and th e following hold.
(5.6.52) Th en , f or eve ry 0
(5.6.53)
G is a fi nit e m easure.
< a < 00
and sufficie n tly large n ,
225
5.6.3. Locally asymptotically minimax
where ba := (4po:)-1/2a, 0: := G(~). Moreover, all assumptions of Lemma 5.6c.1 are satisfied. Proof. Fix an 0 < a < 00. It suffices to show that
t J(f~{2
(5.6.54)
11/ 2 ) 2
-
::;
i=1
b~,
n
~ 1,
and (5.6.55)
(a) mfx (b) mfx
Ju; J(f~{2 1
j) 2dG ::; 'fInl , n 1 2 / ) 2 ::;
-
~ 1,
'fIn2, n
~ 1,
imply all the conditions describing Mn(pn , a). Claim. (5.6.54) implies
By the C-S inequality, VI::; i ::; n , x E
2
IFni(X) - F( x)1 =
I:
II:
(f~{2 - 11/ 2 ) 2
< <
4
J(f~{2
-
(fni _
I:
~,
j)1
2
(f~{2 + 11/ 2 ) 2
11/ 2 ) 2 .
Hence,
J[t
IIAxn illlFni -
i=1
<
t
2
IIAXn il1 .
i=1
< 4po: '
dG
tJ
(Fni - F)2dG
i=1
t J(f~{2 i=1
which proves the Claim.
Fir -
11/ 2 ) 2 ,
226
5. Minimum Distan ce Estimators
°
The above bound, the finiteness of G and (5.6.55b) with'T/n2 -7 imply that maxi IlFni - Flloo -7 in a routine fashion. The rest uses (5.5.42), (5.5.43) and details are straightforward. Now let tp(y) = 'l/J( -y) - 'l/J(y), y E lit Note that d'l/J( -y) == -d'l/J(y), dtp == -2d'l/J, d'l/J = fdG and because F is symmetric around 0, f sp] = 0. Let
°
0-
2
Var{'l/J(e)!F}, ~n i (Yni, ,B)
€ni
T
=J
f 2dG, P = tp/o-,
== AXniP( end ·
Use the above facts to obtain 2
J
~ eni(y,,8)(p~(y))1/2{ (~(y) = 2
t
t' -~(y))1/2
r
dy
AXni J pfl /2 (f~{2 - fl /2)
i =1
= ~ AXni {J pfni
- J
p(f~{2 _ l /2)2 }
= _0-- 1 ~ AXni{ J[Fni =
0--
1
FJdtp - J
p(f~{2 -
fl /2f}
~Axni{ 2 [v: - F]fdG - J P (J~{2 - fl /2f}·
(5.6.56)
The last but one equality follows from integrating the first term by parts. Now consider the R.H.S. of (5.6.48). Note that because F and G are symmetric around 0,
(5.6.57) J bnfdG =
J
t t
AXni[Fni(y) - 1 + Fni( -y)Jd'l/J(y)
i= 1
J
AXni[Fni(y) - F(y)
i= 1
+Fni( -y) - F( -y)Jd'l/J(y)
227
5.6.3. Locally asymptotically minimax
=
2! t
AXni[Fni - F]fdG.
i=1
Recall that by definition
Tn(f3 , p~ ) = (3.
Now take A n of (2) of the
Definit ion 5.6c.1 to be A -17(y - l and conclude from (5.6.54), (5.6.56), (5.6.57) , that
II An{Tn(/3, Q~) - 2~ L..J
-
Tn({3 , p~)}
! ~>ni ( y ,{3 )
( Pni(Y) {3
)1/2
i=1 X
t
<
{(q~(y)) 1/2 _ (p~(y))1/2 } 2 dy ll
A Xni !
! p(J~{2
- f l /2)2
t= 1
<
m ax I
IIAxnd . Ilpll oo . b~ = 0(1),
uniformly for {Qn} E 1i n (pn , ba , 'Tln) ' This prov es that th e requirement (2) of the Definition 5.6c.1 is satisfied by the fun ctional Tn with the {eni} given as above. The fact th at these {end satisfy (1) and (3) of the Definition 5.6c.1 follows from (5.3.8) , (5.5.a), (5.6.52), (5.6.53) and the symmet ry of F. This th en verifies the H-differenti ability of t he abov e m.d. functional Tn. We shall now der ive the asymptotic distribution of (3+ under any sequ ence {Qn} E M n(pn ,a) under the conditions of Lemma 5.6c.2. For t hat reason consider t he above Z ~. Not e that under Qn, (1/2) Z~ is t he sum of independent centered triangular random arrays and t he boundedness of 'IjJ and (5.5.a), imply, via t he L-F CLT, that C;;- I /2Z~ -+d N(O, I x p ) , where n
L AXniX~iAa~i
en
4- E Z~ Z~' =
a~i
i=1 Var{ 'IjJ (end lFnd, 1 ~ i ~ n .
1
But the boundedness of 'IjJ implies that maxi la~i - a 21 -+ 0, for every Qn E M n(pn,a) , where a 2 = Var{ 'IjJ( edlF} . Therefore a-1 Z~ -+d
N( O, I p x p ) .
5. Minimum Distance Estimators
228
Consequently, from Lemma 5.6c.1, lim lim sup c--+o n
=
{3
sup Qn E1C n (Pl:J ,C,7]n )
E{U (IIA n({3 + - T n ({3 , Q'h) )II IQ'h} fJ
fJ
EU (IIZII),
for every bo unded nondecreas ing fun ction U , where Z is a N( o, I p x p ) r.v. This and Lemma 5.6c.2 shows that t he seq uence of the m.d. est imato rs {{3 +} achieves the lower bound of (5.6.45) and hence is 0 LAM. Remark 5.6c.1. It is an interesting problem to see if one can remove the requirement of the finit en ess of the int egrating measure G in the above LAM result . The LAM pr op erty of tB} can be obtained in a similar fashi on . For an alte rnat ive definition of LAM see Millar (1984) where, among ot her t hings , he pr oves t he LAM pr op erty, in his sense, of {{3} for p = 1. A P roblem : To t his date an appropriate ext ension of Beran (1978) to the mod el (1.1.1) does not seem to be availab le. Such an exte nsion would pr ovide asy mptotically fully efficient est imators at every sym met ric density with finite Fisher informati on and would 0 also be LAM. Not e : T he cont ents of t his cha pter are based on the works of Koul (1979, 1980, 1984, 1985 a,b) , Williamson (1979, 1982), Koul and DeWet (1983) , Basawa and Koul (1988) and Dhar (1991a , b) .
6 Goodness-of-fit Tests Regression 6.1
•
In
Introduction
In thi s cha pte r we shall discuss th e two problems of t he goodness-of-fit. The first one pert ains to th e error d.f. of the linear model (1.1.1) and th e second one pertains to fittin g a par am etric regression model to a regression function . The proposed tes ts will be based on certain residu al weighted empiricals for th e first problem and a partial sum pro cess of t he residuals for t he second problem. Th e first five sections of t his cha pte r deal with the first pr oblem and Section 6.6, with several subsections, discusses the second prob lem . To begin with we sha ll focus on th e first problem . Consider the mod el (1.1.1) and th e goodness-of-fit hypothesis (6.1.1)
H o : F n i == Fo ,
Fo a known continuous d.f..
This is a classical problem yet not much is readily available in literature. Observe t hat even if F o is known, having an unknown (3 in th e mode l poses a prob lem in constructing tests of H o that would be imp lementable, at least asymptot ically. One test of H o could be based on b, of (1.3.3). This test statistic is suggest ed by looking at th e est imated residuals and mimicking the one sample location mod el techn ique. In general, its large sample distribution depend s on th e design matrix. In addit ion , it does not redu ce to th e Kiefer (1959) tests of goodness-of-fit in th e k-sample location pr oblem when (1.1.1) H. L. Koul, Weighted Empirical Processes in Dynamic Nonlinear Models © Springer-Verlag New York, Inc 2002
6. Goodn ess-oE-fit Tests in R egression
230
is reduced to t his model. The test statistics t hat overco me t hese deficiencies are t hose t hat are based on t he W. E. P.'s V of (1.1.2 ). For example, t he two candidat es t hat will be considered in t his chapte r are (6.1.2)
D2
'-
D3
sup IWO(y , .B)I,
:= II WO(y ,.B) II ,
y
where.B is an est imator of {3 and , for y E JR, t E JRP ,
(X' X)-1 /2{V(y, t ) - X'IFo(y )} ,
(6.1.3)
(1" " , 1h xn , an d where [x] = max {lxjl; 1 ~ j ~ p}, for an y with l ' x E JRP . Other classes of tes ts are based on K'jd.B x) and inf { K~ (t) , t E I1~.P} , where K~ is equa ls t o the Kx of (1.3 .2) with W repl aced by WO in t here. Secti on 6.2.1 discusses the asymptotic null distribu ti ons of t he supremum dist an ce test statistics for Hi, when {3 is est imated arb it rarily and asymptotically efficient ly. Also discusse d in t his section ar e some asy mptotica lly distributi on free (ADF) tests for Ho. Some comment s abo ut the asymptotic power of these tests appear at t he end of this sect ion . Sectio n 6.2.2 discusses a smooth bootstrap null dist ribution of D3 . Analogous result s for t est s of H o based on L 2 - dist a nces invo lving the ord inary and weighted empirical processes appear in Sect ion 6.3. A closely related prob lem to H o is t hat of testing t he compos ite hypothesis (6.1.4)
HI : F n i ( · ) = Fo(-j a ), a
> 0, Fo a kn own d .f.
Mod ifications of various tests of H o suitable for test ing HI and t heir asy mptotic null distribution s are discussed in Sect ion 6.4 . Another pr oblem of int erest is to tes t t he compos it e hyp oth esis of symmetry of the err ors: (6.1.5)
n, : F n i =
F, 1
~
i
~
n , n 2: 1;
F a d.f, symmetric aro und 0, not necessaril y kn own . This is a mor e general hypothesis than H o. In some sit uat ions it may be of interest to test H s before testing , say, that t he errors are normally distributed . Rejecti on of H , would a priori exclud e any poss ibility of normality of t he errors . A test of H; could be based on (6.1.6)
A
D Is
+
A
:= sup IWI (Y , {3)I , y
231
6.2. Th e suprem um distance tests where, for y E IR, t E IRP , (6.1.7)
wt(y ,t) n
n - 1 / 2 L[I(Yni ::; y
+ X~it)
- I( -Yni < y - x~it)l
i= 1
with H n as in (1.2.1). Other candidates are (6.1.8)
fhs
.-
sup IW +( y,;3) I, y
o; .=
sUP IIW+(y ,;3)1I y
su p[y+' (y , ;3)(X' X)- l y+(y, ;3W /2 , y
where W+ := AY+ , y +' := (V1+ , ' " , V/ ), with n
Vj+ (y, t ) := l'j( y, t) -- L Xnij
+ l'j( -y , t ),
i= 1
for 1 ::; j ::; p , y E IR, t E IRP . Yet ot her test s can be obtained by consideri ng various L 2 - nor ms involving wt and W +. The asympto tic null dist ribution of all t hese test statistics is given in Sect ion 6.5. It will be observed t hat t he tes ts based on t he vect ors W Oand W+ of W .E.P. 's will have asy mpt ot ic distributions similar to th eir counterparts in the k-sample location models. Consequently t hese tes ts can use, at least for the lar ge sam ples, th e null distribution tables t hat are available for such problems. For t he sake of t he complete ness some of th ese tables are reproduced in the following sect ions.
6.2 6.2.1
The Supremum Distance Tests Asymptotic null distributions
To begin with, define, for 0 ::; t ::; 1, s E ffi.P, (6.2.1)
W1(t , s)
.-
n 1/ 2{ H n (Fo- l (t ), s) - t },
W(t , s)
.-
W O(FO - 1 (t) , s) .
Let , for 0 ::; t ::; 1, (6.2.2)
232
6. Goodtiess-oi-iit Tests in Regression
Clearly, if Fo is continuous then the distribution of same as that of IIWdloo,
sup{IW(t)l i 0 ~ t ~ I},
ti;
j
= 1,2,3, is the
sup{IIW(t)lI i 0 ~ t ~ I},
respectively. Consequently, from Corollaries 2.3.3 and 2.3.5 one readily obtains the following Theorem 6.2.1. Recall the conditions (F ol ) and (NX) from Corollary 2.3.1 and just after Corollary 2.3.2. Theorem 6.2 .1 Suppose that the model (1.1.1) and H o hold. In addition, assume that X and Fo satisfy (NX) and (F ol), and that [3 satisfies
(6.2.3) Then
(6.2.4)
sup IWdt , [3)
-
{WI (t, {3)
+ qo(t) . nl /2x~A . A-I ([3 - {3)}1
= op(I), (6.2.5)
sup IIW(t, [3)
where qo := fo(Fo- l
)
-
{W(t , {3)
+ qo(t)A -1 ([3 -
{3)}11 = op(I) ,
and the supremum is over 0 ~ t ~ 1.
o
Write WI (t) , W(t) for WI (t, {3), W(t, {3) , respectively. The following lemma gives the weak limits of WI and Wunder H o. Lemma 6.2.1 Suppose that the model (1.1.1) and Ho hold. Then WI =? B , B a Brownian bridge in C[0,1] .
(6.2.6)
In addition, if X satisfies (NX) , then
(6.2.7) where B l
W =? B' := (B l , .. .
,· · ·
, Bp )
, B p are independent Brownian bridges in C[O, 1] .
Proof. The result (6.2.6) is well known or may be deduced from Corollary 2.2.2. The same corollary implies (6.2.7) . To see this, rewrite n
W(t)
=
A LxndI(eni ~ FO-l(t)) - t} = AX'On(t), ;=1
where On(t):= (a nl(t), ·· · ,ann(t))', with
233
6.2.1 The supremum distance tests Clearly, under H o, (6.2.8)
EW
0,
(s 1\ t - st)Ip x p ,
(7ov(W(s), W(t))
0
~
s, t
~
1.
Now apply Corollary 2.2.2 p times, jth time to the W.E.P. with the weights and r.v .'s given as in (6.2.9) below, 1 ~ j ~ p , to conclude (6.2.7). (6.2.9)
the jth column of XA, X n i ==
d(j)
F
_
Fo, 1
~
j
eni,
and
~ p,
See (2.3.31) and (2.3.32) for ensuring the appli cability of Corollary 2.2.2 to 0 this case . Remark 6.2.1 From (6.2.5) it follows that if fJ is chosen so that the finite dim ensional asymptotic distributions of {W(t) + qo(t)A -1 (fJ - (3); 0 ~ t ~ I} do not depend on th e design matrix then the asymptot ic null distribution's of i.; j = 2,3, will also not depend on the design matrix. The classes of estimators that satisfy this requirement include M-, R- and m.d. estimators. Consequently, in these cases, the asymptotic null distribution's j = 2,3 ar e design free. of On the other hand, from (6.2.4), the asymptotic null distribution of D1 depends on the design matrix through n1 /2x~A. Of course, if x n equals to 0 zero, then this distribution is free from Fo and the design matrix.
o;
Remark 6.2.2 The effect of estim ating the parameter (3 efficiently. To describe this, assume that F o has an a.c. density 10 with a.e. derivative satisfying
10
o < 10 :=
· I Uo/lo)
and assume that the estimator
fJ satisfies
(6.2.10)
Define (6.2 .11)
(6.2.12)
2
dFo <
00 .
6. Goodness-oE-fit Tests in Regression
234
Then, th e approximating processes in (6.2.4) and (6.2.5) , respe cti vely, become (6.2.13)
WI (t) + qo(t)nl /2 x~ A · 10 1 AX'sn,
WI (t)
W (t ) + qO(t)lOI AX'sn, 0::; t ::; 1.
.-
W (t )
Using th e ind epend ence of the errors , one dir ectly obtains (6.2.14)
EW1 (s) W 1 (t) = {s (l - t ) - nx~ ( X'X )-lxnqO ( s )qo (t) lol} , EW (s )W' (t )
= {s(l
- t ) - qo(s )qo (t )l ol }I p x p ,
for 0 ::; s ::; t ::; 1. The calculations in (6.2.14) use th e fact s th at E Sn == 0 , EO: n(t)s~ == qo(t)I nxn . From (6.2. 14), Th eorem 2.2.1(i) applied to t he ent it ies given in (6.2.9), and th e uniform continuity of qo, impli ed by (6.2.10) (see Claim 3.2.1 above), it readil y follows that W =? Z := (ZI , ' " , Zp)' , where ZI , ' " , Zp are cont inuous ind epend ent Gaussian pro cesses, each having th e covariance functi on
(6.2.15)
p(s , t ) .-
s( l - t) - qo(s )qO(t )lO I ,
0 ::; s ::; t ::; 1.
Consequ ently, (6.2.16)
O2 03
==?
sup{ IZ(t) l; 0 ::; t ::; I},
==?
sup{IIZ(t)I\;O ::; t
::; I} .
This shows that t he asy mpt ot ic null distribution 's of OJ, j = 1,3 are design free when an asy mptotically efficient est ima t or of f3 is used in construc t ing t he residuals while t he sa me can not be said about 0 1 , Moreover , recall , say from Durbin (1975), th at when t esting for H o in th e one sample location mod el, th e Gau ssian pr ocess ZI with th e covariance function p appears as th e limiting process for th e analogue of 0 1 • Not e also t hat in this case , 0 1 = O2 = 0 3 . However , it is the test based on 0 3 that provides th e right exte nsion of the one sample Kolmogorov goodness-of-fit test t o th e linear regression model (1.1.1) for testing H o in th e sense th at it includes th e k-sample goodness-of-fit Kolmogorov typ e test of Kiefer (1959) . That is, if we sp ecialize (1.1.1) t o t he k-sa mple location mod el, th en 0 3 redu ces to t he Tfv of Section 2 of Kiefer modulo th e fact th at we have t o est imate f3 . The distribution of SUP{IZI (t)I; 0 ::; t ::; I} has been studied by Durbin (1976) when Fo equals N(O, 1) and some other distributions. Cons equently,
235
6.2.1 The supremum distance tests
one can use these results together with the independence of ZI, . .. ,Zp to 0 implement the tests based on D2, D3 in a routine fashion. Remark 6.2.3 Asymptotically distribution free (ADF) tests. Here we shall construct estimators of 13 such that the above tests become ADF for testing H o. To that effect, write X n and An for X and A to emphasize their dependence on n . Recall that n is the number of rows in X n . Let m = m n be a sequence of positive integers, m n :S n. Let X m be m n x p matrix obtained from some m n rows of X n . A way to choose m n and these rows will be discussed later on . Relabel the rows of X n so that its first m n rows are the rows of X m and let {e~i' 1 :S i :S m n }, {Y';i ; 1 :S i :S m n } denote the corresponding errors and observations, respectively. Define
(6.2.17)
s:r, .Tm
.-
(S~i '
l:S i:Sm n )' ,
10 AmX~s:r" 1
Am = (X~Xmr-I /2 .
Observe that under (6.2.10) and H o , (6.2.18) Consider the assumption (6.2.19)
m n :S n, m n -7
00
such that
(X~Xn)I/2(X~Xm)-I(X~Xn/ /2 -72Ip x p .
The assumptions (6.2.19) and (NX) together imply (6.2.20) Consequently one obtains, with the aid of the Cramer - Wold LF - CLT, that (6.2.21) Now use {( x~i' Y';i) ; 1 :S i :S m n } to construct an estimator such that
i3 m
(6.2.22) Note that , by (6.2.19) and (6.2.21) , IIA;-1Am 1100 = 0(1) and hence (6.2.23)
of 13
236
6. Goodness-ot-Iit Tests in Regression
Therefore it follows t hat (:J m satisfies (6.2.3) . Define
From (6 .2.5 ) and (6 .2.23) it now readil y follows that sup IIW(t ,(:J ) - K * (t)11 09:'S1
(6.2.24)
= op(l).
We shall now show t hat K*
(6.2.25)
===}
B with B as in (6.2.7).
First , conside r t he covar iance fun ction of K *. By the ind ep end en ce of the errors and by (6. 2.10) one obtains t ha t
E( {I (em. <_ F.0
(t )) _ t}
jo(e~j) ) I ( * .) J O <,
f. i . 1 ::; i ::; n , 1 ::; j ::; m n , qo(t), 1::; i = j ::; mn ,O::; t ::; 1. i
0,
=
1
Use this and direct calculat ions to obtain t hat for all 0 ::; s ::; t ::; 1,
(6.2.26) E K *(s) K*( t)' s (l - t )l pxp - [21p x p - (X'n x n )1/2(X m ' X m )-I(X'n X n )1 / 2]
x I; l qO(S)qo(t) . Thus (6.2 .19) implies t hat
E K*( s) K*(t ) ---r s( l - t )l pxp, V 0 ::; s ::; t ::; 1. Becau se of (6.2.7) and t he uniform cont inuity of qo, t he relati ve compactness of t he sequence {K *} is a priori established, t hereby complet ing t he pr oof of (6.2 .25). Consequ ently, we ob t ain t he following Corollary 6.2.1 Under (1.1.1) , H o, (N X ), (6.2.10), (6.2.19) and (6.2.22) , ---rd
sup max IBj (t )1, 09 :'SII :'S J:'Sp
,~~~, {t,B)(t l where Djm stand for the
b, with (:J
=
s.;
r
j = 2, 3.
o
237
6.2.1 The supremum distance tests
It thus follows, from the independence of the Brownian bridges {B j , 1 :S j :S p} and Theorem V.3.6.1 of Hajek and Sidak (1967), that the test
that rejects H o when o.; ~ d is of the asymptotic size 0:, provided d is determined from the relation 00
2:2) _1)i+ 1 e - 2 j 2 d 2 = 1 - (1 - o:)l/ P.
(6.2.27)
j=l
Let T p stand for the limiting LV. of fhm. The distribution of T p has been tabulated by Kiefer (1959) for 1 :S p :S 5. Delong (1983) has also computed these tables for 1 :S p :S 7. The following table is obtained from Kiefer for 1 :S p :S 5 and Delong for p = 6,7 for the sake of completeness. The last place digit is rounded from their entries .
o:\p .001 .005 .01 .02 .025 .05 .10 .15
.20 .25
1 1.9495 1.7308 1.6276 1.5174 1.480 1.3581 1.2239 1.1380 1.0728 1.0192
2 2.1516 1.9417 1.8427 1.7370 1.702 1.5838 1.4540 1.3703 1.3061 1.2530
3 2.3030 2.0977 2.0009 1.8974 1.8625 1.7473 1.6196 1.5370 1.4734 1.4205
4 2.4301 2.2280 2.1326 2.0305 1.9961 1.8823 1.7559 1.6740 1.6107 1.5579
5 2.5422 2.3424 2.2480 2.1470 2.116 2.0001 1.8746 1.7930 1.730 1.6773
6 2.6437 2.445 2.3525 2.252 2.217 2.1053 1.981 1.900 1.8352 1.785
Table 1: Values d such that P(Tp ~ d) ~ 0: for 1 :S p from Kiefer (1959) & Delong (personal communication) .
7 2.7373 2.540 2.4525 2.350 2.315 2.2031 2.0788 1.9977 1.9349 1.8825
:S 7. Obtained
Note that for p = 1, D2m and D3m are the same tests and the d of (6.2.27) is the same as the d of column 1 of Tabl e 1 for various values of 0: . The entries in Tab le 1 can be used to get the asymptotic crit ical level of D3m for 1 :S p :S 7. Thus for p = 5,0: = .05, the test that rejects H o when D3m ~ 2.0001 is of the asymptotic size .05, no matter what Fo is within the class of d.L's satisfying (6.2.10). Next , to make D1 - test ADF, let r = r« be a sequence of positive integers, r n :S n, r n --+ 00 . Let X, denote the r n xp matrix obtain from some r n rows of X n ' Relabel the rows of X; so that the first r n rows are in X, and let li o, e? denote the corresponding Yi's and ei 's. Let A r = (X~Xr) -1 / 2 .
238
6. Goodness-oE-fit Tests in Regression
Assume that (6.2.28)
(i)
= 0(1),
Iln1/2X~Arll
and
(ii) Inxn(X~Xr)-lxn - 2rnx~(X~Xr)-lxrl = 0(1).
Let
/:J r be an estimator of 13 based on {(X~i ' Yr?i)' 1 ::; i
::; r n} such that
(6.2.29)
Similar to (6 .2.26), we obtain, for s ::; t , that
EK:(s)Ki(t)
=
s(l- t) -Io1qo(s)qo(t){X~(X~Xr)-1[nxn - 2r nx r ]}.
Argue as for Corollary 6.2.1 to conclude Corollary 6.2.2 Under (1.1.1), H o, (NX), {6.2.10}, {6.2.28} and {6.2.29},
»; -+d 099 sup IB(t)l ,
(6 .2.30)
where
b;
is the
b,
/:J
with
=
/:Jr'
o
Remark 6.2.4 On assumptions {6.2.19} and {6.2.28}. To begin with note that if (6.2.31)
lim n -1 (X~Xn) exists and is positive definite, n
then (6 .2.19) is equivalent to (6.2.32)
If, in addition to (6.2.31) , one also assumes (6.2.33)
lim xn exists and is finite , n
then (6 .2.28) is equivalent to (6.2 .34)
6.2.1 The supremum distance tests
239
There are many designs that satisfy (6.2.31) and (6.2.33) . These include the one way classification, randomized block and the factorial designs, among others. The choice of m n and r« rows is, of course, crucial, and obviously, depends on the design matrix. In the one way classification design with p treatments, nj observations from the jfh treatment, it is recommended to choose the first mnj = [nj /2] observations from the jth treatment, 1 ::; j ::; P to estimate {3 . Here m n = mnl + .. . + m np = [n/2]. One chooses rnj = mnj, 1 ::; j ::; p, r« = 2:. j r nj = [n/2]. The choice of m n and r n is made similarly in the randomized block design and other similar designs . If one had several replications of a design, where the design matrix satisfies (6.2.31) and (6.2.33), then one could use the first half of the replications to estimate {3 and all replications to carry out th e test . Thus, in those cases where designs satisfy (6.2.31) and (6.2.33), the above construction of the ADF tests is similar to the half sample technique in the one sample problem as found in Rao (1972) or Durbin (1976) Of course there are designs of interest where (6.2.31) and (6.2.33) do not hold. An example is p = 1, Xni == i. Here, X~Xn = O(n 3 ) . If one decides to choose the first mn(r n) Xi'S, then (6.2.19) and (6.2.28) are equivalent to requiring (m n/n)3 -7 1/2 and (r n/n)2 -7 1/2. Thus, here D2 m or D3 m would use 89% of the observations to estimate f3 while b.; would use 71%. On the other hand, if one decides to use the last mn(rn) Xi'S, then th, D3 will use the last 21% observations while D1 will use the last 29% observations to estimate {3 . Of course all of these tests would be based on the entire sample. In general, to avoid the above kind of problem, one may wish to use, from the practical point of view, some other characteristics of the design matrix in deciding which m n , r n rows to choose. One criterion to use may be to choose those mn(r n) rows that will approximately maximize (mn/n)((rn/n)) subject to (6.2.19) ((6.2.28)). 0 Remark 6.2.5 Construction of 13 m and 13r' If Fo is a d.f. for which the maximum likelihood estimator (m.l.e.) of {3 has a limiting distribution under (NX) and (6.2.10) then one should use this estimator based on rn(m n) observations {(x~, for D1(D2 or D3 ) . For example, if Fo is the N(O, 1) d.f., then the obvious choice for 13 r and 13 mare the least squares estimators:
Yin
i3r
0-
( X ir X r )-lX Ir yo. r '
240
6. Goodness-of-fit Tests in R egression
Of course t here ar e man y d.f. 's F o t hat satisfy t he above conditions , but for which th e computation of m.l. e. is not easy. One way to pro ceed in such cases is to use one step linear approximat ion. To make t his precise, let /3m be an estimator of f3 based on {( X~i ' Y n i ), 1
:s i :s m n } such that
(6.2.35)
Define (6.2.36)
1/Jo(Y)
.-
f o(Y)/ f o(Y), Y E IR;
Sni
.-
1/Jo(Yn i
sm .13m V ;" (y,t )
-
x~i/3m ) ' 1
(sni, l:S i
:s m
:s i :s m n ;
n )'
.- /3m + IoA mAmX;"'sm Am
L x niI(Yni :s Y + X~i t ), i= 1
yE IR, t E IRP.
Then
AmX;"'sm =
f
1/Jo(y )V ;"(dY, /3m)'
From t his and (2.3.37), applied to { (x~i ' Y n i ) , 1 obt ain s
:s i :s
m n }, one readily
Corollary 6.2.3 A ssum e that (1.1.1) and H o hold. In addition, assum e tha t Fo is strictly in creasin g, sat isfi es (6.2.10) and is suc h that 1/Jo is a finite lin ear com bination of nondecreasing bounded fun ctions, X and {/3m } satisf y (NX ) and (6.2.35). Th en {13 m} of (6.2.36) sat isfies (6.2.22) f or any sequence m n -t 00 . Proof. Clea rly,
But , int egration by parts and (2.3.37) yield
AmX;"' {sm - sm}
f = -
1/Jo(y ){V ;"(dY, f3 ) - V ;"(dy , ,B )}
f
{V ;" (y, /3) - V ;"(y ,f3)}d1/Jo(Y)
A;;/(/3m - f3)
I
fo(y)d1/Jo (Y) + op(l)
- A;;,!(/3m - f3)Io + op(l) .
0
241
6.2.2. Bootstrap distributions
The above result is useful , e.g., when F o is logistic, Cauchy or double exponential. In the first case m.l.e. is not easy to compute but Fo has finite second moment. So t ake 13m to be the l.s.e. and then use (6.2.36) to obtain the final estimator to be used for testing. In the case of Cauchy, 13m may be chosen to be an R-estimator. Clearly, there is an analogue of the above corollary involving {,8r} that would satisfy (6.2.28). 0
6.2.2
Bootstrap distributions
In this subsection we shall obtain a weak convergence result about a bootstrapped W .E. P . 's and th en apply this to yield bootstrap distributions of some of the above tests. Let (1.1.1) with eni ei and H o hold . Let Eo and Po denote the expectation and probability, resp ectively, und er these assumptions. In addition, throughout this section we shall assume that (Fol) , (F o2) and (NX) hold . Recall the definition of W , W from (6.2.1), (6.2.2). Let,8 be an Mest imators of (3 corresponding to a bounded nondecreasing right continuous score function 'l/J such that
=
(6.2.37)
J 'l/JdFo
J fod'l/J > o.
= 0,
Upon specializing (4.2.7) to the current setup one readily obtains n
A -1 (,8 - (3) = -K: L
(6.2.38)
AXni 'l/J(ei)
+ op(I),
(Po).
i=1
where K: := 1/ J fod'l/J . Let the approximating process obtained from (6.2.2) and (6.2.38) be denoted by W, i.e., for 0 ~ t ~ 1, n
(6.2.39)
W(t)
:=
LAxndI(ei ~ FO- 1(t)) - t - K:qo(t)'l/J(ei)} . i=1
Define (6.2.40)
(J2
._
go(t) .-
E o'l/J 2 (ei) , E o{I(e1 ~ FO- 1(t)) - t} 'l/J(ed J I(x
~ FO- 1(t)) 'l/J(x)dFo(x) ,
0
~ t ~ 1,
242
6. Gocdness-oi-Iit Tests in R egression
and , for
°
(6.2.41)
~
t
~
u ~ 1,
Po(t,u ) .-
t(l- u) - K: [qo(t)go(u)
+ go(t)qo(u )]
+K: 2qo(t)qO (u)a 2 . Not e th at
Co(t , u)
(6.2.42)
.-
Eo {W(t) W(u) ' } = Po(t , u)I p x p ,
for all 0 < t ~ u ~ 1. Let go := (901, " ' , gop)' be a p - vector of ind ependent Gaussian processes each havin g th e cova riance fun cti on Po. Thus , Eg o(t)9o (u )' Co(t,u) . Sin ce Po is continuous, go E {C[O ,I]}P. Mor eover , from Corollar y 2.2 .1 applied p time, j t h time to the ent it ies
=
=
=
=
X ni e., Fni Fo and d ni (i , j)-th ent ry of AX , 1 ~ i ~ p, 1 and from t he uniform cont inuity of qo it readil y follows that
W ==> go in [{'O[O, 1]}P,
(6.2.43)
~
i
~
n,
pl·
Now, let in be a density estimator based on {eni := Yni-x~i.B ; 1 ~ i ~ n } and Fn be t he correspo nding d.f. Let {e~i ; 1 ~ i ~ n } represent i.i.d . i; r .v.'s, i.e., { e~i ; 1 ~ i ~ n} is a random sa mp le from t he populati on Becau se t; is cont inuous, t he resampling pr ocedures based on it are usu ally called smooth bo ots t rap procedures. Let
r;
Define t he boot strap est imator (3* to be a solution s E
jRP
of t he equat ion
n
L AX n i{1P (yr:i - x~is) - En7/J(e~i)} = O. i=l
(6.2.44)
where E n is t he expectation under Fn . Let Pn denot e the bootstrap probability under Fn . Finally, define, for 0 ~ t ~ 1, u E jRP , n
S* (t ,u)
LAxniI (e~i ~F;:l (t )+X~iAu) ,
.-
i=l
n
W *(t )
LAxni{I (y r:i - x~ i{3 * ~ F;:l( t)) - t }. i=1
.-
We also need n
(6.2.45)
W *(t ) := L AXn;{I ( e~i ~ F;: l (t )) - t}, i=l
0 ~ t ~ 1.
6.2.2. Bootstrap distributions
243
Our goal is to show that W* converges weakly to go in [{ID>[O, 1)}P, p}, a.s. Here a.s . refers to almost all error sequences [e.; i 2: I}. We in fact have the following Theorem 6.2.2 In addition to (1.1.1), H o , (F o! ), (F 02) (NX) and {6.2.37}, assume that'lj; is a bounded nondecreasing right continuous score function and that the following hold .
(6.2.46)
For almost all error sequences for almost all x E JR, n
1.
lIin - folloo -+ 0, a.s ., (Po) .
(6.2.47) Then , VO
2:
{e.; i 2: I}, in(x) > 0,
< b < 00,
(6.2.48)
sup IIS*(t , u) - S*(t , 0) - Uin(.fr;l(t))11 = op(I) , (Fn ), a.s .,
°:: ;
t ::::; 1, lIull : : ; b. where the supremum is over Moreover, for almost all error sequences {e.; i
2: I},
(6.2.49) n
Kn
L AXn;{ 'lj;(e~i) -
En'lj;(e~i)} + op(I) , (Fn ),
i=1
and
W*
(6.2.50) where Kn :=
=:}
go,
in [{V[O , 1]}P, p],
1/ J ind'lj; .
Proof. Fix an error sequences [e.: i 2: I} for which (6.2.51)
in(x)
> 0, for almost all x E JR, and lIin - folloo -+ O.
The following arguments are carried out conditional on this sequence . Observe that S* (t , u) is a p-vector of W .E.P.'s Sd(t, u) of (2.3.1) whose jth component has various underlying entities as follows: (6.2.52)
where , as usual, a(j) = j-th column of A , 1 ::::; j ::::; p .
6. Goodness-oi-fit Tests in R egression
244
Thus, (6.2.48) follows from p applications of Theorem 2.3.1, l h time applied to t he above entities, provided we ensure t he validity of t he assumpti ons of that t heorem. But , f o uniformly conti nuous and (6.2.47) read ily imply that {in, n ~ 1} satisfies (2.3.4) , (2.3.5). In view of (2.3.31), (2.3.32) and (N X) , it follows t hat all ot her ass umptions of Theorem 2.3.1 are sa tisfied . Hence, (6.2.48) follows from (2.3.8) . In view of (6.2.46) we also obtain , from (2.3.9), (6.2.53) where So*(x , u) == S* (Fn(x ), u) and where th e sup remum is over x E JR,lIuli :::; b. Now, (6.2.49) follows from (6.2.53) in precisely th e same fashion as does (4.2.7) from (2.3.9). From (6.2.48) , (6.2.49) and (6.2.62) below, we readily obtain th at , under Pn , n
(6.2.54)
W* (t ) =
L AX ni{I (e~i :::; F;;l (t» -
t - kn qn(t)
i=l
where qn := I n(F;; 1 ) . In analogy to (6.2.40) and (6.2.41), let 9n,Pn stand for go , Po after F o is replaced by r; in t hese entit ies. Thus
gn(t)
En {I(e~i :::; F;;l( t» - t } 1/!(e~i)
.-
!
I (x :::; F;;l( t» 1/!(x) dFn( x) , 0 :::;
. < 1,
and, for 0 :::; t :::; u :::; 1,
Pn(t ,u)
:=
t(1 - u ) - k n [qn(t)gn (U) - 9n(t)qn(U )] +k;,qn(t)qn(u )0-;' .
where o-~ := En[1/!(e~l - En 1/! (e~l )F. Let W*(t ) denot e t he leading r .v. in the R.H.S. of (6.2.54). Observe t hat
for all 0 :::; t :::; (6.2.55)
u :::; 1.
Claim: Pn(t ,u)
--7
Po (t ,u ),
\j
0 :::; t :::; u:::; 1.
6.2.2. Bootstrap distributions
245
To prove (6.2.55), note that (6.2.51) and Lemma 9.1.4 of the Appendix below imply that for the given error sequence {e, j i 2: 1},
(6.2.56) which, together with the continuity of
Fn , yields
sup IFO(F;l(t)) - tl ~ O.
(6.2.57)
099
Also, observe that
by (6.2.51), and th at , V 0
~
t
~
1,
IfO(F;l(t)) - fo(Fo-1(t))1 == Iqo(Fo(F;l(t))) - qo(t)l· Hence, by (6.2.57) and the uniform continuity of qo , which is implied by (FoI) ,
(6.2.58)
sup ItIn(t) - qo(t)1 ---t O.
0:;t:9
Next, let
Upon rewriting
from (6.2.51)' Lemm a 9.1.4 and the boundedness of 'l/J, we readily obtain th at
But, the inequa lit y Fo(x) - bn ~ Fn(x) ~ Fo(x) + bn for all x, implies that
19n(t) - 90(t)1
< 11'l/Jlloo! I(Fo( x) - bn ~ t
< 11'l/Jlloo2bn ,
V0 ~
t
~ 1.
Hence , by (6.2.56),
(6.2.59)
sup Ii/n(t) - 90(t)1 ~ O. O::;t ::;l
~ Fo(x) + bn)dFo(x),
246
6. Goodness-oE-fit Tests in Regression
Again by the boundedness of'l/J, (6.2.51) and (6.2.56), one readily concludes that
(6.2.60) The claim (6.2.55) now readily follows from (6.2.58) - (6.2.60). Now recall (6.2.45) and rewrite W* as n
(6.2.61)
W*(t) = W*(t) - Kntln(t)
L AXni['l/J(e~i) -
En'l/J(e~J].
i=1
Observe that because
En II
t
AXni['l/J(e~i) - En'l/J(e~i)112 = p 17~,
i=1
by (6.2.60) and the Markov inequality it follows that n
II L AXni[7jJ(e~i) - En7jJ(e~i)JII =
(6.2.62)
Op(l),
(Pn).
i=1
Apply Corollary 2.2.1 p times , conclude that lim lim sup P, ( sup
'1-+ 0
n
It-sl <'I
jth
time to the entities given at (6.2.52), to
IW*(t) - W*(s)1 >
c)
= 0,
V ep > O.
This together with (6.2.62) , (6.2.61), (6.2.58) and the uniform continuity of Fo implies that the sequence of processes {W*} is tight in the uniform metric p and all its subsequential limits must be in {C[O,l]}P . Now, (6.2.50) follows from this , (6.2.54), (6.2.47) , (6.2.46), (6.2.42) and the Claim (6.2.55). 0
Remark 6.2.6 One of the main consequences of (6.2.50) is that one can use the bootstrap analogue of ii; v.i.z.,
D;
:=
sup{IIW*(t)II ,O::; i : 1}
to carry out the test H o. Thus an approximation to the null distribution of D3 is obtained by the distribution of Dj under Pn . In practice it means to obtain repeated random samples of size n from Fn , compute the frequency distribution of Dj from th ese samples and use that to approximate the null distribution of D3 . At least asymptotically this converges to the right
6.3. L2-distance tests
247
»;
D2 can distribution. Obviously the smooth bootstrap distributions for be obtained similarly. Reader might have realized that the conclusion (6.2.50) is true for any sequence of estimators {,B} , {13*} satisfying (6.2.38) and (6.2.49) . 0 6.3
L2-Distance Tests
Let Kp and Kg, resp ectively, st and for the K 1 and K x of (5.2.5) and (5.2.5) after th e d.f.'s {Hnd there ar e replaced by Fo. Thus, for G E DI(IR), (6.3.1)
KP(t)
.-
Kg(t)
.-
! !
{wf(y, t)} ZdG(y), IIWo(y , t)llzdG(y) ,
t E IRP ,
where WO is as in (6.1.3) and (6.3.2) Let
,B be an estimator of 13 and define th e four test
st atistics
(6.3.3) The large values of these statisti cs are significant for testing He: We shall first discuss the asymptotic null distribution's of Kj , j = 1,2 . Let Wp ('), WO (-) st and for Wp (-, 13) and WO (-, 13) , resp ectively. Theorem 6.3.1 Assume that (1.1.1), H o , (NX) , (5.5.44) - (5.5.46) with F == Fo hold. (a) II, in addition, (5.6.9) and (5.6.10) hold, then (6.3.4)
K I* -_
!{ ° WI
lodG } ZdG + op(l) . (y) - lo(Y) J JWp IJdG
(b) Under no additional assumptions,
(6.3.5)
K z* =
! II
°
lodG li ZdG + op(l) . W (y) - lo(Y) JWo f IJdG
Proof. Apply Theorems 5.5.1 and 5.5.3 twice, once with D = XA , and on ce with D = n- I / Z (l , 0" " , 0] and the rest of the entities as follows: (6.3.6) The theorem th en follows from (5.5.9) , (5.6.5), (5.6.11) and some algebra. See also Claim 5.5.2. 0
248
6. Goodn ess-oi-Iit Test s in R egression
Remark 6.3.1 Perh ap s it is worth emphas izing that (6.3.5) holds without any ext ra condit ions on th e design mat rix X . Thus, at least in t his sense, Ki is a more natural statistic to use t ha n Ki for tes t ing H o. A consequence of (6.3.4) is that even if 131 of (5.2.4) is asymptotically non-unique, Ki asymp totically behaves like a unique sequence of LV . 'so Moreover , unlike t he D1 - statist ic, t he asy mptotic null dist ribution of K i does not depend on t he design matrix among all t hose designs t hat satisfy t he given conditions. The assumpt ions (5.6.9) and (5.6.10) are restri cti ve. For exa mp le, in t he case p = 1, (5.6.9) t ra nslates to requiring that eit her Xi! 2: 0 for all i or XiI::; 0 for all i . The ass umpt ion (5.6.10) says t hat x =j; 0 or can not converge to O. Compare this with t he fact th at if x ~ 0 t hen t he asy mptotic distribution of D1 does not depend on t he prelimin ar y est imator 13. 0 Next , we need a result t hat will be useful in deriving t he limiting dist ributions of certain quadratic forms involving W .E.P.'so To t hat effect, let L~ (IR, G) be th e equivalence classes of measurable function s h : JR to JRP such that Ihlb := J II hl12 dG < 00. T he equivalence classes are defined in terms of t he norm I . lb. In t he following lemm a, {a. : i 2: I } is a fixed ort honormal basis in L~ ( JR, G ). Lemma 6.3.1 Let {Z n , n 2: I} be a sequence ofp - vector stochas tic pro cess es wit h E Z n = 0 ,
f or 1 ::; i , j ::; p ,
X,
Y E JR. In addition, assume the f ollowing:
There is a covariance m atrix fun ction K (x , y ) = (( K ij(x ,y))), an d ap - vector mean zero covariance K Gaussian process Z suc h th at (i)
(ii) (iii)
2:j'=1 J Kn ij( x , x )dG (x) < 00, n 2: 1. (b) 2:j'=1J K j j (x , x )dG (x ) < 00. 2:j'=1J K nij (x , x )dG (x ) -+ 2:j'=1 J K j j (x , x )dG (x ).
(a)
For every m
2: 1,
(/
Z~aldG, ·· · , / Z~amdG) -+d ( / Z'a 1dG ,· .. , / Z'am dG) .
6.3. L 2-distance tests (iv)
For each i
~
249
1,
Then, Zn , Z belong to L~ (lR, G), and
Zn ~ Z in L~(lR, G) .
(6.3.7)
Proof. In view of Theorem VI.2.2 of Parthasarthy (1967) and in view of (iii) , it suffices to show that for any E > 0, there is an N(= N f ) such that
(6.3.8)
Because of the properties of {ad, Fubini and (i),
(6.3.9)
t1
EIZnl~ = L E
t1
EIZI~ = L
Kn;j(x, x)dG(x) =
1=1
(6.3.10)
,::::1
Kjj(x ,x)dG(x) =
1=1
E
(I Z~a;dG) (I Z'a;dG) 2
,::::1
Thus, to prove (6.3.8), it suffices to exhibit an N such that (6.3.11)
By (ii), (6.3.9) and (6.3.10) , there exists N 1f such that
By (i)(b) and (6.3.10) , there exists N(= N f ) such that (6.3.12)
;~ E (I Z'a;dG)
By (iv) , there exists N 2f such that
2
S E/3.
2,
6. Goodness -oE-fit Tests in R egression
250
Therefore, wit h N = N, := N 1, V N 2
L E( f Z~aidG)2
n ::::N i
~ ~~~ [t? (J Z'a;dG)' - ;~ (J Z~a;dG)'] +3 :::;E. Use (i) (a) to t ake ca re of the case n
< N, . This proves (6.3.7) .
0
R emark 6. 3. 2 Millar (1981) cont ains a special case of th e abov e lemma where p = 1, Zn is the standardized ordinary e.p. and Z is t he Brow nian br idge. The above lemma is an exte nsion of that result to cover more general pro cesses like the "V.E .P.'s under general ind epend ent setting . In applica t ions of th e above lemma , one may choos e {a.} to be such that t he support S, of a , has G(S il < 00 , i ~ 1 and such t ha t {ad are boundecl. 0 Corollar y 6 .3 .1 (a) Under the conditions of Th eorem 6.3.i(a),
. f{
B (Fo) - fo .
](1 - 7 d
f B (Fo)fo dG f f JdG
}2dG = : -G
1,
(say) .
(b) Under the conditions of Th eorem 6.3.i(b),
". f
K1
-7d
2
II
11 B (Fo) - fo . f Bf (Fo)fodG fJdG dG = : -G 1 , (say) .
Here B , B are as in (6.2.6), (6.2.7).
Proof. (b ) Apply Lemma 6.3.1, with a , as in the Rem ark 6.3.2 above, to
W O_ f Wofo dG . f f iJdG JO , odG . f Z = B (F.°) _ f Bf (Fo)f f J dG J O· Direct calculat ions show that E Z n
Kn( x ,y)
= 0 = EZ , and \:j x , y E JR,
:= E Z n(x) Z ~(y)
Ipxpt'(x , y )
=
K (x , y ) = : E Z (x )Z' (y ),
251
6.3. L 2-distance tests where, for x, y E JR, £(x,y)
:=
k(x,y) -a- 1 / 0(Y) _a- 1 10(Y)
k(x , y)
:=
J
k(x,s)d'ljJ(s)
J
k(y, s)d'ljJ(s)
+ a- 2
Fo(x /\ y) - Fo(x)Fo(y), 'ljJ(x) :=
JJ
k(s , t)d'ljJ(s)d'ljJ(t) ,
[ X10dG, a
= 'ljJ(oo) .
oo
Therefore, (5.5.44), (5.5.45) imply (i), (ii) and (iv). To prove (iii), let AI, . . . ,Am be real numbers. Then
"f=';. J J m
Aj
I ZnajdG
=
J
W
0'
bdG -
J ~ JW10d'ljJd'ljJ ,
bd'ljJ =: h(W O ) , (say),
where b := ~;~1 Ajaj. Because e and bdG are finite measures, h(W O) is a uniformly continuous function of W O • Thus by Lemma 6.2.1 and Theorem 5.1 of Billingsley (1968), h(W O) -'td h(B(Fo)), under n; and (NX) . This then verifies all conditions of Lemma 6.3.1. Hence Zn ==::} Z in L~(JR, G) . In particular J IIZnl12dG -'td J IIZII2dG. This and (6.3.5) proves part (a). The proof of part (b) is similar. 0 Remark 6.3.3 The r.v. G 1 can be rewritten as
=
G 1
J
B 2(F. )dG _ 0
{f B(Fo)/odGF
J IJdG
Recall that G 1 is the same as the limiting r.v. obtained in the one sample location model. Its distribution for various G and F o has been theoretically studied by Martynov (1975). Boos (1981) has tabulated some critical values of G 1 when dG = {Fo(l - FO)}-ldFo and F o = Logistic . From AndersonDarling or Boos one obtains that in this case
1 1
G1
=
B 2 * (t)(t(l- t))-ldt -
6(1
1
B(t)dt)2
L NJ Ii (j + 1) j?2
where {Nj } are Li.d. N(O , 1) r.v.'s. From Boos (Table 3), one obtains the following
6. Goodness-oF-fit Tests in Regression
252
Table II .005 1.710
a
ta
.01 1.505
.025 1.240
In Table II , t a is such that P(G 1 > t a ) Stephens (1979). The LV . G2 can be rewritten as
G2
:=
f
IIB(Fo)1I 2dO _
.05 1.046
= a . For some other tables see
IIB(;~{~~GI12
[fB 2(F, )dG _ (fB J(Fo)! odG)2] ~ !JdG ' ~
j
J
0
which is a sum of p independent r.v.'s identically distributed as G 1 . The distribution of such r.v .'s does not seem to have been studied yet . Until the distribution of G 2 is tabulated one could use the independence of the summands in O 2 and the bounds between the sum and the maximum to obtain a crude approximation to the significance level. For p = 1, the asymptotic null distribution of J(i and J(; is the same but th e conditions under which the results for J(i hold are stronger than those for lq . 0 The next result gives an approximation for kj ,j = 1,2. It also follows from Theorem 5.5.1 in a fashion similar to the previous theorem, and hence no details are given . Theorem 6.3.2 Assume that (1.1 .1) , H o, (NX), {5.5.44} - {5.5.46} with F == Fo and {6.2.3} hold. Then , (6.3 .13)
tc, = k2=
f + f IIWO(y) + [Wf(y)
n 1/ 2x A · A - 1 (13 - !3)!o(y)f dG(y) +op(l) A -1(13 -
!3)!o(y)lr dG(y) + op(l).
From this we can obtain the asymptotic null distribution of these statistics when 13 is estimated efficiently for the large samples as follows. Recall the definition of {s.] from (6.2.12) and let
+ nx'(X'X)-lXiSJo1 !o(y) , Q:i(Y) .- I(ei::; y) - Fo(y) + sJo 1!o(y) , 1 ::; i ::; n, y E IE., a (a1, '" ,an)', I' = h1,'" , I'n)'. I'i(y)
I(ei ::; y) - Fo(y)
253
6.3. L 2-distance tests Also, define, for y E JR,
n
=
(6.3.14)
n - 1/ 2
L ' iCY), i= 1
From Theorem 6.3.2 we readily obtain t he Corollary 6.3 .2 A ssum e that {1.1.1}, H o (NX ), (5.5.44) - (5.5.46) with F == Fo, (6.2.11) and (6.2.13) hold. Then,
! Z~1dG+op(1) .
(6.3.15)
k2
(6.3.16)
!IIZn211 2dG + op(l) .
Next , observe t hat for Y ::; z,
J(n1 (Y, Z) .-
=
COV(Z n1(y) ,Zn1(Z)) Fo(y )(l - Fo(z )) - nx '( X ' X )- 1x f o(Y)f o(z ) 1 0
_. en1(y , z ), K n2(y , z)
.-
E Z n2(y) Zn2(Z)' {Fo(y )(l - Fo (z )) -
fo(Y~:o(z) }I p x p
_ . ro(Y , z) I p x p , say.
Now ap ply Lemma 6.3.1 and argue just as in t he proo f of Corollary 6.3.1 to conclude Corollary 6.3.3 (a) In addition to th e conditio ns of Corollary 6.3.2, assum e that
n x' (X'X)- 1x: -7
(6.3.17) Then,
ic,
-td
!
c,
lei < 00 .
Z l (y )dG(y ),
where Z1 is a Gaussian process in L 2 (R , G ) with the covariance function
J(1(x , y)
:=
Fo(x )(l - Fo(y )) -
C
fo (x )fo(y )I 1,
o
x ::; y.
6. Goodness-oE-fit Tests in Regression
254
(b) Under th e conditions of Corollary 6.3.2,
(6.3.18) where Yo is a vect or of p independ ent Gau ssian processes in L~(lR, G) with 0 the covariance matrix TO · I p x p .
Remark 6.3 .4 Again , observe th at asy mptotic null distribution of t he test statisti c K1 based on t he ordinary empirical of the residu als is design dependent whereas th at of t he test based on th e weight ed empiricals K2 is design free. In fact , for p = 1, t he limiting L V . in (6.3.18) is t he same as th e one t hat appears in t he one sample locat ion mod el. For G = Fo = N (o, 1) d.f, Mart ynov (1976) has tabulated the distri buti on of thi s L V .. Stephens (1976) has also tabulate d th e distribution of this r .v. for G = Fo,dG = dG o = {Fo(1 - FO)}- ldFo and for Fo = N (O, 1). For G = F o , Fo = JV(O, 1) eLf., Stephens and Mart ynov's tab les generally agree up to th e two decimal places , t hough occasio na lly there is an agree ment up to three decimal places. In any case, for p = 1, one could use th ese tables to implement t he test based on K2 , at least asy mptotically, whereas the tes t based on K1 , being design depend ent , can not be read ily implemented. For th e sake of convenience we reproduce some of the Stephens (1976, 1979) t ables below. Table III
Fo = N (O, 1)
K2 \ K2 (Fo) K2 (GO)
a
0.10 .237 1.541
0.25 .196 1.281
.05 .165 1.088
.10 .135 .897
In Table III , K2 (G ) stands for t he K2 with G being t he integ rating measur e. K2 (G O) is th e K2 with the Anderson - Darling weights. Table III is, of course, useful only when p = 1. 0 As far as t he asymptotic power of th e above L 2 - test s is concerned, it is appa rent that Theorems 5.5.1,5.5.3 and Lemm a 6.3.1 can be used to dedu ce th e asymptotic power of th ese test s agai nst fairl y general alte rnatives. Here we sha ll discuss th e asy mpto tic behavior of only K; , j = 1,2 und er the
255
6.3. L 2- distance tests
heteroscedastic gross erro rs alte rnatives. More precisely, suppose F 1 is a fixed d.f., and let {8n d be numbers satisfying (6.3.19)
o :S s:
:S 1, 1:S i :S n ;
max
l ~i ~n
8n i --+ O.
Let
n
m l :=
n - 1/ 2
n
L 8ni (F
Fo), m 2 :=
I -
i=1
L A Xni8ni (F
I -
Fo ).
i=1
L emma 6.3.2 Let (1.1.1) hold with eni having the d.f. F ni given by (6.3.20) , 1 :S i :S n. Suppos e that X satisfies (NX ); (Fo , G) and (Fl , G)
satisfy (5.5.44) - (5.5.46) and that
! IFI -
(6.3.21)
FoldG
< 00,
(a) If, in addition (5.6.9) and (5.6.10) hold, then
(6.3.22)
1(* 1
= !{W O+ 1
_ r J (Wf + ml)fodG }2dG + 0 (1) ml)O J f JdG P
provided n
(6.3.23)
n-
1 2 /
L8
ni
= 0(1) .
i= 1
(b) Without an y additiona l conditions,
(6.3.24)
1(*
2
=
! Ilw
o+
O
2
- r J(W J+fJmdG2)fodG 1 1dG + 0 P(1) , m 2 )0
provid ed n
(6.3.25)
L AX
n i 8n i
= 0(1 ).
i=1
Proof. App ly Theorem 5.5.1 and (5.5.31) to Y ni == eni, H ni == Fo , {Fnd given by (6.3.20) and to D = n - 1/ 2[ 1, 0, · · · , 0], to conclude (a) . Apply th e same results to D = A X and the rest of th e ent ities as in t he proof of (a) to conclude (b) . 0
256
6. Goodness-oE-fit Tests in Regression Now apply Lemma 6.3.1 to
(6.3.26)
W o + ml
Zn Z
.-
B(Fo)
f
-
;0
f (W O + ml )jodG f 16 dG
+ al (Fl
- F o)
.f, f{ B(Fo) + al (Fl - Fo)} 1_dG -
f 16 dG
0
Corollary 6.3.4 Under the conditions 01 Lemma 6.3.2{a} ,
1\~
-+d
J
ZZdG,
where Z is as above .
Similarly, apply Lemma 6.3.1 to (6.3.27)
Zn Z
·.-
WO
f
f(W O + ffiz)/odG
+ ffiZ - ;0 B(Fo) + az(Fl f
f 16 dG
'
Fo)
f{B(Fo)
+ az(Fl -
-;0
f 16 dG
Fo)}/odG '
Corollary 6.3.5 Under the conditions of Lemma 6.3.2 (b],
1\; -+d
JIIZllzdG,
where
Z is
as in {6.3.27}.
An interesting choice of Oni = p-l /zIlAx n & Another choice is Oni == n- 1 / Z . Both a priori satisfy (6.3.19), (6.3.23) and (6.3.25) . 0
6.4
Testing with Unknown Scale
Now consider (1.1.1) and the problem of testing HI of (6.1.4). Here we shall discuss the modifi cations of ii; j = 1,2, of Sections 6.2, 6.3 that O will be suitable for HI ' With Wf, W as before, define
«;
(6.4.1)
D 1 (a, u) := sup IWf(ay, uj] ,
Dz(a, u) := sup
y
}(I(a , u) := }(z(a, u) :=
J J
IIW O(ay , u)lj,
y
{Wf(ay, u)}ZdG(y),
IIWo(ay, ullzdG(y) ,
a> 0, u E IRP.
6.4. Testing with unkn own scale
257
Let (a, 13 ) be est imators of (a, (3 ), Dj and ic, stand for Dj (a, 13 ) and Kj( a ,fJ) , respecti vely, j = 1,2 . The following two theore ms give the asym ptotic null distrib ut ion 's of t hese statistics . Th eorem 6.4.1 follows from Corollary 2.3.4 in a similar fash ion as does Th eorem 6.2.1 from Corollari es 2.3.3 and 2.3.5. T heorem 6.4.2 follows from Th eorem 5.5.8 in a similar fashi on as does Th eorem 6.3.2from Theorem 5.5.1. Recall t he conditions (Fo l) and (F 0 3) from Sectio n 2.3.
Theorem 6 .4.1 In addition to (1.1.1) and H I , assume that (NX ) , (F ol) , (F 0 3) an d th e f ollowing hold. In 1/ 2(iT - a) a - 11= Op(1). (b)
(6.4.2) (a)
IIA- 1(13 - (3) 11 = Op(1).
Th en ,
D1
=
sup !W1 (t ) + qo(t){n 1/ 2 x n C B- (3)
+n 1/ 2 (iT - a) Fo- 1(t )}a- 1!+ op(1),
D2
sup
IW(t)+ qo(t){ A - 1(13 - (3) +n 1/ 2 Axn · n 1 / 2(a - a)Fo- 1(t)}a-111 + op(1),
where now W 1(-) := Wp(aFo- 1( .) , (3) and W (-) := W O(a F o- 1 (.), (3 ).
Theorem 6.4.2 Suppos e (1.1.1) and H I , (N X) , (6.4.2), (5.5.45) with F = Fo hold. In addition , supppose F o has a con tin uous density fo su ch that
(6.4.3)
JIylj
0<
ft (y)dG(y) <
lim lim sup n
5-40
k
= 1, 2,
lim
5-40
Th en ,
J J
K1 =
J
ft(y
00,
j
= 0, k = 1, 2; j = 2, k = 2,
+ rn - I / 2 + s)dG(y) =
r E JR,
lyIJ0 (y(1 + s)dG(y ) =
J
ft dG(y ),
J
lylJo(y)dG(y ).
[W f (ay , (3 ) + fo(y){n1/2x~(fJ - (3)
r
+n 1 / 2 (iT - a)y }a- 1 dG(y) + op(1),
K2 =
J
II W O(a y , (3)
+ f o(y ){A -1(13 - (3) 2
+n 1/ 2 AX n . n 1/ 2 (a - a)y}a -111 dG(y) + op(1).
258
6. Goodness-oi-Iit Tests in R egression
Clearly, from these t heorems one can obtain an ana logue of Coro llary 6.3.2 when (0- , iJ) are chosen to be asy mptotically efficient est imators . As is t he case in the classical least squa re t heory or in t he M-est imation met hodology, neith er of the two dispersions K 1 (a, u ) and K 2 (a, u ) can be used to satisfacto rily est ima te (0-, fJ) by t he simultaneous minimi zation process. The ana logues of t he m.d. goodness-of-fit tests tha t should be used are inf{I( j (o-,u);u E /RP }, j = 1,2 . The meth odology of Section 5 may be used to obtain t he asy mptotic distributi ons of th ese statist ics in a fashi on similar to t he above. 0
6.5
Testing for the Symmetry of the Errors
Consider th e mod el (1.1.1) and t he hypothesis H; of sym met ry of th e errors specified at (6.1.5). The proposed tes ts are to be based on Dj s , j = 1,2 , 3 of (6.1.6), (6.1.7), K j (/3 ), and inf {Kj (t ), t E /RP}, j = 1, 2, where (6.5.1)
K t (t )
.-
K i (t )
.-
J J
{wt(y , t )} 2dG(y), IIW+(y , t )11 2dG(y) , t E /RP,
wit h wt and W + as in (6.1.7) and (6.1.8) . Large values of these statistics ar e considered to be significan t for H s . Although the results of Chapters 2 and 5 can be used to obtain t heir asymptotic behavio r under fairl y genera l alternatives, here we sha ll focus only on t he asymptotic null distri bu tion 's of these tests . To state t hese , we need some more notation . For a d .f. F , define F+(y) := F (y) - F (- y), y 2:
o.
Then , with F- 1 denoting th e usual inverse of a d.f, F , we have F;I (t)
(6.5.2)
-r;' (t)
=
F - 1(1
+ t) /2) ,
1
F- ((1 - t) /2 ), O :S t
:s 1,
for all F that are continuous and symmetric aro und O. Fin ally, let wt(t)
.-
w t(F;I(t), fJ),
q+(t )
.-
f (F; l (t )),
O:S
We are now ready to state and prove
W * (t ) := W + (F; I (t ),,B),
t:s 1.
6.5. Testing for symmetry of the errors
259
Theorem 6 .5.1 In addit ion to (1.1.1), H; and (NX) , assume that F in H , and th e estim ator i:J satisf y (F1) and (6.5.3) Th en,
(6.5.4)
(6.5.5)
b. , =
sup Iwt(t) 0:::;t9
Dzs =
+ 2q+(t)n1 /Zx~AA -1((3
sup IIW*(t) + 2q+(t)A - 1((3 09 9
- {3)1
+ op(l) ,
,8)11 + op(1),
and
(6.5.6)
D3 s
= sup IIW*(t) 0:::;t :::;1
+ 2q+(t)A - 1((3 -
{3)11
+ Op(l) .
Proof. The proof follows from Theorem 2.3.1 in t he following fashion. The det ails will be given only for (6.5.6) , as th ey are th e same for (6.5.5) and quite similar for (6.5.4) . Becau se F is continuous and symmetri c ar ound 0 + -1 and becau se W + (-, .) = W+(- ', ') , D 3 s =d sUPO:::; t:::;1 W (F+ (t) ,{3). But , from t he definiti on (6.1.8) and (6.5.2) , it follows t hat for a v E lEt , _
(6.5.7)
A
A
W +(F;1 (t ), v) =
t
I
AXni { ( eni
S F- 1
(1 ; t) + C~i u)
i= 1
+I(en i = S
S F- 1 C ~ t ) +C~iU)
C; ,u) C~ t
+S
t,
u) -
t
AXni ,
-I}
0 S t S 1,
where n
S(t , u )
:=
L
Axn;I(eni
S F- 1 (t) + c~iu) ,
0 S t S 1,
i =1
is a p-vector of Sd - processes of (2.3.1) with X ni == eni, Fni == F == H , Cni == AXni ,U = A - 1(v - {3) and where the jth pr ocess has the weights {d nd given by the j th column of AX. The assumpt ions abo ut F and X imply all the ass umptions of Theorem 2.3.1. Hence (6.5.6) follows from (3.2.6), 0 (6.5.3) and (6.5.7) in an obvious fash ion . Next , we state an ana logous result for t he L z - dist an ces.
260
6. Goodness-oE-fit Tests in R egression
Theorem 6.5.2 In addit ion to {l .1.1}, Hi , (NX) and {6.5.3}, assume tha t F in H s an d th e in tegrati ng measure G satisf y {5.3.8}, {5.5.44}, {5.5.46} an d {5.6.12}. Th en, K t (!3) =
K i (!3) = where
! !
[Wi (Y) + 2f(y)nl/2x~ (!3 IIW+ (Y) + 2f(y )A- 1(13 -
wi O,W + O
no w stan d f or
.8)f dG(y) + op(I), .8)11
2
+ op(I),
dG(y)
wi (-,.8 ),W+ (·,.8).
Proof. Th e pro of follows from two applica t ions of Theorem 5.5.2, once with D = n - 1 / 2[1 , 0 , ·· · , 0) and once with D = XA . In bot h cases, take Yn i and F n i of that th eorem to be equa l to e n i and F , 1 ~ i ~ n resp ectively. The Claim 5.5.2 justifies t he applica bility of that th eorem und er th e pr esent 0 assumptions . Th e next result is useful in obtaining t he asymptotic null distribution 's of t he m.d. test statist ics. Its proof uses T heore m 5.5.2 and 5.5.4 in a similar fashi on as Theorems 5.5.1 and 5.5.3 are used in t he proof of Th eorem 6.3.1, and hence no details are given. Let
KJ := inf { K!(t) ; t E
jRP} ,
j = 1, 2.
Theorem 6.5 .3 A ssum e that {l .1.1}, s ; (NX ), {5.3.8}, {5.5.44}, (5.5 .46) and (5.6.12) hold. (a) If, in addition, {5.6.9} and (5.6.10) hold, then
21 {wi 00
Kt
=
1 wi (1 00
(y) - f (y)
00
f dG
rr 1
f 2dG
dG
+op(I ). {b} Under no addit ional assumptions,
K~ =
21
00
1 w- 1
00
00
Ilw+(y) -
f (y)
fdG(
f 2dGrll1 2dG
+op(I ). To obtain t he asy mptotic null dist ributi on 's of t he given statistics from thi s t heorem we now apply Lemm a 6.3.1 to t he ap proximating processes . The details will be given for Iq only as t hey are similar for Kt . Accordin gly, let , for n ~ 1, y ~ 0, (6.5.8)
261
6.5. Testing for symmetry of the errors
To determine the approximating r.v. for K~ we shall first obtain the covariance matrix function for this Zn, the computation of which is made easy by rewriting Zn as follows. Recall the definition of'IjJ from (5.6 .2) and define for 1 :S i :S n, y E JR,
1
00
O:i(Y)
.-
I(ei:S y) + I(ei :S -y) - 1,
a'
.-
(0:1, · ·· ,O:n);
ai :=
O:id'ljJ;
1
00
a':= (al, · · · ,an);
a :=
f 2dG .
Then (6.5 .9)
Zn(Y)
= AX'[a(y) -
f(y)aa- l],
o.
y 2:
Now observe that under H s , Ecc = 0, EO: l (x)o:dy) x :S y, and , because of the independence of the errors,
= 2(1- F(y)),
0 :S
Ea(x)a'(y) = 2(1 - F(y))I pxp, 0:S x :S y .
(6.5.10)
Again , because of the symmetry and the continuity of F and Fubini, for y 2: 0,
1 1
00
E([I(el :S y)
+ I(el
:S -y) - 1J
x[I(el :S x)
+ Lie,
:S -x) - 1J)d'IjJ(x)
00
[F(x
1\
y)
+ F( -x 1\ y) -
F(y)
+ F(x 1\ -y)
+F( -x 1\ -y) - F( -y)]d'ljJ(x) 2(1 - F(y)){'ljJ(y) - 'IjJ (0)}
+ ~oo 2(1 -
2 ~oo ['ljJ(x) _ 'ljJ(O)JdF(x) = : k(y),
F(x))d'ljJ(x)
say.
The last equality is obtained by integrating the second expression in the previous one by parts. From this and the independence of the errors, we obtain
Ea(y)a' E(i
(i'
k(y)I pxp, y 2: 0,
Ipxp4100 100 (1 - F(y))d'IjJ(x)d'ljJ(y) _ . Ipxpr(F, G) , say.
262
6. Goodness-of-fit Tests in Regression
From these calculations one readily obtains that under Hi, for 0 :S x :S y , Kn(x ,y) := EZn(x)Z~(y)
[2(1 - F(y)) - k(y)f(x)a- 1
-
k(x)f(y)a- 1
+ r(F, G)] I p x p .
We also need the weak convergence of W+ to a continuous Gaussian pro cess in uniform topology. One way to prove this is as follows. By (6.5.10) , (6.5.11) From the definition (6.1.8) and the symmetry of F , we obtain n
(6.5.12)
W+(y)
=
L
AxndI(eni :S y) - I( -eni
< y)}
i =1 n
=
L
AXni{I(en; :S y) - F(y)}
i= 1
n
- LAxn;{I(-eni:S y) - F(y)} ;=1 n
+L
AXn i I
( - eni = y)
i =1
n
(6.5.13)
Wdy) - Wz(Y)
+L
AxnJ(- eni
= y) ,
say,
;= 1
for all y 2 o. Now, let U' ; = (U1 ,Uz, ··· ,Up) be a vector of independent Wiener processes on [0, 1) such that U(O) = 0 , EU == 0 , and EUj( s)Uj(t) = s 1\ t , 1 :S j :S p . Note that EU(2(1 - F( x)))U(2(1 - F(y))) = 2(1 - F(y))I p x p , 0 :S x :S y .
From (6.5.11) and (6.5.12), it hence follows, with the aid of the L-F CLT and the Cramer - Wold device, that under (NX) , all finite dimensional distributions of W+ converge to those of U(2(1 - F)) . To prove th e tightness in the uniform metric, proceed as follows. From (6.5.13) and the triangle inequality, because of (NX) , it suffices to show that WI and W z are tight . But by th e symmet ry and th e continuity of F ,
263
6.5. Testing for symmetry of the errors
But, WI (F-I) is obviously a p-vector of W.E.P.'s of the type WJ specified at (2.2.23). Thus the tightness follows from (2.2.25) of Corollary 2.2.1. We summarize this weak convergence result as Lemma 6.5.1 Let F be a continuous d.f. that is symmetric around {e n i , l :::; i :::; n} be i.i.d. F r.v . 'so Assume that (NX) holds. Then,
°
and
W+(-) :::} U(2(1 - F( ·))) in ('0[0,00], p). The above discussion suggests the approximating process for the Zn of (6.5.10) to be (6.5.14)
Z(y) oo
>
U(2(1 _ F( ))) _ fey) I o U(2(1 - F))fdG v 0. oo y j2dG ,-
._
Io
Straightforward calculations show that Kn(x ,y) = EZ(x)Z'(y),O:::; x:::; y, n ;::. 1. This then verifies (i), (ii) and (iv) of Lemma 6.3.1 in the present case . Condition (iii) is verified as in the proof of Corollary 6.3.1(b) with the help of Lemma 6.5.1. To summarize, we have Corollary 6.5.1 (a) Under the conditions of Theorem 6.5.3(a),
«;
----+d
21
00
[W 1(2(1-F(y)))
_ fey) J~oo W 1 (2(1 - F))fdG]2 dG( ). Iooo j2dG y (b) Under the conditions of Theorem 6.5.3(b),
K~
----+d
21
00
IIZI1 2 dG (y),
with
Z
given at (6.5.14) .
Remark 6.5.1 The distributions of the limiting r.v.'s appearing in (a) and (b) above have been studied by Martynov (1975, 1976) and Boos (1982) for some F and G. An interesting G in the present case is G = A. But the corresponding tests are not ADF. Also because the F in H; is unknown , one can not use G = F or the Anderson - Darling integrating measures dG = dF / {F(l - F)} in these test statistics. One way to overcome this problem would be to use the signed rank analogues of the above tests which is equivalent to replacing the F in the integrating measure by an appropriate empirical of the residuals {Yn j -
6. Goodness-of-fit Tests in Regression
264
:s j :s n} . Let «: denote th e rank of /Yni X~jU; 1 :s j :s n} , 1 :s i :S n , and define X~jU ; 1
x~iul among {/Ynj -
n
z t (t , u )
n - 1/2
.-
L I (Rt :s nt )sgn(Yni -
X~iu) ,
i= l
n
Zr(t , u)
A
:=
L xniI(Rt :s nt)sgn(Yni -
X~iu) ,
i= l
for a :s t :s 1, u E jRP resp ectiv ely, are
The signed rank analogues of Kt ,
Kf := inf{K 1 (u ); u E jRP} ,
q := inf{K 2(u ); u E
K~
statist ics,
n~n,
where for u E jRP , K 1(u)
:=
10
1
1
[zt (t, uWdL(t) ,
K2 (u ) :=10 IIZr(t, u)1I2dL(t) ,
with L E VI[O, 1]. If L (t) == t th en Kj ,j = 1,2 , ar e ana logues of the Cramer - von Mises statistics. If L is specified by the relation dL(t) = {l/t(l- t) }dt , t hen th e corres ponding tests would be t he Anderson - Darling ty pe test of symmet ry. Note tha t if in (3.3.1) we pu t d ni == n- 1/ 2, X ni == e-a. F«: == F , t hen of (3.3.1) redu ces to zt . Similarl y, Z r corresponds to a p - vector of pro cesses of (3.3.1) whose j th component has dni == (j th column of A )'Xni and th e rest of th e entit ies th e same as abo ve. Consequently, from (3.3.12) and argum ent s like t hose used for Theorem 6.5.3, we ca n dedu ce th e following
zt zt -
Theorem 6.5.4 Assume that (1.1 .1) , H; and (NX) hold ; L is a d.f. on [0,1], and F of n, satisfies (FI), (F2). (a) If, in addition , (5.6 .9) and (5.6.10) hold, th en
K S --+d 1
1
1 0
+(t) r 1 U «at. 2 [U1(t ) - q )0 1q ] dL(t ). fo1(q+)2dL
(b) Und er no additional assumptions,
q
--+d
(t) t U dL IIU(t)q q II dL (t) , 1o f (q*)2dL 1
*
*
1
2
0
o
where q*(t ) := 2[f (F - 1((t
+ 1)/2) -
1 (0)], O:S t
:s 1.
o
265
6.6. Regression model fitting
Clearly this theorem covers L(t) == t case but not the case where dL(t) = {l/t(l - tndt. The problem of proving an analogue of the above 0 theorem for a general L is unsolved at the time of this writing.
6.6 6.6.1
Regression Model Fitting Introduction
In the previous sections of this chapter we assumed the model to be a linear regression model and then proceeded to develop tests for fitting an error distribution. In this section we shall consider the problem of fitting a given parametric model to a regression function. More precisely, let X , Y be r.v.'s with X being p-dimensional. In much of the existing literature the regression function is defined to be the conditional mean function f.1(x) := E(YIX = x), x E JRP, assuming it exists , and then one proceeds to fit a parametric model to this function , i.e., one assumes the existence of a parametric family M = {m(x ,O) : x E JRP , 0 E 8} of functions and then proceeds to tests the hypothesis f.1(x)
= m(x, ( 0 ),
for some 00 E
e and Vx E I,
based on n i.i.d. observations {(Xi, Y;) ; 1 SiS n} on (X, Y) . Here, and in the sequel , I is a compact subset of JRP and 8 is a proper subset of the q-dimensional Euclidean space JRq . The use of the conditional mean function as a regression function is justified partly for historical reasons and partly for convenience. In the presence of non-Gaussian errors it is desirable to look for other dispersions that would lead to different regression functions and which may well be equally appropriate to model the effect of the covariate vector X on the response variable Y . For example, if the errors are believed to form a white noise from a double exponential distribution then the conditional median function would be a proper regression function. We are thus motivated to propose a class of tests for testing goodness-of-fit hypotheses pertaining to a class of implicitly defined regression functions as follows. Let 'IjJ be a non decreasing real valued function such that EI'IjJ(Y - 1')I < 00, for each r E JR. Define the 'IjJ-regression function ml/J by the requirement
266
6. Goodness-oi-iit Tests in Regression
that (6.6.1)
E[,¢(Y - m,p(X))IX] = 0,
a.s.
Observe that, if '¢(x) := x, then m,p = u , and if
'¢(x) := '¢o.(x) := I(x > 0) - (1 - a), for an
°<
a
< 1,
then m,p(x) := mo.(x) , the oth quantile of the conditional distribution of Y, given X = x . In general, m,p will exist if'¢ is skew symmetric and the conditional distribution of the error Y - m,p(X) , given X, is symmetric around zero. The weighted empirical process that is of interest here is n
Sn ,,p(x) := n- 1 / 2
L ,¢(Y; -
m ,p(Xd) I(£(X i ) ~ x), x E [-00,00],
i=1
where e is a known function from JRP to JR. Write mr, Sn,I, for m,p, Sn ,,p when '¢(x ) := x , respectively. Tests of goodness-of-fit for fitting a model to m1/ will be based on the process Sn,,p . This process is an appropriate extension of the usual partial sum process useful in testing for the mean in the one sample problem to the current regression setup. To see this , recall the one sample mean testing problem where Y1 , . . . ,Yn is a random sample from a population with mean v and the problem is to test v = vo. The well celebrated cumulative sum test is based on the supremum over i of the process n
L()'j - vo)I(j ~ i),
1 ~ i ~ n.
j=1
Now consider the simple linear regression model Y; = (ijn)(3 + e., 1 ~ i ~ n, (3 E JR. The analogue of the above test for testing (3 = (30 here is based on the maximum over i of the process n
L(Yj
-
Ujn) (3o)I(j
< i),
1 ~ i ~ n.
j =1
In the regression model with the uniform non-random design points j jn in [0,1], where Y; = p(ijn) + Ci , the analogue of the above test for testing p(x) = Po(x), for x E [0,1], and for some known function Po, is based on n
IL(Y 1:::;i:::;n sup
-lloUjn))IU
j
i=1
n
=
~ i)/
ILeYj - poUjn))I((jjn) ~ x)l· 0:::;x:::;1 sup
i=1
6.6.1. Introduction to model checking
267
In the general regression model the role of j In is played by X j and one readily sees that Sn,1 with £(x) = x in the case p = 1, provides the right extension of the above partial sum process to the current regression setup. The function £ gives an extra flexibility in the choice of these processes. In the sequel m", is assumed to exist uniquely, the r.v. £(X) is assumed to be continuous and (6.6.2)
Under the assumptions (6.6.1) and (6.6.2), Theorem 6.6.1 below proves the weak convergence of Sn ,,,, in D[-oo, 00], to a continuous Gaussian process S", with the covariance function K",(x, y) = E'ljJ2(y - m",(X)) I(£(X) ~ x Ay),
x, y E JR.
Since the function T~(X) := K",(x,x) = E'l/}(Y - m",(X)) I(f(X) ~ x)
is nondecreasing and nonnegative, S", admits a representation (6.6.3)
S",(x) = B(T~(X)) ,
in distribution,
where B is a standard Brownian motion on the positive real line. Note that the continuity of f(X) implies that of T", and hence that of B(T~). The representation (6.6.3), Theorem 6.6.1 and thecontinuous mapping theorem yield sup ISn,,,,(x)1 ~ xEIR
sup O::::t::::T~(OO)
IB(t)1 = T",(OO) sup IB(t)j ,
in law.
099
This result is useful for testing the simple hypothesis flo : m ", = mo , where mo is a known function as follows. Estimate (under m ", = mo) the variance T~ (x) by T;,,,,(X) := n- l
n
L
'ljJ2(Yi - mo(Xi))I(£(Xi) ~ x),
x E JR,
i=l
s;,,,
and replace m", by mo in the definition of Sn,,,, . Write for T;,,,,(OO) and let Ie := £(I) . Then, for example, the Kolmogorov-Smirnov (K-S) test based on Sn,,,, of the given asymptotic level would reject the hypothesis flo if
268
6. Goodness-oi-fit Tests in R egression
exceeds an appropria te critical value obtained from t he boundary crossing probabili ties of a Browni an motion on t he uni t interval which ar e readily availabl e. More generally, t he asymptoti c level of any tes t based on a cont inuous function of S~,~ Sn,,,, ( ( T~,1/,) - I ) can be obtained from th e distribution of t he corresponding functi on of B on [0,1], where ( T~ ,,,,) - I( t) := inf{ x E JR : T~ , ,,, (X ) 2: t} , t 2: O. For exa mple, t he asymptotic level of the test based on th e Cramer - von Mises stat ist ic
21 s~, '"
52
0
n .W
2 2
I
2
S n,,,,((Tn,,,, )- (t)) dH (t/ Sn,,,,), 1
is obtained from th e distributi on of J0 B 2 dH , where H is a d.f. on [0,1] . Now consider th e more realistic problem of testing the goodness-of-fit hyp othesis H o : m ,p(x)
= m (x , ( 0 ) ,
for some 00 E
e, x
E T.
Let On be a consistent estimator of 0 0 under H o based on {(X i , Y; ); 1 i ~ n }. Let mn(x) == m( x , On) and define
~
n
S n,,,, (X) = n- 1 / 2
I: 'l/J(Y; -
mn( Xi)) /( e(X i) ~ x) ,
x E JR.
i=1
s. ,
T he process is a weighted empi rical process, where t he weights, at e(X i ) ar e now given by t he 'l/J-residuals 'l/J(Yi - mn(Xi)) . Tests for H o can be based on an ap propriat ely scaled discrepancy of t his process. For example, an ana logue of t he K-S test would reject H o in favor of HI if sup { O"~,~ I Sn ,,,, (x) 1 : x E Ie} is too large, where O"~,,,, := n- 12::Z:I 'l/J2 (Yi mn(Xi)) . These tests , however , are not generally asy mptotically dist ribution free (ADF) . In th e next sub-section we shall describ e a transform of th e Sn,,,, pro cess whose limiting null distribution is known so th at th e tests based on it will be ADF . The description of this transformation needs some assumptions on th e null model which are also state d in the sam e sect ion. A computat ional formul as for computing some of these test s is given in t he Sub-sect ion 6.6.3. All proofs are given in t he Sub-section 6.6.4. But before pr oceeding we discuss Consistency . Here we sha ll briefly discuss some sufficient condit ions t hat will imp ly t he consiste ncy of goodness-of-fit tests based on Sn,,,, for a simple hypoth esis m", = mo against the fixed alte rnative m ", =j:. mo , where mo is a known functi on.
269
6.6.2. Transform i;
By the st ateme nt m'l/J -:j; mo it should be understood that the G-measure of the set { x E JRP ; m'l/J(x) -:j; mo(x)} is positive. Let >.(x , z) := IE{1P(Y -
m'l/J (X) + z) IX = x }, x E JRP, Z E JR. Note that 'l/J nondecreasing implies that >.(x, z) is nondecreasing in z , for eac h x . Assume that for every x E JRP , >.(x , z ) = 0, if and only if z
(6.6.4)
= O.
Let d(x) := m'l/J (x ) - mo(x), x E JRP and
D n(x) := n - 1/ 2
n
L >.(Xi , d(Xi)) I(£(X i) :::; x ),
x E JR.
i= l
An adaptat ion of t he Glivenko-Cantelli arg uments ( see (6.6.31) below) yield s sup In- 1/ 2 D n(x) - E>'(X ,d(X))I( £(X) :::; x )1 -+ 0, a.s., xE IR
where the expectation E is com pute d under the alte rnati ve m'l/J . Mor eover , by Theorem 6.6.1 below, Sn,'l/J converges weakl y to a continuous Gau ssian process. These fact s t ogether with the ass umpt ion (6.6.4) and a r outine arg ument yield the consiste ncy of t he K-S and Cramer-v on Mises t est s based on Sn,'l/J ' Note that t he condit ion (6.6.4) is t rivi ally satisfied when 'l/J (x) == x while for 'l/J = 'l/Jo:, it is equivalent to requirin g t hat zero be the un iqu e o:th per centile of the condi tional distribution of the error Y - m'l/J( X ), given X .
6.6.2
Transform
t; of Sn ,'l/J
This section first discusses the asympt ot ic behavior of t he processes introdu ced in the previous sect ion under the simple and composi te hyp otheses. Then, in the special case of p = 1, £(x ) == x, a transformation Tn is given so that the process TnSn,'l/J has the weak limit with a known distribution . Con sequentl y th e t est s based on the pr ocess TnSn,'l/J are ADF. To begin with we give a general resul t of somewhat ind ep endent interest . For each n, let (Z ni, X i) be i.i.d . r.v .'s, {X i ,l :::; i:::; n } ind ep endent of {Zni ; 1 :::; i :::; n },
IEZn1 == 0,
(6.6.5)
EZ;'l
< 00 , V n 2: 1,
and define n
(6.6 .6)
Zn(x)
= n- 1/ 2 L i= l
ZniI(£(X i) :::; x),
x E JR.
6. Goodness-oi-Iit Tests in Regression
270
The process Zn takes its value in th e Skorokhod space D( - 00, (0 ). Extend it continuously to ±oo by putting
Zn ( - 00) = 0,
and
Zn (oo) = n- 1 / 2
n
L: Zn;· ;= 1
Then Z n becomes a pro cess in V[- oo, 00]. Let (T~ := EZ~ ,1 and L denote th e d.f. of f( X ). Not e that und er (6.6.5) , the covariance function of Zn is := (T~L(x 1\ y) ,
J
x, Y E JR,
The following lemma gives its weak convergence to an appropriate process.
Lemma 6.6 .1 In addition to the above setup and (6.6.5) , assume that (T~
(6.6.7)
--+ 0'2,
for some a ~ O.
Then, Zn ==> B
(6.6 .8)
0
r 2 in th e space V[- oo, oo],
where B 0 r 2 is a continuous Brownian motion on JR with respect to time r 2, where r 2(x) == 0'2 L (x ). Proof. Apply t he Lind eberg-Feller CLT and th e Cramer - Wold device t o show that all finite dim ensional distributions converge weakly to t he right limit , und er t he assumed cond itions . The argument for t ight ness is similar to th at of the W d process of Cha pter 2.2. For convenience, write Z, for Zni and fix - 00 ~ t 1 < t2 < ta ~ 00. Th en we have
[Zn (ta) - Z n(t 2W[Zn(t2) - Zn(td F
=
n- 2
n
?
i= 1
n-
2
n
[L: Z;I (t2 < (( Xi ) ~ ta)r [L: z.i«, < f(X i) ~ t2)] L: ~i~j(k(l,
2
;=1
i ,j ,k ,l
z.t«,
with ~i = Z;I(t2 < f(X i ) ~ ta), ( i = < f (X ;) ~ t2)' Now, if the larg est ind ex among i , j , k , l is not matched by any other , th en lE{ ~i ~j(k(tl = O. Also, since th e two int erval s (t2' ta] and (t 1 , t2] are disjoint, ~i (; == O. We thus obtain (6.6.9)
E {n-
L: ~i~j (k (l} n- L: E {(i(jd } + n- L: E {~i ~j(n ·
2
i,j ,k,l
=
2
2
i,j
i,j
6.6.2. Transform
t;
271
But, by the assumed independence, n- 2
L
n
L
E{(i(j~n
i,j
k-l
E{
k=2
(L(i)
2
~~}
i=1
One has a similar bound for the second term. This together with an argument like in Section 2.2 proves the tigh tness of th e process Zn . 0 Now we consider th e asymptotic null beh avior of Sn,1/J and Sn,1/J ' For the sake of the clarity of the exposition, from now onwards we shall assume (6.6 .10)
Y - m1/J(X) is independent of X .
Lemm a 6.6 .1 applied to Zni == 7I> (Yi - m1/J(Xi)) readily gives the following Theorem 6 .6.1 Assume that (6.6.1), (6.6.2) and (6 .6.10) hold. Then , (6.6.11)
Sn,1/J
=> S1/J, in th e spac e '0[-00 ,00] .
To st udy the asymptotic null behavior of Sn,1/J , the following additional regularity conditions on th e underlying entities are needed . For technical reasons, th e case of a smooth 71> (see (WI) below) and a non-smooth 71> (see (11'2) below) are dealt with separately. All probability statements in thes e assumptions ar e understood to be made und er H o . We make th e following assumptions . About the est imator On assume (6.6.12)
About th e model under H o assume the following: There exists a function rn from ffi.P x e to ffi.q such that rile, ( 0 ) is measurable and satisfies the following: For all k < 00, (6.6.13)
sup n 1 / 2 Im (X i , B) - m(Xi , ( 0 )
-
(0 - Oo)'ril(Xi , ( 0 )1
= op(l) ,
(6.6.14)
Ellril(X , ( 0 )11 2 <
00 ,
:Eo := Eril(X, Oo)ril'(X, ( 0 )
is positive definite,
wher e in (6.6.13) the supremum is taken over n 1 / 2 110 i :::; n .
0011 :::;
k and 1 :::;
6. Goodn ess-oE-fit Tests in R egression
272
('l'1 ). (Smooth ¢ ). The fun cti on ¢ is absolute ly cont inuous with its almost everywhere derivativ e ~ such t hat the fun cti on z H EI ~ (c - z) ~ (c) 1 is continuous at O. ('l'2) ' (Non-smooth ¢ ). The function ¢ is non-decreasing , right cont inuous, bou nded and such t hat t he function
is conti nuous at O. Recall the definition of O'~( x) from t he previous section and not e that under (6.6 .10) , it is a constant fun ction , say, O'~ . Also, let
r t/J
where
f
E[~(c d ,
.- f
for smooth ¢, for non- smooth ¢ ,
f (x ) ¢ (dx ),
is the Lebesgue densi ty of th e err or d.f. F . Let Vel' ) = Em(X , Oo)I (e(X ) ~ x ), x E JR.
Note that under (6.6.14) and ('l' d or under (6.6 .14), ('l'2) a nd (F 1) these ent it ies are well-defined . We ar e now ready to formul ate an asy mptotic expansion of Sn,t/J ' which is crucial for t he subs equent resu lts and t he transformation Tn. Theorem 6.6.2 Assume that (6.6.1), (6.6.12), (6.6.13), (6.6.14), and H o hold. If, in addition , either (A) ('l'l ) holds, or (B ) ('l'2) and (F1 ) hold, then (6.6.15) Remark 6.6.1 The ass umption ('l'd covers many inte resting ¢'s including the leas t square score ¢ (x ) == x and the Hu ber score ¢ (x ) == x I(lxl ~ c) + csign(x) I( lx l > c), where c is a real constant, while ('l'2) covers t he a -quanti le score ¢ (x ) == I (x > 0) - (1 - a ). 0 Now, sup pose additiona lly, rt/J
> 0, and t he est imator On satisfies
(6.6. 16) n
(rt/J ~ o ) - l n - 1 / 2
L m (X ; , Oo)¢ (c; ) + op(l ), ;= 1
6.6.2. Transform Tn
273
where Ci == Y; - m(Xi , ( 0 ) , Then, the following corollary is an immediate consequence of the above theorem and Theorem 6.6.1. We shall state it for the smooth 'ljJ- case only. Th e same holds in the non-smooth 'ljJ. Corollary 6.6.1 Under the assumptions of Theorems 6.6.1 and 6.6.2(A), Sn ,,p ~ S ,p , in the space V[-oo , 00], where s; is a centered continuous Gaussian process with the covariance function
(6.6.17)
Under (6.6.10) , a class of M-estimators of 0 0 corresponding to a given 'ljJ defined by the relation n
On,,p
:=
argmin t lln-
1 2 /
L
m(Xi , t) 'ljJ(Y; - m(X i , t))11
i=1 generally satisfies (6.6.16). A set of sufficient conditions on the model M under which this holds includ es (6.6.13) , (6.6.14) , and th e following additional conditions. In these condit ions m i(O) stands for m(X i , 0) , for the sake of brevity. n
(6.6.18)
n- 1
L Ellmi(O + n-
1 2s) /
- mi(0)1I2
= 0(1),
s E ~q .
i=1 n
(6.6.19) (6.6.20)
L II mi(O + n-
- rlli(O)II = Op(l) , s E ~q. i=1 'tI € > 0, :3 a 8 > 0, and an N < 00,3 'tI 0 < b < 00 , n- 1 / 2
1 2s) /
IIsll :::; b, n > N ,
p( (6.6.21)
n
L
sup n- 1 / 2 Ilrlli(O + n- 1 / 2 t ) - rlli(O + n- 1/ 2 s)11:::; t-sll II ::;" i=1 2: 1 - e,
€)
e'M(O + n- 1 / 2re) is monotonic in r E ~, 'tie E ~q ,
Ilell =
1, n 2: 1,
a.s.
A proof of the above claim uses the methodology of Chapter 5.4. See also Section 8.2 below in connection with autoregressive models. Unlike (6.6.3), th e structure of K~ given at (6.6.17) does not allow for a in terms of a pro cess with a known distribution. simple representation of
s;
6. Goodness-oE-fit Tests in Regression
274
The situation is similar to the model checking for the underlying error distribution as in the previous sections of this chapter. Now focus on the case p = 1, £(x) == x. In this case it is possible to transform the process Sn,,,, so that it is still a statistic with a known limiting null distribution. To simplify the exposition further write mO = me, 0 0 ) , Set A(x) = m(y)m'(y) 1(y 2: x)G(dy) , x E JR,
!
where G denotes the d.f. of X, assumed to be continuous. Assume that
A(xo) is nonsingular for some Xo <
(6.6.22)
00.
This and the nonnegative definiteness of A(x) - A(xo) implies that A(x) is non-singular for all x ::; xo . Write A -1 (x) for (A( x)) - 1, and define, for
x ::; xo, Tf(x) = f( x) -
r
lssx
m'(s)A-'l(s) [!m(z) 1(z 2: s) f(dz)]G(ds) .
The transformation T will be applied to functions f which are either of bounded variation or Brownian motion . In the latter case the inner integral needs to be interpreted as a stochastic integral. Since T is a linear operator, T(S",) is a centered Gaussian process. Informally speaking, T maps Sn,,,, into the (approximate) martingale part of its Doob-Meyer decomposition. Moreover, we have the following fact . Lemma 6.6.2 Under the above setup and under (6.6.10),
Cov[TS",(x) ,T S",(y)] = (T~ G(x /\ y) ,
x , Y E JR,
that is, T S",/ 0'", is a Brownian motion with respect to time G.
The proof uses the independence of increments of the Brownian motion S", and properties of stochastic integrals. Details are left out for interesting readers. To convince oneself about the validity of the above lemma, consider the empirical analog of th is claim where S", is replaced by Sn,,,, . Let
Ln(x)
.-
n- 1/ 2
n
L m(X i==l
i)
1(Xi
2: x) 'l/J(Ei) , s E JR,
6.6.2. Tran sform t;
275
Notice that T Sn,t/J (X) == Sn,t/J (x ) - Un(x ). Now, because of (6.6.1), (6.6.14), and t he assumed LLd. set up, n
ELn(s)Ln(ty
=
a~ n- 1
L E m(X
i)
m ' (X i ) I (X i ~ s V t )
i=l
ELn(s)Sn,t/J (Y)
=
a~A( s vt) ,
=
a~ E[m(X ) I (X ~ s) I (X
=
0,
ify < s ,
=
a~ [v(y) - v (s)),
if y
Use th ese facts to obtain, for x
ESn,t/J (x )Un(y )
.s y) ) s.
.s y,
= l~y rn' {s j A - 1 (s)E [Ln(s) Sn ,t/J(x) )G(ds) = a~ l~x m ' (s)A - l (S) [v(x ) -
ESn,t/J (y)Un(x )
~
v (s))G(ds )
= l~xo m' (s)A -l(S) E [Ln(s)Sn,t/J(y) )G(ds) =
a~ l~x m ' (s)A -l(S) [v(y) -
v (s)) G(ds),
a;;/ EUn(x)Un(y) =
E [ l~x m' {s j A - 1 (s)Ln(s)G(ds)]
x [ ( l~x + 1
l~x l~x m ' (s)A - 1 (s )E [L n (s)Ln(t )'J A -1 (t)m(t)G(ds)G(dt) +
=
r r
i; c.:
m ' (s)A - 1 (s)E [Ln(s) Ln(t )'J A -l(t) m(t)
x G(ds)G(dt )
l~x l~x m ' (s)A - 1 (s)A(s V t) A -1 (t)m(t) G(ds)G(dt) + l~x l
=
a~ [21~x m'(s)A - 1 (S) [v(X) -
V(s))G(ds)
+ l~x m ' (s)A -1 (S) [V (y) - v (x ))G(ds )].
6. Goodness-oE-fit Tests in Regression
276
Using the above derivation one can now easily conclude that
ETSn.1/J(x)TSn.1/J(Y)
= a~ G(x /\ y).
The next result shows that the transformation T takes Sn,t/J in the limit to T Sn,t/J' and hence its limiting distribution is known. Theorem 6.6.3 (A) . Assume, in addition to the assumptions of Theorem 6.6. 2(A}, that (6.6.16) and (6.6.22) hold. Then
sup ITSn,t/J(x) - TSn ,t/J(x)
(6.6.23)
x:S;xo
I= op(l) .
If in addition, (6.6.2) holds, then (6.6.24)
TSn,1/J
==::}
TS1/J and TSn,1/J
==::}
TSt/JinV[-oo ,xo].
(B) . The above claims continue to hold under the assumptions of Theorem 6.6.1(B}, (6.6.16) and (6.6.22}. Note that T Sn ,1/J is not a statistic as the transformation T depends on the parameter 00 and the covariate d.f. G. For statistical applications to the goodness-of-fit testing one needs to obtain an analogue of the above theorem where T is replaced by an estimator Tn. Let, for x E ~, n
Gn(x)
.-
n- 1
L I(X
i
:S x) ,
i=1
An estimator of T is defined , for x :S xo , to be (6.6.25)
Tnf(x)
= f(x)
-
iXoo ril'(y ,On)A;;-1(y) x [/ ril(z ,On) I(z:::: y) f(dz)]Gn(dy).
The next result is the most useful result of this section. It states the consistency of T nSn,1/J for T Sn,1/J under the following additional smoothness condition on m . For some q x q square matrix m(x, ( 0 ) and a nonnegative function K 1 (x, ( 0 ) , both measurable in the x-coordinate, the following holds : (6.6.26)
IEllril(X, Oo)ll} K 1 (X, ( 0 ) < 00, 1E1lrii.(X, Oo)llllril(X, Oo)ll} < 00 , j = 0, 1,
277
6.6.2. Transform Tn and Y to> 0, there exists a 8
Ilril(x,O) - ril(x, 0 0 )
> 0 such that 110 - 0011 < 8 implies
m(x, 0 0)(0 - 00)11 ~
-
to
K 1 (x, 0 0 ) 110 - 0011,
for G-almost all x . We are ready to state Theorem 6.6.4 (A). Suppose, in addition to the assumptions of Theorem 6.6.3(A), (6.6.26) holds. Then, sup ITnSn,,p(X) - TSn,,p(x) I = op(l) .
(6.6.27)
x::;xo
Consequently, -1
'
(Tn,,p TnSn ,,p(') => BoG,
(6.6.28)
in V[-oo,xo].
(B). The same continues to hold under (6.6.26) and the assumptions of Theorem 6.6.2(B).
Now, let a < b be given real numbers and suppose one is interested in testing the hypothesis
H: m ,p(x) = m(x, 0 0 ),
for all x E [a, b] and for some 0 0 E O.
Assume the support of G is ffi., and A(b) is positive definite . Then, A(x) is non-singular for all x ~ b, continuous on [a, b] and A -1 (x) is continuous on [a,b] and
Ellril'(X)A -l(x)III(a < X ~ b) < 00. Thus, under the conditions of the Theorem 6.6.4, (T~,~ TnSn,,p(') => BoG, in V[a, b] and we obtain l '
(T~,,p{TnSn,,p(')
,
- TnSn,,p(a)} => B(G(·)) - B(G(a)),
in D[a,b].
The stationarity of the increments of the Brownian motion then readily implies that sup IB(u)l .
o::;u::;1
Hence, the asymptotic null distribution of any test of H based on D n is known .
278
6.6.3
6. Goodness-of-flt Testing in Regression
Computation of TnSn,ljJ
In this sub -section we discuss some comput ational aspects of the proposed statistics. It is useful for the computational purpose to rewrite TnSn,t/J as follows: For convenience, let m(x) := m(x , ( 0 ),
(6.6.29)
m(x) := m(x, ( 0 ),
mn( x) := m( x , On),
mn(x) := m(x, On),
Ei := Y; - m(X;) ,
Eni := Yi - mn(X i) ,
1:::; i
:::; n .
Let X(i) denot e the i l h order st atistic among {Xj ; 1 :::; j :::; n} , and Sni , denote the corr esponding residuals Eni' Then,
n-1
n- 1 / 2
L
[I(X(i) :::; x )
i=l n- 1
_n- 1
L
m~(X(j))A;;-l(X(j))mn(X( i))
j =l
Let I
Tli :=
L m~(X(k))A;;-l(X(k))mn(X(i)) '
1:::; l , i
:::; n - 1.
k=l If for some 1 :::; j :::; n - 1, X(j) :::; x
TnSn,t/J(X)
< X(j+l) , th en one sees that
= Sj , wher e
s,
:=
j
n-1
i=l
i=j+1
L [1- Tii] 7i'(Sni) - L
Tj i7i'(Sni) .
This is useful in computi ng various tests based on th e above transformation. Now, let 91, . .. , 9q be known real-valued G-square int egrable functions on IR and consider th e class of mod els )\.11 with m( x ,O) = 91(X)Bl
+ ... + 9q(X)()q .
Then (6.6.13) , (6.6.14) and (6.6.26) are trivially sat isfied with m(x , 0) == (gl( X), ' " , 9q(X))',
m(x ,O) == 0 == K 1(x , 0) .
279
6.6.3. Computation ofTnSn,'IjJ
In particular this model includes the simple linear regression model. First consider the case when q = 1, gl(X) = x and assume EX 2 < 00. In this case A(x) == A(x) = 1EX 2 I(X ~ x) is positive for all real x , uniformly continuous and decreasing on JR, and thus A(x) is nonsingular for every x E JR. A uniformly a.s. consistent estimator of A is n
An(x) == n- 1
L xl I(X i ~ x) . i=l
Thus a test of the hypothesis that the regression mean function is simple linear through the origin on the interval (-00, xol can be based on
where
TnSn,I(x)
=
1 2
n- /
~ (l'i -
_t j=l
X/}n) [I(Xi
~ x)
X j Xi I(X j ~ Xi 1\ x) ] L~=l X; I(Xk ~ Xj) ,
n
0";,1
=
n- 1 L (l'i - X i Bn )2. i=l
Similarly, a test of the hypothesis that the regression median function is simple linear through the origin can be based on
where
T n Sn ,.5(X)
=
n-
1 2 /
~ {I(l'i - x.e; > 0) -
.5} [ I(X i
_t
~ xo)
X j Xi [(Xj ~ Xi 1\ x) ] j=l L~=l X; I(Xk ~ Xj) ,
and
n
0";,.5 = n- 1 L{I(l'i -
x.e; > 0) -
.5}2.
i=l By Theorem 6.6.4, the asymptotic null distribution of both of these tests is free from the null model and other underlying parameters, as long
280
6. Goodn ess-oi-iit Testing in Regression
as t he estimator en is t he least squa re est imator in t he form er test and t he least abso lute deviation esti mator in t he lat ter.
=
=
In the case q = 2, 91 (x) 1, 92(X) x , one obtains ril (x , ( 0 ) and A (x ) is t he 2 x 2 symmetric matrix
A (x )
= E I(X ?
x)
(~
= (1, xy
:2 ).
Clearly, E X 2 < 00 implies A (x ) is nonsingular for every real x and A-I and A are cont inuous on lIt The matrix
). provides a uniformly a.s . consiste nt estimator of A (x). Thus one may use sUPx:SXQ ITnSn,I (x) II {(In,IGn (XO)} to test t he hypoth esis t hat t he regression mean function is a simpl e linear model on th e interval (-00, xo ]. Similarl y, one can use t he test statistic
to test the hypoth esis t hat t he regression median function is given by a simple linear funct ion. In both cases A n is as above an d one should now use t he genera l formula (6.6.30) to compute t hese statisti cs. Again, from Theorem 6.6.4 it readily follows t hat the asymptotic levels of both of t hese tests can be computed from t he distribution of sUPO:Su:Sl IB (u)l, pro vided t he estimator On is taken to be , respectively, th e LS and t he LAD. Remark 6.6.2 Theorems 6.6.1 and 6.6.2 can be exte nded to t he case where £(X i ) is replac ed by an r -vector of functions in th e definitions of th e Sn ,1/J and Sn ,1/J , for some positive integer r . In thi s case th e time parameter of th ese pro cesses is an r -dimensiona l vector. The difficulty in transforming such processes to obtain a limitin g process t hat has a known limiting distribution is similar to t hat faced in t ra nsforming t he multi variate empirical pro cess in th e i.i.d. setting . This, in turn , is related to t he difficulty of having a proper definit ion of a multi-tim e par ameter martingale. See Khm alad ze (1988, 1993) for a discussion on t he issues involved . For t hese reasons we restricted our attent ion here to t he one dimensional ~e ~~
0
281
6.6.4. Proofs
6.6 .4
Proofs of some results of Section 6.6.2
Before proceeding further , we state two fact s that will be used below repeatedly. Let {~;} be r.v.'s with finit e first moment such th at {(~i , Xi)} are i.i.d . and let ( i be i.i.d. square integrable LV .'S. Then maxl ~ i ~ n n - 1 / 2 1( i l = op (l) and n
(6.6.31)
sup !n- 1
L ~i I (X
x EIR
i= 1
i :::;
x ) - E6 I (X :::; x)l -+ 0, a.s.
The LLN's implies th e pointwi se convergence in (6.6.31). The uniformity is obtained with th e aid of t he trian gle inequality and by decompo sing each ~j into it s negative and positive par t and app lying a Glivenko-Cantelli typ e argument to each part . R ema rk 6.6.3 We are now ready to sket ch an argument for th e weak convergence of Sn , ,p ( (T~ . ,p )-I) to B und er t he hypoth esis m,p = mo. For t he sake of brevity, let b., := T~ . ,p (00) , b := T~ (00) . Fir st , note t hat sup I T~ , ,p ( (T~ . ,p ) -J (t)) - t l
max n - J 7jJ2(Yj
<
-
l ~i~n
0 9 9 ..
mo(X;))
op(l )
by (6.6.2). Next , fix an On A n,
E
> 0 and let A n := [Ibn - bl :::;
E]
and
Cf :=
1/[1 - ~] .
and sup IT~ . ,p ( (T~ . ,p ) -I ( t ) ) - t l
<
0 9 ~b ..
+
sup b.. < t ~ b n. c.
IT~ .,p ( (T~ .,p ) -I (t ) ) - t l.
The second term is further bo unded from the above, on A n, by ~~: E. But , by th e ET , P( A n ) -+ 1 . The arbit ra riness of E thus read ily implies that
We thus obtain, in view of (6.6.31), sup 0 9 ~ T~(OO )
:::;
IT~ ( (T~ . ,p ) - I (t ) ) - tl
sup I T~ ( X ) - T~ .,p (x ) 1 x EIR
op(l) .
+
sup 09 ~T~ (OO)
IT~ . ,p ( (T~ . ,p) -1 (t )) - tl
282
6. Goodness-oE-fit Testing in R egression
These observations to gether with the continuity of th e weak limit of S n,1/J implies t hat
T herefore, by Theorem 6.6.1, S~,~ Sn , 1/J«T'; , 1/J) - l) ===} B and t he limiting distribution of any cont inuous functional of S~,~ Sn, 1/J « T';,1/J )-1 ) can be obtained from t he distribution of t he corres ponding functional of B. In par ticular the asymptotic level of t he test based on the Cr am er - von Mises ty pe statistic 2
s~,~ l S n.", S?t ,1/J«T?t,1/J) -1(t) )dH(tl s~,1/J) can be obtai ned from t he distri bu tion of tion on [0,1] .
f01 B 2dH , where H is a d.f, func0
For our next lemm a , recall t he notation in (6.6.29) and Let n
'On
= n- 1 / 2 L
It/J(en;) - 'l/J (e;) - (eni -
ei ) ~ (e; ) l lIr(X; ) II .
;= 1
where r is a measurable vector valued funct ion such t hat IEllr(X)1I 2
(6.6.32)
< 00 .
Lemma 6.6.3 Under the assum ptions of Theorem 6.6.2{A) and {6.6.32},
Proof. Fix an
e; =
0:
> O. Let h; :=
0:
+ kllrh(X;)1I
and
{IIAnll :::; k ;
mfx Imn(X ; ) -
m(X;) - ffi' (X;)(O n - ( 0 )1:::;
n~2 } '
T hen by assumpt ion (6.6.12) - (6.6.14) t here exists a lar ge k < int eger N such t hat Vn >N.
00
and an
283
6.6.4. Proofs Now, on B n , we obtain for 1 :S i :S n , und er H o,
Furthermore, by the absolute conti nuity of 'l/J, on B n ,
ti; :S n- 1/ 2
n- 1 / 2
n
t;
IIr(X i)lIh(Xi) i n_ / 2 1-0(Ei - zh(X i)) - -0(Ei )ldz. 1
But , by (1JIt} , t he expected value of t his upp er bound equals - 1/ 2
E(llr(X)llh(X)) n 1/ 2 i~- 1/2 EI-0(E - z h(X )) - -0 (E )ldz = 0(1), t hereby completing the proof of Lemma 6.6.3.
o
Proof of Theorem 6.6 .2. Put , for an x E JR, n
Rn(x) := Sn,,p(X) - S n,,p(x) = n- 1/ 2 L ['l/J(Eni ) - 'l/J(Ei )]I (X i :S x ). i=1 Decomp ose R n as n
Rn (x)
=
n - 1/ 2 L['l/J (Eni ) - 'l/J (Ei) - (Eni - Ei )-0 (E;) ]I( X i :S x ) i= 1
n
- n - 1/ 2 L[mn(X i) - m (X i) i=1 n
- n - 1/ 2 L
ril' (X i ) -0 (Ei)I (X i :S x ) (On - ( 0 )
i=1
The te rm R n3(x) is equal to n
n - 1L
ril' (X i ) -0(Ei )I (X i :S x ).
i=1 By an application of (6.6.31) we readily obtain that sup IIRn3(x) - 'Y,p v(x )11= op(l ). x EIR
Due to (6.6.12), it thus rema ins to show t hat R n1 and R n2 tend to zero, uniformly in x, in probabilit y. The assertion for R n1 follows immediately
284
6. Goodness-oE-fit Testing in R egression
from Lem ma 6.6.3, because it is un ifor mly bounded by t he V n wit h r == 1. As to R n2, recall the event B n from the proof of Lemma 6.6 .3 and not e t hat on B n , n
sup IRn2(x )1 ~
O'n -
1
L
I ~(cdl
= 0(0') ,
a .s.,
i=1
x
> 0 is arbitrarily chosen, t his completes the proof
by the LLN's. Since a of part (A) .
As to the proof of part (B) , pu t , for 1
~
i
~
n , t E IRq, a E IR,
dn,i(t )
.-
mi X« , (}o + n- 1 / 2t ) - m (X i) ;
r n,i
.-
n- / (2O'
J-ln(X i , t , a)
.-
E 'IjJ(Ci -
I 2
+ Jllril(Xi)ll), a > 0, J > 0; dn,i(t) + arn ,i) '
Define, for a, x E IR and t E IRq,
n
.-
n-
I 2 /
L['IjJ (Ci - dn,i(t)
+ arn ,i)
- J-ln(.Xi, t , a)
i= 1
- 'ljJ(Ci)] I(X i ~ x ).
Write D n(x , t ) and J-ln(X i , t ) for D n(x , t , 0) and Il n(Xi , t , 0), respectively. Not e th at by the i.i.d . ass umpt ion, Var(D n (x , t , a))
< lE['IjJ (c - dn,1 (t) + ar n,d - J-ln(X , t , a) - 'IjJ (cp d ]2 ~
lE['IjJ(c - dn,I (t ) + ar n,l) - 'IjJ (c)f --+ 0,
by assumption (6.6.13) and (W2) ' Upon an application of Lemma 6.6.2 with Zni = 'IjJ (ci - dn,i(t) + arn ,i) - J-ln(X i , t , a) - 'IjJ (ci ) we readi ly obtain that (6.6.33)
V a E IR, t E IRq.
sup IDn(x , t , a)\ = op (l ), x ER
We need to prove t hat for every b < (6.6.34)
sup
00 ,
IDn( x , t)1
= op(l) .
x EIR, lItll :Sb
To that effect let en
:= {
sup Idn,i (t )1 ~ n IItll ::;b
I 2 / (O'
+ bllril(X i )II ), 1 ~ i ~ n },
6.6.4. Proofs and for an
285
Iisil
~ b,
let
sup
A n := {
II tll:5:b,lIt-sll:5:c5
Idn,i (t ) - dn,i(S) 1 ~ I'n,i' 1
By assum pt ion (6.6.13), t here is an N < that Vb < 00 and V Iisil ~ b,
00,
(6.6.35)
Vn > N.
~ i ~ n}n en'
depending only on a , such
Now, by t he monotonicity of 'If; one obtains t hat on An ' for each fixed IIsll ~ b and V IItll ~ b, with lit - sll ~ 6,
IDn(x , t) 1
(6.6.36)
< IDn(x , s, 1)1 + IDn(x , s, -1)1
+
n
In-
1 2 /
2: rPn(X i , s , 1) - Pn(X i ,
s, -1 )] I (X i ~ x) I.
i= l
By (6.6.33), t he first two terms converge to zero uniformly in x, in probability. Moreover, by t he definition of I'n,i' and by t he LLN's n
n- 1 / 2
n
2: I'n,i = n - 2: (2a + 61 Iril(X )ID= Op(a + 6). 1
i=l
i
i= l
In view of t his and (F l) , t he last term in (6.6.34) is bounded above by n-
1 2 /
n
~
/ 00 IF(y + dn ,i(S) + I'n,i) - 00
-F (y
+ dn,i(S)
- l'n,i )!'If; (dy )
n
< K n - 1 / 2 2: I'n,i = Op(a + 6) , i= l
which can be made ar bit rarily sma ller t ha n a posit ive constant multiple of a, by the choice of 6. This toget her with the compact ness of t he set {lItll ~ b} proves (6.6.34) . Next , by (6.6.1), (W2 ), (F l) , Fubini 's t heorem, and t he LLN's, we read ily readily obtain sup xEIR, lIt ll:5:b
In-
1 2 /
n
2: Pn(Xi , t) I (X i ~ x) + v' (x h ", tl = op(l) . i=l
This toget her wit h (6.6.34), (6.6.35) and the assumpt ion (6.6.12) proves (6.6.15) an d hence t he part (B) of Theorem 6.6.2. 0
6. Goodness-oE-fit Testing in Regression
286
Remark 6.6.4 By (6.6.22), Al := inf{a'A(xo)a; a E and A(x) is positive definite for all x ::; xo, Hence,
~q ,
lIall = I} > 0
and (6.6.14) implies (6.6.37)
Ellm'(X)A -1(x)III(X ::; xo) ::; Ellm(X)11 All <
00 .
This fact is used in th e following proofs repe atedly.
o
Proof of Theorem 6.6.3 . Details will be given for Part (A) only, they being similar for Part (B) . We shall first prove (6.6.28). From th e definitions of T , we obtain that (6.6.38)
TS n,t/J( x)
=
Sn,t/J( X) X
TSn,t/J( x)
Sn" t/J(x) -
[~ m'(y) A -1(y) [~oo m(t) Sn,t/J(dt)] G(dy),
[Xm'(y) A -1(y) oo
X
[~oo m(f)Sn(dt)]G(dy) .
As before, set ~n := n 1 / 2 (On - ( 0 ) . From (6.6.15) we obtain, uniformly in x
E~,
(6.6.39)
Sn,t/J( X) = Sn,t/J(x) - "Yt/J v'(x) ~n
+ op(l) .
The two int egrals in (6.6.38) differ by
!~ m'(y) A -l(y) Dn(y)G(dy) , where Dn(y)
:=
n - 1/ 2
n
L m(Xi) [¢(ci) - ¢ (cni )]I (X i :::: y) . i =1
6.6.4. Proofs
287
This process is similar to the process R n as studied in the proof of Theorem 6.6.2(A) . Decompose Ir; as
Dn(y)
=
n- 1/ 2
n
L ril(Xi) [¢(€i) -
¢ (€ni)
;=1
+ n- 1/ 2
n
L ril(Xi) [mn(Xi) -
m(Xi)
i=l -ril'(Xi) (On - ( 0 ) ] ~ (d I(X i 2 y) n
+ n- 1/ 2 L
ril(Xi) ril'(Xi) ~(€i)I(Xi 2 y)
i=l (6.6.40)
say.
Lemma 6.6.3, with r
= m and
the triangle inequ ality readily imply
supIlD n 1 (y)1I = op(l) . yEIR
This fact together with (6.6.37) yields (6.6.41)
xs~~o Ii X oo ril'(y) A -l(y) D n1(y) G(dy)1 = op(l) .
Recall B n from th e proof of Lemma 6.6.3. Then, on B n , n
(6.6.42)
< an- 1
sup IIDn 2 (y)1I yEIR
L
Ilril(Xi)III ~(€i)1
;= 1
O(a) , a.s .,
by the LLN's. Arbitrariness of a and (6.6.37) yield (6.6.43) Now consider the third term. We have n
D n 3 (y)
= n- 1 L
ril(X;) ril'(Xi) ~ (€;) I(X i 2 y).
i=l An application of (6.6.31) together with (6.6.10) yield that sup yEIR
IID n 3 (y) -
I',p A (y ) II -70, a.s .
6. Goodness-of-fit Testing in Regression
288 This tog ether with the fact
II~nll
= op(l) entails that
[
:~~J i~ ril'(y) A-I (y) D n 3(y) G(dy)
(6.6.44)
- "ft/J v'(x)]
~nl = op(l) .
The proof of the claim (6.6.23) is complete d upon combining (6.6.41) (6.6.44) with (6.6.40). Next , we t urn to the proof of (6.6.24). In view of (6.6.23) , it suffices to prove TSn ,t/J ~ TS1/J ' To this effect , not e that for each real x, TSn ,t/J(x) is a sum of centered finite variance i.i.d. r.v.'s . The convergence of the finite dimension al distributions thus follows from th e classical CLT . To verify th e tightness, because Sn,1/J is tight and has a continuous limit by Theorem 6.6.1, it suffices to prove th e same for the second term of TSn ,t/J in (6.6.38). To th at effect , let x ~ Xo.
Note that 0 is nond ecreasing, continuous and by (6.6.37), ¢>(xo) rewrite th e relevant term as Kn(x)
.-
n - 1/ 2
~ 'l/J(ci ) i
X oo ril'(y)A
-1
< 00 . Now,
(y)ril(Xi)I(X i ;::: y) xG(dy) .
Because the summands are martingale differences and because of (6.6.14) we obtain , with the help of Fubini 's th eorem , that for x < y , E[Kn (y) - Kn( x)F =
O'~
lY lY ril'(s)A-
By (6.6.14), IIAlloo obtain th at
1(
s)A(s vt)A- 1(t)ril(t)G(dt)G(ds) .
sUPxEIR IIA(x) II ~ J~oo Ilrilll 2 dG
E[Kn (y) - Kn( xW
<
00 .
<
O'~IIAlioo [lY llril' A -III dGf
=
O'~IIAll oo[¢>(Y) - ¢>(xW ·
We thus
This then yields th e tightness of the second term in (6.6.38) in a standard fashion and also completes the proof of th e Th eorem 6.6.3(A) . 0 For th e proof of Theorem 6.6.4 th e following lemma will be crucial.
289
6.6.4. Proofs
Lemma 6.6.4 Let U be a relatively compact subset of D[-00 , xoJ . Let L, L n be a sequence of random distribution functions on lR such that sup ILn(t) - L(t)1
~
0, a.s.
t:S;xo
Then
t:S;XsO~~EU I
t
!
a(x)[Ln(dx) - L(dx)JI = op(l).
-00
Its proof is similar to that of Lemma 3.1 of Chang (1990) and uses th e fact that the uniform convergence over compact families of functions follows from the uniform convergence over intervals. In the following proofs , the above lemma is used with L n == G n and L == G and mor e generally, with L n and L given by the relations dL n == hdG n , dL == hdG , wher e h is an G-integrable function. As to the choice of U , let {an} be a sequence of stochastic processes which ar e uniformly tight , i.e., for a given <5 > th ere exists a comp act set U such that an E U with probability at least 1- <5 . Apply Lemma 6.6.4 with this U and observe th at a n 't U with small probability to finally obtain
°
s~p I
t
!
an(x)[Ln(dx) - L(dx)J! = op(l) .
-00
As will be seen below, th ese types of integrals appear in the expansion of
TnS; ,t/J. Proof of Theorem 6 .6.4. Again , the details below are given for Part (A) only, and for convenience we do not exhibit (}o in K 1 and iii . First, note that by (6.6.14) n-1 /2maxillm(Xi)11 = op(l) . Then by (6.6.26) and th e LLN's , on an event with probability tending to one and for a given E > 0, n
n-
1
L
Ilm(Xi)llllmn(Xi) - m(Xi)11
i=l
n
< n-1 /2maxillm(Xi)llll~nll{ En- 1L K 1(Xi ) i=l
n
+n- 1
L
Ilm(Xi)ll}
i= l
Similarly one obtains n
n- 1
L Ilmn(Xi) - m(Xi)11 2 = op(l) . i=l
= op(l).
290
6. Goodn ess-oE-fit Testing in Regression
These bounds in t urn together wit h (6.6.31) impl y t hat sup IIAn(y) - A (y )1I yE R
n
< 2n- 1
L Ilm(Xi)l llImn(Xi) -
m (X i )1I
i= l
n
+ n- 1 L
Ilmn( X i) - m (Xi )11
2
i= 1
n
+ sup Iln-
1
y E IR
=
L m (X i )m' (Xi )I(X
i
~ y) - A(y) 1I
i= 1
op(l) .
Consequently we have sup IIA;; 1(y) - A - 1(y)11 = op(l) .
(6.6.45)
y :Sxo
Next , we sha ll prove (6.6.27) . Let
Then we have
so t hat from (6.6.38) we obtain, uniform ly in x E lR, TnSn(x) - T Sn(x) '-
- , v' (x ) ~ n
+ op(l)
[Xm' (y ) A - 1(y) U n(y) G(dy ) - [Xm' (y, On) A;; (y) U~ (y) c; (dy)
+
oo
1
oo
(6.6.46)
- ,v' ( x )~ n
+ op(l) + B ndx ) - B n2(x),
say .
We shall shortly show t hat (6.6.47)
sup IIUn(x) - U n(x ) x :Sxo
+,,, A (x ) ~nll = op(l).
App ly Lemma 6.6.2 k tim es, jth time wit h Zni == m j (X i )'lj! (c;} , where rn , is t he jth component of m , 1 j k. Then under t he assumed condit ions
:s :s
291
6.6.4. Proofs
it follows that U n is tight . Using (6.6.45) , (6.6.47) , Lemma 6.6.4, and the assumption (6.6.26) , we obtain
B n2(x ) =
iX oo
rn' A -I u, so;
- 'Y i~ rn' dG ~n + op(l) ,
i~ m'A-IUndG- 'Yl./(X)~n + op(l) , uniformly in x
~ Xo,
which in t urn togeth er with (6.6.46) implies (6.6.27) .
We shall now prove (6.6.47). Some of th e arguments are similar to th e proof of Theorem 6.6.3. Now, rewrite n
U n(y)
=
n - 1/ 2 L mn(Xi) 7/J(Sni ) I(X i ~ y ) i= 1
n
n - 1/ 2 L m n(X i) [7/J(Sni) - 7/J(Si) i =1
n
+ n - 1/ 2 L m n(X i) [m(X i) - m n(X;) i= 1
n
- n-
I
Lmn (Xi)m~ ~(si)I(Xi ~ Y) ~n i=1
n
+ n - 1/ 2 L m n(X i) 7/J (Si) I(X i ~ y) i= 1
Observe t ha t T n1, T n2 are, respectively, similar to D n1, D n2 in th e proof of Theorem 6.6.3 except th e weight s m(X;) are now replaced by mn(Xi) . We sha ll first approxima te T n1 by D n1. We obtain , for a given e > 0,
n
< n- 1
L
[lI m(X i)1I + fJ(I(X i ) ] II~nll
i= 1
h( X;)/n 1/2
X
r
) - h ( X ;)/n l / 2
=
1~(Si -
s) -
~(si) 1 ds
op(l) .
A similar , but simpler , argument using the assumpt ion (6.6.13) shows th at sUPY EIR IITn2(y) - D n2(y)11 = op(l ). Since D nl and D n2 tend to zero uni-
292
6. Gooduess-oi-Iit Testing in Regression
formly in y, we conclude that
Again, using (6.6.26) and (6.6.31) we obtain n
T n3(y)
L:: m(Xi) m~ ~(ci)I(Xi ~ y) + op(l)
=
n- 1
=
i=l -yA(y) + op(l),
uniformly in y E lit We now turn to T n4. We shall prove n
(6.6.48)
sup IITn4(y ) - n- 1 / 2 yEIR
L:: m(Xi) 'lj;(ci ) I(Xi 2: y)1I = op(l) . i=l
To that effect let gni := mn(Xi) - m(Xi)(Oo) - m(Xi)(Oo)(On - 0 0 ) and n
fn(y) := n -
1 2 /
L:: gni 'lj; (ci ) I (X i 2:Y) · i=l
Clearly, by (6.6.26), on a large set , n
sup
Ilf n(y)11 :::;
yEIR
€
kn-
1
L:: K1(X
i)
I'lj;(s.) I =
Op(€) .
i=l
But, because of (6.6.1) and (6.6.31), n
sup IlnyE IR
1
L:: m(Xi)(Oo) 'lj;(ci) I(Xi 2: y)11 = op(l). i=l
The claim (6.6.48) thus follows from these facts and the assumption that Lln = Op(l), in a routine fashion. This also completes the proof of (6.6.47) and hence that of the theorem. 0 Notes: We end this section with some historical remarks. An and Bing (1991) have proposed the K-S test based on Sn,I and a half sample splitting technique a La Rao (1972) and Durbin (1973) to make it ADF for fitting a simple linear regression model. They also discuss the problem of fitting a linear autoregressive model of order 1. See also section 7.6 of this monograph on this. Su and Wei (1991) proposed K-S test based on the Sn,Iprocess to test for fitting a generalized linear regression model. Delgado (1993) constructed two sample type tests based on the Sn,I for comparing two regression models. Diebolt (1995) has obtained the Hungarian-type
6.6.4. Proofs
293
strong approximation result for the analogue of Sn,I in a special regression setting. Stute (1997) investigated the large sample theory of the analogue of Sn,I for model checking in a general regression setting. He also gave a nonparametric principal component analysis of the limiting process in a linear regression setup similar to the one given by Durbin et at. (1975) in the one sample setting. The transformation T is based on the ideas of Khmaladze (1981). Stute, Thies and Zhou (1998), Koul and Stute (1999) discussed this in the regression and autoregressive settings, respectively. The above proofs are adapted from the latter two papers.
7 Autoregression 7.1
Introduction
The purpose of the Chapters 7 and 8 is to offer a uni fied functional approach to some aspects of rob ust est imation and good ness-of-fit testing pr oblems in auto regressive (AR) and condit ionally heteroscedasti c autoregressive (ARCH) mod els. We sha ll first focus on th e well celebrated p-th order linear AR models. For t hese models, th e similarity of t he fun cti onal approac h developed in the previous chapters in connection with linear regression models is t ra nspa rent . This cha pter t hus extends t he domain of applications of t he statistical meth odology of th e previous cha pte rs to th e one of th e most applied mod els with depend ent observat ions. Chapter 8 discusses th e developm ent of similar ap proac h in some genera l non-lin ear AR and ARCH models. As before, let F be a d.f, on JR, p 2: 1 be an integer , and Yo := (X o , X- I , ' " , X 1 - p ) ' be an observa ble rand om vecto r . In a linear AR(p) mod el th e observat ions {X;} are such t hat for some p' = (Pl ,P2, ' " , pp) E JRP, (7.1.1)
ar e i.i.d. F r.v.'s, and indep endent of Y o. Processes tha t play a fundament al role in t he robu st estimation of p in this model are th e ran dom ly weighted residu al empirical processes n
(7.1.2)
Tj (x , t )
.-
n- 1 L
gj (Y ;- l )I(X ; - t' Y;- 1
:s x) ,
;= 1
H. L. Koul, Weighted Empirical Processes in Dynamic Nonlinear Models © Springer-Verlag New York, Inc 2002
295
7.1. Introdu ction
for x E JR, t E JRP, 1 :S j :S p , where 9 := (gl, ' " , gp) is a p-vector of measurable functions from W to JR and Y i - 1 := (Xi-I, ' " , X i - p ) ' , 1 :S i:S n . Let T := (T 1 , ' " , T p )' . Note t hat n
T (x , t ) = n- 1 Lg (Y i - 1)I (X i
t' Yi-l :S z ),
-
x E JRP , t E W .
i=1
The genera lized M - (GM) est ima to rs of p , are soluti on t of t he p equations (7.1.3)
Q(t ) :=
f
'l/; (x) T (dx , t ) = 0 ,
where 'l/; is a nondecreasing bounded measurable function from JR to JR. Th ese est imators are ana logues of M-estimators of 13 in linear regression as discussed in Chapter 4. Note t hat taking x I [lx l :S k] + kxl xl -
'l/; (x )
xI[lIx l1 :S k]
g (x )
1
1[1:1:1> k],
+ kxllxll-
1
I[lIx ll
x E JR,
> k],
x E JRP ,
in (7.1.3) gives th e Hub er (k) estimators and taking g (x ) == x , x E JRP and ljJ (x ) == x , x E JR, gives t he famous least squa re estimator . The minimum distance est ima tor that is an ana logue of of (5.2.15) is defined as a minimizer , w.r.t. t E JRP of
pt
(7.1.4)
I3ri
Kg(t )
tf
[n -
1 2 /
t
gj( Y i-d { I (X i :S x
+ t'Yi- d
i=l
j=1
-I (- X i
<X
-
t'Yi - l } f dG (x ).
Observe tha t Kg involves T . In fact , \:j t E JRP, Kg (t )
=
tf j=1
[n
1 2 / { T j (x , t )
-
t
i=1
gj( Y i- 1 )
f
+T j ( - x , t ) } dG( x) . Three members of this class of est ima tors are of special int erest . They correspond to th e cases g (x ) == x , G( x) == Xi g (x ) == x , G == 0o , t he meas ure degenerate at 0; g (x ) == x , G == F in t he case F is known. Th e first gives an analogue of th e Hodges - Lehmann (H-L) estimator of p , the
296
7. Autoregression
second gives the least absolute deviation (LAD) estimator, while the third gives an estimator that is more efficient at logistic (double exponential) errors than LAD (H-L) estimator. Another important process in the model (7.1.1) is the ordinary residual empirical process n
(7.1.5)
Fn(x, t) := n- 1
L I(X
i -
t'Y i - 1 ~ x),
x E JR, t E JRP.
i=1
An estimator of F or a test of goodness-of-fit pertaining to F are usually based on F n (x, jJ), where p is an estimator of p. Clearly Fn is a special case of (7.1.2) . But, both Fn and Tj, 1 ~ j ~ p , are special cases of (7.1.6) n
n- 1
L h(Yi-dI(Xi -
t'Y i -
1
~ x)
i=1
n
=
n-
1
L h(Yi-dI(si ~
X
+ (t - p),Yi-d,
i=1
for x E JR, t E JRP, where h is a measurable function from JRP to JR. Choosing h(Yi-d == gj(Yi-d in W h gives Tj, 1 ~ j ~ p and the choice of h == 1 yields Fn . From the above discussion it is apparent that the investigation of the large sample behavior of various inferential procedures pertaining to p and F, based on {T} and F n (-, p), is facilitated by the weak convergence properties of {Wh(x, p + n- 1 / 2u) , x E JR, u E JRP}. This will be investigated in Section 7.2, with the aid of Theorem 2.2.4. In particular, this section contains an AUL result about {Wh(x ,p+n- 1 / 2u) , x E R, Ilull ~ b} which in turn yields AUL results about {T(x,p+n- 1 / 2u), x E JR, lIuli ~ b} and {Fn(x, p + n- 1 / 2u), x E JR, Ilull ~ b}. These results are useful in studying GM - and R- estimators of p, akin to Chapters 3 and 4 when dealing with linear regression models. They are also useful in studying the large sample behaviour of some tests of goodness-of-fit pertaining to F. Analogous results about the ordinary empirical of the residuals in autoregressive moving average models are briefly discussed in Remark 7.2.4. Generalized M- estimators and analogues of Jaeckel's R- estimators are discussed in Section 7.3. In the same section we also discuss a class of generalized R- (GR-) estimators. In order to use GR or m.d. estimators to
7.2. AUL OF WH AND FN
297
const ruct confidence intervals one often needs consistent estima tors of t he functional QU ) of t he err or densi ty f. Appropriate analogues of estimators of QU ) of Section 4.5 are shown to be consistent und er (Fl ) and (F 2). This is also done in Section 7.3, with the help of the AUL property of {F n (x , p + n - 1 / 2 u ), x E~, Ilull ::; b}. This result is also used to prove the AUL of serial rank correlat ions of t he residu als in an AR(p) model. Such results should be useful in developing analogues of t he method of moment est imators or Yule - Walker equations based on ranks in these models . Section 7.4 investigates t he behav iour of two classes of m.d . est imators of p, including th e class of est imators {p~n. A crucial result needed to obtain th e asymptotic dist ributions of these esti ma tors is th e asymptotic uniform quadraticity of th eir defining dispersions. This result is also proved in Section 7.4. Th e regression qua ntiles of Koenker and Baset t (1978) are now accepte d as an ap propriat e extension of t he one sample quantiles in t he one sample location mode ls to th e multip le linear regression model (1.1.1). Sectio n 7.5 discusses an exte nsion of t hese regression quantiles to t he stationary ergodic linear autoregressive time series. T he closely related autoregression ran k scores are also discussed here. Section 7.6 contains appropr iat e analogues of some of t he goodness-offit tests of Chapter 6 pertaining to F , while Secti on 7.7 discusses at some length t he goodness-of-fit tests for fit tin g an auto regressive model of ord er 1. Th e prop osed tests are analogues of t he tests of Section 6.6. T hese tes ts ar e based on certain ma rti nga le transforms of a partial sum process of t he estimated innovatio ns.
7.2
AUL of W h and
r;
Recall t he notation from (2.2.26), and th e st at ement of Theorem 2.2.4. In (2.2.26), take (7.2.1)
(ni
-
Ci,
h ni == h( Y i-d , 6ni == n- 1 / 2 u 'Y i _ l ,
AnI
a - field {Yo},
Ani
a - field {Yb, C I , . . . ,ci - d, 2::; i ::; n .
Th en one readil y sees th at the corresponding Vh (x), V'; (x) are, respectively, equal to Wh( x ,p + n - 1 / 2 u ), W h(X,p) for each u E ~p and for all x E ~.
298
7. Autoregression
Consequently, if we let n
(7.2.2)
L h(Yi-dF(x + (t -
Vh(X, t) := n- 1
P)'Y i- 1 ) ,
i=1
Wh(X, t) := n I/ 2[Wh(X, t) - Vh(X, t)],
x E JR,
t E JRP,
then the corresponding U;'(x), Uh(x) are, respectively, equal to Wh(X, p), Wh(x, p+n- 1/ 2u) , for each u E JRP and for all x E JR. Recall the conditions (F1) and (F2) from Corollary 2.3.1. We are now ready to state and prove the following Theorem 7.2.1 In addition to (7.1.1), assume that the following conditions hold:
(7.2.3) (7.2.4)
n
n- ~ h 1
(
2(Y
) 1/2 i
_d
= a + op(l) ,
a a positive r .v.
n- 1/ 2 max IIh(Y i - dll = Op(l). 1::o,::on
n-
1/ 2
(7.2.6)
n-
1
(7.2.7)
F satisfies (F1) and (F2) .
(7.2.5)
max IIY i - 1 11 = Op(l).
1::0'::0" n
L IIh(Y i-dY i - t11= Op(l) . i=1
Then, for every 0
(7.2.8)
< b < 00 ,
sup
IWh(X, p
+ n- 1/ 2u)
- Wh(X , p)1 = op(l),
x EIR,lIu ll::o b
(7.2.9)
n 1/ 2 [Wh(X,p
+ n- 1/ 2u)
- Wh(x ,p)]
n
= -u'n -
1
L h(Yi-dYi -lf(x) + up(l) . i=1
where up(l) is a sequence of stochastic processes that conv erges to zero, uniformly over the set x E JR, lIuli
~
b, in probability.
Remark 7.2.1 If in the above theorem the condition (7.2.7) is weaken to (7.2.10)
F has a continuous and positive density
f on JR.
then, for every k < 00 , the analogues of (7.2.8) and (7.2.9) where the supremum over x E JR is changed to the supremum over the set x E [-k , k],
7.2. AUL OfWh &
r;
299
continue to hold . This follows from adapting the following proof to the compact subsets. Note that f continuous on JR implies that it is uniformly 0 continuous on every bounded interval of JR. Proof of Theorem 7.2.1 . In view of the discussion preceding the statement of the theorem it is clear that (2.2.32) of Theorem 2.2.4 applied to entities given in (7.2.1) above readily yields that sup IWh(X, p + n- 1 / 2 u ) - Wh(X, p)1 = op(l), xEIR
for every fixed u E JRP . It is the uniformity with respect to u that requires an extra argument and that also turns out to be a consequence of another application of (2.2.32) and a monotonic property inherent in these processes as we now show. Since h is fixed, it will not be exhibited in the proof. Also, for convenience, write W( ·), Wu (-) , W';=(-), v~(-) etc . for Wh(-, p), Wh(', p + n- 1 / 2u), Wt'(-, p+n 1 / 2u) , vt(-, p+n 1 / 2u) etc . with ± signifying the fact that h± now appears in the place of h in these processes where h+ = 0 V h, n: = h-h+ . To conserve the space, write ~i ' hi, ht for Yi-l , h(Y i- 1 ) , h±(Y i - 1 ) , respectively, i 2: 1. Thus, e.g., (7.2 .11) = n- 1 / 2
n
L ht{I(ci ~
X
+ n-l /2u/~i) - F(x + n-l /2u'~i)} '
i=l We also need the following processes:
T±(x;u,a)
(7.2.12)
:=
n- 1 / 2
n
L ht I (Ci ~
X
+ n-l /2u/~i + n-l /2all~ill) ,
i=l
m±(x; u, a) Z±
:=
:=
n- 1 / 2
n
L ht F (x + n-l /2u'~i + n-l /2all~ill)
i=l T± - m±, x E JR, u E JRP, a E JR.
Observe that if in Uh of (2.2.26) we take eni == Ci , h ni == h±(~i)' Oni == n-l /2{u'~i + all~ill} and Ani, 1 ~ i ~ n, as in (7.2.1) , we obtain
Uh(-) = Z±(,;u,a), for every u E JRP, a E JR. Similarly, if we take Oni == n- 1 / 2 u ' ~i and the rest of the quantities as above then
7. Autoregression
300
It t hus follows from two applications of (2.2.32) and t he t riangle inequ ality t hat for every u E ~P , a E ~, (7.2.13)
sup IZ±(x; u , a) - Z± (x ; u , 0)1 = op(l) ,
(7.2.14)
sup I W~ ( x ) - W ±(x)1
zE R
= op(l).
zER
Thus to prove (7.2.8), because of t he compactness of the ba ll {u E ~p; lIull b}, it suffices to show t hat for every e > 0 t here is a 6 > 0 such that for every Ilull b,
s
(7.2.15)
s
lim supP ( n
sup IIsIl9, lIs- ull:So,
IWs(x ) - Wu (x ) 1 > 4c)
< c.
zER
By th e definition of W ± and the triangle inequality , for x E ~, s , u E ~p , (7.2.16)
IWs(x ) - Wu(x) 1
<
IW:( x) - W~ (x »1
<
2 n 1/
+ IW; (x ) - W ,-;(x) l, IW!C x ) -
W~(x) 1
[ IWs± (x ) - W';=(x)1 + Iv; (x ) -
But
Iisil
s b, lI ull s b, lis-
(7.2.17)
n - l /2 u' ~i
u j]
v~(x) I ] ·
s 6 impl y t hat for all 1 SiS n ,
n -l /2611 ~ ill
-
< n- 1 / 2 s' 'oc ,. S n - l /2 u' ~i + n - l /2611 ~;/J. From (7.2.17), t he monotonicity of t he indicator funct ion and t he nonn egativity of h± , we obtain T± (x ; u , - 6) - T ± (x ; u , 0)
S
Ws± (x ) - W';= (x )
S T±(x ; u , 6) for all x E ~, (7.2.18)
Iisil s b, lis -
uj ]
T±(x ; u, 0) ,
s 6. Now cente r T ± appropriately to obtain
n 1 / 2 IW s±(x) - W';= (x )1
S IZ± (x ; u , 6) - Z±(x ; u , 0)1 +! Z ±(x; u , -6) - Z± (x;u,O)1 +Im± (x ; u , 0) - m ±(x; u , 0)1
+Im±( x; u , - 6) - m±(x; u , 0)1,
7.2. AUL
OfWh
r;
&
301
for all x E ~, Iisil ~ b, lis - ul] ~ 8. But, by (F1) , V lIuli ~ b, n
sup Im±(x; u, ±8)
(7.2.19)
- m±(x; u,0)1 ~ 811f11oon- 1 L Ilhieill,
x ER
i=l n
(7.2.20)
n 1 / 2I v';= (x ) - v~(x)1 ~
sup
811flloon- 1 L Ilhie&
I/s-ul/ :S<5,xER
i= l
= 6 and a = -8 and
From (7.2.19) , (7.2.18) , (7.2.13) applied with a the assumption (7.2.6) one concludes th at for every such that for each lIull ~ b, lim sup P (
sup Iis-ull :S<5 ,x
n
E
n 1 / 2IWs±(x) - W':;=(x)1
> 0 there is a 8 > 0
> E)
~ E/2.
From this , (7.2.20) , (7.2.16) , and (7.2.6) one now concludes (7.2.15) in a routine fashion. Finally, (7.2.9) follows from (7.2.8) and (7.2.7) by Taylor 'S expansion of F . 0 An application of (7.2.9) with h(Yi-d = gj(Yi-d , 1 ~ j ~ p , and t he rest of the quantities as in (7.2.1) readily yields the AUL property of T j - pro cesses, 1 ~ j ~ p of (7.1.2) . This tog eth er with int egration by parts yields the following expa nsion of th e M-scores gj , 1 ~ j ~ p of (7.1.3). Corollary 7.2.1 In addition to (7.1.1), (7.2.5) and (7.2.7), assume that the following conditions hold. n
(n- 1 Lg](Yi-d)
(7.2.21)(a)
1/ 2
= (X j , (Xj , 1
~
j
~ p, posit iv e r.v.' s,
i= l
(b) n- 1 / 2 max
l :S, :Sn
Ilg(Yi-dll
= op(l) .
(7 .2.22) 'l/J is nondecreasing , bounded and
J
'l/JdF = O.
n
(7 .2.23) n-
1
L Ilgj(Yi-d Yi-111 = Op(l) , 1 ~ j ~ p. i= l
Then V 0 < k, b <
00 ,
sup In1 / 2 [gj (p + n - 1 / 2u) - gj(p)] -u'n -
1
t
gj(Y i- 1)Y i- 1
J
fd'l/JI
= op(l)
i =l
where the supremum is taken over all 'l/J with j~~
11'l/Jlltv ~ k < 00 , Ilull
~
b, 1
~
0
302
7. Autoregression
We remark here that (7.2.21) is a priori satisfied whenever 9 is bounded. Upon choosing h == 1 in (7.2.9) one obtains an analogous result for the ordinary residual empirical process Fn(x, t) . Because of its importance and for an easy reference later on we state it as a separate result. Observe that in the following corollary the assumption (7.2.24) is nothing but the assumption (7.2.6) of Theorem 7.2.1 with h == 1. Corollary 7.2.2 Suppose that (7.1.1), {7.2.5), and (7.2.7) hold. In addition, assume the following. n
(7.2.24)
n-
1
I: IIYi-dl = Op(l) . i=1
Then, for every 0
< b < 00,
(7.2.25) n
-u'n-
1
I:Yi-I!(x)1
= op(l),
i=1
where the supremum is taken over x E IR, Ilull
~
o
b.
Observe that none of the above results require that the process {Xi} be stationary or any of its moments be finite . 0 Remark 7.2.2 On the assumptions (7.2.3) - (7.2.6) . If Yo and {cd are so chosen as to make {X,'] stationary, ergodic and if
then all of these assumptions are a priori satisfied. See, e.g., Anderson (1971; p. 203) . In particular, they hold for any bounded h, including the one corresponding to the Huber function h(x) == IxlI(lxl ~ k) + sign(x)I(lxl > k), k > O. Observe that (7.2 .5) is weaker than requiring the finiteness of the second moment. To see this , consider, for example, an AR(l) model where X o and C1, C2, . . . are independent r .v. 's and for some Ipl < 1,
Then,
Xi = piXO + I:pi-i Cj , j=l
v,
= Xi,
i 2: 1.
7.2. AUL
OfWh
&
r;
303
Thus, here (7.2.5) is implied by
I
max n-I/2Icil = :Si:Sn
(1).
0 P
But, this is equivalent to showing that x 2€n{1 - P(lcri > x)} -+ 0 as x -+ 00, which, in turn is equivalent to requiring that x 2 P(lcII > x) -+ 0 as x -+ 00. This last condition is weaker than requiring that Elcl12 < 00 . For example, let the right tail of the distribution of ICII be given as follows:
P(lcII >x)
< 2,
1,
x
1/(x
2€nx)
, x 2': 2.
Then, ElcII < 00 , EcI = 00 , yet x 2 P(lcll > x) -+ 0 as x -+ 00. A similar remark applies to (7.2.4) with respect to the square integrability of h(Yo) . 0
Remark 7.2.3 An analogue of (7.2.25) was first proved by Boldin (1982) requiring {Xd to be stationary, ECI = 0, E(cI) < 00 and a uniformly bounded second derivative of F . The Corollary 7.2.2 is an improvement of Boldin 's result in the sense that F needs to be smooth only up to the first derivative and the r. v.'s need not have finite second moment. Again, if Yo and {cd are so chosen that the Ergodic Theorem is applicable and E(Y o) = 0, then the coefficient n- l L~=l Y i - l of the linear term in (7.2.25) will converge to 0 , a.s .. Thus (7.2.25) becomes (7.2.26)
sup Inl / 2{Fn(x, p + n lIull9
I 2 / u)
- Fn(x , p)}1
= op(l) .
In particular, this implies that if jJ is an estimator of p such th at
then
Consequently, the estimation of p has asymptotically negligible effect on the estimation of the error d.f. F. This is similar to the fact , observed in the previous chapter, that the estimation of the slope parameters in linear regression has asymptotically negligible effect on the estimation of the error d.f. as long as the design matrix is centered at the origin . 0
7. Autoregression
304
Serial Rank Residual Correlations. An important application of (7.2.25) occurs when proving the AUL property of the serial rank correlations of the residuals as functions of t . More precisely, let R it denote the rank of Xi - eY i- 1 among X j - ey j - 1 , 1 :S j :S n, 1 :S i :S n. Define Rit = 0 for i :S O. Residual rank correlations of lag i, for 1 :S j :S p, t E IRP, are defined as (7.2.27)
S'
5 j (t )
.-
n (n + 1)) ( (n + 1)) 12 n(n2 -1) .L (Ri- jt 2 Rit 2 ' '=J+1 (51,"' , 5 p ) .
Simple algebra shows that
where an is a nonrandom sequence not depending on t, lanl
bnj(t)
:=
j)
In (n . L +L ,=n-J+1 ,=1
6(n+l) {n(n2 _
= 0(1), and
Rit ,
n
Lj(t)
:=
n- 3
L
Ri-jtRit, 1:S j :S p, t E IRP .
i=j+1 Observe that sup{lb nj (t )l; t E JRP} :S 48p/n, so that n 1 / 2 sup{lbnj(t)l; t E IRP} tends to zero , a .s. It thus suffices to prove the AUL of {L j } only, 1 :S j :S p. In order to state the AUL result we need to introduce
(7.2.28) :=
0,
U i j := Y i- j- 1F(E;)f(Ei-j) :=
0,
i :S i .
+ Yi-I!(E;)F(Ei-j) ,
i
> i, i :S j ,
Observe that {Zij} are bounded r .v.'s with EZij = J f2(X)dx for all i and j. Moreover, {Ed i.i.d . F imply that {Zij , j < i :S n} are stationary
r;
7.2. AUL OfWh &
305
and ergodic. By th e Ergodic Theorem ,
Zj -+ b(f ) :=
I
f 2(x )dx , a.s ., j = 1, '"
.v.
We are now read y to state and prove Theorem 7.2.2 Assume that (7.1.1), (7.2.5) , (7.2. 7) and (7.2.24) hold. Then for every 0 < b < 00 and for every 1 ~ j ~ p, (7.2.29)
sup In 1 / 2[L j (p + n - 1 / 2 u) - L j (p )] - u'[b(f )Y n - Uj]1 lIull:Sb
-
= op(l) .
cn
If (7.2.5) and (7.2.24) are strengthened to requiring E(lIY o Il 2 + < 00 and {X;} stationary and ergodic then Y nand U j may be replaced by their respective expectations in (7.2.29). Proof. Fix a j in 1 ~ j ~ p . For the sake of simplicity of t he exposition, write L (u ), L (O ) for Lj (p + n- 1 / 2u ), Lj (p ): respectively. Apply similar convent ion to other functions of u . Also write Ci u for e, - n- 1 / 2 u' Y i _ 1 and FnO for Fn (·, p ). With t hese convent ions R iu is now t he rank of X i - (p + n - 1 / 2 u )' Y i _ 1 = Ci u . In ot her words, R iu == n Fn(ciU', u ) and
L (u ) = n - 1
n
L
Fn(ci- ju , U)Fn(ciu, u ), u E JRP .
i= j + l
The pr oof is based on t he lineari ty properti es of Fn ( , u ) as given in (7.2.25) of Corollar y 7.2.2 above. In fact if we let
B n(x , u )
:=
Fn(x , u ) - Fn(x ) - n - 1 / 2 u'Y nf( x) ,
X
E JR
t hen (7.2.25) is equivalent to
supn 1 / 2 IBn(x, u )1
= op(l) .
All supremums, unl ess specified ot herwise, in the proof are over x E JR, 1 lIuli ~ b. Rewrite
i ~ nand / or
n 1 / 2(L(u) - L(O)) n
=
n- 1 / 2
L
{Fn(ci-ju , U)Fn(ciu, u ) - Fn(ci- j )Fn(ci )}
i =j + l n
=
1 2
n- /
L
[{Bn(Ci- jU , u ) + Fn(ci- ju ) + n -
1 2 / u' Y nf
(ci- ju )}
i = j+ l
·{B n(C iu, u ) + Fn(ciu ) + n - 1 / 2 u' Y nf( ciu)} -Fn(ci- j )Fn(ci)] .
~
7. Autoregression
306 Hence , from (7.2.5), (7.2.20) and (7.2.24), (7.2 .30)
n 1 / 2(L(u) - L(O)) n
L
=n- 1/ 2
[Fn(ci-ju)Fn(ciu) - Fn(ci)Fn(ci-j)]
i=j+1 n
+ n- 1
L
[Fn(ci-ju)!(ciu)
i=j+1
+ Fn(ciu)f(ci-ju)](U'Y n) + u p (l ), where, now, u p (l ) is a sequence of stochastic processes converging to zero uniformly, in probability, over the set {u E jRP ; lIull :::; b). Now recall that (7.2.7) and the asymptotic uniform continuity of the standard empirical process based on LLd. r.v. 's imply that sup
n 1/ 2 1[Fn(x ) - F( x)] - Fn(y) - F(y)J1 = op(l)
Ix- yl:S o
when first n -+ 00 and th en J -+ O. Hence from (7.2.5) and the fact that sup ICiu - c;/ :::; bn- 1 / 2 max IIY i- 111, i ,u
1
one readily obtains supn l / 2 1[Fn(Ciu) - F(ciu)]- [Fn(Ci) - F(ci)]1 = op(l) . I ,U
From this and (7.2.7) we obtain (7.2.31)
supn 1 / 2 IFn(ciu) - Fn(ci)
+ n- 1 / 2 u' Yi_l! (Ci)1=
op(l) .
i ,u
From (7.2.30), (7.2.31), the uniform continuity of Cantelli lemma, one obtains (7.2.32)
!
and F, the Glivenko-
n 1 / 2(L(u) - L(O))
n- 1
n
L
[F(ci-j)!(ci)
+ F(c i)!(ci_j)](U'Y n)
i=j+1 -u'n- 1
n
L i=j+1
{Yi-j-l!(ci -j)F(Ci) + Y i-l!(ci)F(Ci-j)}
7.2. AUL OfWh &
r;
307
In concluding (7.2 .32) we also used the fact that by (7.2.5) and (7.2.24) , n
sup In- 3 / 2
L
!U'Yi -j' u'Yi -
1
1
i=j+1
u
n
:S bn - 1/ 2 max , IlYi_ 1I1 n - 1
L
II Y i - j ll =
op(l) .
i=j+1
Now (7.2.29) readily follows from (7.2.32) and the notation introduced just befor e th e statement of the theorem. The rest is obvious. 0 Remark 7.2.4 Autoregressive m oving average models. Boldin (1989) and Kreiss (1991) give an analogue of (7.2.26) for a moving average model of
order q and an autoregressive moving average mod el of order (p, q) (ARMA (p, q)) , respect ively, when t he error d.f. F has zero mean , finit e second moment and bo unded second derivative. Here we shall illust rate as to how Theorem 2.2.3 can be used to yield th e same result under weaker conditions on F. For th e sake of clarity, the det ails are carried out for an ARMA (l , 1) mod el only. Let CO, C1, C2, '" , be i.i.d . F r.v.'s and X o be a r .v. ind epend ent of {ci,i 2: I }. Consider th e process given by the relation Xi = pXi- 1 + e, + (3ci - 1, i 2: 1,
(7.2.33) where
Ipl < 1, 1(3 1< 1.
(7.2.34)
One can rewrite t his mod el as
Ci
Xl - (pX o + (3co),
i = 1,
i- I
Xi -
L (- (3 )j (p + (3 )X i- j - 1
j=l +( _ (3) i- 1(pX o + (3co), i 2: 2.
Let 0 := (s,t) ' denote a point in th e open square (- 1, 1)2 and 0 0 := (p, (3)' denote t he true param et er value. Assume that O's are restricted to t he following sequen ce of neighbourhoods: For abE (0,00) , (7.2.35) Let {Ei , i 2: I } stand for the residu als {ci, i 2: I } of (7.2.34) afte r p and (3 ar e replaced by s and t , resp ect ively, in (7.2.34) . Let FnC, O) denote th e empirical process of {Ei, 1 :S i :S n} . This empirical can be rewrit ten as n
(7.2.36)
Fn(x, O)
=n-
1
L i=l
I(ci :S x
+ 6ni ), x
E JR,
7. Autoregression
308 where
(7 .2.37)
bni
.-
(s - p)Xo + (t - b)co, i = 1, i-2 -t)i (s + t) - (_ (3)i (p + (3 )]X i- j - 1
L [( j= l
+ ( _t)i -1 (sXo + tco) - (_.8)i-1] (pX o + .8co), i 2: 2.
=
<Sni t
+ 8n i 2 ,
say, i
2
2.
From (7.2.37), it follows t hat for every () E (-1 ,1) 2 sa t isfying (7.2.35) ,
< bn- 1 / 2(!Xol + Ico!), < 2bn- 1/ 2 max IXil(l -
Ibn11 max Ibnill 2:Si:S n
l :Si:S n
br.,- 1/2 -
(3)- 1
X{l + (1-1 (3!)-1}, max Ibni21 2:Si:Sn
<
2bn- 1/ 2 (1 - bn - 1/ 2
(3 )-l (IX ol+ Icol).
-
Con sequently, if n - 1 / 2 max1:Si:S n IXd = op(l) , then the {b n ;} of (7.2.37) would satisfy (2.2.28) for every () E (-1 ,1)2 . But by (7.2 .33),
(7.2.38)
Xi
= pXo + (3co + c1, i = 1, i-2
= pi- 1(pX o + (3co) +
L pi (p + (3)ci-j-1 + Ci , i 2: 2. j= l
Therefore, (2.2.28) will hold for the above { bn ;} if
(7.2.39)
n- 1/ 2 max Icd = i:Si:Sn
0 p
(1) .
We now verify (2.2.30) for the above {bn;} and with h ni == 1. That is we must show that n - 1 / 2 2:7=1 Ibnil = Op(l) . We proceed as follows. Let u = n 1/2 (s - p), v = n 1/ 2 (t - (3) and Zo := IXo l + Icol. By (7.2.35) ,
7.2. AUL
lui + Ivl
&
OfWh
r:
309
S; b. From (7.2.37), n
n- 1 / 2
L Ibnil i=1
< n- 1bZo n .i- 2 + n- 1/ 2 L I L[( -t)j (s + t) - (- [3 )j (p + [3 )]X i- j- 1! i=2 j=O n
(7.2.40)
=
+ n- 1/ 2 L I(_t) i-l (sX o + tEo ) - (_[3)i- l ZoJ, i=2 An! + A n 2 + An 3 , say.
Clearly, IAn1
1
A n2
=
= 0(1), a .s. Rewrite n i-2 n -l /2LIL[(-t)j{(u+v)n-l /2+p+ [3} i=2 j=O
- (- [3 )j (p + [3)] X i- j- 1! n
i-2
< 2bn - 1 L L Itj jIXi - j -
11
i=2 j=O n i-2 + 2n- 1/ 2 L I L [(-t )j - (- [3) j X i- j - 11 i=2 j=O 2bA n21 + 2A n22, say .
(7.2.41)
By a change of vari ables and an int erchan ge of summations one obt ains n
A n21 S; n- 1 L IX i l(l - lt l)- l . i=1
(7.2.42)
Next, use the expansion aj numbers a , c to obtain
-
cj = (a - c) 2:{:~ aj -
1
-
k
ck for an y real
n i-2 j-l A n22 S; bn- 1 L L L Itl j-l- kl[3l kIXi_ j_ll · i=3 j=1 k=O
Again , use change of vari abl es and interchange of summa tions repeatedly and th e fact that a bove by
1[31 viti < 1, to conclude that t his upp er bound is bounded n
b(l
-1[31)-1 [(1 -
Itl)- 1 + 1]n- 1 L IXiI· i=1
7. Autoregression
310
This, (7.2.40) , (7.2.41) and (7.2.42) together with (7.2.35) imply that n
A n 2 S 2bn- 1
(7.2.43)
L
IXd[(1 - bn- 1/ 2 -1,81)-1
;=1
x {1 + (1 - /,8I)-I}
+ (1 - /,81)-1]
Finally, similar calculations show that (7.2.44) From (7.2.41) , (7.2.44) and (7.2.45) it thus follows that if n
n-
1
L
IXd = Op( I) ,
;=1
th en th e {bn ;} of (7.2.37) will satisfy (2.2.30) with b-« == 1. But in view of (7.2.38) and the assumption th at Ipl V 1,81< 1, it readily follows that if n
n- I L ied =
(7.2.45)
Op( l) ,
;= 1
t hen (2.2.30) with hn ; == 1 holds for th e {bn ;} of (7.2.37). We have thus proved th e following: If (7.2.33) hold s with t he erro r d.f, F satisfying (F I) , (F 2), (7.2.39) and (7.2.45) , th en "1 0 E (- 1, 1)2, n
sup x
In- 1 / 2
1) I(E"; S x)
- Li e,
S x ) - F (x + bnd + F (x)}1
= op(l) .
;= 1
Now use an argument like th e one used in th e proof of Th eorem 7.2.1 to conclude th e following Corollary 7. 2. 3 In addition to {7. 2. 33}, assume that the error d.f. F satisfies (F I), (F2 ) , {7.2.39} and {7.2.45}. Then , "10 < b <
00 ,
n
sup In 1 / 2 [Fn (x , 0 ) - Fn(:r , ( 0 ) ]
-
n- 1 / 2
L bn;f(x) I =
Op(l) ,
; =1
where the supremum is tak en over x E IR and 0 , 0 0 satisf ying {7.2.35}. If {7.2.45} is strengthened to assuming that Elel < 00 , then sup n
- 1/ 2
~ s ~ un ; -
n
1/ 2
[(s(1 -_ p) p) +
(t - ,8)] [_ ( ) (1 + ,8) J1. - Op I ,
1
where the supremum is taken over s , t satisf ying {7.2.35} and J1. = E e.
0
7.3. GM- and R- estimators
311
Consequently, if EE: = 0 and (p, S) is an estimator of (p, (3) such that Iln - p, S- (3)11 = Op(l), then an analogue of (7.2.26) holds in the present case also under weaker conditions than those given by Boldin or Kreiss . The details for proving an analogue of Corollary 7.2.3 for a general ARMA (p, q) model are similar but some what more complicated than those given above . Clearly, an analogue of Theorem 7.2.2 also remains trues here. Ferretti, Klemansky and Yohai (1991) used this result to study the asymptotic distribution of rank estimators obtained by solving Yule-Walker type equations 0 using rank auto-covariances of the residuals in ARMA models . 1 2 / (p
7.3
GM- and GR- Estimators
In this section we shall discuss the asymptotic distributions of GM- and Restimators of p . In addition , some consistent estimators of the functional QU) will be also constructed. We begin with
7.3.1
GM-estimators
Here we shall state the asymptotic normality of the GM - estimators. Let PM stand for a solution of (7.1.3) such that Iln 1 / 2 (P M - p)1I = Op(l). That such an estimator PM exists can be seen by an argument similar to the one given in Huber (1981) in connection with the linear regression model. To state the asymptotic normality of PM we need to introduce some more notation. Let
x .-
(7.3.1)
91 (yo)
9dYd
G [
91
(;n -d
312
Ch 7. Autoregression n
:=
Sn
g'X = Z)91 (Y i-d Y
:-1.. , ,9p(Yi - p)Y :- 1)' ·
i= l
Proposition 7.3.1 In addition to (7.1.1) , (7.2.5), (7.2.7) , (7.2.21), (7.2.22) and (7.2.23) assume that (7.3.2)
n-1Sn = S + op(l) , for som e p x p non -random positive definition matrix S .
Th en If, in additi on, we assumes that (7.3.3)
n - 1G ' G
= G* + op(l) ,
G* a p x p non-ra ndom posit ive
definite m atrix, then (7.3.4)
n 1 / 2(PM - p) ---+ d N (O,v('lj; , F )J ),
J
:= S-lG*S-l ,
v ('lj;, F) := (J fd'lj;)- 2(J 'lj;2 dF ).
Proof. Follows from Corollar y 7.2.1, th e Cram er - Wold device and Lemma 9.1.3 in t he App endix applied to n 1/ 2 g (p ). 0 Again , if Yo and {cd are so chosen as to make {Xd stationa ry, ergodic and E(IIYol/2 + ci) < 00 th en (7.3.2) and (7.3.3) are a priori satisfied. See, e.g., Anderson (1971 ; p. 203). N ote : For a mor e genera l class of GM-estimators see Bustos (1982) where a result analogous to t he above corollary for smoot h score func tions 'lj; is obt ained.
7.3.2
GR-estimators
This section will discuss some exte nsions of th e analogues of J aeckel's (1972) R- est imators and general ize R- estima to rs of p and t heir asymptot ic distribution s. Let ~i(t) := X i - t'Y i- 1 , 1 ~ i ~ n . Recall th at R it is th e rank of ~i(t) among {~k(t) , 1 ~ k ~ n }, for 1 ~ i ~ n . Also, Rit == 0 for i ~ O. Let cp be a nond ecreasing score functi on from [0,1] to th e real line such t ha t n
(7.3.5)
L cp(ij(n + 1)) = O. i= 1
313
7.3.2. GR-estimators
For example, if cp(t) = -cp(l-t) for all t E [0,1], i.e., if cp is skew symmetric, then it satisfies (7.3.5) . Let g := (91, · ·· ,92)' be a p-vector of measurable functions from ffi.P to ffi. and define n
Sg(t)
.-
n- 1 Lg(Y i- 1)cp(Rit!(n + 1), i=l
n
Zg(u, t)
.-
n- 1 Lg(Yi-dI(Rit :S nu), i=l
Zg(u, t)
Z(u , t) - gu,
tEffi.P, O:S u :S 1.
Write S, Z for Sg, Zg whenever g(y) == y . The class of rank statistics Sg, one for each cp, is an analogue of the class of rank statistics T d of (3.1.2) discussed in Chapter 3 in connection with linear regression models. A test of the hypothesis P = Po may be based on a suitably standardized S(Po), the large values of the statistic being significant . We now give an analogue of the Jaeckel estimator for the AR(p) model. Accordingly, for atE ffi.P , let ~( i) (t) denote the the i t h largest residual among {~k(t) , 1 :S k:S n} , 1 :S i :S n, and set n
3(t)
L
n
cp(i/(n
+ 1))~(i)(t) == L cp(Rit!(n + 1))~i(t) .
i=l
i=l
Then Jaeckel 's estimator PJ is defined by the relation
PJ
= argmin{3(t); t E ffi.P} .
Jaeckel 's argument about the existence of an analogue of PJ in the context of linear regression model can be adapted to the present situation. This follows from the following two lemmas , the first of which is of a general interest. Lemma 7.3.1 Let d 1, d 2, . . . ,dn, V1, V2 , . .. , Vn be real numbers such that not all {d;} are the same and no two {Vi} are the same. Let riu denote the rank of Vi -udi among {Vj -ud j ; 1:S j:S n} ,u E R Let {bn(i); 1:S i:S n} be a set of real numbers that are nondecreasing in i. Let n
T(u) := L
dibn(riu) , u E ffi..
i=l
Then, T (u) is a nondecreasing step function in all those u E ffi. for which there are no ties among {Vj - udj; 1 :S j :S n} .
Ch 7. Autoregression
314 Proof. See Theorem II .7E, p. 35 of Hajek (1969).
0
Lemma 7.3.2 Assume that the model (7.1.1) holds with Yb ,Xl,X2 , · · · X n having a continuous joint distribution . Then the following hold.
,
(a) For each realization (Yb, Xl , .. . , X n ), the assumption (7.3.5) implies that J(t) is nonnegative, continuous and convex function of t with its a. e. derivative equal to -nS( t). (b) If the realization (Yb , Xl, X 2 , ·
· · , X n) is sucli that the rank of Xc is p then, for every 0 < b < 00, the set {t E jRP; J (t) ::; b} is bounded, where Xc is the X of (7.3.1), centered at the origin.
Proof. (a) For any x' = (Xl, X 2 , · · · , X n ) E jRn, let x(l) ::; x (2) ::; . .. ::; x( n) denote the ordered Xl, X2 , . . . , X n . Let II := {rr = (11"1 ,11"2, · . . , 11" n)'; rr a permutation of the integers 1,2 ,· ·· , n} , bn(i) :=
n
L bn(i)x(i),
D(x)
Drr(x) :=
;=1
k
x E JRTl,
; =1
min{l::; j ::; n ; bn(j)
.-
L bn(i)x,,;,
> O}.
Observe that J(t) = D(e(t)) , where e(t) := (6(t), · ·· Now, (7.3.5) and ip nondecreasing implies that
, ~n ( t ) ) ' .
n
D(x)
=
L bn(i)(x(i) -
x(k))
;=1
k-1
n
;=1
;=k
L bn(i)(x(i) - x(k)) + L bn(i)(x(i) -
> 0,
'tj
x(k))
x E jRn ,
because each summand is nonnegative. This proves that J(t) 2: 0, t E By Theorem 368 of hardy, Littlewood and Polya (1952), D(x) = max D,,(x),
"En
Therefore, (7.3.6)
'tj
t E W,
J(t)
'tj
x E jRn .
jR.
315
7.3.2. GR-estimators
This shows that J(t) is a maximal element of a finite number of continuous and convex functions, which itself is continuous and convex . The statement about a.e. differential being -nS(t) is obvious. This completes the proof of (a) . (b) Without the loss of generality assume b > J(D) . Write atE IRP as t = se, s E IR, e E IRP , Ilell = 1. Let d; == e'Y i - 1 . The assumptions about J imply that not all {d i } are equal. Rewrite n
J(t)
J(se) =
L bn(i)(X -
sd)(i)
i=l
n
L bn(ris)(Xi -
sdi)
i=l
where now ris is the rank of Xi - sd; among {Xj - sd j; 1 ~ j ~ n}. From (7.3.6) it follows that J(se) is linear and convex in s , for every
e E IRP , Ilell
= 1. Its a.e. derivative w.r.t. s is -
2:7=1 dibn(ris),
which by Lemma 7.3.1 and because of the assumed continuity, is nondecreasing in u and eventually positive. Hence J(se) will eventually exceed b, for every
e E IRP : lIell = 1. Thus, there exists a Se such that J(see) > b. Since J is continuous, there is an open set De of unit vectors v, containing e such that J(sev) > b. Since b > J(O) , and J is convex, J(sv) > b, 'tis ;::: Se and 'tI v E De. Now, for each unit vector e , there is an open set De covering it. Since the unit sphere is compact, a finite number of these sets covers it . Let m be the maximum of the corresponding finite set of Se. Then for all s ;::: m, for all unit vectors u, J (sv) > b. This proves the claim (b), hence the lemma.
o Note : Lemma 7.3.2 and its proof is an adaptation of Theorems 1 and 2 of Jaeckel (1972) to the present case.
0
From the above lemma it follows that if the r.v .'s Y O,X1,X2 , ' " ,Xn are continuous and the matrix n- 1 2:7=1 (Yi - 1 - Y)(Yi - 1 - V)' is a.s. positive definite, then the rank of Xc is a .s. p and the set {t E IRP ; J(t) ~ b} is a.s. bounded for every 0 ~ b < 00 . Thus a minimizer PJ of J exists, a.s ., and has the property that it makes IISII small. As is shown in Jaeckel (1972) in connection with the linear regression model , it follows from the AUL result given in Theorem 7.3.1 below that PJ and PR are asymptotically equivalent. Note that the score function e.p need not satisfy (7.3.5) in this theorem.
Ch 7. Autoregression
316
Unlike in the regression model (1.1.1), these estimators are not robust against outliers in the errors because the weights in the scores 8 are now unbounded functions of the errors. Akin to GM- estimators, we thus define GR- estimators as (7.3.7) Strictly speaking these estimators are not determined only by the residual ranks, as here the weights in 8 g involve the observations also. But we borrow this terminology from linear regression setup. For the convenience of the statement of the assumptions and results, from now onwards we shall assume that the observed time series comes from the following model. (7.3.8)
Xi
= PIXi-l + P2Xi-2 + .. . + PpXi- p + Ci, i = 0, ±1 , ±2,· ·· , P E ~p,
with all roots of the equation (7.3.9) inside the interval (-1 ,1) , where {ci, i = 0, ±1, ±2 ,· ··} are i.i.d. F r.v.'s , with (7.3.10)
Ec = 0,
Ec 2 <
00.
It is well known that such a time series admits the representation
(7.3.11)
Xi
=L
()i-kck,
i
= 0, ±1 , ±2, · · · ,
in L 2 and a.s. ,
k5,i
where the constants {()j,j 2: O} are such that ()o = 1, 'L: j 2: 0 I()jl < 00 , and where the unspecified lower limit on the index of summation is -00 . See, e.g., Anderson (1971) and Brockwell and Davis (1987, pp . 76-86). Thus {X;} is stationary, ergodic and EIIYol12 < 00 . Hence (7.2.3) implies (7.2.6). Moreover, the stationarity of {Y i - 1 } and EIIY ol12 < 00 imply that V 1] > 0, (7.3.12) n
:S {1]n 1 / 2 } -
2
L i=1
EIIY i _ 1 112 I(IIY i - I112:
1 2
1]n / )
7.3.2. GR-estimators
317
Thus (7.2.5) holds . By the same reason, the square integrability of g(Y o) will imply
n
n-
1
L Ig'(Y
i - d Yi - 1 1 =
Op(l) ,
i= l
n
rg
:=
1
plim nn- L(g(Yi-d - g)(g(Yi-d - g)' exists, i=l
n
~g := plimnn- 1 L(g(Yi-d - g)(Y i -
Y)' exists.
1 -
i= l
These observations are frequently used in the proof of Theorem 7.3.1 below, without mentioning. Let n 1
Zg(u) := n - L(g(Yi-d - g)[I(F(s i) :::; u) -
uJ,
i= l
0:::; u :::; 1,
1 1
8g
:= n- 1 t(g(Yi-d - g) [ep(F(Si» - 5], i =l
q(u) := j(F-1(u» , 0 :::; u:::; 1,
Q :=
5 =
ep(u )du,
J
j dsp .
With this preliminary ba ckground, we now state
Theorem 7.3.1 { AUL oj R-statistics} . Assume that {1.3.8}, {1.3.9} and {1. 3.1O} above hold. In add ition, assume that F satisfies (F1) , (F2) , Ellg(Yo)112 < 00 , and that ~g , r g of {1.3.13} are positiv e definite . Then, for every 0 < b < 00 , (7.3.14)
Iln1 / 2[Zg(u , p
sup
+ n- 1 / 2s)
- Zg(u)]
O~u ~I ,lIsll ~b
-~gsq(u)11
(7.3.15)
sup
IIn1 / 2{S(p
+
n- 1 / 2s)
'P E C , l l s lI~ b
-
= op(l) .
8g } + ~gS QII =
op(1).
Proof. Integration by parts shows that Sg(t) = g5
-1
1
Zg(u(n
+ l) /n , t) ep(du).
This and (7.3.14) implies (7.3.15) t rivially. The proof of the claim (7.3.14) is similar to that of Theorem 3.2.2. One uses Theorem 7.2.1 wherever (2.3.25) is used in th ere. We give det ails in brief. Drop the suffix 9 in th ese details.
Ch 7. Autoregression
318
r;
Recall the definition of T j , 1 ::; j ::; p , and T from (7.1.2) , and that of from (7.1.5). Let, for 0 ::; u ::; 1, s , t E jRP , n
T(u , t)
:=
T(F- 1(u), t) = n- 1 Lg(Yi-dI(~i(t) ::; F- 1(u» i=l n
F;;,,!(u)
.-
Z(u,s)
.-
n- 1 Lg(Yi-dI(ci ::; F- 1(u) + (t - p)'Yi-d i=1 inf{x; Fn(x , p + n- 1/ 2s) 2': u} , n
n- 1 Lg(Y i- 1)I(Ci::; F;;;}(u) +n- 1/ 2s/Y i_d i=l T(FF~l(U), p + n- 1/ 2s) .
Note also th at sup
IIZg(u , p
+ n- 1/ 2s) -
Z(u, s)11
o:::;u ::;l.lIsll:Sb
::; 2maxn -
1 2 / I1g (Y i _ 1
)11= Op(l) .
1
Thus it suffices to prove (7.3.14) with Z repl aced by Z . But this follows from th e fact Z(u ,s) T(FF~1(1l) ,p+n-1 /2s) and Theorem 7.2.1 applied p times, jlh time to h(Yi-d = 9j(Y i - d and from Corollary 7.2.2, in a 0 fashion similar to that in the proof of Theorem 3.2.1. The next result gives the asymptotic normality of the est imat or PRg '
=
Theorem 7.3 .2 In addition to the assumptions of Theorem 7.3.1, suppose the following hold : VeE JRP , lIell = 1, 1 ::; i ::; n , (7.3.16)
> 0, Y)'e < O.
E ither e'(g(Yi-d - g)(Y i - 1 - Y)'e
Or e'(g(Yi-d - g)(Y i - 1
-
Th en, n
1
/
2(PRg
2 17
- p)
~ N( 0, Q~ ~glrg~gl) .
Proof. From the methods in Sections 5.4 and 5.5, Theorem 7.3.1 and (7.3.16) imply that n 1 / 2 11 (pg - p)11 = Op(l) and that n 1/ 2(PRg - p) = (Q~g) -ln1 / 2Sg(p)
+ op(l) .
Observ e tha t n 1 / 2 Sg (p) is a vector of square integrable mean zero mar1[ tingale arrays with nESgS = (J~rg , (J~ := Jo r.p(u) - ep]2 du . Thus, by
g
319
7.3.3. Estimation of QU )
t he routine Cramer-Wold device and by Lemm a 9.1.3 in the App endix, one readily obtains the claim of t he asym ptotic normality. 0 Remark 7.3.1 Write PR for PRg when g (y ) == y . Argue either as in the Section 3.4 or as in J aeckel (1972) to conclude that IIn 1 / 2 (PR - pJ)11 = 0 + p( l) . Consequent ly by T heorem 7.3.2,
Remark 7.3.2 Recently Mukherjee and Bai (2001) have extended the
AUL result (7.3.15) to any nondecreas ing square int egrable ip for t he case g (y ) == y , pr ovided the error d.f, F has the finite Fisher informati on for location, th e condition (a) of Theorem 3.2.3. Their proof uses the cont iguity argument, approximating t he squa re integrable
7.3.3
Estimation of QU)
:=
J j dcp(F )
As is evident from T heorem 7.3.1, t he ra nk ana lysis of an A R(p) mod el via th e abov e GR- estimators will need a consistent estimator of the functional Q. In this subsect ion we give two classes of consist ent estimators of t his funct iona l in t he A R(p) mod el (7.3.8) and (7.3.9). One class of est imators is obtained by replacing f and F in Q by a kern el density estimator and the empirical d.f. bas ed on the estimated residuals, respect ively. This is analogous to th e class of estimators discussed in T heorem 4.5.3. The ot her class is an ana logue of t he class of estimators discussed in T heorem 4.5.1 in connection wit h t he linear regression set up . Accord ingly, let p be an est imator of p , K be a probability density on JR, h n be a sequence of positive numbers, h n --t 0 and define, for x E JR, n
e, .-
X i - P'Y i- 1, 1 :S; i :S; n ;
Fn( x ) := n -1L1(Ei :S; X), i =l
n
(n h n )- l L
_
n
K( x ~ Ci), f n (x ) := (n h n )- l L
~1
~1
n
Finally, let Q n :=
f
lnd
Recall t he definitio n of C from (3.2.1).
K ( x ~ Ci ). n
320
7. Autoregression
Theorem 7.3 .3 In addition to {7.3.8}-{7.3.10}, assume th at (Fl ) , (F2 ) an d th e follo wing con ditions hold.
> 0; h n -+ 0, n 1 / 2 h n -+ 00 .
(7.3.18)
hn
(7.3.19)
K is absolutely con ti nuous with it s a.e. derivative
tc (7.3.20)
satisf ying
II n 1 / 2 (p
J Ikl < 00 .
p)11 = Op(l ).
-
Th en,
sup IOn
(7.3.21)
-
Q(I)I = Op(l) .
Proof. The proof is similar to th at of Theorem 4.5.3 , so we shall be brief, indicating only one major difference. Unlike in th e linear regression set up, i.e., unlik e (4.5.9), here we have from Remark 7.2.3, (7.3.22)
sup .n 1/ 2 1F ,,(x ) - Fn(x )1 -_ op(l) . x
where F,,(;r ) F,,(x , p ). In ot her word s t he linear ity term involving n 1/ 2 (p - p ) is not pr esent in the app roximation of F". Proceedin g as in the proof of Theor em 4.5.3, (7.3.22) will yield
II]" - f"lloo <
(n
1 2 / h ,, ) - 1
op
((n
1 2 /
'IIn
1 2[ / Fn -
h,,)- I )
F"Jlloo .
! Ik l
= op(l) .
Compare t his with (4.5.16) where th e ana logous t erm is of the order Op((n 1/2 hn)- I ) instead of op((n 1/2 h,,) -1 ). The rest of t he pro of is exac t ly t he same as th ere with t he proviso t ha t one uses (7.3.22) inst ead of (4.5.9), whenever needed . 0 The reader may wish to modify th e above pr oof to see th at 0" cont inues to be consiste nt for Q even when E £ =I 0, so that th e term tha t is linear in n 1 / 2 (p - p ) is now pr esent in th e expa nsion of F" . We shall now describ e an analogue of Q~ of (4.5.5). The motivation is th e same as in Section 4.5, so we sha ll be bri ef on that also. Accor din gly, let p(y) := ![Fn (Y + x ) - F,,(-y
+ x) )dcp(F" (x)) ,
y
2: O.
Observe t hat p is an est imat or of t he d.f, of the abso lute difference 1£ - 1]1, where e and 1] are independent r.v.'s wit h resp ectiv e d.L's F and cp(F) . As in
321
7.4. Minimum distance estimation
Section 4.5, one can use the following representation for the computational purposes. For y ~ 0, n
p(y)
n
= n- 1 L[cp(j In) -
cp(j - l)ln] L /(Is(i) i=1
j=1
S(j) I
~ y) ,
where {S(i)} are the ordered residuals {s;} from the smallest to the largest. Now let i~ denote an o-th percentile of the d.f. p(y) and define
The consistency of these estimators may be proved using the method of the proof of Theorem 4.5.1 and the results given in Corollary 7.2.1. The discussion about the choice of a etc . that appears in Remark 4.5.1 is also pertinent here . Another class of estimators is obtained by modifying On by replacing Fn by the estimator F n(.r) = J In(Y)/( -00 < y ~ x)dy . The consistency of these estimators can be also proved by the help of Corollary 7.2.1. D
7.4
Minimum Distance Estimation
In this section we shall discuss two classes of m .d. estimators. They are the analogues of the classes of estimators defined in the linear regressive setup at (5.2.8) and (5.2.15). To be precise, consider the autoregressive model (7.1.1) and define, for aGE 'D/(lR), and atE lRP , (7.4.1)
Kg(t)
= LP j=1
J[n- L 1
n
gj(Yi-t}{X i
x
~ + t'Yi-t} ,
i=1
r
-F(x)} dG(x) , Kt(t) =
i: J[n- t 12 /
j=1
i=1
gj(Y i- 1){/(Xi
~ x+ t'Yi-t}
-/(-Xi < x-t'Yi-d}rdG(x). In the case the error d.f. F is known, define a class of m.d. estimators of p to be (7.4.2)
Pg := argmin {Kg(t) ;t E JRP} .
322
7. Autoregression
In the case F is unknown but symmetric around 0, define a class of m.d. estimators of p to be (7.4.3)
Pg := argmin {Kt(t) ; t E Il~.p} .
Note that the role played by the vectors
is similar to that of the vectors {d n i ; 1 :S i :S n} of Chapter 5. To put it in matrices, the precise analogue of D is the matrix n- 1 / 2 g , where g is as in (7.3 .1). The existence of these estimators has been discussed in Dhar (1991a) for p = 1 and in Dhar (1991c) for p 2: 1. For p = 1, these results are relatively easy to state and prove . We give an existence result for the estimator defined at (7.4.3) in the case p = 1. Lemma 7.4.1 In addition to (7.1.1) with p = 1, assume that (7.4.4)
Either
xg(x)
(7.4.5)
Or
xg(x)
> 0, < 0,
\i x E JR, \i x E JR.
Then, a minimizer of Kt exists if either G(JR)
= 00
or G(JR)
<
00
and
g(O) = 0. The proof of this lemma is precisely similar to that of Lemma 5.3.1. The discussion about the computation of their analogues that appears in Section 5.3 is also relevan t here with appropriate modifications. Thus, for example, if G is continuous and symmetric around 0, i.e., satisfies (5.3.10), then , analogous to (5.3.12), n
p
n
L L Lgj(Yi- 1)gj(Yk-1) j=1 i=1 k=1 X
{IG(X i
-
t'Yi-d - G( -Xk + t'Y k-dl
-IG(Xi - t'Yi-d - G(Xk - t'Yk-dl}.
°
If G is degenerate at errors, that
then one obtains, assuming the continuity of the
p
(7.4.6)
n
2
Kt(t) = L [Lgj(Yi-dsign(Xi - t'Yi-d] , W.p . 1. j=1 i=1
323
7.4. Minimum distance estimation
One has similar expressions for a genera l G. See (5.3.7) and (5.3.11). If g (x ) == x , G(x ) == x, Pg is m.l.e. of p if F is logistic, while p~ is an ana logue of t he Hodges - Lehmann est imator. Similarly, if g(x ) == x and G is degenerate at 0, t hen p~ is t he LAD est imator. We sha ll now focus on proving t heir asy mptotic norm ality. The approa ch is t he same as t hat of Secti on 5.4 and 5.5, i.e., we sha ll prove t hat these dispersions satisfy (5.4.AI) - (5.4.A5) by using t he techniques t hat are similar to t hose used in Secti on 5.5. Only the too ls ar e somewhat different becau se of t he dependence structure. To begin with we state t he ad ditiona l assumptions needed under which an asy mptotic uniform quadraticity result for a general dispersion of the above ty pe holds. Becau se here t he weights are random, we have to be somewha t careful if we do not wish to impose more th an necessar y moment condit ions on the un derlying ent ities. For t he same reason , unl ike t he linear regression set up where t he asymptotic uniform quad raticity of t he underlying dispersions was obtained in L 1 , we sha ll obtain t hese results in prob abi lity only. This is also reflected in t he formulation of t he following assumptions. (7.4.7)
(7.4.8)
(b)
'r:j
!
lIull :S
0
< E€ 2 < 00 .
b, a E JR,
Eh 2 (Y O )IF(x
+ n - 1 / 2 (u'Y o + aIlYolI)) - F( x) ldG(x)
= 0(1) There exists a constant 0 (7.4.9)
< k < 00,
3 'rf6
> 0, 'r:j Ilull :S
b,
limi~fP(! n- 1 [th ±(Yi-d{F(X +n- 1 / 2 u'Y i_ 1 + 8ni ) i= l
-F(x
+ n-
1 2 /
u'Y i _ 1
-
8ni) }
r
dG(x ) :S k82 ) = 1,
where 8ni := n-1/281 IYi_d ll and h± is as in t he proof of T heorem 7.2.1. For every lIull :S b, (7.4. 10)
!
n- 1 [ t h(Y i-d { F(x
+ n- 1 / 2 u'Yi _d - F (x )
i= l
- n - 1 / 2 u'Y
i_ 1
! (X)}
r
dG(x) = op(I) ,
7. Autoregression
324
and (5.5.44b) holds. Now, recall the definitions of W h , Vh, Wh , W±, T±, W±, Z±, m± from (7.1.6) , (7.2.2), (7.2.11) and (7.2.12). Let I . Ie denote the £2 - norm w.r .t. th e measure G. In the proofs below, we have adopted t he notation and conventions used in the proof of Theorem 7.2.1. Thus, e.g., 1 2u), Ei == Y i - 1 ; W u (-) , v u (-) stand for W h(-, p+n- / Vh( ·,p+n- 1 / 2u) , etc. Lemma 7.4.2 Suppose that the autoregression model {7.3.8} and {7.3.9} holds. Th en the following hold . Assumption {7.4.8} implies that V 0
(7.4. 11)
E ![Z±(x ; u, a) - Z±(x; u, OWdG( x)
= 0(1) , Ilull ::; b, a E lIt
Assumption {7.4.9} implies that V 0
(7.4 .12)
liminfP( n
sup
< b < 00 ,
< b < 00,
n1/2Iv; (x ,p+n-1 / 2v)
II v - ull ~o
-v;(x , p
+ n-1 /2u)l ~ ::; k<5 2 )
= 1,
Vllull ::; b.
where k and <5 are as in {7.4.9}.
(7.4.13)
Assumptions {7.4.7}. {7.4.9} and {7.4.11} im ply that
V 0 < b < 00 , lIull ::;
b,
sup ! [n 1 / 2{vh (x , p
+ n - 1! 2 u ) -
IJh ( X ,
p)}
lIull ~b
2
n
-u'n- 1 L
h(Y i - 1f( x)] dG( x) = op(I) .
i= 1
Proof. Let , for x, a E lR; u , Y E lRP , (7.4.14)
p(x , u , a; y )
:=
IF (x
+ n- 1/ 2 (u'y + ally l!) -F(x + n- 1 ! 2 u ' y )1
Now, observe that n 1/ 2[Z±(x; u , a ) - Z±(x ; u , 0)] is a sum of n r.v.'s whose i t h summand is conditionally centered, given ;:i -1 , and whose conditional variance, given F i - 1 , is E[{h±(E;)}2p (X, u ,a;Ei){l - p(x , u , a ;Ei)}], 1 ::;
Hence , by Fubini, the stationarity of
VlIull ::; b, L.H.S. (7.4.11) ::;
!
{Ed
i::; n.
and the fact that (h±)2 ::; h 2,
Eh 2 (y o )p(x , u , a ;Y o)dG (x )
= 0(1),
325
7.4. Minimum dist ance estima tion
by (7.4.8) applied with the given a an d with a = 0 and t he t rian gle inequality. To prove (7.4.12) , use t he nonnegati vity of h±, t he monotonicity of F and (7.2.17), to obtain t hat
Ilvll :S b, Ilv -
u l] :S 0 imply t ha t 'if lI u ll:S b,
n 1/ 2\lIt (X) - II~ (X)\
(7.4.15)
:S Im ±(x ;u , 0) - m±(x ; u , 0) - m±(x; u , - 0)\, 'if x EJR.. Thi s and (7.4.9) readil y imply(7 .4.12) as the r.v. in t he L.H.S. of (7.4.9) is precisely t he I.Ib of t he R.H .S. of (7.4.15) for each n 2: l. The proof of (7.4.13) is obtained from (7.4.7), (7.4.9) and (7.4.10) in the same way as that of (5.5.11) from (5.5.g), (5.5.h) and (5.5.i) , hence no details are given . 0 Lemma 7.4.3 Su ppose th at th e aut oregressive mo del {7.3.8} an d {7.3.9} holds. In addition, assume th at {7.4.8} an d (7.4.9) hold. Then, 'if 0 < b < 00 , sup ! lW : (x , p
(7.4. 16)
lIull:S b
+ n - 1 / 2 u ) - W:( x , p )]2dG(x ) =
sup !lW : (x , p + n- 1 / 2 u ) - Wh (X, pW dG(x )
(7.4.17)
lIull:Sb
op(l) .
= op(l ).
Proof. Let q(x , u ; y ) := IF(x + n- 1 / 2u /y ) - F (x )l, x E JR.; u , Y E JRP . The r .v. n l /2[W~(-) - W±(·)] is a sum of n r.v.'s whose i th summa nd is conditionally centered, given F i - 1 , and whose condit iona l varia nce, given fi-l , is E[ { h±(~i)F q(-, u ; ~i ){l- q(., U ;~i )}], 1 :S i:S n . Hence, by Fubini , t he stationa rity of {~d and th e fact t hat (h±)2 :S h2, 'if lIull :S b, EIW~-W±l b
<
<
! t n-
!
1
Eh 2(Y i_ 1IF( x
+ n-l /2u/Yi_l) - F( x)ldG( x)
i=l
Eh 2(Y o)IF(x
+ n- 1 / 2 u 'Y o) - F (x )ldG(x ).
Therefore, by (7.4.8) wit h a = 0 and t he Markov inequality, (7.4.18) Thus, to pr ove (7.4. 16), because of t he compact ness of t he ball {u E > 0 t here is a 0> 0 such
JR.p ; lI ull :S b}, it suffices to show t hat for every "l
326
7. Au toregression
that for every
Ilull ::; b,
(7.4.19)
liminfP(
sup II v - ull :'::<5
n
l.cv - . c ul <1J)
=1 ,
wher e Zj, := IW'::: - W±I ~ , u E JRP. Expand the quadratic, apply th e C-S inequality to the cross product terms, to obtain
Observe th at h± ~ 0, F nond ecreasing and (7.2.17) imply t hat
°::;
Im±(x ; u , ±8) - m ±(x ; u , 0)1 ::; m ±(x; u , 8) - m±(x ; u , -8) ,
for all x E JR,lIsll ::; b, lis - uj] ::; 8. Use this, th e second inequality in (7.2.16), (7.2.17) , (7.2.18) , and t he fact th at (a + b)2 ::; 2(a 2 + b2), a E JR, to obtain
IW;= -
W':::I ~
< 16{![Z±(x ;u,8)-Z±(x ;u,0)fdG( x)
+
!
[Z±(x ;u,-8 ) - Z±(x ;u , O)f dG (x)
+ ![m±(x ; u ,8)
- m ±( :r ; u , -JW dG( x ) + In1 / 2 (v;=
- v~ ) I~ },
for all IIvll ::; b, llv - ull ::; 8. This toge t her with (7.4.9) , (7.4.12), (7.4.13), (7.4.18), (7.4.20) and th e C-S inequality proves (7.4.19) and hence (7.4.16). The proof of (7.4 .17) follows from (7.4.16) and th e first inequality in (7.2.15) . 0 Now define, for t E JRP, Kh (t)
.- ! [n- t 1 2 /
h(Yi-d{I(X i
::; X
+ t'Y i - 1 )
-
F(y)}
i=1
Kh(t) :=
!
[Wh(X, p)
+ n l / 2 (t -
p)' n - 1
t
r
dG( x) ,
h(Yi-1)Yi-I!(x)f dG( x) .
i=1
Theorem 7.4.1 Suppos e that the auto regressive mod el (1.3.8) - (1.3.10) holds and that (5.5.45), (1.4.1) - (7.4.10) hold. Th en, 'if < b < 00 ,
°
(7.4.21)
sup IK h(p + n - 1 / 2 u) - Kh(p lIu ll :':: b
+ n- I / 2 u ))/ =
op(l) .
327
7.4. Minimum distance estimation Proof. Observe that, by (5.5.45) , (7.4.7), (7.4 .22)
E
! W~(x,p)dG(x) =
Eh 2 (y
O)
!
F(I- F)dG
< 00.
The rest of the proof of (7.4.21) follows from Lemmas 7.4.2 and 7.4.3 in a similar way as that of (5.5.9) from Lemmas 5.5.1, 5.5.2 and the result 0 (5.5.11). Now we shall apply this result to obtain the required quadraticity of the dispersion Kg and Kt. For that purpose recall the matrices X, G and e; from (7.3.1). Note that X i- j , gj(Yi -d are the (i ,j)th entries of X , G respectively, 1 :s: i :s: n , 1 :s: j :s: p . Also observe that the n
i'" row of n;
(7.4.23)
Lgj(Y i- 1)Y; -1 ' 1:S: i
is
:
p.
i= l
To obtain the desired result about Kg, we need to apply the above theorem p times , jth time with (7.4.24)
1
:s:
Now write Wj for W h when h is as in (7.4.24) and Wj(-) for W j( ·, p) , j :s: p. Note that for 1 :s: j :s: p, x E JR, n
Wj(x)
:=
n- 1/ 2 Lgj(Yi-d{I(Ci
<x ) -
F(x)}.
i= l
We also need to define the approximating quadratic forms : For t E JRP , let (7.4.25)
Kg(t) :=
t!
+ n 1/ 2 (t
[Wj(x)
- p)'
j=l
2
n
xn - 1 L
(7.4.26)
Kt(t) :=
t!
gj(Yi-dYi-l!(X)] dG(x)
i =l
[wt(x)
+ 2n 1/ 2 (t
- p)'
j=l
n
xn- 1 L
?
gj(Yi -dYi-l!(X)] ~ dG(x) ,
i=l
where , for 1 :s: j
:s: p, x
E JR, n
wt(x) := n - 1/ 2 L
gj(Yi -d{I(ci
:s:
x) - I( -C i
i=l
The following theorem readily follows from (7.4.21) .
< x)}.
328
7. Autoregression
Theorem 7.4.2 Suppose that the autoregressive model (7.3.8) and (7.3.9) holds and that (5 .5 .44(a}), (5.5.45) , (7.4.7) - (7.4.11) hold for the p functions h given at (7.4.24). Then, V 0
(7.4.27)
sup IKg(p
+ n- 1 / 2u ) -
< b < 00, Kg(p
+ n- 1/ 2u)1 = opel).
o
lIull:S:b
Lemmas 7.4.2 and 7.4.3 can be directly used to obtain the following Theorem 7.4.3 In addition to the assumptions of Theorem 7.4.2, except (5.5.45), assume that F is symmetric around 0, G satisfies (5.3.8) and that
(5.6.12) holds. Then, V 0 < b <
00 ,
(7.4.28) Upon expanding the quadratic and using an appropriate analogue of (7.4.22) obtained when h is as in (7.4.24), one can rewrite Kg(t)
=
Kg(p)
+ 2(t - p)'n-l/2B~
J
W(x)f(x)dG(x)
+(t - p)'n- 1 B~Bn(t - p)lfl~ ,
t E
jRP,
where W := {WI,' " , W p )' . Now consider the r.v.'s in the second term. Recalling the definition of 1/J from (5.6.2), one can rewrite
s;
:=
J
W(x)f(x)dG(x) = _n- 1 / 2 'tgi[1/J(C:i) - E1/J(C:)]' i=1
where
Since 9 i is a function of Y i-I, it is Fi-l - measurable. Therefore, in view of (7.4.7(a)) applied to h at (7.4.24), and by (5.5.44(a)) , {(Sn, F n - 1 ) , n ~ I} is a mean zero square integrable martingale array. Hence, it follows from Lemma 9.1.3 in the Appendix that
s;
-td
N( 0, G
•
2 T
•
,2
I p x p ) , G = Eg 1g 1 ,
T
=
Var1/J(c:)
(J j2dG)2'
By the stationarity and the Ergodic Theorem, we also obtain (7.4.29)
329
7.4. Minimum distance estimation
Consequently it follows that the dispersion Kg satisfies (5.4.Al) to (5.4.A3) with
90 = p ,
t5 n == n 1/ 2,
W n == n-18~8n ,
W
Sn == n-1 /28~Sn,
=8,
1:
= 8'G*8r 2,
and hence it is an U.L.A.N.Q. dispersion. In view of (7.4.22) applied to h as in (7.4.24), the condition (5.4.A4) is trivially implied by (7.4.7(a)) and (5.5.45). Recall, from Section 5.5, that in the linear regression setup the condition (5.4.A5) was shown to be implied by (5.5.k) and (5.5.l) . In the present situation, the role ofrn,!'n of (5.5.k) is being played by n- 1 8 n f , n- 1 8 n J fdG, respectively. Thus, in view of (7.4.29) and (5.5.44(a)), an analogue of (5.5.k) would hold in the present case if we additionally assumed that 8 is positive definite. An exact analogue of (5.5.1) in the present case is (7.4.30)
Either e' o, Y;-1 e 2 0, VI :S i :S n , VeE W', Ilell Or e' gi Y;'-1 e :S 0 , V 1 :S i :S n , Ve E ffi.P , Ilell
= 1,
= 1,
a.s.,
a.s..
We are now ready to state the following
Theorem 7.4.4 In addition to the assumptions of Theorem 7.4.2, assume that the 8 of (7.4.29) is positive definite and that (7.4.30) holds. Then , (7.4.31)
n 1 / 2 (pg - p) = -{ n- 1 8 n
Jf
2
dG } -1 s;
+ op(I) .
Consequently,
o
(7.4.32)
Let Px denote the estimator Pg when g(x) == x . Observe that in this case G* = 8 = En- 1 X' X = EY oYo . Moreover , the assumption (7.4.30) is a priori satisfied and (7.3.8), (7.3.9) and (7.4.7(b)) imply that EY oYb is positive definite . Consequently, we have obtained
Corollary 7.4.1 Suppose that the autoregressive model (7.3. 8}, (7.3.9) holds and that (5.5.44), (5.5.45), (7.4.7(b)), (7.4.9) - (7.4.11) with h(x) == x hold. Then, (7.4.33)
o
7. Autoregression
330
Remark 7.4.1 Asymptotic Optimality of Px . Because EYoY~ and Bare positive definite, and because ofn- 1 X ' X -t EYoY~, a.s., and because of (7.4.29) , there exists an No such that n- 1X' X and n- 1B n are positive definite , for all n :2 No , a.s. Recall the inequality (5.6.8). Take J = n- 1 / 2g' , L = n- 1 / 2 X' in that inequality to obtain
with equality holding if, and only if X ex: G. Letting n tend to infinity in this inequality yields
We thus have proved the following: (7.4.34)
Among all estimators {Pg}, where the components of 9 satisfy (7.4 .7(a)), (7.4.8) - (7.4.10), for the given (F, G) that satisfy (7.4. 7(b)) , (5.5.44), (5.5.45) , the one that minimizes the asymptotic variance is
Px!
Pg.
We shall now state analogous results for Arguments for th eir proofs are similar to those appearing above and, hence , will not be given .
Theorem 7.4 .5 In addition to the assumptions of Theorem 7.4.4, except (5.5.45) , assume that F is symmetric around 0, G satisfies (5.3.8) and that (5.6.12) holds . Then, n 1 / 2(pg - p) = -{n- 1 B n
(7.4.35)
Jf2dG}-lS~ +
op(l) ,
where
s; :=
J
W+(x)f(x)dG(x) = n- 1/ 2 tgi['l/J(-Ci) - 'l/J(ci)]' i=l
Consequently ,
n 1/ 2pg - p) -td N(O , (B)-lG*(B')-l r 2), n1 /2(p~ - p) -td
N(O , (EYoy~)-lr2) .
0
Obviously the optimality property like (7.4.34) holds here also .
331
7.4. Minimum distance estimation
Remark 7.4.2 On assumptions for the asymptotic normality of Px, p;t . If G is a finite measure and F has uniformly continuous density then it is not hard to see that (7.4.8) - (7.4.11), with h(Yo) equivalent to the components of Yo, are all implied by (7.4.7(b)) . Consider the following assumptions for a general G:
(7.4.36) For each 1 ::; j ::; p, as a function of s E lR,
(7.4.37) (7.4.38)
I
EIX l
!2I1Y ollf (x
+ sIlYolI)dG(x)
is continuous at 0.
[Ill E{IIYolI[j(x + n- l/ (u'Y o + t81IYolI)) 2
- f(x (7.4.39)
_j
+ n- l / 2u'Y o)]}2dG(x)dt
For every u E
Iln-l
t
= 0(1) , V8> 0, u E JRP .
lRP ,
Xr_jIIYi-lllf(x + n- l/2u'Y i _ lWdG(x)
i=l
= Op(l),
1::; j
::; p .
An argument similar to the one used in verifying the Claim 5.5.1 shows that (5.5.44(a)), (7.4.36) and (7.4.37) imply (7.4.8) while (5.5.44(b)), (7.4.38) and (7.4.39) imply (7.4.9) and (7.4.11) with h as in (7.4.24). In particular if G(x) == x, then (5.5.44) , (7.4.36) and f continuous imply all of the above conditions , (5.5.45) and (5.6.12). This is seen with 0 the help of a version of Lemma 9.1.6. Remark 7.4.3 Asymptotic relative efficiency of Px and p;t . Since their asymptotic variances are the same , we shall carry out the discussion in terms of Px only, as the same applies to p;t under the additional assumption of the symmetry of F and G. Consider the case p = 1. Let (J2 = V ar( e) and Pis denote the least square estimator of Pl . Then it is well known that under (7.4.7b) , n l / 2 (Pls pd -7d N(O , 1 - pi) . See, e.g., Anderson (1971) . Also note that in this case (EYoy~)-l = (1 - pi)/ (J2. Hence the asymptotic relative efficiency e of Px, relative to Pis , obtained by taking the ratio of the inverses of their asymptotic variances, is
(7.4.40)
e
') = (J 2/ T.2 = e (Px' , Pis
332
7. Autoregression
Note that e > 1 means Px is asymptotically more efficient than Pis . It follows that Px is to be preferred to Pis for the heavy tailed error d.f.'s F. Also note that if G(x) == x then 7 2 = 1/12[J j2(x)dxj2 and e = 12O' 2 [J j2(x)dxj2 . If G is degenerate at 0, then y2 = 1/4(j2(0)] and e = 40'2 j2(0) . These expressions are well known in connection with the Wilcoxon and median rank estimators of the slope parameters in linear regression models . For example if F is N(O , 1) then the first expr ession is 3/1r while the second is 2/1r. See Lehmann (1975) for some bounds on these 0 expressions. Similar conclusions remain valid for p > 1. Remark 7 .4.4 Least Absolute Deviation Estimator. As mentioned earlier,
if we choose g(x) == x and G to be degenerate at 0 then p~ is the LAD estimator , v.i.z., n
p
(7.4.41)
Piad :=
arqm iru.
2: [2: X j=1
2 i- j
sign(Xi
-
t'Y i -
1) ]
.
i=l
See also (7.4.6). Because of its importance we shall now summarize sufficient conditions under which it is asymptotically normally distributed. Of course we could use th e stronger conditions (7.4.36) - (7.4.39) but they do not use the given inform ation about G. Clearly , (7.4.7(b)) implies (7.4.7(a)) in the present case. Moreover , in this case the L.R .S. of (7.4.8) becomes
which tends to 0 by th e D.C.T., (7.4.7(b)) and th e continuity of F, 1 ~ j p. Now consider (7.4.9). Assume the following (7.4.42)
F has a density
I,
continuous and positive at O.
Recall from (7.3.12) that under (7.3.8), (7.3.9) and (7.4.7(b)) , (7.4.43)
n-
1
/
2 max{IIY
i
- t1I; 1 ~ i
~ n} = op(l) .
Th e r.v.'s involved in the L.R .S. of (7.4.9) in the present case are n
n- 1
[2: xt, {F(ni=l
1 2 / u'Y i _ 1
+ n- 1 / 2 81IY i _ 1 ID
~
7.5. AUTOREGRESSION QUANTILES & RANK SCORES
333
which, in view of (7.4.42) , can be bounded above by n
I: Xt)I Yi -11If(1]niW ,
1 46 [n2
(7.4.44)
i=l
where {1]n;} are
LV . 'S,
with
Hence, by the stationarity and the ergodicity of the pro cess {X;}, (7.4.7(b)) , (7.4.42) and (7.4.43) imply that the r.v.'s in (7.4.44) converge to
This verifies (7.4.9) in the present case. The condition (7.4.10) is verified similarly. Also note that here (5.5.44) is implied by (7.4.42) and (5.5.45) is trivially satisfied as J F(l - F)dG :S 1/4 in the pres ent case. We summarize the above discussion in
Corollary 7.4.2 Assume that the autoregressive model (1.3.8) - (1.3.10) holds. In addition, assume that the error d.f. F has finite second moment, F(O) = 1/2 and satisfies (7.4.42) . Then ,
where
7.5
Ptad
is defined at (7·4·41).
Autoregression Quantiles and Rank Scores
The regression quantiles of Koenker and Basett (1978) (KB) are now accepted as an appropriate extension of the one sample quantiles in the one sample location models to the multiple linear regression model (1.1.1) . In this section we shall discuss an extension of these regression quantiles to th e stationary ergodic linear autoregressive tim e series of order p as specified at (7.3.8) - (7.3.10). We shall also assume that the error d.f. F is continuous.
7. Autoregression
334 Let Z~
;=
(1, y~), and for an 0 ::; a ::; 1, t E IRP +1 ,
?/Jo:(u) ;= auI(u > 0) - (I-a)uI(u::; 0), n
Qo:(t)
;=
L ?/Jo:(X
i -
Z:_l t) ,
i=l
n
So:(t)
;=
n- 1
LY
i - 1 {I(X i -
Z:_l t ::; 0) - a} .
i=l
The extension of the one sample order statistics to the linear AR(p) model (7.3.8) - (7.3 .10) is given by the autoregression quantiles defined as a minmuzer (7.5.1)
We also need to define (7.5 .2)
Note that p( .5) and Pmd(.5) are both equla to the LAD (least absolute deviation) estimator, which provides the extension of the one sample median to the above model. Let 1~ ; = (1" " ,Ihxn be an n-dimensional vector of L's , 1r n be a subset of size p+ 1 of the set of integers {I , 2, . .. , n} , X~l := (Xl , . . . , X n ), X 1r n be the vector of Xi , i E 1r n , H n be the n x (p + 1) matrix with rows Z~ _l ; i = 1,· ·· ,n, and H 1r n be the (p + 1) x (p + 1) matrix with rows Z~_l ; i E 1r n · Now recall that the above model is casual and invertible satisfying a relation like (7.3.11) . This and the continuity of F implies that the rows of H n are linearly independent as are its columns, w.p.I. Hence, the various inverses below exist w.p.I. Now, let
and consider the following linear programming probl em . (7.5 .3)
. . . a I'nr + rmrnrrnze
+ (1 -
a )1'nr - , w.r.t .
(t ,r,r + -) ,
subject to X; - Hnt = r+ - r- , over all (t ,r+ ,r-) E IRP+ l
X
(O,oo)n
X
(Il.co)" .
7.5.1. Autoregression quantiles
335
Note that p(o:) E Bn(o:) . Moreover, the set Bn(o:) is the convex hull of one or more basic solutions of the form (7.5.4)
This is proved in the same fashion as in KB. A closely related entity is the so called autoregression rank scores defined as follows. Consider the following dual of the above linear programming problem. Maximize
(7.5.5)
X~a,
w.r.t . a , subject to
X~a = (1 - o:)X~In ,
a E [0, It.
By the linear programming theory the optimal solution a n (0:) = (anI (0:) , .. . , ann (0:) )' of this problem can be computed in t erm s of p(o:) as follows: If
p(o:) =
H;;?a)XJr( a) ,
th en , for i
for som e (p+ l.j-dimensional subset 71"(0:) of {I, ... ,n },
tt 71"(0:),
(7.5.6)
1,
Xi
0,
Xi
> Z; _IP(o:) , < Z;_IP(O:),
and, for i E 71"(0:) , Ilni(O:) is the solution of th e p + 1 linear equations (7.5.7)
L
Zj-Illnj(O:)
j EJrn(a) n
(1 - 0:)
L j =l
n
Zj-I -
L
Zj_1I
(X
j
> ZJ -IP(o:)) .
j =l
The cont inuity of F impli es that the autoregression rank scores an (0:) are unique for all 0 < 0: < 1, w.p.1. The process an E [o,I]n has piecewise linear paths in [C(O,I)]n and an(O) = In = In - a n(I) . It is invariant in th e sense that an (0:) based on the vect or X ; + H n t is th e same as the an (0:) bas ed on X n , for all t E lRp+l , 0 < 0: < 1. One can use th e computational algorithm of Koenker and d 'Odrey (1987, 1993) to compute these ent it ies. In the next two subsections we shall discuss th e asymptotic distributions of Pn(O:) and an(o:)·
7.5.1
Autoregression quantiles
In this sub-sect ion we shall show how th e results of section 7.2 can be used to obtain th e limiting distribution of Pn(0:) . All th e need ed results are
7. Autoregression
336
given in t he following lemma . It s statement needs the additional notation:
p (a) :=p+F- 1 (a )el , 1
q(a) := f(F- (a )),
0
el := (I ,O, ·· · ,O)' ,
< a < 1; n
~n := n - 1 H~ Hn
= n- 1 LZi-1Z:-1 ,
~
= plimn~n.
i= 1
By t he Er godi c Theorem , 2: exists and is positi ve definite . In t his subsec t ion, for any pro cess Zn( s , a) , t he stateme nt Zn(s , a) = 0;( 1) mea ns t hat for every 0 op(I) .
< a:S 1/2 ,0 < b < 00, suP{ IZn(a) l; II s ll
:s b, a :S a :s 1 -
a} =
Lemma 7.5.1 Suppose the assumptions made at (7. 3. 8) - (7.3.10) hold. If, in additi on (7. 2.10) holds, then, f or every 0 < a 1/2,0 < b < 00 ,
:s
(7.5.8) Moreov er,
(7.5.9)
(7.5.10)
n 1 / 2( Pmd(a) - p (a )) = - {q ( a )~ 71} - lnl / 2S,, (p (a )) + 0;( 1), n 1/ 2( Pmd(a) - jJ(a)) = 0;( 1).
If, (7.2.10) is strengthened to (F1 ) and (F2 ) , then , fo r ever y 0 < b < 00 ,
where the supremum is taken over (o ; s ) E [0,1] x {s E IRP+ l; II s ll
:s b}.
A sketch of the proof. The claims (7.5.8) a nd (7.5.11) follow from Theorem 7.2.1 an d Remark 7.2.1 in an obvio us fashion : apply these resul ts once with h == 1 and p times, jth time with h(Yi-d == Xi-j . In view of the Er godi c Theorem all conditions of T heorem 7.2.1 are a pri ori satisfied , in view of t he cur rent assumptions. The proof of (7.5.9) is similar to that of Theor em 5.5.3. It amo unts to first showing t hat (7.5.12) and t hen using t he result (7.5.8) to concl ude t he claim . But t he proof of (7.5.12) is similar to th at of Lemma 5.5.4, and hence no det ails are given.
7.5.1. Autoregression quantiles
337
To prove (7.5.10), we shall first show that for every 0 < a :S 1/2, (7.5.13) To th at effect, let wn(a )
:=
2::
Z~_l {I (X i - Z~_lP(a) :S 0) - a } H;:(a)
i (l 1T n (a)
+
2::
Z~ _lI(Xi-Z ~_lP( a)=O)H ;n\a) '
i(l1Tn (a )
where 1rn (a ) is as in (7.5.4). Using sgn(x) = 1 - 2I(x :S 0) have th e following inequalities w.p.I . For all 0 < a < 1, (a - 1)lp
+ I( x = 0)
we
< wn(a) < alp .
Note that from (7.5.4) we have I((X; -Z~_lP(a) = 0) Thus we obtain
= 0, for all if/. 1rn (a ).
n
[2:: Z~_l {I(X Z~_lP(a) :S 0) - 2:: Z~_l {I (X i -
a}
i= l
i -
Z~_ljJ(a) :S 0) -
a }]H ; :(a)
iE 1T n (a)
=
w~ ( a ).
Again , by (7.5.4), I (X i - Z ~ _lP ( a ) :S 0) = 1, i E 1rn (a ), 0 < a < I " w.p.I . Hence, w.p.L , VO < a < 1, n 1 / 2S a(p(a))
2::
= n - 1/ 2
Z ~_l (1 - a)
+ n -l /2H~n (a ) wn(a ),
i E 1T n (a )
so tha t
in view of the square integrability of X o and the stationarity of the process. This completes the proof of (7.5.13). Hence we obtain (7.5.14)
sup a ~a ~l-a
inf Iln 1/ 2 S a(s)1I = op(I) . s
This and (7.5.13) essentially th en show t hat sup
n1/ 21ISa(p (a )) - Sa (Pmd)(a ))11 = op(I) ,
a ~ a ~l - a
which together with (7.5.8) proves the claim (7.5.10). Th e following corollary is immediat e.
o
7. Autoregression
338
Corollary 7.5.1 Under the assumptions (7.2.10) and (7.3.8) - (7.3.10),
(7.5.15)
n 1/ 2(p(a) - p(a)) = -{q(a)1: n } -l nl /2S o (p (a )) + 0;(1).
Moreover, for every 0 < al < .. . < ak < 1, the asymptotic joint distribution of the vector n 1/ 2[(p( a r) - p(ar)) , · · · , (P(ak) - p(ak))] is (p + 1) x k normal distribution with the mean matrix bfO and the covariance matrix A ffi 1:- 1 , where
and where ffi denotes the Kronecker matrix product.
7.5.2
Autoregression rank scores
Now we shall discuss the asymptotic behaviour of the autoregression rank scores defined at (7.5.6). To that effect we need to introduce some more notation. Let q be a positive integer, {k n i j ; 1 ~ j ~ q} be ;:i-l := (J {Yo ,co ,El, · · · ,ci-I} measurable and independent of Ci , 1 ~ i ~ n . Let k n i := (k n i 1 , . . . , kn iq )' and K denote the matrix whose i t h row is k n i , 1 ~ i ~ n. Define the processes n
Uk(a) := n- 1
L kn;{&ni(a) -
(1 - an,
i ==1 n
Uk(a)
:=
n- 1
L kn;{I(Ci > F-
1
(a ) - (1 - an ,
a~
a ~ 1.
i==l
Let
We are now ready to state Lemma 7 .5.2 In addition to the model assumptions (7.3.8) - (7.3.10) , suppose the following two conditions hold. For some positive definite matrix
rqx q, = r + op(I) . max IIk = op(I) . l ::;i::;n n ili
(7.5.16)
n- 1 K ' K
(7.5.17)
n- 1 / 2
Then, for every
(7.5.18)
a< a
~
1/2,
7.5.2. Autoregression rank scores
339
Conse quen tly,
(7.5.19) Proof. From (7.5.6) , we obtain t hat V1 ani (a)
=
I (ci
~
i ~ n, 0
< a < 1,
n-I/2 z~_l an(a))
> F -1 (a ) +
+ ani (a )I(X i = Z~_ IP ( a ) ) ,
which in t urn yields the following identi ty: ani(a) - (1 - a) I (ci
> F - 1(a ) - (1 -
a)
-{I(ci ~ F - 1(a) + n- I /2 z ~_ l an (a ) ) - Iie, ~ F-1( a))} + ani( a)I(X i = Z:_lp(a) ),
for all 1
~
i
~
n, 0 < a < 1, w.p.1. T his and (7.5.4) yield
n l / 2 Uk(a)
n l / 2 U k(a) - lCna n( a)q(a)
-[n-I/2~kni{I(€i < F-1(a) + n - I/ 2z :_ la n (a )) -I(ci
+n - I / 2
L
~ F- 1(a )) } -lCnan(a )q(a )]
k niani (a )I(X i = Z:_ I P(a ))
i E 7rn (o )
n l / 2 U k(a) - lCna n (a )q(a ) - R 1(a)
+ R 2(a ),
say .
Now, by t he C-S inequality and by (7.5.16), IllCn/l = Op( I) . App ly Remark 7.2.1 to r ni == k nij and ot her ent it ies as in t he previous section to conclude t hat sup{/I R 1(a ) II; O ~ a ~ I} = op(I) . Also , not e t hat from the results of the previous sect ion we have II sUPa::oo::o l- a /Ia n( a )11 = Op(I ). Use this and (7.5.17) to obtain sup{IIR 1 (a) ll; a ~ a ~ 1- a} = op(I ), t here by complet ing D. th e proof of (7.5.18) . T he rest is obvious . Corollary 7.5.2 Under the assumptions of Lem m a 7.5.2, the auto regression quantile and autoregression rank score processes are asymptotically in dependent. Moreov er, for every k
2: 1, an d f or every 0 < a l < ... < ak ,
n l / 2( U k(ad , ' " , Uk (a d )
=}
N( O, B) ,
B := B EB p limnn -I[ K~ - lCn :En H~)[ K~ -lC n:E nH ~ ]' ,
340
7. Autoregression
where B := ((ai /\ aj - aiaj)h:::;i,j9'
Proof. Let si(a) := I(ci > F- 1(a)) - (1 - a), 1 ~ i ~ n, s(a) := (Sl (a), ·· · , sn (a ))' . The leading r.v .'s in the right hand sides of (7.5.15) and (7.5.19) are equal to -I:;;-ln-1/2H~s(a)jq(a),
n-1/2[K~ -1CnI:nH~ls(a),
respectively. By the stationarity, ergodicity of the underlying process, Lemma 9.1.3 in the Appendix, and by the Cramer-Wold device, it follows that for each a E (0,1), the asymptotic joint distribution of an(a) and n 1/ 2Uk(a) is (p + 1 + q)-dimensional normal with the mean vector 0 and the covariance matrix
where 'On := [a(1 - a)jq2(a))~-1 '0 22 := plimnn-1[K~ -1CnI:nH~]'[K~ -1CnI:nH~) '0 12 := [a(1 - a)jq(a)) plimnn-1I:;;-lH~[K~-1CnI:nH~)'.
But, by definition, , w.p.I ,
n -1I:n
vo
> 1,
1H'[K' -1C I: H')'=I:- 11C' -I:- 11C' =0 nn nnn n n n n
This proves the claim of independence for each a. The result is proved 0 similarly for any finite dimensional joint distribution. Note: The above results were first obtained in Koul and Saleh (1995), using numerous facts available in linear regression from the works of Koenker and Bassett (1978) and Grutenbrunner and lureekova (1992), and of course the AUL result given in Theorem 7.2.1 .
7.6
Goodness-of-fit Testing for F
Once again consider the AR(p) model given by (7.3.8), (7.3.9) and let Fo be a known d .f.. Consider the problem of testing H o : F = F o. One of the common tests of Hi, is based on the Kolmogorov - Smirnov statistic D n := n 1 / 2 sup IFn(x, p) - Fo(x)l· x
7.7. AUTOREGRESSIVE MODEL FITTING
341
From Corollary 7.2.1 one readily has the following : If Fo has finite second moment and a uniformly continuous density fo , fo > 0 a .e.; p satisfies (7.3.20) under Fo, then , under H o,
o; =
sup IB(Fo(X))
+ n 1 / 2 (p - p)'n- 1
t
Yi-do( x) 1+ op(1) .
In addition , if EYo = 0 = Eel , then Ir; -+d sup{ IB(t) 1,0 ::; t ::; 1}, thereby rendering D n asymptotically distribution free. Next, consider, H 0 1 : F = N(p , a 2 ) , p E IR, a 2 > O. In other words , H 0 1 states t hat the AR(p) process is generated by som e normal err ors. Let {tn, an and Pn be est imat ors of p , a , and p , respectively. Define n
Fn(x)
:=
n- 1
L I(X i ::; x a n + {tn + P~ Y
i - 1 ),
x E IR,
i=l A
D II
._
.-
n
1/ 2
sup IFn(x) -
(x)l, A
.
_
- N(O , 1) d.f..
x
Corollary 7.2.1 can be readily modifi ed in a routine fashion to yield that if
then
ti; :=
sup IB(( x))
+ n 1 / 2 {({tn -p) + (an -
a)}a - 1 <j> (x )! + op(1) ,
x
where <j> is the density of . Thus the asymptotic null distribution of D; is similar to its an alogue in the one sample location-scale mod el: the estimation of p has no effect on the large sample .null distribution of ii; Clearly, simil ar conclusions can be applied to other goodness-of-fit t ests. In particular we leave it as an exercise for an interested reader to investigate the large sample behaviour of the goodness - of - fit tests based on £ 2 distances, analogous to the results obtained in Section 6.3. Lemma 6.3.1 and the results of the pr evious sect ion ar e found useful here . 0
7.7 7.7.1
Autoregressive Model Fitting Introduction
In this section we shall cons ider the problem of fitting a given parametric autoregressive model of order 1 to a real valued st ationary ergodic Markovian time series X i, i = 0, ±1 , ±2, " ' . Much of the development her e is
342
7. Autoregression
parallel to that of Section 6.6 above . We shall thus be brief on motivation and details here. Let 'lj; be a non decreasing real valued function such that EI 'lj;(X I r) I < 00, for each rElit Define the 'lj;-autoregressive function m t/J by the requirement that (7.7.1)
E['lj;(X I
Observe that , if 'lj;(x) 'lj;(x)
-
mt/J(Xo))IXo)
= 0,
a.s.
== x , then mt/J = u, and if
== 'lj;o: (x ) := I(x > 0) - (1 -
0'), for an
°<
0'
< 1,
then m t/J(x) == mo:(x) , the oth quantile of the conditional distribution of X l , given X o = x . The choice of 'lj; is up to the practitioner. If the desire is to have a goodness-of-fit proc edure that is less sensitive to outliers in th e innovations Xi - mt/J(Xi - I ) , then one may choose a bounded 'lj; . In the sequel m w is assumed to exist uniquely. The process of interest here is Mn ,t/J(x) := n- I/ 2
n
L 'lj; (X i -
mt/J(Xi-d) I(X i -
1
~ x) , x E [-00 ,00].
i=l
Note th at Mn,t/J is an an alogue of Sn,t/J with p = 1, fI(x) == x , of Section 6.6. Writ e u; Mn,I, for mt/J, Mn ,t/J when 'lj; (x ) == x, respectively. Tests of goodness-of-fit for fitting a model to m t/J will be based on the process Mn ,t/J . We shall also assume throughout that th e d.f. G of X o is continuous and (7.7.2) Under (7.7.1), (7.7.2), and und er some additional assumptions th at involve some moments and conditional innovation density, Theorem 7.7.1 below, which in turn follows from the Theorem 2.2.6, gives the weak convergence of Mn ,t/J to a cont inuous mean zero Gaussian pro cess M t/J with the covariance function J(t/J( x , y) = E 'lj;2(X I
-
mt/J(Xo)) I(Xo ::; x /\ y) ,
Arguing as for (6.6.3), M t/J admits a repr esentation (7.7.3)
M t/J(x)
= B(T~(X)) ,
in distribution,
x , Y E lit
7.7.1. Introduction to AR model fitting
343
where B is a standard Brownian motion on the positive real line. Note that the continuity of the d.f, G implies that of T'IjJ and hence that of B(TJ). The representation (7.7.3), Theorem 7.7.1 and the continuous mapping theorem yield sup IMn ,'IjJ(x)I ===>
sup
= T'IjJ(OO)
IB(t)1
O::;t~T~(OO)
xEIR
sup IB(t)l ,
in law.
0~t9
Thus, to test the simple hypothesis fI o : m'IjJ = mo , where mo is a known function proceed as follows. Estimate (under m'IjJ = mo) the variance (x) by
TJ
T~,'IjJ(X) := n- 1
n
L
'ljJ2(X i - mo(X i-d)I(Xi- 1 :; x),
x E JR,
i=1
and replace m'IjJ by mo in the definition of Mn ,'IjJ . Write s;,,'IjJ for T~ ,'IjJ(OO) . Then, for example, the Kolmogorov-Smirnov (K-S) test based on Mn ,'IjJ of the given asymptotic level would reject the hypothesis fI o if
exceeds an appropriate critical value obtained from the boundary crossing probabilities of a Brownian motion on the unit interval which are readily available. More generally, the asymptotic level of any test based on a continuous function of S~,~Mn ,'IjJ((T~ ,1/,)-I) can be obtained from the distribution of the corresponding function of B on [0,1], where (T~,'IjJ)-1 (t) := inf{x E JR : T~ ,'IjJ(X) ;::: t}, t ;::: 0. For example, the asymptotic level of the test based on the Cramer - von Mises statistic
1
is obtained from the distribution of fo B 2 dH , where H is a d.f, on [0, I) . Now, let M be as in Section 6.6 and consider the problem of testing the goodness-of-fit hypothesis H o : m'IjJ(x) = m(x , (Jo),
for some (Jo E
e, x
E I,
where I is now a compact subset of JR. Let (In be an consistent estimator of (Jo under Hi, based on {Xi , 0:; i :; n} . Define, for an -00:S x :S 00, Mn,'IjJ(x) = n- 1! 2
n
L 'IjJ(Xi i=1
m(Xi-l, (In))I(X i - 1 :; x ).
7. Autoregression
344
The process M n,1jJ is a weighted empirical process, where the weights, at X i - l are now given by the 1jJ-residuals 1jJ(Xi - m(X i - l , On)). Tests for H o can be based on an appropriately scaled discrepancy of this process. For example, an analogue of the K-S test would reject H o in favor of HI if sup{(J"~,~IMn,1jJ(x)1 : x E ~} is too large, where (J"; ,1jJ := l n - L~=11jJ2(Xi - m(Xi-I ,On)). These tests, however, are not generally asymptotically distribution free. In the next sub-section we shall show that under the same conditions on M as in Section 6.6.2, under H o, the weak limit of i; M n,1jJ is B(G) , so that the asymptotic null distribution of various tests based on it will be known . Here t; is an analogue of the transformation Tn of (6.6.25) . Computational formulas for computing some of these tests is also given in the same section. Tests based on various discrepancies of iWn ,1jJ are consistent here also under the condition (6.6.4) with oX properly defined : Just change X , Y to X o, Xl , respectively.
7.7.2
Transform
r;
of M n ,1j;
This section first discusses the asymptotic behavior of the processes introduced in th e previous section under the simple and composite hypotheses. Then a transformation T and its estimate i; are given so that the processes T M n,1jJ and T nMn,1jJ have the same weak limit with a known distribution. Let P(X I
e,
-
m1jJ(Xo)
Xi - m1jJ(Xi-d,
:s x IXo = y), i = 0, ±1 , ±2 ,' "
x, Y
E~,
We are ready to state our first result . Theorem 7.7.1 Assume that (1.1.1) and (7.7.2) holds . Then all finite dimensional distributions of M n,1jJ converge weakly to those of a centered continuous Gaussian process M 1jJ with the covariance function K 1jJ ' (1). Suppose, in addition, that for some T/ > 0, f> > 0, (7.7.4)
< 00 ,
E 1jJ4(cd
(c)
E{'ljJ2(C2)1jJ2(cdIXII}I+c5
and that the family of d.f. 's {Fy, y E that are uniformly bounded: (7.7 .5)
(b)
E 1jJ4(CI)IXoll+'7
(a)
~}
sup fy(x) x ,y
< 00,
< 00,
have Lebesgue densities {fy , y E
< 00 .
~}
7.7.2. Transform T n Mn ,1/J
345
Then
Mn,t/J ===> M t/J' in the space D[-oo, 00) .
(7.7.6)
(II) . Instead of (7.7.4) and (7.7.5) , suppose that u: is bounded and th e family of d.f. 's {Fy, y E JR} have Lebesgue densities {fy , y E JR} satisfying
/ [E{ fl~6(x -
(7.7 .7) for some 8
m t/J(Xo )) } ] th dx <
00,
> 0. Then also (7.7.6) holds.
Proof. Part (1) follows from Theorem 2.2.4 while part (II) follows from
Theorem 2.2.7 upon choosing 0, E'lj;4(IH) (cd and EX;(lH) are finite .
Moreover , in th is situation th e conditional distributions do not dep end on y , so that (7.7.5) amounts to assuming that th e density of CI is bounded. In th e case of bounded 'lj;, EIXIIl+ 6 < 00 , for some 8 > 0, impl ies (7.7.4) . Now consider th e assumption (7.7.7) . Note th at the stat iona ry distribution G has Lebesgue density g( x) == Efxo (x - mt/J(Xo )). This fact together with (7.7.5) implies th at th e left hand side of (7.7.7) is bounded o • from the above by a constant C := [suPX,y fy( x)) Tn' tim es /
(Efxo (x - m t/J(Xo ))] l~o dx = /
gth (x)dx.
Thus, (7.7.7) is implied by assuming /
gl~o (x)dx < 00 .
Alt ernately, suppose m t/J is bounded , and that fy(x) :::; f(x) , for all x, y E JR, where f is a bounded and unimodal Lebesgue density on JR. Then also the left hand side of (7.7.7) is finit e. One thus sees that in the particular case of th e i.i.d. homosc edastic err ors, (7.7.7) is satisfied for eit her all bounded error den sitie s and for all st ationary densities that have an exponential tailor for all bounded unimodal error densities in the
7. Autoregression
346
case of bounded m,p. Summarizing, we see that (7.7.4), (7.7.5) and (7.7.7) are fulfilled in many models under standard assumptions on the relevant densities and moments. Perhaps the differences between Theorem 6.6.1 and the above theorem are worth pointing out. In the former no additional moment conditions, beyond the finite second moment of the ¢-innovation, were needed nor did it require the the error density to be bounded or to satisfy any thing like (7.7.7). Next, we need to study the asymptotic null behaviour of Mn,,p. To that effect, the following additional regularity conditions on the underlying entities will be needed. To begin with, the regularity conditions for asymptotic expansion of IV.Jn, ,p are stated without assuming X i - 1 to be independent of Ci , i 2: 1. All probability statements in these assumptions are understood to be made under Ho. Unlike in the regression setup, the d.f. of X o here in general depends on 0 0 but this dependence is not exhibited for th e sake of convenience. We make the following assumptions. The estimator On satisfies n
(7.7.8)
n 1/ 2 (On - ( 0) = n- 1/ 2 :L>t>(X i - 1,Xi ,00)
+ op(l)
i=1
for some q-vector valued function ¢ such that lE{¢(Xo , Xl, ( 0) !X o} = 0 and ¢(Oo) := E¢(XO ,Xl ,Oo)¢'(XO, Xl, ( 0) exists and is positive definite. (F). The family of d.f.'s {Fy , y E JR} has Lebesgue densities {fy, y E JR} that are equicontinuous: For every 0' > 0 there exists a 5 > 0 such that sup
Ify(x) - fy(z)1 ::;
0' .
yE IR ,l x-zl <e5
Let, for 1 ::; j ::; q, x E JR,
IErilj (Xo, (0) ~(X1 - m(Xo, Oo))I(Xo ::; x ),
Illj(X , ( 0 )
tj(x, ( 0 )
=
lEmj (Xo, ( 0)
M(x,Oo)
(rn, (x , ( 0 )"
t(x, ( 0 )
rr, (x, ( 0 )"
Jl
" , " ,
x; d¢ I(Xo ::; x),
Illq(X , ( 0 ) ) ' ,
I'q(x, ( 0 ) ) ' ,
Note that by (6.6.14) and ('1Jd or by (6.6.14) , ('1J 2 ) and (F) these entities are well-defined.
7.7.2. Transform T n Mn ,1/J
347
We are now ready to formulate an asymptotic expansion of £1n ,,,,, which is crucial for the subsequent results and the transformation Tn. Recall the conditions ('lid and ('lI 2 ) from Section 6.6.2. Theorem 7.7.2 Assume that (7.7.1), (7.7.8) and H o hold. About the model M, assume that (6.6.13) and (6.6.14) hold with Xi , X replaced by X i-I, X o, respectively.
(A). If, in addition (WI) holds, th en n
(7.7.9)
l£1n,,,,(x) - Mn,,,,(x)
+ M'(x , ( 0 ) n- I / 2 I>t>(Xi -
l,
Xi, ( 0 )
I
i=1
=u p (l ). (B). Assume, in addition, that (iJ1 2 ) and (F) hold , and that either lElXo 11+0 < 00, for som e <5 > a and (7.7.5) holds or (7.7.7) holds. Then the conclusion (7.7.9) with M replaced by r continues to hold . We note that th e Remark 6.6.1 applies to the autoregressive setup also . The following corollary is an imm ediate consequence of the above theorem and Theorems 7.7.1. We shall state it for th e smoot h 'l/J- case only. The same holds in th e non-smooth 'l/J. Note that under H o, e, == Xi m(X i - l , ( 0 ) . Corollary 7.7.1 Und er the assumptions of Theorems 7.7.1 and 7.7.2{A) ,
£1n,,p
=}
£1""
in the space
D[-oo ,00],
where £1,p is a centered continuous Gaussian process with the covariance function K~(x,y)
K ",(x , y)
+ M'( x , Oo)1>(Oo)M(y, ( 0 )
-M'(x , Oo)lE{ I(Xo :::; y) 'l/J(EI) 1>(Xo , Xl, Oo)}
- M' (y , Oo)lE{ I(Xo :::; x )'l/J(EI )1>(XO , X I, ( 0 ) }
.
The above complicated looking covariance function can be further simplified if we choose On to be related to the function 'I/J in the following
7. Autoregression
348 fashion . Recall from t he prev ious sect ion th at (J~ (x ) an d let , for x E IRP,
1',p(x)
= x),
.-
E[~ (Ed IXo
.-
/ f x (x ) '1/1 (dx) ,
= IE['I/12(edIXo = z ]
for smooth '1/1, for non-smooth '1/1.
From now onwards we shall assume t hat The erro rs e, are i.i.d. F , Ei ind epend ent of
(7.7.10)
X i -I ,
for eac h i = 0, ±1 ,· · · , and F sat isfies Fl and F2 . Then it readil y follows th at
=
(J~(x)
(J~ , a positive constant inx , a. s.,
1',p , a posi tive cons tant in x, a. s .,
1',p (x )
an d t hat On sat isfies (7.7.8) with (7.7.11)
o
for x , y E JR, where :Eo := E ril(Xo, Oo)ril'(X o , ( 0 ), so that ¢>(Oo) = T :E I , wit h T := (J~/ I'~ . Then direct calcu lations show t hat the above covar iance function simp lifies to J(~ (:r , y) = E'I/12(ed [G(x 1\ y) - v'( x) :E01v(y)),
v (:r )
= E ril(Xo, ( 0 ) I (X o :s x) ,
x , y E JR.
Unde r (7.7.10), a set of sufficient cond itio ns on th e mod el M is given in Sect ion 8.2 below und er which a class of M-estimators of 0 0 corres pond ing to a given '1/1 defined by t he relation n
On,,p
:=
argmin t lln- I / 2
L
ril(Xi -
I,
t)'I/1(X i
-
m( X i -
I ,
t ))11
i= 1
satisfies (7.7.11). See, Theorem 8.2.1 and Cor ollar y 8.2.1 below. Throughout t he rest of t he section we shall ass ume that (7.7.10) holds. To simplify t he expos itio n fur th er write rnf -) = rn f, ( 0 ) . Set
A (x )
=/
ril(y)ril' (y) I (y 2': x) G(dy) ,
x E JR.
Assume t hat (7.7.12)
A (x ) is nonsi ngular for some Xo <
00 .
7.7.2. Transform T n Mn ,1/J
349
This and the nonnegative definiteness of A(x) - A(xo) implies that A(x) is non-singular for all x :S xo. Write A-1(X) for (A(X))-l , and define, for x :S xo,
Tf( x)
= f( x) -
ril'( s)A- 1(s) [!ril(z) I( z
(
i;
~ s)
f(dz)]G(ds) .
It is clear that an analogue of the Lemma 6.6.2 holds here also as do the derivations following this lemm a with obvious modifications. The next result is analogous to Theorem 6.6.3.
Theorem 7.7.3 (A) . As sum e, in addition to the assumptions of Theorem 7.7.2(A), that (7. 7.10) and (7.7.12) hold. Then
sup !T iWn,,p(X) - TMn ,,p(x) I = op(l) .
X-:;'X Q
If in addition, (7.7.1) , (7.7.4) and (7. 7.5) hold, then TMn,,p
=}
TM,p and TMn ,,p
=}
TM,pinD[-oo , xo] .
(E). The above claims continu e to hold under the assumptions of Theorem 7.7.1(E) , (7.7.10) and (7.7.12). We now describ e the analog of Tn of Section 6.6. Let, for x E JR,
Gn( x)
.-
n- 1
n
L I(X;-l :S x ) ;=1
An estimator of T is defined, for x :S xo, to be
Tnf( x)
=
f( x) -
i
Xoo
ril'(y ,On)A ~l(y) x
[!
ril(z ,On) I( z
~ y)
f(dz)] Gn(dy) .
The next result is the most useful result of this section. It proves the consistency of TnMn ,,p for T Mn,,p under the same additional smoothness condition on m as in Section 6.6. Theorem 7.7.4 (A). Suppose, in addition to the assumptions of Theorem 7.7.3(A), (6.6.26) holds and that (7.7.4) with 'l/J (c1), 'l/J (c2) replaced by Ilril(X o, ( 0 )11'l/J (cd, \Iril(X 1 , ( 0 )11'l/J (c2), respectively, holds. Then, (7.7.13)
7. Autoregression
350 and consequently, -1
(7.7.14)
-
A
(Tn ,,,, TnMn ,,,,(')
===}
BoG ,
in D[-oo ,xo].
(B) . The same continues to hold under (6.6.26) and the assumptions of Theorem 7.7.2(B). Remark 7.7.2 By (7.7.12), Al := inf{aIA(xo)a ; a E IRq, lIall = I} > 0 and A(x) is positive definite for all x S xo, Hence,
and (6.6.14) implies
(7.7.15) This fact is used in the proofs repeatedly. Now, let a < b be given real numbers and suppose one is int erested in testing the hypothesis
H : m",(x)
= m(x, eo),
for all x E [a , b] and for some
eo
E e.
Assume the support of G is IR, and A(b) is positive definite . Then, A(x) is non-singular for all x S b, cont inuous on [a, b] and A-I (x) is continuous on [a , b] and
Ellril/(Xo)A -1 (Xo)II/(a < X o S b) <
00.
Thus, under the conditions of the Theorem 7.7.4, (T~,~ TnMn,,,,(') in D[a , b] and we obtain -1
A
A
(Tn ,,,,{TnMn,,,,(') - T; M n,,,, (a)}
===}
B(G( ·)) - B(G(a)) ,
===}
BoG,
in D[a ,b].
The stationarity of the increments of the Brownian motion then readily implies that
Hence, any test of H based on D n is ADF . Proofs of the last three theorems is given at th e end of this section.
7.7.3. Computation of TnMn ,'I/J in some examples
7.7.3
351
Some examples
In this sub-sections we discuss some examples of non-linear time series to which the above results may be applied. This section is some what different from its analogous Section 6.6.3 primarily because one can have non-linearity in autoregressive modeling from the way the lag variables appear in the model. Again , it is useful for comput ational purposes to rewrite TnMn,,p as follows: For convenience , let
Then for for all x :::; xo , (7.7.16)
n
n - 1/ 2
I: [I(X
i - 1 :::;
x)
i= l
_n- 1
n
I: m~(Xj_1)A~1(Xj_dmn(Xi_1) j =l
Now, let 91, . . . , 9q be known real-valued G-square int egrable functions on IE. and consider the class of mod els M with m( x ,O) = 91 (x)B1 +
. . . + 9q(x )Bq.
Then (6.6.13) and (6.6.26) ar e trivially satisfied with m(x ,B) == (91(X), . . . ,9q(X))' and m(x , B) == 0 == J(l(x ,B). A major difference between the regr ession setup and th e autoregressive setup is th at this mod el includes a larg e class of th e first ord er autoregressive mod els. Besides including the first ord er linear autoregressive (AR(l)) model where q = 1, 91 (x) == x , this class also includes some nonlinear autoregressive models . For example the choice of q = 2, 91(X) = X, 92(X) = 2 xe- x gives an exponent ial-amplit ude dependent AR (l) (EXPAR(l)) model of Ozaki and Oda (1978) or t he choice of p = 1, q = 4, 91(X) = I(x :::; 0), 92(X) = xI (x :::; 0), 93(X) = I(x > 0) , 94(X) = xI (x > 0) gives th e self excitin g threshold AR(l) mod el m( x ,O) = (B 1 + B2 x )I (x :::; 0)
+ (B 3 + B4 x )I( x > 0) .
7. Autoregression
352
For more on these and several other non-linear AR(l) models see Tong (1990). In the following discussion the assumption (7.7.10) is in action. In the linear AR(l) model ril(x, B) x and A(x) A(x) IEXJ I(Xo ~ x) is positive for all real x , uniformly continuous and decreasing on ~, and thus trivially satisfies (7.7.12). A uniformly a.s. consistent estimator of A is
=
An(x)
=
=
=
n
l
n- L
xLI I(X k - l
~ x) .
k=l
Thus a test of the hypothesis that the first order autoregressive mean function is linear AR(l) on the interval (-00, xo] can be based on sup ITnMn,I(x)I/{£1n,IG~/2(xo)},
x::O;xo
where
n
n-
l 2 /
L (Xi - Xi-IBn) [I(X i- l :S X) i=l
_
-1
n
-.2-~
j=l
X j - l X i - l I(X j _ l :S X i - l 1\ X) ] 2 ( X ) n -1 ",n L..k=l X k _ l I X k - l ~ j-l
n
£1;,I
n- l L(Xi - X i_ lB n)2. i=l
Similarly, a test of the hypothesis that the first order autoregressive median function is linear AR(l) can be based on sup ITnMn,.5(X)11 {£1n, .5G~/2(xo)} ,
x::O;xo
where
n 2
n- l / L
{I(X i - Xi-IBn> 0) - .5} [I(X i -
i=l _
-1
n
and
t j=l
Xj n- l
l
Xi-
l
I(X j - l :S X i -
l
l 1\
= n- l
L {I(X i - Xi-IBn> 0) - .5}2. i=l
X) ]
L~=l XLI I(X k - l ~ Xj-d
n
£1; ,.5
:S X)
7.7.3. Computation of Tn Mn ,1/J in some examples
353
By Theorem 7.7.4, both of these tests are ADF as long as the estimator On is the least square (LS) estimator in the former test and the least absolute deviation (LAD) estimator in the latter. For the former test we additionally require l&i(lH) < 00, for some 0 > 0, while for the latter test I&r < 00 and f being uniformly continuous and positive suffice. In the EXPAR(l) model, ri1(x, ( 0 ) == (x, xe- X 2 ) 1 and A(x) is the 2 x 2 symmetric matrix
From Theorem 4.3 of Tong (1990: p 128), if I&t < 00, f is absolutely continuous and positive on JR then the above EXPAR(l) process is stationary, ergodic, the corresponding stationary d.f. G is strictly increasing on JR, and EX6 < 00. Moreover, one can directly verify that EXJ < 00 implies A(x) is nonsingular for every real x and A -1 and A are continuous on JR. The matrix n
An(x)
= n- 1 L
I(X i -
1
~ x)
i=l
provides a uniformly a.s. consistent estimator of A(x). Thus one may use supx~xo ITnM"n,I(x)I/{O"n,IGn(So)} to test the hypothesis that the autoregressive mean function is given by an EXPAR(l) function on an interval (-00, xo) . Similarly, one can use the test statistic sup ITn Mn,.5(X)I/{O"n,.5G;j2(xo)}
X~XO
to test the hypothesis that the autoregressive median function is given by an EXPAR(l) function . In both cases An is as above and one should now use the general formula (7.7.16) to compute these statistics. Again, from Theorem 7.7.4 it readily follows that the asymptotic levels of both of these tests can be computed from the distribution of sUPO~u~l IB(u)l, provided the estimator On is taken to be, respe ctively, the LS and the LAD. Again one needs the (4 + o)th moment assumption for the former test and the uniform continuity of f for the latter test. The relevant asymptotics of the LS estimator and a class of M-estimators with bounded 'ljJ in a class of non-linear time series models is given in Tjestheim (1986) and Koul (1996), respectively. In particular these papers include the above EXPAR(l) model.
7. Autoregression
354
7.7.4
Proofs of some results of Section 7.7.2
Many pro ofs are similar to th ose of Section 6.6.4. For exa mple th e ana logue of (6.6.31) holds here also with X i replaced by X i - 1 and with i.i.d . replaced by assuming th at t he r.v.'s {(~i , X i)} are stationa ry and ergodic. In many arguments just replace th e LLN 's by t he Ergodic Theorem and th e classical CLT by th e cent ra l limit t heorem for mar tingales as given in Lemm a A3 of th e App endix. So many details are not given or are shorte ned. The Remark 6.6.3 applies here also without any cha nge . T he proof of par t (A) of Theorem 7.7.2 is exactly similar to t ha t of part (A) of Theorem 6.6.2 while th at of part (B) is some what different . We give th e det ails for t his par t only. Proof part (B) of Theorem 7.7.2 . Put, for 1 :::; i :::; n, t E ~q,
dn,i(t )
'-
m (X i -
1,
()o + n -1 /Zt ) - m(X i -
1 2 / (2o
+ t5llm(X i _ 1 , ()o)ll),
1,
I'n,i
.-
n-
f-tn(Xi - 1, t , a)
.-
lE[l/J (ci - dn,i(t ) + aI'",;) I Xi- I]'
Define, for a, x E
~
and t E
0:
()o) ;
> 0, £5 > 0;
~q ,
n
.-
n- 1/ Z I )l/J (Ci - dn,i(t) + al'n,i ) - f-tn (X i - 1, t , a) i=l - ¢(Ci)] I(X i - 1 :::; x) .
Write Dn(x ,t) and f-tn (Xi-1 ,t) for Dn(x ,t ,O) and f-t n(Xi- 1,t ,0), respect ively. Note t hat the summa nds in D n (x, t , a) form mean zero bounded martin gale differences, for each x, t and a . Thus
Va r (Dn( x , t , a»
< lE[l/J(Cl - dn,l (t) + al'n,l ) - f-tn(XO, t , a) - ¢ (CI W :::; lE[¢(Cl - dn,l(t) + al'n,d - ¢ (Cl )]2 --+ 0, by ass umption (6.6.13) and (lItz ). Upon an applicatio n of Th eorem 2.2.5 with Zn,i = ¢ (ci - dn,i(t ) + al'n,i) - f-tn(Xi - l , t , a) - ¢ (ci ) we readil y obt ain t hat (7.7.17)
sup IDn(x , t, a )1 = op(l ), xE IR
7.7.4. Proofs
355
The assumption (C) of Theorem 2.2.5 with these {Zn ,d and implied by (W2) while (7.7.7) impli es (2.2.74) here. We need to prove that for every b < 00, (7.7.18)
sup
7
2
== 0 is
IDn(x , t )1 = op(I ).
xE R,lI t llSb
To that effect let
c; := and for an
{
Iisil
An := {
sup Idn,i(t )1 S; n- 1/ 2 (0' + bllriJ.(Xi-dID , 1 S; i II tli ::; b
s; n} ,
S; b, let
sup Idn,i (t ) - dn,i(s)1 S; I n,i, 1 S; is; n } n IItll::;b,lIt- sll::;J
<
By assumption (6.6.13), th ere is an N th at \:I b < 00 and \:I Iisil S; b, (7.7.19)
00,
c;
dep ending only on
0' ,
such
\:In > N.
Now, by th e mon otonicity of 'ljJ one obtain s t ha t on A n , for each fixed
Iisil
S; b and \:I Iltll S; b with
li t - sll
S; 6,
IDn(x , t )1
< IDn(x, s , 1)1 + IDn(x , s , -1 )1 n
+In- 1/ 2 L [fln(.'Y i- 1, s , 1) - fln(.Xi- 1, s , -1 )J I (X i- 1 S; x) l· i= l
By (7.7.17), th e first two terms converge to zero uniform ly in x, in prob ability, while the last te rm is bounded above by n
['XO
12 n- / ~ 1-
IFX i _ 1 (y
+ dn,i( S) + I n,i )
00
-FXi_1
(y
+ dn,i(S)
- ' n,i )I'ljJ (dy ).
Observe that for every IIsll S; b, on An , Idn,i(s)I+,n,i S; an , for all l S; i S; n , where an := maX1
(7.7.20)
n-
1 2 /
L In ,i [ i= l
sup
Ify (x ) - f y(z )1
YEIR,l x- zl ::;a n
+2
i:
!Xi -I
(y) 'ljJ (dy ) ] .
356
7. Autoregression
Now, by the ET, (6.6.14) and (F) , n
n- 1 / 2
n
L I'n,i = n- L (2a + <5l1m(X 1
i=1
and
n-
1 2 /
t i: I'n,i
-+ 2a
i - 1, ( 0
)11) = op(l),
i=1
i:
!Xi-l (y) 1j;(dy)
E!xo(y) 1j;(dy)
+ <5
i:
Ellm(Xo)lI!xo(Y) 1j;(dy) .
Observe that , by (6.6.14), the functions
are Lebesgue integrable on JR. By (F) , they are also uniformly continuous and hence bounded on JR so that J~oo qj(Y) 1j;(dy) < 00 for j = 0,1. We thus obtain that the bound in (7.7.20) converges in probability to
4a
i:
qo(y) ~)(dy) + 2<5
i:
ql(y) 1j;(dy)
which can be made less than <5 by the choice of a . This together with (7.7.17) applied with a = 0 and the compactness of N b proves (7.7.18) . Next , by (7.7.1) and Fubini 's theorem, we have n
n- 1 / 2
L I-tn(X
i - 1,
t) I(X i- 1
:::;
x)
i=1
n
n- 1/ 2
L
[l-tn(Xi- 1 , t) -l-tn(Xi- 1 ,
0)] I(Xi-
1 :::;
i =1
n
_n- 1 / 2
L
I(X i -
1 :::;
x)
i =1
n
_n- 1
L
i:
m'(Xi -
1 , ( 0)
tI(X i -
1 :::;
x)
i= 1
x
fxi-l (y) 1j;(dy)
+ opel)
-Eril'(Xo,Oo)I(Xo:::; x)
i:
!xo(y) 1j;(dy)t
+ op(l),
x)
7.7.4. Proofs
357
uniformly in x E lR and Iltll ::; b. In the above, the last equality follows from (6.6.31) while the one before that follows from the assumptions (6.6.13), ('l'2) and (F). This together with (7.7.18), (7.7.19) and the assumption (7.7.8) proves (7.7.9) and hence the part (B) of Theorem 7.7.2. 0 The details of the proofs of the remaining two theorems are similar to their analogues in the regression setup and will not be given here . They also appear in Koul and Stute (1999) .
8 Nonlinear Autoregression
8.1
Introduction
Th e decade of 1990's has seen an exponent ial growth in th e applications of nonlinear autoregressive mod els (AR) to economics, financ e and other sciences. Tong (1990) illustrates th e usefulness of homo scedastic AR mod els in a large class of applied examples from physical sciences while Gourieroux (1997) contains severa l exa mples from economics and finan ce where th e ARCH (auto regressive conditiona l het eroscedastic) models of En gle (1982) and its various generalizations are found useful. Most of the exist ing litera ture has focused on developin g classical inference pr ocedures in these models. The th eor eti cal developm ent of th e an alogues of th e est imators discussed in the pr evious sections that are known to be robust against outliers in th e innovat ions in linear AR models has relatively lagged behind. In th is chapter we shall discu ss the asymptotic distributions of th e an alogues of some of th e procedures of the previous chapte rs for a class of nonlinear AR and ARCH t ime series models in two main sections. The main point of t he chapter is to exhibit the significance of th e weak convergence approach in prov iding a unified methodology for est ablishing th ese resu lts for non-smooth und erlying score functions 'l/J, sp and L in th ese dynamic models. Th e main techni cal tool used are T heorems 2.2.4 and 2.2.5. We sha ll first focus on nonlin ear AR mod els. The main reason for doin g thi s is t hat th e motivation for vari ous est ima t ors is simple and t he det ails of th e proofs are relatively tr an spar ent . H. L. Koul, Weighted Empirical Processes in Dynamic Nonlinear Models © Springer-Verlag New York, Inc 2002
8.2. AR MODELS
8.2
359
AR Models
Let p, q, be positive integers, Yo := (X o, X-I, . .. , X 1 - P ) ' be an observable random vector, n be an open subset of IRq, and Jl be real valued function from IRP x n. In a nonlinear AR(p) model of interest here, the observations {X;} are such that for some 0' = (81, (h, ·" , 8q ) E n, (8.2.1)
Ci
=
Xi - Jl(Y i- 1 , 0),
i = 0, ±l , . . . ,
are i.i.d. LV .'S , and independent of Yo . We do not need to have stationary and ergodic time series for the results of this chapter to hold. To proceed further we need to make the following additional model smoothness assumption about the functions p: There exists a function /J, from IRP x n to IRq, measurable in the first p coordinates, such that for every E > 0, k < 00, sEn, (8.2 .2)
supn 1/ 2 IJl(Y i_1, t) - Jl(Y i- 1 ,S) - (t - S)'/J,i(Y i-1,S)1 = op(l),
where the supremum is taken over 1 ::; i ::; n , n 1/ 211t - sll ::; k. Note that the differentiability of Jl in 0 alone need not imply this assumption. To see th is consider the simple case q = 1 = p, Jl(y,8) = 82y . Then the left hand side of (8.2.2) is bounded above by a constant multiple of n - 1 / 2 max, !Yi-l l , which may not tend to zero in probability unless some additional conditions are satisfied. Now we are ready to define analogues of M- and R- and m .d. estimators for O. For the sake of brev ity we shall often write Jli (t), Jli for Jl(Y i- l , t) , Jl(Y i- 1 , 0) and /J,i(S), /J,i for /J,(Yi -l ' s) , jJ,(Yi-I' 0) , respectively. The basic scores needed ar e as follows. Let ci( t) := Xi - Jl(Y i- 1 , t) , and R it denote the rank of Ci(t ) among {Cj(t) ; 1::; j ::; n} , 1::; i ::; n , t E n . Also let ,¢, tp be as in Section 7.3, L be a d .f. on [0,1] and define n
(8.2.3)
M(t)
.-
n - 1/ 2
L /J,i(t)'¢(ci(t)) , i=1
S(t)
.- n - I/ 2
n
L /J,i(t)tp(Rit/(n + 1)) , i=1 n
Z(u , t)
.-
n- 1 / 2
L /J,i(t)I(R it ::; nu), i=1 n
Z(u, t)
.-
Z (u , t) - n - 1 / 2
L /J,i(t) u, i=1
O::;u::;l ,
8. Nonlinear Autoregression
360
tEn . Also, define (8.2.4)
OM
:=
OR := argmin t IIS(t )1I 2,
argmin t IIM (t )1I2,
e.: := argmin IlZ (t )11
2
t
.
Note that OM, OR are the extensions of M- and R- estimators of Section 7.3 and Omd is an analogue of the m.d. estimator (5.2.18) appropriate for the model here . Thus, for example, the choice of'ljJ(x) == sign(x),
s.:
8. 2.1
Main R esults in A R m odels
The basic weighted empirical process that plays the role of W h of (7.1.6) here is as follows: (8.2.5)
di(t)
.-
Pi(t) - Pi((}) ,
1 ::::; i ::::; n,
n
W(y , t)
.-
n- 1
L jLi(t)I(Xi - pi(t) ::::; y) i=1 n
=
n- 1
L jLi(t)I(ci ::::; y + di(t)), i=1 n
v(y ,t)
:L jLi(t))F(y + di(t)) ,
.-
n- I
.-
n 1 / 2 [W
i=1
W (y , t)
(y , () + n -
1/ 2 t
) - v(y, () + n- 1 / 2 t )],
361
8.2.1 Main results in AR models
for y E IR, t E IRq . The components of the vector of processes Wand Ware different from W h and Wh of (7.2.1) or any other previous process because now the weights also involve the parameter. In linear models , the weights iti(t) == Yi-1, thereby making them free of t . The dependence of weights on the parameter creates additional technical difficulty. Some of the assumptions stated below are needed to overcome this new difficulty with minimal constraints on the model p . In particular we avoid requiring the second order differentiability assumption on this function . These assumptions are as follows. (8.2.6)
There exists a positive definite matrix, :E possibly n
depending on
e. such that n- 1 2:: itiit~ = :E + Op(l). ;==1
(8.2 .7) n
(8.2.8)
n-
1
2:: Elliti((} + n-
1 2s) /
- it i ((})11 2 = 0(1),
s E IRq.
;==1
n
(8.2.9)
n- 1 / 2
2:: Ilit
i ( (}
+ n- 1 / 2s) - it i((})11 = Op(l),
s E IRq .
i= l
(8.2.10)
V t > 0, :3 a 15 > 0, and an N < 00,:1 V 0 < b < 00,
IIsll :S b, n > N,
p(
n
sup II t - sll
n- 1 / 2
2:: lIiti((} + n-
1 2 / t
) - iti((} + n- 1 / 2s)1I :S
t)
i= l
21-L The first two assumptions are similar to (7.2.3)-(7 .2.5) of Theorem 7.2.1, while the last three assumptions are needed to achieve the required equicontinuity of the W proc esses with respect to the parameter in the weights. Clearly in the case of linear models , the last three assumptions are vacuously satisfied. The analogue of Theorem 7.2.1 and Remark 7.2.1 is given by the following lemma.
Lemma 8.2.1 Suppose the assumptions (8.2.1) , (8.2.2), (8.2.6) - (8.2.10) hold. Then , for every continuity point y of F, and for every 0 < b < 00 , (8.2.11)
sup IW(y , t) - W(y , 0)1 IItll:;b
= op(l) .
If, in addition, the error d.f. satisfies (7.2.10), then for every 0 <
362
8. Nonlinear Autoregression
k, b <
00,
(8.2.12)
IW(y, t) - W(y, 0)1 = op(l).
sup lyI9,lI tll:Sb
Finally, if, in addition, the error d.f. satisfies (F1) and (F2), then for every 0
00 ,
(8.2.13)
IW(y, t) - W(y, 0)1 = u p (l ),
(8.2.14)
n 1 / 2 [W(y, t) - W(y , 0)]- t'''£f(y) n
= n- 1 / Z L)it;(O + n- 1 / Z t ) - it;(O)]F(y)
+ U p (l ),
;=1
where u p (l ) is a sequence of stochastic processes that converges to zero, uniformly over the set y E JR,
Iltil :s b,
in probability.
The proof of this lemma will be given at the end of this subsection. We now pro ceed to give several results about the estimators defined in the previous section . As will be seen the above lemma plays th e same role in their proofs as did Theorem 7.2.1 in the linear AR model. Recall the definition of the class of score functions 'l' from (4.2.3). Let 'l' F :=
{wE 'l' ; ! 'ljJ dF = o],
A :=! fd'ljJ .
The first result of this section gives th e AUL of the M-scores.
Theorem 8.2 .1 In addition to the assumptions (8.2.1), (8.2.2), (8.2.6) (8.2.10) , suppose that the error d.f. F satisfies (F1) and (F2). Th en, for every 0 < b < 00 , sup ,pE'iJF, lIt ll:S b
I!M(O + n-
1 Z / t
This follows from (8.2.14), the fact that int egration by parts that shows that
) - M(O) - "£t
All = op(l) .
J Fdsb = 'ljJ(00), for all 'ljJ E 'l' , and
n
M(t) - M(s)
=
n- 1 / 2 l)it;(t) - it;(s)]'ljJ(oo)
-!
;=1
n 1 / Z[W(y , t) - W(y , s)]'ljJ(dy) ,
s , t E JRq .
This th eorem is useful in obtaining the limiting distribution of OM provided n 1 / z liOM- Oil = Op(l) . Using th e methodology of sections 5.4
363
8.2.1 Main results in AR models
and 5.5, one such sufficient condit ion that guarantees this is the following: (8.2.15)
e'M(9 + n- 1/ 2re) is monotonic in r E IR,
v e E IRq,
IIell = 1, n 2 1, a.s. Corollary 8.2.1 In addition to the assumptions of Theorem 8.2.1 and (8.2.15), assume that A > O. Then, for each 'lj; E IJt F, (8.2.16)
n 1 / 2 (9 M
-
n 1 / 2 (9 M
- 9)
9)
= {A~} -IM(9) + op(l), --7d
N (0, v ('lj;, F) ~-1) ,
where v( 'lj; , F) is as in (7.3.4).
The following corollary gives the analogue of th e above corollary for the LAD estimator under weaker conditions on the error d.f, Its proof uses (8.2 .12).
Corollary 8.2.2 Assume (8.2.1) and (8.2.2) - (8.2.10) hold. In addition, if (8.2.15) with 'lj; (x ) sgn(x ) holds and if the error d.f. F has positive and continuous den sity in an open n eighborhood of 0, then nl / 2(9Iad - 9) =} N(o , ~-1 /4j2(0)) .
=
To state ana logous results about the other two classes of esti mators we need to int roduce some more notation. Let
n
Z(u) := n- 1 / 2 I)it i - j't)[I(F(c;) :S u) - u],
0:S u :S 1,
i= 1
n
S := n- 1 / 2 I)iti - j't)[
1 1
K(t) :=
IIZ(u) + r t q(u)11 2 L(du),
Theorem 8.2 .2 Suppos e the assumptions (8.2.1), (8.2.2), (8.2.7) (8.2.10), (F1) and (F2) hold. In addition, suppos e that fo r some positiv e definit e q x q matrix r , possibly depending on 9, n
(8.2.17)
n- 1 2)iti - j't)(it i - j't)' i= 1
= r + op(l) .
8. Nonlin ear Autoregression
364 Th en, f or every 0
< b < 00 ,
(8.2.18)
IIZ(U,() + n-
(8.2.19)
sup
IIS(O+ n -
sup
1 2 / t)
1 2 / t)
- Z(u) -
- n 1 / 2 Ji,ip -
r t q(u)11= op (l ),
S -r e
(8.2.20)
sup LE
,IIt ll9
!q() +
f
qd
= op(l) ,
n - 1 / 2 t ) - K( t) 1 = op(l) .
where the supremum in (8.2. 18) is taken over all 0 in (8.2.19) over , Iltll ::; b.
:'S u :'S 1, IItli :'S band
The claim (8.2.20) follows from (8.2 .18) trivially while (8.2.19) follows from (8.2.18) and th e integration by par ts operation t hat gives S et ) == -rJL
f
Z (u( n+1 n ) , t )
'V t E JRq, w .p.I.
Th e proof of (8.2.18) will be indicated at th e end of t his sectio n. Th e next two corollaries give t he asympt ot ic distri buti on of t he R- and rn.d- esti mators. Corollary 8.2.3 In addition to the assu mpti ons of Th eorem 8.2.2, assum e that
(8.2.21)
e'S (O + n - 1 / 2re) is monotonic in r E JR, 'Ve E IRq , Ilell = 1, n
(8.2.22)
ip
2: 1, a.s.
= O.
Th en, f or ever y
,
n 1 / 2(OR - ())
(8.2.23)
= {Tn-Is + op(l) , , :=
n 1/ 2(OR - ()) -----+d N
(0, T(
1 1
qdip
r- 1 ) ,
Corollary 8.2.4 In addition to the assumpti ons of Th eorem 8.2.2, assume
that f or some 0 <
1
eE L 2 ([0, 1), L ),
1
(8.2.24)
e' Z(u , 0 + n -
'V eE JRq , Ilell
1 2 / re
= 1, n
)f (u )L (du ) is monotonic in r E IR,
2: 1, a.s.
8.2.1 Main results in AR models
365
Th en, for every 'P E cI>, (8.2.25)
n 1 / 2 (iJ m d
n 1 / 2 (iJ R where
-
-
9) 9)
= bl r} - 1
----+ d
1 1
Z(u) q(u) L(du)
N (0, 0"5 r- 1 )
+ op(l) ,
,
0"6 is as in {5.6.21} and {5.6.15}.
Remark 8.2.1 On the assumptions {8.2.6} - {8.2.9}. These assumptions are state d for a general time series satisfying (8.2.1) and (8.2.2). If t he und erlying time series and /1 is such that th e ser ies is stationary and ergodic, t hen the following hold : Condit ions (8.2.6) and (8.2.7) are impli ed by (8.2.26) while (8.2.8) is equivalent to (8.2.27) and t he assumption (8.2.9) is implied by (8.2.28) where E denotes t he expectation under t he stationary distribution of Y o, which typically will depend on t he true parameter 9. Remark 8.2.2 On the condi tions {8.2.15}, {8.2.21} and {8.2.24}. These conditions may be replaced by any ot her conditions that will gua ra ntee the t ight ness of t he resp ective standard ized estimators . However , t hese assumptions are sa t isfied in all t hose time series models where t he para meter vect or 9 in th e fun cti on /1 appears in a linear fashion. Consider th e mod el (8.2 .29)
/1(Y , t)
= t'g(y) ,
where g is a q-vect or of fun cti ons gl, ' .. , gq, each from JRP to JR. In this case jJ, (y , t ) == g (y ) and n
e' M(9
+ n - 1 / 2r e) = n - 1/ 2 L
e' g(Yi-d 'ljJ(Xi - re' g( Yi - d)
i=1
which, in view of 'ljJ being nondecreasing, is clearly seen to be monotonic decreas ing in r , t hereby veri fying (8.2.15).
8. Nonlin ear Autoregression
366 Simil arl y, one obt ains n
e'S (B + n- 1 / 2 re ) = n - 1 / 2
L e'g(Yi- d cp(R ire/ (n + 1)), i= 1
10
1
e' Z (u , B + n - 1 / 2 re) £(u) L (du ) n
=_ n -
1 2 /
L e'(g (Y
i - 1) -
g )cpo(R ire/n ),
i= 1
u
where CPo(u ) == fo £ dL. Now use Lemma 7.3.1 of Section 7.3.2 above and the non decreasing na ture of cp, CPo t o verify (8.2.21) and (8.2.24) in the prese nt case . Not e t hat for the stationary and ergodic mod els of the typ e (8.2.29), the ass um ptio ns (8.2.2), (8.2.6) and (8.2.7) are impli ed by t he squa re int egrability of Ilg(Y o) II while (8.2.8) - (8.2.10) are vacuously satisfied . In t he next section we give some examples of t ime series satisfying (8.2.29). These series are linear in paramet er and nonlinear in t he lag variables. Tong (1990) gives num erous examples of such models a nd t heir pr act ical import an ce. The above resul t s ar e t hus ap plicable to such mod els. 0
R emark 8.2.3 Efficiency . Obser ve t hat t he sca le factors v('IjJ, F ), T(cp , F ) and 0"0 that appear in t he asy mptot ic variances above also appea r in the asy mptotic varia nces of t heir analogues in linear regr ession and aut oregressive models. Thus t he asymptot ic rela ti ve efficiency st atements t hat hold for t hese linear mod els continue to hold in t he pr esent nonlin ear time series set up . 0 R emark 8.2.4 M.D. Estim ation. In t he case the err ors in (8.2.1) are symmet rically distribut ed , th e analogues of t he m .d . est imators (7.4.3) in the cur rent setup ar e defined as follows: Let iti,j (t ) denot e the j t h component of the vect or iti (t) , 1 :S i :S n , 1 :S j :S p , and define
L J[n- L iti,j (t ){I(Xi :S x+ JLi(t )) n
p
1 2 /
j= 1
i= 1
- I( - X i < X .-
-
JLi(t )) }f dG(x ),
argmin{I{ +( t ); t E lRP },
wher e G E DI (lR). Assuming the symmetry of t he error density and of G , and using t he t echniques of Ch ap ters 5 and 7, it is possible to give a
367
8.2.1 Main results in AR models
set of minimal sufficient conditions under which n 1 / 2 (O+ - 0) will converge weakly to N (O, ~ -l v (F, G )) , where v( F, G ); = Var{J~ oo f dG }/(J p dG)2 . Details are left out as an exercise for an inte rest ed reader . 0
Remark 8.2 .5 T he above estimators will ty pically be not robust aga inst outliers in t he erro rs whenever {itil are un bounded functions of {Yi- d · One way to robusti fy these est imators is to replace t he weight s iti(t) in their defining scores by a vector of bounded fun ctions of iti(t ), just as was done in t he previous chapter when defining GM- and GR- est imators. The asymptotic t heory of such procedures can also be est ablished using t he met hodology developed here. 0 Next , we give some results about sequential weight ed residua l empirical processes. These results are found useful for testi ng some change point hypot heses pertaining to t he err or distribution in t hese models. Accordingly, let {h n ;} be an array of 1'. V . 's, h ni being past measur able and independent of Ci , for each i . Define, for y E IR, t E IRq, u E [0,1 ], [nul
T h(y , t , u) := n -
1 2 /
L h n;I(c i ::; y + di( t)) , i= l
[nul
'h (y , t , u ) := n - 1 / 2
L b-« [I (ci < y + di (t )) -
F (y
+ di (t ))] .
i= l
Theorem 8.2 .3 In addition to th e assum ptio ns (8.2.1) , (8.2.2), (PI ), (P2 ), (2.2.52) , (8.2.7), assume th at {h nd sat isfy the following. n
(8.2.30) i= l
n
(8.2.31)
L Eh~i[/1i (O + n-
1
/
2t ) - /1i(OW = 0 (1),
t E IRq .
i= l
Th en, for every 0
(8.2.32)
< b < 00 ,
sup Ith (y , O +n- 1 / 2t ,u ) - t h(y ,O , u )1 = op(l ), Y ,t ,u
(8.2.33)
sup ITh(y , 0
+ n - 1 / 2t, u ) -
Th(y , 0 , u)
Y ,t ,u
[nuJ
_ n- /2 1
L hniit~ t f (y )1 = op(l ), i= l
368
8. Nonlinear Autoregression
where the supremum is taken over Y E JR, IItll ~ b, 0 ~ u ~ 1. Now consider th e residu al sequent ial empirica l pr ocess [n ul
Fn(y , t , u)
n-
:=
1
L ii x, - J.L i(t ) ~ y) , i=l
[n uJ
L
F (y + diet)), i=l ._ T- 1 (y , t , u) .n 1/ 2 [Fn(y, t , u) - v (y , t , u) ]. v(y , t , u)
:=
n- 1
Note t hat Fn is t he Ti, with h n i == 1 and T1 is equa l to Th with n-« == 1. Write Fn(y , t) , T1 (y , t ) etc . for Fn(y , t , 1), T1 (y, t , 1). The following t wo corollaries trivially follow from Lemm a 8.2 .1 and Theorem 8.2.3, resp ectively.
Corollary 8.2 .5 Under (8.2.1), (8.2.2), (FI), (F2), (8.2. 7), we obtain sup IT1(y, 0 + n - I / 2t ) - T1 (y ,0)! = ope l).
(8.2.34)
Y,t
If, in addition , n
(8.2.35)
n-
I
L lIiL ll = Opel ), i
i= 1
then, for every 0 < b < 00 , (8.2.36)
sup In 1 / 2[Fn (y , 0
+ n- 1 / 2 t ) -
Fn(y , 0 )] - t'lL'f (y )1= opel ),
y, t
where the supremum is taken over y E JR,
IItll ~
b.
Corollary 8 .2 .6 A ssum e (8.2.1) , (8.2.2), (FI ), and (8.2.7) , hold. In additi on suppose n
(8.2.37)
n-
1
L EiliLil1
2
= 0 (1),
i= 1 n
L E[J.Li(O + n-
(8.2.38)
1 2 / t
) - J.L i(OW
= 0(1 ).
i=l Then, for every 0 < b < 00, (8.2.39)
sup
\T1 (y , 0 + n - I / 2 t , u ) - T1 (y , 0 ,u )1=
ope l) .
Y,t ,u
(8.2.40)
sup In 1 / 2 [Fn (Y, 0+ n- 1 / 2 t , u ) - Fn(y , 0 , u )]
Y,t ,u
[nuJ
_ n - 1 LiL~t f (y) 1 = opel ), i=l
8.2.1 Main results in AR models
369
where th e suprem um is tak en over y E JR, Ilt ll :::; b, 0 :::; u :::; 1.
Proof of Lemma 8.2.1 . We shall first prove (8.2.13). Let
. (t) -= JLi . «(J JLni
- 1/ t 2 ), +n n
W * (y, t ) := n -
1 2 /
L it ni (t )[/(ci :::; y) - F (y)], i=l
n
U (y, t)
1 2
:= n- /
L[itni (t ) - it ni (O )) [/(ci :::; y) - F (y)). i=l
Then one can rewrite
W(y , t) - W (y, 0) == [W(y, t) - W *(y , t)) Thus, it suffices to pr ove that for every 0 (8.2.41)
< b < 00 ,
IIW (y, t ) - W * (y, t )11
sup yE IR.ll t l! :Sb
(8.2.42)
IIU(y, t )1I
sup YE R , lI tll ~b
+ U(y , t) .
= op(l) ,
= op(l) .
We proceed to prove (8.2.41). This will follow from T heore m 2.2.4 in a similar fashi on as in (7.2.8) , once we verify t he condit ions of t hat t heorem , which we shall do now. Fix a 0
00 .
In t he sequel, t he indi ces in t he SUPy,i,t vary from
y E JR, 1 :::; i :::; n , IItll :::; b. Let F ni := a-field {Yo, c l , ' " ,ci- d , 1 :::; i :::; n .
Let J.1nij , Wj , Wj , etc. denote t he i h coordinates of it ni , W , W , etc. Note th at if in (2.2.26) we t ake (8.2.43)
t he n Vh(y), Vh'(y), resp ectively, are equal to Wj(y , s) , Wj (y, 0), for all y E JR, s E JRq . Now, by (8.2.2) , Va > 0, 3nl ,:3 'lin > nl , (8.2.44)
p( s~p !J.1i «(J + n t ,t
1 2 / t)
- J.1i «(J) - n - 1 / 2 t it i l :::; ban- 1 / 2 )
2 1 -a . Hence from (8.2.7) , we readily obtain (8.2.45)
sup Idni( t) 1= op (l ). i ,t
8. Nonlinear Autoregression
370 This verifies (2.2.28) for t he <5 ni of (8.2.43). Next , by (8.2.6) and (8.2.8) ,
where (Jjj is t he lh diagon al element of t he mat rix ~ , so t hat (2.2.34) is verified for t he b-« of (8.2.43) for every s E jRq , 1 ~ j ~ q. Finall y, becau se for every s E jRq , 1 ~ j ~ q,
+n- 1!2 max -" '1 ' ,. 1'""J n - 'I' mf" II' n'}(8) -
p'}I 5
( n -,
~ {Pn'} (s] - P'} }' )' /' ,
(2.2.27) is implied by (8.2.7) and (8.2.8) for t he (2.2.32) read ily implies t hat (8.2.4 6)
tu«
sUPIIW(y , s) - W *(y, s )11 == op(l) ,
of (8.2.43) . Hence,
li s II ~ b.
y EIR
°
To complete t he proof of (8.2.4 1), it suffices to show that V
0' > 0, 3 <5 >
and no , '3 Vn > no , and for each IIsll ~ b,
(8.2.47)
p(
sup
II D (y , t ) - D (y, s)1I
> 0') ~
0',
y EIR,IIt - sll
where D (y, t ) == W (y, t ) - W *(y, t ). For the sa ke of br evity, let O'i(Y, t ) :== I (ci ~ Y + dni(t )) - F (y + dni(t )) and write D;(Y) for Di(Y, 0) . Then one can rewrite n
D(y , t) - D(y, s)
=
n- 1!2
L [itni(t) -
it ni(S )] [Di(y, t) - Di(Y)]
i= l
n
+n - 1! 2
L it ni(S)[Di(Y, t ) -
Di(Y, s)]
i= l
D 1 (y ,s ,t ) +D 2 (y ,s,t ),
say
To prove (8.2.46) it t hus suffices t o prove analogous resul t for D 1 and D 2 . But , (8.2.10) readily imp lies this for D 1 as 100i(Y, t ) - Di(y)1 ~ 1, uniform ly in i , Y, t .
8.2.1 Main results in AR models
371
We proceed to prove it for D 2 . Writ e f.1nij(S) = J-t~ij(S) - f.1;;ij(S). Let denote 2j where f.1nij(S) is replaced by f.1~ij(S) . It suffices to prove the analog of (8.2.46) for 1 :S j :S q. Now, fix an 1 :S j :S q, a > 0, Iisil :S b, and a 8 > 0. Let ~ni .1 2 n- / (ollit i ll + 2ba) and
Dt
D
Dt,
sup
An := {
II tIl9 ,lIt-sll <
Idni(t) - dni(s)1 :S ~ni , 1:S i :S
n}.
From (8.2.44), it follows that (8.2.48)
Next, let , for y , a E JR, n
VT(y ,s,a)
n- / 2 I>j~ij(s))[I(ci :S y 1
:=
+ dni(s) + a~ni)
i=l
Let (8.2.49)
By definition, 0ni are Fn i- measurable, for every a E JR, 1 :S i :S n . Moreover, by (8.2 .7) and (8.2.45) , (8.2 .50)
maxlOnil ,
Th e rest of the argument being the same as for (8.2.46) , one more application of (2.2.34) with these 8ni and the other entities as in (8.2.43), yields t hat (8.2.51)
sup IVT(y , s , a) - VT(Y , s , 0)1 = op(l),
\j a E
JR.
y
Now, use the monotonicity of the indicator function and the d.f, F to obtain that , on An , for IItll :S b, lit - sll < 8,
Dt <
IVT(Y ,s , 1) - VT(y,s ,O)1
+\n-
1
/
2
+ IVT(y ,s, -1) -
VT(y,s ,O)1
n
L jL~ij(s))[F(y + dni(s) + a~ni) i= l
-F(y
+ dni(s) + a~ni)]I .
8. Nonlinear Autoregression
372
By (F I) , th e last term in t his bound is no lar ger t ha n n
2l1flloon- 1 L lit;ij{s)l{ol ltLill + 2bo) i=l which , in view of (8.2.6) and (8.2.8), can be made to be smaller t ha n a constant multiple of 0, by the choice of 0, wit h arbitraril y lar ge probabili ty, for all sufficientl y lar ge n. This tog et her with (8.2.48) and (8.2.51) completes t he proof of an an alogue of (8.2.46) for D 2 , and hence of th e claim (8.2.41). Proof of (8.2.42) . Fix a 1 :::; j :::; q and let h ni == itnij (t ) - itnij (O ). Write h ni = -: - h;'i so th at t he j th component of U is writ te n as Uj = Uj- , with
ut -
n
Ul(y, t)
1 2
:= n - /
L
h;iOi(Y)'
i=l From (8.2.8), we obt ain , (8.2.52)
v"aT
(Ul(Y , t )) n
n- 1
I: E( h;Y F (y )(l -
F (y )
i=l n
< n - 1 I: E(h;Y =o( 1), VyElR. i=l Next , fix an 0 > 0 and let - 00 = Yo < Yl < . .. < Yr = 00 be a par ti tion of lR such that F(Yk) - F(Yk-l) :::; 0 , k = 0, 1, . . . , T . Then , once aga in using th e monotonicity of the indicator function and F , we obtain ,
< -
max IUJ±(Yk> t)1 + o:n-
O
LJ
- -
This, (8.2.52) , (8.2.9), and the arbitrariness of sup IIU(y, t )11 y
n 1 2 / '"
= op(l) ,
0:
Ihni\.
i=l
impli es
V IItll
:::; b.
To finish th e proof of (8.2.42) , we need to pr ove an ana logue of (8.2.47) for the U (y, t )-process. Bu t thi s is implied by (8.2.10), becau se IOi(y) 1 :::; 1, and becau se n
U (y, t ) - U (y , s ) = n- 1/ 2 L[tLni(t ) - tLni(s )]O:i(Y)'
i=l
8.2.1 Main results in AR models
373
This completes the proof of (8.2.42) , and hence of (8.2.13), while that of (8.2.14) follows from (8.2.13) and the Taylor expansion of F. The proof of (8.2 .12) is similar and simpler while that of (8.2.11) follows from the Chebbychev inequality. The proof of Lemma 8.2.1 is terminated. 0 Proof of Theorem 8.2.3 . The proof of this theorem is facilitated by the following two lemmas. Recall the definitions of bni from (8.2.49) . Lemma 8.2 .2 Under (Fl) and {8.2.2} - {8.2.8}, for some K < for every a > 0, n
limsupP( supn- l / 2 n
x ,y
L IhndF(Y + bni) i=l
F(x
+ bni)]1
::;
00
and
Ka) = 1,
for every a E 1R,llsll ::; b, where the supremum is taken over the set {x , y E IR; IF(y) - F(x)1 ::; n- l / 2 a}. Proof. Let Un := max, Ibnil, Tn := {If(y)- f( x)l ; IF(y)--F( x)1 ::; n- l / 2a}, and wn := {If(z)-f(v)l ; Iz- vl::; Un} . From (Fl) , (F2) and (8.2.50) , Tn = 0(1), W n = op(I). Also, from (8.2.2) - (8.2.8) , we obtain n- l / 2 L:~=l Ibnil = Op(I) . The lemma follows from these facts and the inequality n
l 2
n- /
L
Ihni [F(y
+ bni) -
F( x
+ bni)]1
i=l n
l 2 n- /
::;
L
IhnillF(y) - F( x)1 + n- l / 2
i=l
n
L
Ihnibnil{Tn + 2wn} . 0
i=l
Lemma 8 .2 .3 Let F be a continuous strictly increasing d.f. on IR" e, be i.i.d. F , a > 0, n 21 , N := [nl / 2 / a] and {Yj} be the partition oflR such that F(Yj) = jlN, 1 ::; j ::; N , Yo = -00 , YN+l = 00 . Then, under {2.2.52}, [nul
(8.2.53)
suI? In- 1/ 2 U,J
L h;i {I(ci ::; Yj+l) i=l
ti«, ::; Yj) - liN} I = op(I) ,
where the supremum is taken over 0 ::; U ::; 1, 0 ::; j ::; N + 1. Proof. Let i
Vi,j
:=
h;i{ I(c i ::; Yj+l) -
u«, < Yj) -
liN},
Si ,j :=
L Vk,j. k=l
8. Nonlinear Autoregression
374
Clearl y, for each 0 ::; j ::; N + 1, {Si,i ' F ni , 1 ::; i ;::: n,} is a mean zero martingale array. By the inequ ality (9.1.4) , for some C < 00 ,
P(l~~n ISi,il > a ) ::; a - ES~ ,i ' 4
n
2
n
ES~,i ::; C { E [ L E (V;~i I Fn,i- 1 ) ] + L EV;~i }' i=l
i= l
But, because -: ::; Ihn ;/ , for all i, n
n
L EV;;i ::; L i=l i=l
Eh~i '
E(Vi,i2Ir.r n,t-. 1) <_ h2ni N-1 <_n - 1/2 h ni2 _a_ l_a' n
?
E [ LE(V;~iIFn,i-dr
::;
i=l
C~ a)
2
1 ::; i ::; n ,
n
L Eh~i '
\f0 ::; j
< N + 1.
i=l
Ren ee, for som e constant Co: , dep ending on ly on C and a , n
P (L .H.S.(8.2.53) > a) ::; Co: n- 2 N L
Eh~i = Q (n -
1/ 2
)
= 0(1),
i= l
Now we pro ceed to prove (8 .2.32) . Write n-« = h7;i - h;;i' By the triangle inequ ali ty, it suffices to prove (8.2.32) with h ni repl aced by h~i ' Let V±(y , t , u ) denote t he difference inside th e absolute valu e on the L.R .S. of (8 .2.32) with h ni replaced by h ~i ' Now let Oni(t ) == dni(t ) + a ~ni , and as before we will conti nue writing Oni for Oni(S ), Define [nul
U±(a, y , t , u)
:=
n-
1/ 2
L
i: [n Si ::; y + Oni(t) ) - I( ci < y)
i =l
- F(y
+ Oni(t)) + F(Y) ]
We have, by (F ), (2 .2.52) , and th e DCT , for every fixed y E IR, t E IRq , and 0 ::; u ::; 1, n
Var( V ±(y , t , u )) ::; n- l L
Eh;'ilF(y
+ dni(t )) -
F( y )1 = 0(1).
i= l
Now, fix an a > 0, 11511 ::; b and a 0 > O. Let An be as in t he proof of (8.2 .41) . Arguin g as in t here , we obtain t hat on t he set An, \f Iltll ::;
375
8.2.1 Main results in AR models
sll < 6, and for all Y E JR, 0 :S u :S 1,
b, lit -
V±(y, t, u)
< sup IU±(l,Y, s, u)1 + sup IU±(-1, y, s, u)1 y ,u
y ,u
[nuJ
+ supn- 1/ 2 L h;i[F(y + dni(s) + .6. ni ) - F(y + dni(s) - .6. ni )). y ,u
i=l
But , by (Fl), (2.2.52) , and (8.2.30), the last term in this bound is bounded above by n
n
C(6n- 1 L
IIhni it i li + 2bn- 1 L
i=l
= Op(a) ,
Ihnila)
i=l
by the choice of 6. Thus to complete the proof of (8.2.32), it suffices to show that sup IU±(a,Y, s, u)1 = op(l),
(8.2.54)
a E JR,
IIsll :S b.
y,u
Let Nand {Yj} be as in the proof of Lemma 8.2.3. Then we obtain sup IU±(a,Y, s , u)1 y ,u
<
12
n
2 sup IU±(a, v,» .u)1 + n- / L j,u
Ihnd max[F(YHd - F(Yj))
i=l
J
n
+ maxn- 1/ 2 L J
Ihnil /F(YH1 + 6ni) - F(Yj + 6ni )!
i=l
[nul
+supln-1 /2 Lh;i {I(ci:S YHd - I(ci:S Yj) -l/ N
}1
i=l
J ,U
The second term is Op(a) by the definition of yj's and (2.2.52) , while the last two terms are op(l) by Lemmas 8.2.2 and 8.2.3. Using the fact that for each a, Y, s, n 1/ 2U±(a,y,s,iln) is a martingale in i , and arguing as in the proof of Lemma 8.2.3, we obtain
P (suplu±(a,Yj,s,u)1 > a) J ,U
< N maxP ( sup IU±(a, vs .s , iln)1 > a) l~i ~n
J
< Nn-
2
mrx E{ n 1/
2U±(a,
Yj , s , 1)
r
= O(n- 1/ 2 ) .
8. Nonlin ear Autoregression
376
This and t he arbit rariness of ex completes th e proof of (8.2.39) . The claim (8.2.33) follows from (8.2.32) in a routine fashion , thereby ending th e proof of Theorem 8.2.3. 0
8.2.2
Examples of AR models
We sha ll first discuss th e problem of testing for a cha nge in th e err or distri bution of t he mod el (8.2 .1), (8.2.2). Example 8.2.1 Testing for a change in the error d.f. Let F l , F 2 be two different distribution functions, not necessarily known, and F, :P F 2 . Consider th e problem of testing the cha nge point hypothesis
Hi, : th e err ors
Cl , . . . Cn
HI : c l ,' "
are i.i.d. F l
, Cj
in (8.2.1) are i.i.d ., against , C j+ l , ' "
.e«
are i.i.d . F 2 , for some
1:::; j < n. That is, we are inte rested in tes ti ng t he hyp oth esis th at t he ti me series (8.2.1) is generated by i.i.d. err ors, versus t he alte rnatives th at for some 1 :::; j < n , th e first j and th e last n - j observatio ns are generated from possibly two different error distributions. To describe a test for t his problem , let 0 be estima tors of fJ based on X i , 1- » : i:::; n . Let d n := n l / 2 (0 - fJ ). Assum e th at (8.2.55)
und er
n.:
Also, let Fnu , Fn(l-u ), denote residu al empirical processes based on th e first [nul residu als X i -Pi(O); 1 :::; i :::; [un], and th e last n - [nul residua ls X i - Pi(O); [nul + 1 :::; i :::; n, u E [0,1], where [x] denotes the greates t int eger less th an or equa l to th e real number x . The Kolmogorov-Smirnov type tes t of th is hypothesis is based on th e pro cess
[nul ( 1- --;:;[nul) n 1/2 { Fnu(Y) ~ 6.. n(y , u) ._ .- --;:;- Fn(l- U)(Y) , A
}
where Y E JR, u E [0,1] . For th e sake of brevity write here W( y , u) for W( y , fJ , u ) of Coroll ar y 8.2.5. Also, let th e common err or d.f. be denot ed by F and its density by f . All th e needed assumpt ions for t he validity of (8.2.39) and (8.2.40) are assumed to be in action.
8.2.2 Examples of AR models
377
From (8.2.40) we obtain that under H o,
=
"'n(Y , u)
(1- u) [W(Y, u)
+ n-' ~,,: d., fry)]
- u [W(y , 1) - W(y , u)
+ n- 1
t
jL~ d., f(y)]
i= l+ [n u J
+u p (l ). Now, suppose additiona lly that for some random vector m , n
(8.2.56)
n-
1
L jLi = m + op(l). i= 1
Then one readily obtains [nuJ
sup !n- 1 O:S; U:5)
L jLi -
u ml = op(l) .
i= 1
Hence, und er H o, and und er th e above appropriate regular ity conditions,
D.n(y, u) = [W(y , u) - uW(y ,
1)] + u ( l ). p
Thus, it follows, say from Bickel and Wichura (1971) , th at under H o , (8.2.57)
sup
lD. n(y , u)1 ---+d sup
1D.(t, u )l,
O:S;t ,u :S;1
y EIR ,O:S;u:S;l
where D. is a zero mean continuous Gaussian pr ocess on [0, 1F with
E{D.(s , u) , D.(t , v)}
= [s t\ t -
st][u t\ v - uv].
Consequently, th e test based on sup] ID. n (y , u) I; y E JR, O :s: u :s: I} is asymptotically distribution free for testing H o versus HI . We end thi s example by noting that th e condition (8.2.56) is typically satisfi ed if th e process is stationary and ergodic and the summands involved here have finite expe ct ations as will be typic ally th e case in th e following few exa mples. Example 8.2.2 SETAR(2;1,1) model. If in (8.2.1) we take (8.2.58)
q = 2, p = 1, p(y ,O)
= B1yI(y > 0) + B2yI(y :s: 0) ,
then it becomes the SETAR(2;1 ,1) [self-exciting threshold] model of Tong (1990; p130) . Note that here Y i - l == Xi -I . Let y+
= max{O,y} ,
y-
= min{O,y} ,
Wi == (XLI ' X i-=-I)' ·
378
8. Nonlinear A utoregression
Tong (1990) contains some sufficient conditi ons for t he stationarity and ergodicity of t he SETAR(2; l, l) process. For exa mple t his holds if t he error density f is positive every where, Elel < 00 , and fh < 1, ()2 < 1, ()1() 2 < 1. Moreover , if additionally E e 2 < 00, th en EXJ < 00. Hence, by th e Ergodic Theorem , in this mod el t he assumpti ons (8.2.2), (8.2.6) to (8.2.10) are satisfied with jL;(t ) == W i, (8.2.59)
We emphas ize th e fact tha t all expectations here depend on t he par am eter O.
Note this model is also an example of th e sub- mode l (8.2.29) with g(y ) == (y+ , y -)' . From t he discussion in Remark 8.2.2, it follows t hat t he cond it ions (8.2.15), (8.2.21) an d (8.2.24) are also satisfied here. We t hus obtain th e following Corollary 8 .2 .7 In addition to (8.2.1) and (8.2.58}, assume the Jollowing. (8.2.60)
Th e error d.f. F has a uniJorm ly continuous everywhere posit ive density
f and E e 2 < 00, Ee
= O.
Then n 1/2 (OM
n 1/ 2
n where
~
1 2 /
- 0)
(OR-
~d
N( O, ~ - 1 v( 'ljJ , F)) ,
0 ) ~d N( O, r - T(
(Orn d - 0) ~ d N( O, r-1 a5),
V 'ljJ E 'lJ ,
V ip E cI> ,
V L E cI> ,
an d T are as in (8.2.59) .
Now consider t he problem of testing the goodness-oj-fit hypoth esis H o : F = Fo , against t he alte rnative F f:. Fo , where Fo is a known d.f. havin g a uniforml y cont inuous everyw here posit ive density J. Let 0 be any est imator satisfying under H o.
(8.2.61) Let D n
:=
n 1 / 2 SUPy IFn(Y, 0) - Fo(y) l. From (8.2.36) we readily obtain
D n = sup IW (y, 0) y
+ n 1 / 2 (0 -
O)'m Jo (y )1
+ op( l) ,
8.2.2 Examples of AR models
379
where W(y ,6) is as in (8.2.34) - the standardized empirical of the i.i.d. '- (+ r.v. ,s {e,-} - an d m ./-Lo' /-Lo-)' . Compare this finding with that in Remark 7.2.3 pertaining to the linear AR model. In linear AR(p) models with zero mean errors, the analogous D n statistic satisfies D n = SUPy IW(y, 6)1 + op(l), thereby rendering the tests based on D n ADF . But in the current case, even if Ee = 0, the vector m =j:. 0 and hence the tests based on D n are not ADF. Note that SETAR models are piecewise linear, a very simple departure from the usual linearity, yet the above mentioned property fails . Next, consider the testing problem of Example 8.2.1 for this model. Under the conditions of Corollary 8.2.7 on the common error d.f. under Hi, and under (8.2.55), all needed conditions for the validity of (8.2.57) are trivially satisfied, and hence any test based on .6. n is asymptotically d.f. for testing the hypothesis of no change in the error d .f. Example 8.2 .3 EXPAR model . Let
q = 3, p = 1,
n := (-1,1)
x lR x (0,00) and let
/-L(y ,6) = {01 + 02 exP( -03y2)}y ,
6 E n.
Then (8.2.1) becomes an example of an amplitude-dependent exponential autoregressive model of oreler 1 (EXPAR(l)) . From Tong (1990), one obtains that under (8.2.60) , this times series is stationary and ergodic, and EXJ < 00. Because xkexp( -O'x2) is a smooth function of 0' with all derivatives bounded in x , for all k ~ 0, (8.2.2) - (8.2.10) , (8.2.10) , (8.2.37) and (8.2.37) are readily seen to hold with
n- 1
n
L iJ,;(6) = EiJ,1 (6) + op(l) . i=l
The analogue of the D n statistic asymptotically behaves here like
Dn
= sup IW(y, 6) + d~EiJ,1 (6) fo(y)1 + op(l) . yE IR
Now, if F o is such that the stationary distribution is symmetric around zero , then EiJ,1 (6) = 0 and here also, like in the linear AR(l) model with zero mean errors, the test based on D n is asymptotically distribution free.
8. Nonlin ear Autoregression
380
Similarly th e conclusions of Example 8.2.1 are also valid here in connections with t he change point testing problem , assuming of course among other things , that an estimator iJ satisfying (8.2.55) exists here. But not e that if fh = 0, th en 03 is not identifiable. However , in many applications one takes 03 to be a known number . In th at case we again have a mod el of the typ e (8.2.29), and hence all th e limit results about M-, R-, and m.d. estimators ar e valid und er t he assumption t hat th e error d.f, sa tisfy (8.2.60 ) with :E = E H H ' , r = :E - VI V I ' , where
=[
H VI
Xo XJ exp( - 03X J )
= E [
XJ exp( -03X J ) ] XJ exp(-203X J )
o X ] Xoexp( -03 X J)
We end this example by mentioning that t he above th eory is seen easily to hold for the general EXPAR(m) mod el given by p
f.l(y , O)
;=
I ) O:j + ,Bj exp(-8XL j )]Xi - j , j =I
where now 0
= (0:1, ' " , O:p,,BI,' "
, ,Bp, 6)' E (- 1, l )P x RP x (0, 00) .
R e m a rk 8. 2. 6 An Extension. Analogues of t he most of th e abo ve result s are valid in mor e genera l AR mod els with possibly a nonlin ear covariate
effects present . Let Zni , 1 :S i :S n, be another set of r x 1 rand om vectors deno ting a covariate vector and e be a known function from IRP x IRr x e to the real line and consider th e mod el
where, starting with Y o, Y n ,i - I ; = (Xn ,i-I,' " , X n,i- p)', 1 :S i:S n . Moreover , Y n,i-I , Zni and eni , are assumed to be mut ually independent for each 1 :S i :S n. Koul (1996) developed a general asymptotic t heory analogous to th e abov e discussion in th ese mode ls. In fact t he above discussion is an ad aptation of the resu lts in th is paper to AR models without trend . 0
8.3
ARCH Models
In this section we sha ll discuss ana logues of some of the results of th e pr evious sections for some auto regressive condit iona lly heteroscedastic (ARCH) mod els.
8.3.1 ARCH model & some definitions
8.3.1
381
ARCH Models and Some Definitions
As before, let {Xi, i ~ 1 - p} be an observable time series , and Y i - 1 . (X i - 1 ,Xi - 2 , ' " , X i - p )' , i = 1,2, · · · . Let 11 j , j = 1,2 , be open subsets of IRq, IRr, respectively, where p, q, r, ar e known positive int eger. Set 11 := 11 1 x 11 2 , m = q + r . Let f1 and (J be known functions , resp ectively, from IRP x 11 1 to IR, and IRP x 11 2 to IR+ := (0,00), both measurable in the first p coordinates. In the ARCH mod els of interest one observes a process {Xi ,i ~ 1- p} such that for some 0 E 11 1 , {3 E 11 2 , (8.3.1)
Xi
= f1(Y i - 1 , 0) + (J(Y i - 1 , {3) ei ,
i
~
1,
where th e err ors [s. , i ~ I} ar e independent of Yo, and st andard i.i.d. F r .v.'s. Just as in th e previous sect ion , th e focus of this chapter is to show how th e weak convergence results of certain basic randomly weighted empirical proc esses can be used to obtain the asymptotic distributions of various estima tors of 0 in a unified fashion . To proc eed further we need to make the following basic mod el smoothness assumptions about th e functions f1 and (J , ana logous to (8.2.2) . There exist functions it and a , respectively, from IRP x 11 1 to IRq and IRP x 11 2 to IRr, both measur able in the first p coordinate s, such that for every k < 00, , (8.3.2)
(8.3.3)
where the supremum is taken over 1 :S i :S n , n 1 / 211t - 811 :S k . To define analogues of M- and m.d.- estimators of 0 , we need to introdu ce the following scores. Write t := (t~ , t~)' E 11 := 11 1 x 11 2, Let
and let R i t denote the rank of ci(t) among {Cj(t) ; 1 :S j :S n} , 1 :S i :S n .
382
8. Nonlinear Autoregression
Define, for t E
(21
x
(22 ,
M(t)
._
n- 1/2
t
Z(u; t)
._
n- 1 / 2
t
(8.3.4)
iJ,(Yi-1, t1) ¢(ci(t)) , .t=l £7(Y i - 1, t2) iJ,(Yi - 1 , t 1 ) tin; :S nu), .t=l £7(Y i - 1 , t2)
Z(u ; t)
.-
Z(u; t) - n- 1 / 2
K(t)
.-
1
t
iJ,(Yi - 1 , td u, 0 :S u :S 1, .t=1 £7(Y i - 1 , t 2)
1
IIZ(u; tl , t 2 )112 L(du),
where ¢ and L are as in (8.2.3) . Some times we shall write M(t 1 , t 2 ) etc . for M(t) , etc. Note that the above scores are the analogues of the scores defined at (8.2 .3). Now let /3 be a preliminary n 1 / 2 - consistent estimator of {3 . Based on /3, analogues of M- and m.d .- estimators of a are defined, akin to (8.2.4), by the relation
0: := argmintl EfllIIM(tl,/3)II , O:md := argmintlEfllK(tl ,/3) .
(8.3.5)
This is motivated by noting that (8.3 .1) is equivalent to
which in turn can be approximated by
X;j£7(Yi _ l , /3) : : : : /-l(Y i -
1,
a)/£7(Y i -
1 , /3)
+ Ci ·
This can be thought as a nonlinear AR model with homoscedastic errors and hence th e above definitions. A way to obtain a preliminary n 1/2 - consistent estimator of {3 is to proceed as follows. First, estimate a in (8.3.1) by a preliminary consistent estimator O:p which only considers the nonlinear autoregressive structure of (8.3.1) but does not take into account the heteroscedasticity of the model. Next , use O:p to construct an estimator /3 of the parameter {3. More precisely, let K, be a nondecreasing real valued function on lit Define, for t E IRm
,
n
.-
n- 1 / 2
L i=1
iJ,(Yi -
1 , t 1)
(Xi - /-l(Y
i- 1,
td) ,
8.3.2 Main results in ARCH model
383
A preliminary least squares estimator of a is defined by the relation (8.3.6) Its consistency is assured because under (8.3.1) , E[ll(a)]
= O.
Next, let K, be such that E{CIK,(cd} = 1. This condition is satisfied, for example, when either K, is the identity function because Ec 2 = 1, or when K, is the score function for location of the maximum likelihood estimator at the error distribution F . Since E[M s (a ,,8)] = 0, an M-estimator of,8 is defined by the relation (8.3.7) In the next section we establish the asymptotic distributions of all of the above estimators.
8.3.2
Main Results in ARCH models
To begin with we shall state some additional assumptions needed to obtain the limiting distribution of these estimators. For the sake of brevity, write for tIE IRq , t2 E IRr , 1 :S: i :S: n ,
P,(Yi -
l ,
a
+ n- I / 2 td
(J(Y i -
I,,8)
I,,8 + dY i - I , ,8)
(J(Y i -
n- I / 2 t 2 )
. ( )._ JL. (Y i-I, a + n -1 /2t I ) JLni t l . (Y, (:I) , (J
, - I, IJ
. .( ) ._ U(Yi _I,,8+n-I /2t2) t 2 .(Y,,-I,IJ (:I) , (J
Un>
Uni(t 2 ) (Jni(t2) . Note that Uni(O) = U(Y i - I,,8)/(J(Yi - I ,,8), (Jni(O) := 1. In the sequel, iLi ' P,i , Ui, r, will stand for iLni(O), P,i(O) , Uni(O), rni(O), respectively, as they also do not depend on n . Also, let itni,j and iti,j , respectively, denote the lh co-ordinate of iLni and iLi, 1 :S: j :S: p. All expectations and probabilities below depend on (J := (a', ,8')" but this dependence is not exhibited for the sake of convenience. We now state additional assumptions. (8.3.8)
There exist positive definite matrices A, matrix I', all possibly depending on n
n- l
L i=l
(J,
t , :1\1:, and a
such that
n
iLiiL;
= A + op(l) ,
n- l
L i=l
UiU;
= t + op(l) ,
8. Nonlinear Autoregression
384 n
n- 1 L Ji(Yi-1, 0:)jL(Yi -
1 , 0:)'
=
M + op(I),
i= l
n
n- 1 L Jiio-~
= r + op(I) .
i=l
(8.3.10)
n . (t) 4 n- 1 LE(J.Lni,j 1 ) = 0(1) , VI S:} S:P, t E ]Rm . i=l ani(tz) max n- 1 / z (IIJiill + 110-;11) = op(I) .
(8.3.11)
z 1 n- L E{ llJini(td - Jiill
(8.3.9)
lS'Sn
n
i=l
o-;lI Z } = 0(1),
+IIo-ni(tz) -
t E ]RID.
n
(8.3.12)
n-
1 Z /
L
{IIJini(td - Jiill
i=l
o-ill}
+1!o-ni(t Z) (8.3.13)
For everyt E
]Rm ,
= Op(1),
t E ]RID .
1 S:} S: p ,
n1 /ZE[n-1 t{Jini,j(td}Z i=l ani(tZ) x {!J.lni,j(t1) - J.li ,jl + lani(tz) (8.3.14)
V E > 0, :3 a 8 > 0, and an N
< 00,
-11}f = 0(1).
3 V
°<
b
< 00,
V lisII S: b, Vn > N,
p(
n- 1/ Z t
sup IIt-sll <6
II Jini(td
i=l ani(tZ)
- Jini(sd ani(sz)
II < E) > 1 - E. -
-
Many of the above assumptions are th e analogues of the assumptions (8.2.6) - (8.2.10) needed for the AR models . Attention should be paid to th e difference between the Jini here and th e one appearing in the previous section due to th e presence of the conditional standard heteroscedasticity in the ARCH model. A relatively easily verifiable sufficient condition for (8.3.13) is th e following: For every 1 S: } S: P, t E ]Rm , (8.3.15)
n- 1/ Z t
i=l
E [{ itni'i(~t;) an, Z
r
{1J.lni,j (td - J.li ,jIZ
+!ani(tz) -
liZ}] = 0(1).
Note also that if the und erlying process is stationary and ergodic, then un-
8.3.2 Main results in ARCH model
385
der appropriate moment conditions, (8.3.8) - (8.3.10), are a priori satisfied. The first two theorems below give the asymptotic behavior of the preliminary estimators Q p and i3 of (8.3.6) and (8.3.7) , respectively. To state the first result we need to introduce n
tc;
:=
n- 1 / 2
L jJ,(Y
i- l ,
a){ (i(Y
i-
1,
f3) - 1
}€i'
i=1
Theorem 8.3.1 Suppose that the model assumptions {8.3.1}, {8.3.2}, and {8.3.3} hold and {8.3.8} holds. In addition, suppose the following holds: There exist a real matrix-valued function M on jRP x n1 such that V k < 00 , SI
E
n1 , n
(8.3.17) ~>arEIIM(Yi-I' a)11
= 0(1), IIn- 1 L
-
M(Y i -
1,
a) €ill
where the supremum in {8.3.16} is over 1 ::; i ::; n, n 1/211tl Then , for every
= op(l),
i=1
°<
b<
sIll::; k .
00,
The proof of this theorem is routine. As an immediate consequenc e we have the following corollary. Corollary 8.3.1 In addition to the assumptions of Theorem 8.3.1, assume that (8.3.18)
(b)
II1l nll = Op(l) .
Then,
The additional random vector 1l n coming into picture is identically zero when a == 1. Under additional smoothness assumptions on the function u, we can use any other preliminary n 1/ 2-consistent estimator of a . For example, let ¢> be a nondecreasing score function on jR such that E{¢>(c€)} = 0, for every c > 0. This is satisfied for example when ¢> is skew-symmetric and
386
8. Nonlinear A utoregression
s is symm et rically distribut ed . In this case, a preliminary estimator of a can be defined as n
a :=
argm int EOllin-I /2
L jt(Y i- l , t )> (X i -
J.L(Y i - 1 , t ))II·
i= 1
The next t heorem gives a similar lineari ty resul t about th e scores M s. Its proo f uses usual Taylor expansion and hence is not given here.
Theorem 8.3.2 Suppose that the assumptions (8.3.1), (8.3.2), and (8.3.3) hold. In addition , suppose the following hold. Th e fun ction K, is norulecreasing, twice different iable and satisfies: (i)
J
XK, (x )F (dx ) = 1,
(ii)
J
x 2 1k,(x )lF (dx)
< 00,
(iii) the second derivat ive of K, is bounded. Th ere exist a matrix-valued fun cti ons k < 00 , 82 E 11 2,
R
on
x l"h , such that for ever y
jRP
n
(8.3.20)
~>~ E II R(Yi - l , j3)1I
= 0(1), IIn- 1 LR(Y i - 1 , a ) c i li = op(1).
-
~l
where the sup remum in (8.3.19) is over 1 :S i :S n , n1/211 t2 - 8211 :S k . Then, for every 0 < b < 00, sup IIMs( O + n -
II t ll9
[J + [J +
1
/
2t) - M s(O )
J +J
+
K,(x) F(dx)
XK, (x )F (dx )
x k,(x ) F (dX)] x
2k,(x
r
l
) F (dX)]
t1
I:t211 = op(1).
Consequent ly, we have t he following corollary.
Corollary 8.3.2 In additi on to the assumptions of Theorem 8.3.2, assume that J K, (x)F(dx) = 0 = J x k,(x )F(dx) and that
Iln1/ 2 (,8 -
(8.3.21)
13)11 = Op(1).
Th en,
[J
XK,(x) F( dx)
+
J
x 2k,(x) F (dX)] n 1 / 2 (,8 . -1
=:E
-
13 )
M s(O ) + op(l) .
8.3.2 Main results in ARCH model
387
Note that the asymptotic distribution of 13 does not depend on the preliminary estimator op used in defining 13. Also, the conditions (i)-(iii) and those of the above corollary involving", are satisfied by ",(x) == x, because Ec 2 = 1. Again, if the underlying process is stationary and ergodic then (8.3.17) and (8.3.20) will be typically satisfied under appropriate moment conditions. Now, we address the problem of obtaining the limiting distributions of the estimators defined at (8.3.5) . The first ingredient needed is the AUL property of the M score and the ULAQ of the score K . The following lemma is basic to proving these results when the underlying functions 'IjJ and L are not smooth. Its role here is similar to that of its analogue given by the Lemma 8.2.1 in the AR models . Let, for t = (t~ , t~)' , t 1 E IRq , t2 E IRr , m = q + r , and x E IR,
~ J1,ni(td -1) .(t) I(c . < x + x (IJnt·(t?) ~
W(x , t) := n- 1 / 2 L
i=l
t
IJn t
-
2
+(J.lni(td - J.li)) , v(x , t) := n- 1 / 2
. (t) J.Lni(t 1) F(x + X(IJni (t 2) - 1)
Ln i=l
IJnt
2
+(J.lni(t 1 )
J.li))'
-
W(x , t) := W(x, t) - v( x , t) , W*(x , t) := n- 1 / 2
t
i =l
J1,ni(td [I(c i IJni(t 2)
~ x) -
F(x)] .
The basic result needed is given in the following lemma whose proof appears later. Lemma 8.3.1 Suppose the assumptions (8.3.1)- (8.3.3}, (8.3.8}-(8.3.14) hold and that the error d.f. F satisfies (2.2 .49), (2.2. 50} and {2.2.51}. Then, [or every 0 < b < 00,
= u p (l ). W(x, 0)11 = u p (l ).
(8.3.22)
IIW(x , t) - W*(x , t)1I
(8.3.23)
IIW(x, t) -
(8.3.24)
IIW(X,t) - W(x , 0) - n- 1 / 2
t
i= l
{j(x)A t 1
+ x j (x )r' t 2 }
[itn i(t d - J1,i] IJn i(t 2 )
F(X)II = u (l ), p
388
8. Nonlinear Autoregression
where u p (l ) is a sequence of stochastic processes in x , t, converging to zero , uniformly over the set {x E JR,
litII ::; b} , in probability.
The claim (8.3.24) follows from (8.3.23) and the assumptions (8.3.8) (8.3.14), and the assumption that F satisfies (2.2.49), (2.2.50) and (2.2.51) . Note that the assumptions (8.3.8)-(8 .3.14) ensure that for every 0 < b < 00 ,
The proofs of these two claims are routine and left out for an interested reader. The proofs of (8.3.22) and (8.3.23) appear in the last section as a consequence of Theorem 2.2.5. The next result gives the AUL result for M-scores. Theorem 8.3.3 Und er th e assumption (8.3.1)- (8.3.3) , (2.2.49) , (2.2. 50} , (2.2.51) with G replac ed by F , and (8.3.8}-(8 .3.14), for every 0 < b < 00 , and fo r ever y bounded nondecreasing 'ljJ with J 'ljJdF = 0, sup IIM(9 + n- 1 / 2 t ) - M(9)
Iltll:::;b
- ( I fd'ljJAt l
+
1
X f( X)d'ljJ(X)rt 2) II =op(l).
This theorem follows from the Lemma 8.3.1 in the sam e way as Th eorem 8.2.1 from Lemma 8.2.1, using the relation I[W(x , t) - W(x , 0)] d'ljJ(x)
== n- 1 / 2
t
i=l
ni (t [itO'ni(t2) 1
) -
iti] 'ljJ (00 ) - [M(9
+ n- 1/ 2t)
- M(9)] .
Next, we have the following immediate corollary. Corollary 8.3.3 In addition to the assumptions of Theorem 8.3.3, assume that J fd'ljJ > 0, (8.3.21) holds, and that (8.3.25)
389
8.3.2 Main results in ARCH model Then ,
!
(8.3.26)
fd'ljJn1/ 2(&. - 0)
_A- 1 [M(O)
=
+ rn1/2(~ -
{3)
!
Xf(X)d'ljJ(X)] +op(l) .
From this corollary it is apparent that the asymptotic distribution of &. depends on the preliminary estimator of the scale parameter in gen-
eral. However, if either f xf(x)d'ljJ(x) = 0 or if r = 0 , then the second term in the right hand side of (8.3.26) disappears and the preliminary estimation of the scale parameter has no effect on the asymptotic distribution of the estimation of o . Also, in this case , the asymptotic distribution of &. is the same as that of an M-estimator of 0 for the model Xii o-(Yi-1 , (3) = P,(Yi- 1 , 0'.) /o-(Y i- 1, (3) +ci with {3 known. We summarize this in the following Corollary 8.3.4 In addition to the assumptions of Corollary 8.3.3, suppose either f xf (x)d'ljJ (x) = 0, or T = O. Then,
where v('ljJ , F) is as in (1.3.4). A sufficient condition for f xf (x)d'ljJ (x) = 0 is that f is symmetric and 'ljJ skew symmetric, i.e., f( -x) = f( x) , 'ljJ (-x) = - 'ljJ(x) , for every x E JR. To give the analogous results for the process K and the corresponding minimum distance estimator &.md based on ranks, we need to introduce n
2
L
Z(u)
.-
n- 1 /
q(u)
.-
f(F- 1(u)) ,
[iti
-lL] {I(G(c i) :S u) -
u} ,
lL :=
i=1
n
n-
1
L iti ' i=1
s(u):= F- 1(u)f(F-1(u)),
u E [0,1].
Theorem 8.3.4 In addition to the assumptions (8.3.1) - (8.3 .3) , (2.2.49), (2.2.50), (2.2.51), and (8.3.8) - (8.3.14), suppose that for some positive definite matrix D(0) , n
n-
1
L i=1
[iti
-lL] [iti - Ii]
I
= D(O)
+ op(l) .
8. Nonlinear Autoregression
390 Then, for every
°<
b
1I~~~b Iqt) -
< 00 ,
JIIZ(u) +
2
{q(u)A(O) t 1
+ S(U)r(O) t 2 }11 L(du)!
= op(l). Moreover, if {8.3.21} holds and if
n 1 / 2 (a m d
-
Iln 1 / 2 (a m d
qdL
fol
n 1 / 2 (a m d wh ere
0"5
-1
s(u)dL(u) -
a)1I
= Op(l) , th en
a)
-(J A) [J
Additionally, if either
-
a)
= 0,
-td
Z(u)q(u)L(du)
or if T = 0, then
N (0 ,0"5 A - I D A - I ) ,
is as in {5.6.21} and {5.6.15}.
This theorem follows from Lemm a 8.3.1 in a similar fashion as do Theorem 8.2.2 and Coroll ary 8.2 .3 from Lemma 8.2.1. Note that f symmetric around 1 zero and L(u) -L(l - u) implie s that fo s(u)dL(u) = 0. Next, we sh all state an an alogue of th e Theorem 8.2.2 for sequential weighted emp irical proc esses suitabl e here. Accordingly, let h ni be as before and independent of Ci , 1 ::; i ::; n . Define, for at' = (t~ , t;), t l E IRq, t2 E
=
IRT , S(x , t, u) := n- I / 2
[nul
L
hnJ(Ci ::; X + X(O"ni (t2 ) - 1)
;=1
+(fl.ni(td - fl.i)) , [nul
fl.( X, t , u) := n- I / 2
L
hniF(X + X(O"ni(t2) - 1)
i= 1
+(fl.ni(t l) - fl.i)) ,
S(x , t , u) := S( x , t , u) - fl.( X, t , u),
X E IR, u E [0 ,1].
The next result is th e analogue of Th eorem 8.2.3 suitable for th e curre nt ARCH mod els.
391
8.3.2 Main results in ARCH model
Theorem 8.3.5 Suppose the assumptions (8.3 .1) - (8.3.3}, (2.2.49), (2.2. 50}, (2.2.51) with G replaced by F, (2.2.52}, and (8.3.10) hold. In addition, suppose the following hold. n
(8.3.27)
n- 1
L:Eh;'i (11iti1l
2
+ 110-;11 2 )
= 0(1) .
i=1
Then, for every
(8.3.29)
°<
b<
00 ,
sup IS(x , t , u) - S(x , 0 , u)1 = op(I), x ,t, u
(8.3.30)
sup !S(X, t , u) - S(x, 0, u) x ,t,u
~~
_n - 1 ~ h ni {0-~t2xf(x) +
it~td(x)} I = op(I) ,
°
where the supremum is taken over x E IR,lItll :S b, :S u :S 1.
The proof of this theorem is similar to that of Theorem 8.2.3. No details will be given . Because of the importance of the residual empirical proc esses, we give an AUL result for it obtainable from th e above theorem . Accordingly, let 0 , !3 be any n 1 / 2-consistent estimators of a , (3, and let Fn(x , u), Fn(x , u) denote, respectively, the sequential empiricals of the residuals Ei := (Xi /1(Yi-l ,O))/IY(Yi-I ,!3) , and the errors Ei, 1 :S i:S n , i.e., for x E IR,O :S
u:s 1,
[nul
Fn(x ,u)
.-
Fn(x,u)
:=
n-1L:I(Xi:SXIY(Yi-I,!3)+/1(Yi-I ,O)), i=1 [nul
n - 1L:I(Xi:SxIY(Yi-I ,(3)+/1(Yi- 1, a )) . i =1
Then upon specializing th e above theorem to the case when h ni == 1, we obtain the following corollary. In its statement the assumption about time series being stationary and ergodic is made for the sake of transparency of the statement. Corollary 8.3.5 Suppose the assumptions (8.3.1) , (8.3.2) and (8.3.3) hold and that the underlying time series is stationary and ergodic. In addition,
392
8. Nonlinear Autoregression
suppose the error d.f. F has a positive bounded density f such that f (F- 1 ) is uniformly continuous on [0,1) and satisfies (F3); Ilitlll, II(hll are square integrable; and
Then, sup xE IR ,O:S;u:S;l
In
1 2 /
[Fn(x, u) - Fn(x , u)] - u{
n
1 2 / (&
-
0)' E(itl)f(x)
+n 1/ 2(j3 -{3)' E(c71)Xf (X)}
1= op(l),
This corollary may be used to obtain the limiting distributions of some tests of fit or for some tests of a change point in th e errors of an ARCH mod el in a fashion similar to AR models . We now begin to give proofs of some of the above results. Recall Theorem 2.2.5.This theorem is not enough to cover th e cases where th e weights b-: and th e disturbances Tni, <5 ni are functions of certain parameters and where one desires to obtain various approximations uniformly in these param eters, as is th e case in Lemma 8.3.1. The next result thus gives the need ed ext ension of thi s theorem to cover these cases. Accordingly, let (ni be as in (2.2.48), with F denoting its d.f., and let lni' Vni , Uni be measurable functions from JRffi to JR such that for every t E JRffi , (lnj (t) , Vnj (t) , Unj (t)), 1 ~ j ~ i, are independent of (ni, for each 1 ~ i ~ n . Let ti; En stand for the underlying probability and expectation. Let, for x E JR, t E JRffi , n
(8.3.31)
V( x ,t) ;= n - 1 / 2
L
lni(t)I((ni ~X+XVni(t)+Uni(t)),
i =l
n
Ji» , t)
;=
n-
1 2 /
L
lni(t) F( x
+ XVni(t ) + uni(t)),
i= l
U( X, t)
;=
V(x , t) - J"(x, t) , n
U*(x , t)
1 2
:= n- /
L lni(t) [I((ni ~ x ) -
F( x)] .
i=l
To state the needed result we need the following assumptions. For some positive random pro cess €(t) and for each t E JRffi , (8.3.32)
8.3.2 Main results in ARCH model
393
max n-1 /Zllni(t)1 = op(l) , l:::;, :::;n max {IVni(t)1 + IUni(t)l} = op(l), l:::;,:::;n
(8.3 .33) (8.3 .34)
t,1;;;(t){lun;(t ll + Ivn;(tnr
n' /2En [n-'
(8.3 .35)
~ 0(1),
n
(8.3.36)
n-
1 Z /
L
Ilni(t)1 [Ivni(t)1 + IU ni(t)1]
= Op(l).
i=l
V E > 0, 3 <5
(8.3.37)
Vllsll
~ b,
r; (n-
1 Z /
+ (8.3 .38)
> 0, and an n1 Vn > nl, Ilni(s)1 {
sup
II t-sll<<5 <5
b,Vn
IVni(t) - vni(s)1
IUni(t) - uni(s)l}
> 0, and an nz <
~ E)
00 ,3,
> 1-
E.
VO < b < 00,
> nz ,
r; ( sup n- 1/ Z IIt-sll <<5
sup
< b < 00 ,
IIt-sll <<5
i= l
VE > 0, 3 a
Vilsil ~
t
3 V0
t
Ilni(t) - lni(s)1
i=l
~ E)
>
1-
E.
The following lemma gives the needed result. Lemma 8.3.2 Under the above setup and under the assumptions {8.3.2}, {8.3:3}, {2.2.49}, {2.2.50}, {2.2.51} with G replaced by F , and {8.3.32} {8.3.38}, for every 0 < b < 00 , (8.3.39)
sup
IU(x, t) - U*(x, t)1
xEIR ,lItll9
=
op(l) .
Proof. The proof uses Theorem 2.2 .5. Fix a 0 < b < in (2.2.48) , we take
00.
Observe that if
(8.3.40)
then, Un( x) and U~(x) are equal to U(x, t) and U*(x , t) , respectively, for all x E ~, t E IRm . Clearly the assumption (8.3.32)-(8.3.36) imply (2.2.52)(2.2.55) for each fixed t . Note that in this application one takes
A n1 .-
0' -
An i
0'-field{(n1 ,' " ,(ni-1 i(lnj(t),V nj(t),Unj(t)) ,1 ~ j ~i} ,
.-
field{(ln1(t) ,Vn1(t),Un1(t))} ,
8. Nonlinear Autoregression
394 for 2 SiS n . Hence, (2.2.57) implies that (8.3.41)
sup IU(x , t) - U*(x, t)1 = op(l) ,
t E IRm .
x EIR
To complete the proof of (8.3.39), in view of the compactness of the ball IItll S b, it suffices to show that '<:I E > 0, :3 a 6" > 0, and an no < 00, 3 '<:I liS II S b, (8.3.42)
p (
IV(x , t) - V(x , s)1
sup x EIR ,lIt - sIl9
~ E) S E,
n
> no,
where V := U- U*. Details that follow are similar to the proof of Lemma 8.2.1. For convenience , write for 1 SiS n , t E IRm , x E IR,
13i(x, t)
:=
I(ci S x
+ XVni(t ) + Uni(t)) -F(x
13i(:r ) .-
+ XVn i(t) + 'Uni(t)) ,
I(ci S x) - F(x) .
Th en n
ui«. t)
= n-
1 2 /
L Ini(t)13i(x, t),
n
U*(x , t)
= n-
1 2 /
i =1
L Ini(t)13i(x) , i=1
and V( X, t) - V(x , s) n
n- 1 / 2
L
[lni(t) -lni(s)) [13i(x , t) - 13i(x ))
i=1
n
+ n- 1 / 2
L
lni(s) [13i(x, t) - 13i(x , s))
i=1
It thus suffices to prove the analog of (8.3.42) for VI, '0 2 . Consider VI first. Note that because 13i (x , t) - 13i (x ) are uniformly bounded by 1, we obtain n
IVdx, s , t)1 S
n- 1 / 2
L
11ni(t) -lni(s)
I·
i= 1
This and th e assumption (8.3.38) th en readily verifies (8.3.42) for VI .
8.3.2 Main result s in ARCH model Now consider V 2 . For a 8
395
> 0, and S fixed , let
sup
IVni(t ) - vni(s)l,
sup
IUni(t ) - uni(s)l,
II t - sll
{n-
1 !2
t
1
s: is: n ,
11ni(s )1 [dni(s) + Cni(S)]
°
By (8.3.37), for an e > 0, t here exists a 8
> and an
n1
s: €} .
< 00, such t hat
(8.3.43) Next , writ e lni = l~i - l;;i and V 2 = to V 2 with i: replaced by l; i' Let
vt -
V :; , where
vt
correspond
Dt (x , s , a) n
'-
n- 1! 2
L
l;i(S) [I (ci ~ x
+ X{Vni(S) + adni( s)}
i= 1
+Uni(S) + aCni(s ))
- F( x + X{Vni(S) + adni(s) } +Uni(S) + aCni(s) ) ], Arguing as for (8.3.41), verify t hat hni == l; i(s ), Tni == vni( s) +adni(s), 8ni == Uni (S) + aCni(s) satisfy th e conditions of Theorem 2.2.5. Hence, one mor e ap plication of (2.2.57) yields that for each S E JRm , a E JR, (8.3.44)
sup \Dt(x , s , a) - Dt (x , s , 0)1 = op(l) . x ER
Now, sup pose x > 0. Then , again using monotonicity of th e indi cator function and G , we obtain that on En, for all lit - sil < 8, t , s E JRm ,
Ivt (x,s , t)1 < IDt (x ,s, 1) - Dt (x ,s,O)1 + IDt (x ,s, -1 ) - Dt(x ,s ,O)1 n
+ n - 1! 2
L
l; i(s )
[F( x + X(Vni( S) + dni (s ))
i= l
+Uni(S) + Cni (S))
- F( x + X(Vni( S) -
dni(s ))
+Uni(S) - Cni(S)) ] .
396
8. Nonlinear Autoregression
Again , und er the conditions (2.2.49}-(2.2.51) and (8.3 .37) , there exist s a 8 > 0 such that the last term in this upper bound is Op(€} , while the first two terms are op(l} , by (8.3.44). Thi s completes the proof of (8.3.42) for D 2 in the case x > O. The proof is similar in the case x ::; O. Hence the proof of (8.3.39) is comp lete . 0 Proof of Lem ma 8.3.1. First, consider (8.3.22) . Let m = q + rand writ e t = (t~ , t~)' , t 1 E IRq, t2 E IRr and let W j , W; etc. denot e the jth coordina te of W , W · , etc. Take (8.3.45) (8.3.46)
in U to see that now U equals Wj . Thus (8.3.22) will follow from (8.3.39) once we verify th e condit ions of Lemma 8.3.2 for th e quantities in (8.3.45) for each 1 ::; j ::; q. To t hat effect , note t hat by (8.3.2) and (8.3.3), 'V € > 0, :3 N, such that "In > N , (8.3.47)
p(max{sup IJlni (t
1} -
Jli -
t ,tl
sup IOni(t 2} - 1 't ,t 2
n-l/2t~itil ,
n-l/2t~ui l } ::; bm-
1
/
2) 2: 1 - e,
where, here and in th e sequel, i , t 1 , t 2 in the supremum var y over th e range 1 ::; i ::; n, lI tl l ::; b, t 1 E IRq , t 2 E IRr , unless specified otherwise. From (8.3.47) and th e assumption (8.3.1O) we obtain that (8.3.48)
T his verifies (8.3 .34) for t he follows from (8.3.9). Next , let
Vni, Uni
of (8.3.45) . The cond ition (8.3.32)
=
n- 1 / 2(8I1itill + 2b€} ,
bn := max o-« :
Cn i
-
n- 1 / 2(8 1Iudl + 2b€} ,
Cn
Zni
-
bn-
Wni
=
bn- 1 / 2(lI it i li
bn i
1
/
2(llu dl + e),
+ e),
l ~ i ~n
:= max
Cn i ;
max
Zn i;
l ~i ~n
Zn := Wn
l ~ i ~n
:= max
l ~i ~n
Wn i .
8.3.2 Main results in ARCH model
397
Note that by (8.3.10), bn = op(l) = Zn = :=
en
Wn.
Now let , for an Iisil ~ b,
sup IJlni (td - Jlni (sdl II t, - s,ll
{
~
b-« ,
sup IO"ni (t2) - O"ni (s2)1 ~ Cni, II t2- s2II
I-Ldl ~ Wni ,
sup IO"ni (t 2 ) II t 211 :S b
11
-
~
Zni, 1
~i~
n; Zn
~
1/2} .
From (8.3.47) and (8.3.10) , t here exists an N 1 such that (8.3.49)
m ax ,
litni,j((t)d l ~ ( 1 - Zn)- 1 max IJlni ' ,j ()I t1 , O"ni
t2
Moreover , for all 1
~
'
j
~
1 ~ j ~ q.
q, t 1 E ffi.q ,
These fact s to gether with (8.3.10) and (8.3.11) impl y that (8.3.33) is sat isfied by t he l ni of (8.3 .45) . Using t he definit ions of (8.3.45) , one sees t hat on Cn , n
n- 1/ 2
L
Iln i(s)1 {l vni (s)1 + jUni(s )l}
;= 1
< 2n - 1/ 2 L~
i= 1
litni ,j (s1 )1 [ . + W .J ( ) Zn, n, a -a s 2 n
< 2b(1 -
Zn )- 1n - 1
L
litni,j (s d l [210
+ Ilit i ll + lIo-dl]
;= 1
Op(l) , for each 1 ~ j ~ q, Ii sil ~ b, by (8.3.8) and (8.3 .11) . This verifies (8.3.36) for t he ent it ies given at (8.3.45) . Also, (8.3.35) follows directly from t he
8. Nonlinear Autoregression
398 assumpt ion (8.3.13) . Finally, again on
n- 1/ 2
en ,
n
L
Ilni(S)l {
i=1
IVni(t) - vni(s)1
sup IIt-sll <6
+ < 2n - 1/2
i:
IUni(t) - uni(s)l}
sup IIt- sll<6
liLni,j(sdl { sup IJtni(td - Jtni(sdl i=1 (ini(S2) II t l-sdl <6
+
l(ini(t 2 )
sup
II t 2- s211 <6
-
(ini(S2)1}
n
< 2(1 - Zn )- ln- l/2
L
litni,j(sdl[b ni + eni]
i= 1
n
< 2(1- zn )- ln- l
L
liLni,j(sl)l{o[lIitill
+ lI&dll + 2bE}.
i= 1
In view of (8.3.8) , th is bound ca n be seen to be Op(6) + Op(E), thereby verifying t he condition (8.3.37) for th e enti ties at (8.3.45) , where as (8.3.38) is impli ed by (8.3.14). Thi s completes th e pro of of (8.3.22). To prove (8.3.23) , writ e
+ U(x, t ),
W( x , t) - W (x , 0) = W( x , t) - W*( x , t)
U( x, t) -= W * ( x , t) - W( x, 0)
= n - 1/2 ~ W . 1=1
1) [it ni(t ( ) - -it i] (3i ( X ) • (ini t 2 a,
Thus it suffices to prove (8.3.50)
IIU(x , t)11
sup
xEIR,lI t ll::::b
= op(I) .
Fix a 1 ::; j ::; q, t E IRm and now let h . - ilni,j(td _ iLi,j nt -
Again write h ni rewri tten as
=
ut - U
(ini(t2)
(ii .
i: -
j- ,
h;;i ' so th at t he j t h component of U can be where n
U± t).'= n - 1 / 2 J (x ,
'~ "'
h±.(3 n t ·(x) 1,
•
i= 1
Becau se h ni is independent of Ci , 1 ::; i ::; n and past measurable, usin g a condit ioning argument and (8.3.11) , one obtains th at n
(8.3.51)
V ar { Uj=(x , t)}
<
n-
1
L E{h~;} 2 = 0(1) . i= 1
8.3.3 Examples of ARCH models
399
Next , fix an E > 0 and let -00 = X o < Xl , .. . < X r = 00 be a partition of JR such that G( Xk) - G( xk-d ::; E, k = 1, ' . . , T . Then , again using the monotonicity of th e indic ator and G , we have n
sup IU; (x , t )1::; 2 max IU;( Xk ' t )1 1 ::;k::; r
xE IR
+
l 2 E n- /
L Ihn;i · i= l
This, (8.3.51) , (8.3.12), and th e arbitrariness of E implies sup IIU(x, t )1I = op(l),
'Itt E JRffi .
x EIR
To finish t he proof of (8.3.50), we need to prove t hat an ana log of (8.3.42) holds for th e U (x, t) -processes. But this is implied by (8.3.14) , becaus e
This completes t he proof of (8.3.23) and hence of t he Lemm a 8.3.1.
8.3.3
0
Examples of ARCH model s
We shall now discuss some details for verifying the genera l conditions of the previous sectio n in t hree examples. Example 8.3 .1 (ARCH MODE L).
In the ARCH mod el of En gle (1982) , one observes {Zi, 1 - P ::; i ::; n} such that for some ero > 0, erj 2: 0, j = 1, ' " , p, (8.3.52) are i.i.d. with mean zero and variance 1. Squaring both sides and writing
rd -1 , Xi = Zl , Y i - l = [Xi- I , ' " , X i- p]' = [Zl-l " " , Zl- p]', and W ;_l = [1, Y;-l], er = [ero, . .. ,erpJ', th e above model can be rewritten as
Ci :=
(8.3.53)
X i = a'W i - I + a'W i -
l Ci,
This is an exa mple of th e model (8.3.2) with a = (3, q = p + 1, T = P + 1, p (y , a) = cr'w = a (y , a ), w' = (1, y' ), Y E [0, oo)". Here, F denotes t he d.f. of t he error C := TJ2 - 1, assum ed to satisfy (2.2.49) , (2.2.50) and (2.2.51), and to have E c 2 = 1. Assum e t hat th e process {Zi; 1 - P ::; i } is stationary an d ergodic and EZ;; < 00 . In t he case p = 1, a sufficient condition for thi s to hold is
8. Nonlinear Autoregression
400
-E{log(aldn > 0, c.f. Nelson (1990) . For p ~ 1, Bougerol and Picard (1992, Theorem 1.3) give some sufficient conditions based on Lyapunov exponent. We shall now show that under no additional assumptions, the model (8.3.53) satisfies all the assumptions of the previous section except (8.3.9) and (8.3.13). These assumptions also hold provided, additionally, either aj > 0, for all j = 1, ' " , p, or EZg < 00. The assumptions (8.3.2) and (8.3.3), are readily seen to hold here with
From this we obtain 1 ::; i ::; n .
Observe that E(Z8)
< 00 and ao > 0, ak
~ 0, k = 1,2, '"
,p, imply that
which in turn, together with the stationarity of the process, implies that n- 1 / 2 max1~i ~n IIiJ.ill = op(l) = '1 - 1/ 2 max19 ~n Iluill , thereby ensuring the satisfaction of (8.3.10), and that of (8.3.8) with
WoW~ ] . A = E [ (a'W =:E = I', o)2
M=EWoW~ ,
Since, here () = (a' , a /)', the above expectations depend only on a . The same considerations and the linearity of f1 and (J in the parameters enables one to guarantee the satisfaction of the conditions (8.3 .11), (8.3.12), (8.3.16) , (8.3.17), (8.3.19) and (8.3.20) with M = 0, and
Next , observe that (8.3.54)
itni,j(td (Jni(t2 )
1
= (a
iJ.ni(td (Jni (t 2 )
j = 1,
+ n- 1/ 2t 2 )/W W i-
1
i_ 1'
2 ::; j ::; q;
1 ::; i ::; n.
From this, (8.3.14) is readily seen to hold, again using the D.C.T . and Ez8 < 00 .
401
8.3.3 Examples of ARCH models
The condition (8.3.13) for j = 1 in the current case is seen to hold trivially. To check it for 2 ::s j ::s q = p + 1, we consider the case when OJ-l > for all j = 2, · , · ,q and the case when OJ-l = for some j = 2,· . . , q, separately. In the first case we have the following fact: Va, b E JRP+l , n> - 1,
°
(8.3.55)
°
w
I-t
a'w/(o. + n- 1 / 2b)'w, wE [0, oo)p+l, is bounded.
Use this fact and (8.3.54) to obtain that for some k E (0,00), possibly depending on t,
<
j=2 ,· · ·,p+1.
zt
This bound together with the stationarity of {Z;} 's and E < 00 readily imply that (8.3.13) holds in the first case. In the second case a similar argument and EZg < 00 yields the satisfaction of (8.3.13). Finally, the condition (8.3.9) is verified similarly, using (8.3.54) and (8.3.55) . Note that the condition EZg < 00 is needed in verifying (8.3.13) and (8.3.9) only in the case when some OJ = 0, j = 1, ' " , p . Since here 0. = {3 , for estimation in this model, we use just a twostep procedure, i.e., use &p instead of i3 to define final &. Since in this case &p has the explicit expression (least squares estimator), it is easy to see that condition (8.3.18)(a) is guaranteed . Because H n is a sum of square integrable martingales differences of stationary and ergodic r.v .'s, ll n converges weakly to a normal r.v., and hence here (8.3.18)(b) is a priori satisfied. Moreover , because of the linearity of J.l in 0., the condition (8.3.25) is seen to be satisfied as in Section 5.5. Therefore, from Corollary 8.3.4, if J xj (x)d'IjJ (x) = 0, then n 1/ 2 (& - 0.) --+d N(o, ~(o.)), where
~(o.)
:=
(E [WoW~/(o.'Won)-l v('IjJ ,F) .
Now, recall from Weiss (1986) that under the stationarity of {X;} 's and the finite fourth moment assumption on the i.i.d . errors Ci, the asymptotic
8. Nonlinear Autoregression
402
distribution of the widely used quasi maximum likelihood estimator is as follows: nl /2(aqmle -
~qmle
:=
a)
---7d
aqml e
N(O, ~qmle),
(E [WoWb/(a 'Wo)2]) - 1 Var(e) .
Thus it follows that the asymptotic relative efficiency of an M- estimator a, relative to the widely used quasi maximum likelihood estimator in Engle'S ARCH model is exactly the same as that of the M- estimator relative to the widely used least squared estimator in the one sample location model or in the linear regression model. Fitting an error d.f, Consider the model (8.3.52) . This model is a special case of the general model (8.3.1) with p(y , a) == 0,
a(y , a) == (a ' w )I/ 2,
W'
= (l ,y') .
Let now F denote the common d.f. of 1]i having mean zero and unit variance and Fo be a known d.f. of a standard r .v. Consider the problem of testing Ho : F = Fo against the alternative that Ho is not true, where Fo is a known distribution function . Let a be n 1 / 2 _ consistent estimator of a and denote th e empirical d.f. of the residualsn, := Z;/(a'W i _d 1 / 2 , 1 ::; i ::; n . A natural test of H o is to reject it if D n := sup, n 1 / 2 IFn(x) - Fo(x)1 is large. The following corollary gives the asymptotic behavior of Fn . It is ob-
t:
tained from Theorem 8.3.5 upon taking See also Remark 2.2.5.
hni
== 1, Pi == 0, iL i == 0 in there.
Corollary 8.3.6 Suppose the process given by the model (8.3.52) is stationary, E1]2 < 00 , and the d.f. F satisfies (Fl), (F2) and (F3) . Moreover, suppose a is any estimator of a with n 1 / 2 11a - all = Op(l). Then sup
In
1 2 /
[Fn (x ) - F(x))
xE IR
n W' -n -1 "" L ( ' i-I . )1/2 n 1/2(' a - a ) x f( x . 2 aW'-1 ,=1
)1--
"» (1) .
This result is then useful in assessing the limiting behavior of D n under H o and any alternatives satisfying the assumed conditions. The conclusions here are thus similar to those in Section 6.4. For example, tests based in t; will not be ADF in general.
8.3.3 Examples of ARCH models
403
Example 8.3.2 (AR MODEL WIT H ARCH ERRORS) . Consider th e first ord er autoregressive mod el with het eroscedasti c errors where the conditiona l var ian ce of the i t h observation depends linearl y on t he past as follows:
(8.3.56)
i
~
1,
where, a E JR, (3 = (f3o, f3d' with 130 > 0 and 131 ~ 0, Zi-l = (1, Xf-d'· Here, now F denot es the d.f, of t he error e, assumed to satisfy (F l ), (F2 ) and (F 3) . This is an exa mple of t he model (8.3.1) with p = 1, q = p, r = p + 1, == X i-I , J.L(y , a) = o s), and u(y ,(3) = «(3'z)I / 2, z' = (1,y2), Y E JR. Throughout th e discussion of this exa mple, we assume EXt < 00, which in turn guar antees that E c: 4 < 00 . Y i- l
An addit iona l ass umpt ion needed on th e par am eters under which this model is stationa ry and ergodic is as follows: (8.3.57) This follows with t he help of Lemm a 3.1 of Har die and T sybakov (1997, p 227) up on taking 0 1 = [o] and O2 = {f3of3 d (f3o + f3 d P /2 = sup{ (130 + f3 1X2)1/2/ (1 + Ixi) ; x E JR} in t here, to conclude t hat under (8.3.57), t he process {Xi ;i ~ O} of (8.3.56) is stat ionary and ergodic. The assumpt ions (8.3.2) and (8.3.3) are readil y seen to hold with
Not e t ha t here iJ.il == X i - d (f3o + f31X f- l )I/2. Use t he boundedness of th e fun ction x ~ x/( f3o + f31x2)1/ 2 on [0, (0 ) when 131 > 0, E Xri < 00 , and t he stationarity and th e ergodicity of {X;} to verify (8.3.8), (8.3.9), and (8.3.10) here with 1\1: = E[XgJ, an d
A=E[~] (3'Zo '
t _
- E
[ZoZ~]
4 «(3'ZO )2 '
XoZ~ ] T = E [ 2((3'Zo )3/ 2 .
To verify (8.3.11), note t hat iJ.ni(t d-iJ.i == O. Hence, by t he stationarity of t he pr ocess the left hand side of (8.3.11) here equa ls
404
8. Nonlinear Autoregression
But, clearly the sequence of r.v .'s [{,8'Zo/(,8 + n-1 /Ztz)'ZoP / z - 1] is bounded and tends to 0, a.s. These facts together with the D.C.T. imply (8.3 .11) in this example. Next , to verify (8.3.12), note that the derivative of the function s H [x/(x + s)jl /z at s = 0 is -1/(2x) . Now, rewrite the left hand side of (8 .3.12) as
2~ t, ~~~~~~ IL~ + :"~:~: )'z,J'I' - 11 <
1
~ IIZi-llll{
2y'n ~ ,8'Zi-1
13' Zi- l }l/Z (,8 + n- 1/ Ztz)'Z i_1 1 Z -1+ n- / ,Z', -
tZ
l
I
2,8 Zi-1
+ 2- ~ lI~i_t1I Z? 4n ~ (,8 Z i-d~
Iitzil.
Because EIIZoliz < 00 , we have max1::;i::;n In- 1/ 2Z:_ 1tzl = op(l) . This and the stationarity implies that the first term tends to zero in probability. The Ergodic Theorem implies that the r.v.'s in th e second term converges in probability to E[IIZoli z /4(,8'Zo)zJ, thereby verifying (8.3.12) here . To verify (8 .3.13) we shall use (8.3.15) . The stationarity implies that in this example the left hand side of (8.3.15) is equal to (8 .3.58)
n
-1 /2E[
+n'I' E
X6 (t1XO)2] {(,8 + n-1 /Zt 2)'Zo}2 x (,8'Zo)
[{(~ + n-~~t,)'zo}' W~ + n;,~:t')'Zo
r-'1'] .
If 131 > 0, th en the expectation in the first term of (8.3.58) st ays bounded, as in this case the integrands are bounded uniformly in n . The sam e remains true under the additional assumption EXg < 00 , when 131 = O. In either case this shows that the first term in (8.3 .58) is O(n- 1 / 2 ) . To handle the second term , apply the mean value theorem to the function s H {(x + S)/X}1 /2 around s = 0, to obtain th at for some 0 < ~ < 1, the second term in (8.3.58) equals
n
=
-l /ZE [ X 6 (t~Zo)2] 4{(,8 + n- 1/ 2tz)'Zo}2,8'Zo (,8'Zo + ~n-l /Zt~Zo)
O(n- 1 / Z ) ,
405
8.3.3 Examples of ARCH models
by arg uing as for the first term of (8.3.58). Therefore, (8.3.58) , and hen ce the left hand side of (8.3.15) , is O(n- 1 / 2 ) , thereby verifying this condit ion. Becau se j.t(y, td == Y, a constant in tl , we see that the condition (8.3.14) readil y holds . Conditions (8.3.16) and (8.3.19) are satisfied with 1\1 == 0 and R(Yi- l ,S) == -Z i_lZ~_1(2Z~_ls)- 2 . Observe that here &p = 2:7=1 X iXi- I! 2:7=1X LI ' It is easy to see that in this exa mple n 1/2(&p - 0:) ==:::} N 1(0, a2 ,),2 ), where a2 = E{ X 5(,8o + ,81 X5 )}/{E (X5 )F, ')'2 := V ar( 1O) , thereby gua ranteeing th e satisfaction of (8.3 .18). The conditi on (8.3.25) is impli ed here by th e monotonicity of score function M (h , t 2) in ts , for every t 2 fixed . Therefore, to summarize, we obt ain th at if eit her d, > 0 and EX;: < 00, or ,81 = 0 and EXg < 00, and if eit her f x f (x )d'¢(x ) = 0 or reO) = 0 , t hen , n 1 / 2 (&- 0:) ~d N 1 (0, T 2 (0 )V('¢, F )) , where 2 T
1
(0 ) := E(X5/(,8o
+ ,81X 5)]'
Again, it follows that t he asy mptotic relative efficiency of t he M- estimat or corresponding to th e score fun ction ,¢, relative to th e least squa re est imato r , in t he above ARCH mod el is t he same as in t he one sample location or in the linear regression and aut oregressive models. Example 8.3 .3 (THRESHOLD ARCH MODEL). Consider t he p th order auto regr essive mod el with self exciti ng threshold heteroscedastic errors where th e condit iona l standard deviation of the i t h observation is piecewise linear on th e past as follows: For i 2: 1,
Xi
a'Y i -
1
+ { ,81Xi- lI(Xi- l > 0) - ,82 Xi-l I (Xi - 1 ~
0)
+ .. . + ,82P-l Xi - pI (X i-p > 0) - ,82pXi -p I (Xi- p ~ 0) } 1Oi, where all ,8j'S are positive. For det ails on the applicat ions and many probabilistic properti es of t his mod el an d for th e condit ions on t he stationarity and ergodicity, see Rab eman anj ar a and Zakoian (1993) . For a discussion on th e difficulties associate d with t he asympt ot ics of the robust estimat ion in this mod el, see Rab eman anj ar a and Zakoi an (1993, p 38) . This model is again an exa mple of the mod el (8.3.1) with q = p , r = 2p , p(y, a) = a'y , and p
p
j=1
j=1
aCy, {3) = L ,82j- l yj I(Y j 2: 0) + L ,82j(-Yj)I(Yj < 0) ,
406
8. Nonlinear Autoregression
for y E II~.P, {3 E (0,OO)2P . We shall now verify the assumptions (8.3.2) , (8.3 .3), (8.3.8) - (8.3.19) in this model. Define
[Xi-1I(X i- 1 > 0), -Xi-1I(Xi- 1 SO) " " , Xi-pI(X i - p > 0), -Xi-pI(Xi- p SO)]' . The assumptions (8.3.2) and (8.3 .3) are trivially satisfied with
Assuming {Xd 's are stationary and ergodic, (8.3.8) is satisfied with
Y oY b ] . [ w,w, ] [ ({3'W oF ' 1::(0) = E ({3'W O)2 '
A(O)
=E
. M(B)
, = E[Y oYo]'
r(0)
=E
[ YoWb ] ({3'W F o
.
Since the functions :r -7 X/(132j -lXI(x 2: 0) - ;3zj :l1 (x < 0)) are bounded, itij are bounded in this case, uniformly in i, j. Moreover, (8.3 .10) is seen to hold by the stationarity and the finite fourth moment assumption. Conditions (8.3.11) , (8.3.12) (8.3.14)-(8.3.16) are satisfied since the functions IJ and (J are linear in parameters and (8.3.13) is seen to hold as in Example 8.3.2 . Finally, (8.3 .20) and (8.3.19) are seen to be satisfied with
Therefore, from Corollary 8.3.4 , if f( -x) = f(x), '1/;( -x) = - 'I/; (x) , for every x E JR, then , n1/Z(a - 0:) --'td N(o , 1::(O)v('I/; , F)) , where
Again , a relative efficiency statement similar to the one in the previous two examples holds here also. We end this section by mentioning that similar asymptotic normality and efficiency comparison statements can be deduced in all these examples from Theorem 8.3.4 pertaining to the minimum distance estimators amd .
Not e: The results in this chapter are based on the works of Koul (1996) and Koul and Mukherjee (2001). A special case of Corollary 8.3.6 is also obtained by Boldin (1998) .
8.3.3 Examples of ARCH models
407
Bollerslev, Chou and Kroner (1992), Bera and Higgins (1993), Shephard (1996) and the books by Taylor (1986) and Gourieroux (1997), among others, discuss numerous aspects of ARCH models . In particular, when J.l == 0, the asymptotic distribution of the quasi-maximum likelihood estimator of f3 appears in Weiss (1986) and many probabilistic properties of the model are investigated in Nelson (1990). Adaptive estimation for linear regression models with Engle's ARCH errors was discussed by Linton (1993) . Robust L-estimation of the heteroscedastic parameter f3 based on a preliminary estimator of a in a special case of the above model is discussed in Koenker and Zhou (1996).
9 Appendix 9.1
Appendix
We include here some results relevant to the weak convergence of proc esses in V[O, 1] and C[O, 1] for th e sake of easy reference and without proofs . Our source is the book by Billingsley (1968) (B) on Convergence of Probability Measures. To begin with, let 6 , '" ,~m be r .v.'s, not necessaril y independ ent and define k
s, :=
I:~j , 1 :S
s:
m;
j =l
The following lemm a is obtained by combining (12.5), (12.10) and Theorem 12.1 from pp . 87-89 of (B). Lemma 9.1.1 Suppos e there exist nonnegative numbers 'Y 2: 0 and an 0: > 0 such that k
E{ISk - Sjl 'YISj - Sd 'Y}
:s ( I:
Ur)20) , O:S i
Ul , U 2, . . . ,U m ,
:s j :s k :s m .
r = i+ l
Then,
V,X
> 0,
P(Mm 2: ,X)
m
:s K 'Y ,o . r 'Y (L u 2
20 r )
A
+ P(ISml 2: 2'),
r=l
where K 'Y ,o is a constant depending only on 'Y and
0:.
H. L. Koul, Weighted Empirical Processes in Dynamic Nonlinear Models © Springer-Verlag New York, Inc 2002
a
409
Appendix The following inequality is given as Corollary 8.3 in (B) .
°::; °
Lemma 9.1. 2 Let {((t) , t ::; I} be a stochastic proces s on some probability space. Let b > 0, = to < tl < .. . < t r = 1 with t, - ti-l ~ b, 2 ::; i ::; r - 1, be a part ition 01[0 ,1] . Then, V e > 0, V 0 < b ::; 1,
P( sup I((t) - ((s)1 It- sl <<5
~
3t)
r
Definition: A sequence of stochastic processes {en} in 'O[O,lJ is said to converge weakly to a stochastic process ( E qo, 1J if every finite dimensional distribution of {(n} converges weakly to that of ( and if {(n} is tight with respect to th e uniform metric. The following theorem gives sufficient conditions for th e weak convergence of a sequen ce of stochastic processes in '0[0 , 1J to a limit in qo, 1J . It is essent ially Theorem 15.5, p. 127 of (B) . Theorem 9.1.1 Let {(n(t) ,O ::; t ::; I} be a seque n ce of sto chastic process es in '0[0 , 1J . Suppose that l(n(O)1 = Op(l) and that V e > 0,
lim lim sup P ( sup I(n(s) - (n(t)1 n Is- tl<'7
'7~0
Th en the sequen ce {(n (t) ,
°::;
~ t) = 0.
t ::; I} is tight, and of ( is th e weak limit of
a sub sequence {en, (t) , 0::; t ::; I}, th en P(( E qo, 1]) = 1.
The following th eorem gives sufficient conditions for the weak convergence of a sequ ence of stochastic pro cesses in qO,lJ to a limit in C[O,lJ . It is essentially Theorem 12.3, p. 95 of (B) . Theorem 9.1.2 Let {(n(t) ,O ::; t ::; I} be a sequ ence of sto chastic processes in qO,lJ . Suppose that l(n(O)1 = Op(l) and that there exist a -y ~ 0, a: > 1 and a nondecreasing continuous fun ction F on [0,1] such that,
holds for all s , t in [0, 1J and for all A> 0 . Then the sequ enc e {(n (t) , of a sub sequence {(m n (t) ,
°::;
°::;
t ::; I} is tight, and if ( is th e weak limit
t ::; I}, th en P(( E C[O , 1]) = 1.
Ch.9. Appendix
410
Nex t, we st ate a cent ral limit theorem for martingale arrays.
Let
(f2, F , P ) be a probability space; {Fn,i , 1 :S i:S n}, be an array of sub
a-
:S i :S n ; X ni be :Tn,i measurable LV. with EX~ i < 00 , E (X n;lFn ,i- d = 0, 1 :S i :S n ; and let S nj = L: i::;j X ni , 1 :S j :S n . Then {Sni ,Fn,i ; 1 :S i :S n, n 2': I} is called a zero- mea n squarein tegrable m art ing ale array with differen ces {Xni ; 1 :S i :S n , n 2': I} . field s such t hat F n,i C F n,i+l , 1
The cent ra l limi t th eor em we find useful is Corolla ry 3.1 of Hall and Heyd e (1980) whi ch we state her e for an eas y referen ce. Lemma 9.1.3 Let {Sni, :Tn,i; 1 :S i :S n , n 2': I} be a zero - m ean square in tegrable ma rt ingal e array with differenc es {Xnil satisf ying th e following conditio ns . n
(9.1.1)
\j
E > 0,
L E[X~J(IXnil > E)IFn,i -d = op (l ). i=1
n
(9.1.2)
L E[X~ i IFn , i -d
-+ a r .v . '7 2 , in probabilit y.
(9.1.3)
i=1 :Tn,i C F n+ 1,i, 1 :S
i :S n , n 2': 1.
Then S nn converges in distribution to a r. v. whos e characteris tic fun ction at t is E exp( _1]2( 2 ) , t. E lit DO The following inequ ali ty on t he tail probability of a sum of martingale differences is obtained by combining the Doob and Rossenth al inequalit ies : cf. Hall and Heyd e (1980, Coroll ary 2.1 and Theor em 2.12). S uppose M j = L: {=1 D i is a su m of martingale differences with respect to the un derlyin g in creasing filtrat ion {V;} an d E IDilP < 00 , 1 :S i :S n , f or so me p 2': 2. Th en , there exis ts a cons tan t C = C (p) su ch that for any E> 0,
(9.1.4)
Next , we state and pr ove t h ree lemmas of general interest . The first lemma is du e to Scheffe (194i) whil e t he seco nd has its origin in Theor em 11.4.2.1 of of Hajek - Sid ak (op . cit .) . The third lemma is the same as Theor em V.1.3.1 of Haj ek - Sldak (op . cit.) . All these resul ts are reproduced her e for t he sa ke of comp leteness .
411
Appendix
e, en , n ~ 1 en -t e, a.e. u . Then,
Lemma 9.1.4 Let (D, A, v) be a a-finite measure space. Let be sequence of probability densities w.r.t . v such that
! len - el
du ----+ 0.
Proof. Let 8n := en - e, 8;t := max(8 n, 0), 8;; := max( -8n , 0). By assumption, 8;; -t 0, a.e., u. Mor eover , 8;; e. Thus, by the DCT, J 8;; dv -t 0. This in turn along with the fact that J 8n dv = 0, implies that J 8;t dv -t 0. The claim now follows from these fact s and the relation J len- el dv = J 8;; du + J 8;t dv . 0
:s
Le m m a 9 .1.5 Let (D, A , v) be a a-finit e m easure space. Let {g n}, 9 be a sequ ence of m easurabl e function s su ch that
(9.1.5) (9.1.6)
limnsu p
!
gn
-t
Ignl dv
<
g, a.e. u,
!
Igl dv
< 00,
th en, for any m easurabl e function
!
(9.1.7)
!
P r oof. For any fun ction h , let h+ = max(O, h) , h: (9.1.5), -t s" , a .e. Hence, by Fatou, becau se 0:S
g;
max( -h , 0) . By
g; ,
(9.1.8) Bu t (9.1.8) is compatible with (9.1.6) only if (9.1.9)
lim; u p
!
g;: du :S
!
J
g± dv .
For , otherw ise, suppose that lim sUPn g;t dv lim;u p
!
Ignl dv
=
> J g+ dv . Then ,
(! g~ + ! ! g~ ! «: ! g~ + ! ! +! ! lim;uP
du
lim; u p
dv
> limnsu p
dv
>
g- du =
>
g+ du
g;; dv)
+ limninf
dv
g- dv ,
Ig i dv ,
by (9.1.8-)
412
Ch.9 . Appendix
which violates (9.1.6) . The other case can be handl ed similarly. This proves (9.1.9). But (9.1.9) tog ether with (9.1.8) imply that g; du -+ g± du , which in turn imply that gn dv -+ 9 di/ . Thus we have proved t hat (9.1.5) and (9.1.6) imply
J
J
J
!
(9.1.10)
!
(9.1.11)
g;dv
-+
gndv
-+
! !
J
g±dv, gdv.
Again , by Fatou Lemm a applied to g; cp and g;(1 - cp ), we obtain t ha t (9.1.12) (9.1.13)
limninf
!
!
cpg; dv
>
(1 - cp )g; du
>
limninf
! !
cpg± du ,
(1 - cp) g± du .
But, in view of (9.1.10), (9.1.13) is equivalent to limnsu p
!
cpg; du
:S
!
cpg± di/,
which together with (9.1.12) completes th e proof of (9.1.7).
o
Lem m a 9.1.6 Let (0 , A , v) be a a -finite m easure space. Let {gn}, 9 be a sequen ce of square integrable fun cti ons such tha t (9.1.5) holds an d that (9.1.14)
lim; u p
! g~
du
:S
!
g2 du .
Th en,
Proof. The Fatou Lemm a and (9.1.14) imply
li~ ! g~dv =
(9.1.15)
!
l dv .
By the Cauchy-Schwarz inequali ty, (9.1.16) so that
!
Igng ldv :S
(
! g~dv !
1/ 2
g2 dV)
,
Appendix
413
Hence by Lemma 9.1.5, lim
f
9n9 dv =
fl
du.
The lemma now follows from this , (9.1.15), and some algebra.
DOD