Lecture Notes in Statistics Edited by P. Bickel, P. Diggle, S. Fienberg, U. Gather, I. Olkin, S. Zeger
190
Jérôme Dedecker Paul Doukhan Gabriel Lang José Rafael León R. Sana Louhichi Clémentine Prieur
Weak Dependence: With Examples and Applications
Jérôme Dedecker Laboratoire de Statistique, Théorique et Appliquée Université Paris 6 175 rue de Chevaleret 75013 Paris France
[email protected] Gabriel Lang Equipe MORSE UMR MIA518 INRA Agro Paris Tech, F-75005 Paris France
[email protected] Sana Louhichi Laboratoire de Mathématiques Université Paris-Sud–Bât 425 914058 Orsay Cedex France
[email protected] Clémentine Prieur Génie Mathématique et Modélisation Laboratoire de Statistique et Probabilités INSA Toulouse 135 avenue de Rangueil 31077 Toulouse Cedex 4 France
[email protected]
ISBN 978-0-387-69951-6
Paul Doukhan ENSAE, Laboratoire de Statistique du CREST (Paris Tech) 3 avenue Pierre Larousse 92240 Malakoff France and SAMOS-MATISSE (Statistique Appliquée et Modélisation Stochastique) Centre d’Economie de la Sorbonne Université Paris 1-Panthéon-Sorbonne CNRS 90, rue de Tolbiac 75634-Paris Cedex 13 France
[email protected] José Rafael León R. Escuela de Matematica Universidad Central de Venezuela Ap. 47197 Los Chaguaramos Caracas 1041-A Venezuela
[email protected]
e-ISBN 978-0-387-69952-3
Library of Congress Control Number: 2007929938 © 2007 Springer Science+Business Media, LLC All rights reserved. This work may not be translated or copied in whole or in part without the written permission of the publisher (Springer Science+Business Media, LLC, 233 Spring Street, New York, NY 10013, USA), except for brief excerpts in connection with reviews or scholarly analysis. Use in connection with any form of information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed is forbidden. The use in this publication of trade names, trademarks, service marks, and similar terms, even if they are not identified as such, is not to be taken as an expression of opinion as to whether or not they are subject to proprietary rights. Printed on acid-free paper. 9 8 7 6 5 4 3 2 1 springer.com
Preface Time series and random fields are main topics in modern statistical techniques. They are essential for applications where randomness plays an important role. Indeed, physical constraints mean that serious modelling cannot be done using only independent sequences. This is a real problem because asymptotic properties are not always known in this case. The present work is devoted to providing a framework for the commonly used time series. In order to validate the main statistics, one needs rigorous limit theorems. In the field of probability theory, asymptotic behavior of sums may or may not be analogous to those of independent sequences. We are involved with this first case in this book. Very sharp results have been proved for mixing processes, a notion introduced by Murray Rosenblatt [166]. Extensive discussions of this topic may be found in his Dependence in Probability and Statistics (a monograph published by Birkha¨ user in 1986) and in Doukhan (1994) [61], and the sharpest results may be found in Rio (2000) [161]. However, a counterexample of a really simple non-mixing process was exhibited by Andrews (1984) [2]. The notion of weak dependence discussed here takes real account of the available models, which are discussed extensively. Our idea is that robustness of the limit theorems with respect to the model should be taken into account. In real applications, nobody may assert, for example, the existence of a density for the inputs in a certain model, while such assumptions are always needed when dealing with mixing concepts. Our main task here is not only to provide the reader with the sharpest possible results, but, as statisticians, we need the largest possible framework. Doukhan and Louhichi (1999) [67] introduced a wide dependence framework that turns out to apply to the models used most often. Their simple way of weakening the independence property is mainly adapted to work with stationary sequences. We thus discuss examples of weakly dependent models, limit theory for such sequences, and applications. The notions are mainly divided into the two following classes: • The first class is that of “Causal” dependence. In this case, the conditions may also be expressed in terms of conditional expectations, and thus the v
vi
PREFACE powerful martingale theory tools apply, such as Gordin’s [97] device that allowed Dedecker and Doukhan (2003) [43] to derive a sharp Donsker principle. • The second class is that of noncausal processes such as two-sided linear processes for which specific techniques need to be developed. Moment inequalities are a main tool in this context.
In order to make this book useful to practitioners, we also develop some applications in the fields of Statistics, Stochastic Algorithms, Resampling, and Econometry. We also think that it is good to present here the notation for the concepts of weak dependence. Our aim in this book was to make it simple to read, and thus the mathematical level needed has been set as low as possible. The book may be used in different ways: • First, this is a mathematical textbook aimed at fixing the notions in the area discussed. We do not intend to cover all the topics, but the book may be considered an introduction to weak dependence. • Second, our main objective in this monograph is to propose models and tools for practitioners; hence the sections devoted to examples are really extensive. • Finally, some of the applications already developed are also quoted for completeness. A preliminary version of this joint book on weak dependence concepts was used in a course given by Paul Doukhan to the Latino Americana Escuela de Matem´atica in Merida (Venezuela). It was especially useful for the preparation of our manuscript that a graduate course in Merida (Venezuela) in September 2004 on this subject was based on these notes. The different contributors and authors of the present monograph participated in developing it jointly. We also want to thank the various coauthors of (published or not yet published) papers on the subject, namely Patrick Ango Nz´e (Lille 3), Jean-Marc Bardet (Universit´e Paris 1), Odile Brandi`ere (Orsay), Alain Latour (Grenoble), H´el`ene Madre (Grenoble), Michael Neumann (Iena), Nicolas Ragache (INSEE), Mathieu Rosenbaum (Marne la Vall´ee), Gilles Teyssi`ere (G¨oteborg), Lionel Truquet (Universit´e Paris 1), Pablo Winant (ENS Lyon), Olivier Wintenberger (Universit´e Paris 1), and Bernard Ycart (Grenoble). Even if all their work did not appear in those notes, they were really helpful for their conception. We also want to thank the various referees who provided us with helpful comments either for this monograph or for papers submitted for publication and related to weak dependence. We now give some clarification concerning the origin of this notion of weak dependence. The seminal paper [67] was in fact submitted in 1996 and was part
PREFACE
vii
of the PhD dissertation of Sana Louhichi in 1998. The main tool developed in this work was combinatorial moment inequalities; analogous moment inequalities are also given in Bakhtin and Bulinski (1997) [8]. Another close definition of weak dependence was provided in a preprint by Bickel and B¨ uhlmann (1995) [17] anterior to [67], also published in 1999 [18]. However, those authors aimed to work with the bootstrap; see Chapter 13 and Section 2.2 in [6]. The approach of Wu (2005) [188] detailed in Remark 3.1, based on L2 -conditions for causal Bernoulli shifts, also yields interesting and sharp results. This monograph is essentially built in four parts: Definitions and models In the first chapter, we make precise some issues and tools for investigating dependence: this is a motivational chapter. The second chapter introduces formally the notion of weak dependence. Models are then presented in a long third chapter. Indeed, in our mind, the richness of examples is at the core of the weak dependence properties. Tools Tools are given in two chapters (Chapters 4 and 5) concerned respectively with noncausal and causal properties. Tools are first used in the text for proving the forthcoming limit theorems, but they are essential for any type of further application. Two main tools may be found: moment bounds and coupling arguments. We also present specific tightness criteria adapted to work out empirical limit theorems. Limit theorems Laws of large numbers (and some applications), central limit theorems, invariance principles, laws of the iterated logarithm, and empirical central limit theorems are useful limit theorems in probability. They are precisely stated and worked out within Chapters 6–10 . Applications The end of the monograph is dedicated to applications. We first present in Chapter 11 the properties of the standard nonparametric techniques. After this, we consider some issues of spectral estimation in Chapter 12. Finally, Chapter 13 is devoted to some miscellaneous applications, namely applications to econometrics, the bootstrap, and subsampling techniques. After the table of contents, a useful short list of notation allows rapid access to the main weak dependence coefficients and some useful notation.
J´erˆome Dedecker, Paul Doukhan, Gabriel Lang, Jos´e R. Le´on, Sana Louhichi, and Cl´ementine Prieur
Contents Preface
v
List of notations
xiii
1 Introduction 1.1 From independence to dependence . . 1.2 Mixing . . . . . . . . . . . . . . . . . . 1.3 Mixingales and near epoch dependence 1.4 Association . . . . . . . . . . . . . . . 1.5 Nonmixing models . . . . . . . . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
1 1 4 5 6 8
2 Weak dependence 2.1 Function spaces . . . . . . . . . . . . . . . 2.2 Weak dependence . . . . . . . . . . . . . . 2.2.1 η, κ, λ and ζ-coefficients . . . . . . 2.2.2 θ and τ -coefficients . . . . . . . . . ˜ 2.2.3 α ˜ , β˜ and φ-coefficients. . . . . . . . 2.2.4 Projective measure of dependence
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
9 10 11 12 14 16 19
3 Models 3.1 Bernoulli shifts . . . . . . . . . . . . . . . . . . . 3.1.1 Volterra processes . . . . . . . . . . . . . 3.1.2 Noncausal shifts with independent inputs 3.1.3 Noncausal shifts with dependent inputs . 3.1.4 Causal shifts with independent inputs . . 3.1.5 Causal shifts with dependent inputs . . . 3.2 Markov sequences . . . . . . . . . . . . . . . . . . 3.2.1 Contracting Markov chain. . . . . . . . . 3.2.2 Nonlinear AR(d) models . . . . . . . . . . 3.2.3 ARCH-type processes . . . . . . . . . . . 3.2.4 Branching type models . . . . . . . . . . .
. . . . . . . . . . .
. . . . . . . . . . .
. . . . . . . . . . .
. . . . . . . . . . .
. . . . . . . . . . .
. . . . . . . . . . .
. . . . . . . . . . .
. . . . . . . . . . .
. . . . . . . . . . .
21 21 22 24 25 31 32 33 35 36 36 37
ix
. . . . .
CONTENTS
x 3.3 3.4
. . . . . . . . . . . . .
. . . . . . . . . . . . .
. . . . . . . . . . . . .
. . . . . . . . . . . . .
. . . . . . . . . . . . .
. . . . . . . . . . . . .
38 42 43 48 53 53 55 56 59 59 61 62 65
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . bound . . . .
. . . . . . . . . . . . .
. . . . . . . . . . . . .
. . . . . . . . . . . . .
. . . . . . . . . . . . .
. . . . . . . . . . . . .
67 67 69 69 70 73 77 79 82 84 84 93 96 98
5 Tools for causal cases 5.1 Comparison results . . . . . . . . . . . . . . . . . . . . . . 5.2 Covariance inequalities . . . . . . . . . . . . . . . . . . . . 5.2.1 A covariance inequality for γ1 . . . . . . . . . . . . 5.2.2 A covariance inequality for β˜ and φ˜ . . . . . . . . 5.3 Coupling . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.3.1 A coupling result for real valued random variables 5.3.2 Coupling in higher dimension . . . . . . . . . . . . 5.4 Exponential and moment inequalities . . . . . . . . . . . . 5.4.1 Bennett-type inequality . . . . . . . . . . . . . . . 5.4.2 Burkholder’s inequalities . . . . . . . . . . . . . . . 5.4.3 Rosenthal inequalities using Rio techniques . . . . 5.4.4 Rosenthal inequalities for τ1 -dependent sequences . 5.4.5 Rosenthal inequalities under projective conditions 5.5 Maximal inequalities . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . .
. . . . . . . . . . . . . .
. . . . . . . . . . . . . .
. . . . . . . . . . . . . .
103 103 110 110 111 114 115 116 119 120 123 125 130 131 132
3.5
3.6
Dynamical systems . . . . . . . . . . . . . . . . . Vector valued LARCH(∞) processes . . . . . . . 3.4.1 Chaotic expansion of LARCH(∞) models 3.4.2 Bilinear models . . . . . . . . . . . . . . . ζ-dependent models . . . . . . . . . . . . . . . . 3.5.1 Associated processes . . . . . . . . . . . . 3.5.2 Gaussian processes . . . . . . . . . . . . . 3.5.3 Interacting particle systems . . . . . . . . Other models . . . . . . . . . . . . . . . . . . . . 3.6.1 Random AR models . . . . . . . . . . . . 3.6.2 Integer valued models . . . . . . . . . . . 3.6.3 Random fields . . . . . . . . . . . . . . . 3.6.4 Continuous time . . . . . . . . . . . . . .
4 Tools for non causal cases 4.1 Indicators of weakly dependent processes . . . . . 4.2 Low order moments inequalities . . . . . . . . . . 4.2.1 Variances . . . . . . . . . . . . . . . . . . 4.2.2 A (2 + δ)-order momentbound . . . . . . 4.3 Combinatorial moment inequalities . . . . . . . . 4.3.1 Marcinkiewicz-Zygmundtype inequalities . 4.3.2 Rosenthal type inequalities . . . . . . . . 4.3.3 A first exponential inequality . . . . . . . 4.4 Cumulants . . . . . . . . . . . . . . . . . . . . . . 4.4.1 General properties of cumulants . . . . . 4.4.2 A second exponential inequality . . . . . . 4.4.3 From weak dependence to the exponential 4.5 Tightness criteria . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . .
. . . . . . . . . . . . .
. . . . . . . . . . . . .
xi
CONTENTS 6 Applications of SLLN 6.1 Stochastic algorithms with non causal dependent input 6.1.1 Weakly dependent noise . . . . . . . . . . . . . 6.1.2 γ1 -dependent noise . . . . . . . . . . . . . . . . 6.2 Examples of application . . . . . . . . . . . . . . . . . 6.2.1 Robbins-Monro algorithm . . . . . . . . . . . . 6.2.2 Kiefer-Wolfowitz algorithm . . . . . . . . . . . 6.3 Weighted dependent triangular arrays . . . . . . . . . 6.4 Linear regression . . . . . . . . . . . . . . . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
135 135 137 140 142 142 143 143 145
7 Central limit theorem 7.1 Non causal case: stationary sequences . . . . . . . . 7.2 Lindeberg method . . . . . . . . . . . . . . . . . . . 7.2.1 Proof of the main results . . . . . . . . . . . 7.2.2 Rates of convergence . . . . . . . . . . . . . . 7.3 Non causal random fields . . . . . . . . . . . . . . . 7.4 Conditional central limit theorem (causal) . . . . . . 7.4.1 Definitions and preliminary lemmas . . . . . 7.4.2 Invariance of the conditional variance . . . . 7.4.3 End of the proof . . . . . . . . . . . . . . . . 7.5 Applications . . . . . . . . . . . . . . . . . . . . . . . 7.5.1 Stable convergence . . . . . . . . . . . . . . . 7.5.2 Sufficient conditions for stationary sequences 7.5.3 γ-dependent sequences . . . . . . . . . . . . . ˜ 7.5.4 α ˜ and φ-dependent sequences . . . . . . . . . 7.5.5 Sufficient conditions for triangular arrays . .
. . . . . . . . . . . . . . .
. . . . . . . . . . . . . . .
. . . . . . . . . . . . . . .
. . . . . . . . . . . . . . .
. . . . . . . . . . . . . . .
. . . . . . . . . . . . . . .
. . . . . . . . . . . . . . .
153 153 155 158 161 163 173 174 176 178 182 182 184 189 192 194
8 Donsker principles 8.1 Non causal stationary sequences . . . . . . . . . . . 8.2 Non causal random fields . . . . . . . . . . . . . . . 8.2.1 Moment inequality . . . . . . . . . . . . . . . 8.2.2 Finite dimensional convergence . . . . . . . . 8.2.3 Tightness . . . . . . . . . . . . . . . . . . . . 8.3 Conditional (causal) invariance principle . . . . . . . 8.3.1 Preliminaries . . . . . . . . . . . . . . . . . . 8.3.2 Finite dimensional convergence . . . . . . . . 8.3.3 Relative compactness . . . . . . . . . . . . . 8.4 Applications . . . . . . . . . . . . . . . . . . . . . . . 8.4.1 Sufficient conditions for stationary sequences 8.4.2 Sufficient conditions for triangular arrays . .
. . . . . . . . . . . .
. . . . . . . . . . . .
. . . . . . . . . . . .
. . . . . . . . . . . .
. . . . . . . . . . . .
. . . . . . . . . . . .
. . . . . . . . . . . .
199 199 200 201 202 205 205 206 207 208 209 209 212
9 Law of the iterated logarithm (LIL) 213 9.1 Bounded LIL under a non causal condition . . . . . . . . . . . . 213 9.2 Causal strong invariance principle . . . . . . . . . . . . . . . . . . 214
xii 10 The 10.1 10.2 10.3 10.4 10.5 10.6
CONTENTS empirical process A simple condition for the tightness η-dependent sequences . . . . . . . . ˜ α ˜ , β˜ and φ-dependent sequences . . θ and τ -dependent sequences . . . . Empirical copula processes . . . . . . Random fields . . . . . . . . . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
223 224 225 231 233 234 236
11 Functional estimation 11.1 Some non-parametric problems . . . . . . . 11.2 Kernel regression estimates . . . . . . . . . 11.2.1 Second order and CLT results . . . . 11.2.2 Almost sure convergence properties . ˜ 11.3 MISE for β-dependent sequences . . . . . . 11.4 General kernels . . . . . . . . . . . . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
247 247 248 249 252 254 260
12 Spectral estimation 12.1 Spectral densities . . . . . . . . 12.2 Periodogram . . . . . . . . . . 12.2.1 Whittle estimation . . . 12.3 Spectral density estimation . . 12.3.1 Second order estimate . 12.3.2 Dependence coefficients
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
265 265 269 274 275 277 279
13 Econometric applications and resampling 13.1 Econometrics . . . . . . . . . . . . . . . . . . . . . 13.1.1 Unit root tests . . . . . . . . . . . . . . . . 13.1.2 Parametric problems . . . . . . . . . . . . . 13.1.3 A semi-parametric estimation problem . . . 13.2 Bootstrap . . . . . . . . . . . . . . . . . . . . . . . 13.2.1 Block bootstrap . . . . . . . . . . . . . . . 13.2.2 Bootstrapping GMM estimators . . . . . . 13.2.3 Conditional bootstrap . . . . . . . . . . . . 13.2.4 Sieve bootstrap . . . . . . . . . . . . . . . . 13.3 Limit variance estimates . . . . . . . . . . . . . . . 13.3.1 Moments, cumulants and weak dependence 13.3.2 Estimation of the limit variance . . . . . . . 13.3.3 Law of the large numbers . . . . . . . . . . 13.3.4 Central limit theorem . . . . . . . . . . . . 13.3.5 A non centered variant . . . . . . . . . . . .
. . . . . . . . . . . . . . .
. . . . . . . . . . . . . . .
. . . . . . . . . . . . . . .
. . . . . . . . . . . . . . .
. . . . . . . . . . . . . . .
. . . . . . . . . . . . . . .
. . . . . . . . . . . . . . .
. . . . . . . . . . . . . . .
283 283 284 285 285 287 288 288 290 290 292 293 295 297 299 302
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
Bibliography
305
Index
317
List of notations We recall here the main specific or unusual notation used throughout this monograph. As usual, #A denotes the cardinal of a finite set A, and N, Z, R, C are the standard sets of calculus. For clarity and homogeneity, we note in this list: • a b or a = O(b) means that a ≤ Cb for a constant C > 0 and a, b ≥ 0, • a b or a = o(b) as b → 0, for a, b ≥ 0, means limb→0 a/(a + b) = 1, • a ∧ b, a ∨ b are the minimum and the maximum of the numbers a, b ≥ 0, • M, U, V, A, B are σ-algebras, and (Ω, A, P) is a probability space, • X, Y, Z, . . . denote random variables (usually ξ, ζ are inputs), 1 • Lp (E, E , m) are classes of measurable functions: f p = E |f (x)|p dm(x) p < ∞, • F is a cumulative distribution function, Φ is the normal cumulative distribution function, and φ = Φ is its density, • n is a time (space) delay, r, s ∈ N are “past-future” parameters, p is a moment order, and F, G are function spaces.
α(U, V) α (M, X) α r (n) BV BV1 β(U, V) β(M, X) βr (n) Cn,r cX,r (n) cX,r (n) γp (M, X) γp (n) δn
(X, Y )
p. 4, p. 16, p. 19, p. 11,
§ § § §
1.2, eqn. (1.2.1), 2.2.3, def. 2.5-1, 2.2.3, def. 2.2.15, 2.1,
mixing coefficient dependence coefficient dependence sequence bounded variation spaces
p. 4, p. 16, p. 19, p. 73, p. 87, p. 87, p. 19, p. 19, p. 25, p. 11,
§ § § § § § § § § §
1.2, eqn. (1.2.4), 2.2.3, def. 2.5-2, 2.2.3, def .2.2.15, 4.3, def. 4.1, 4.4.1, eqn. (4.4.7), 4.4.1, eqn. (4.4.8), 2.2.4, def. 2.2.16, 2.2.4, def. 2.2.16, 3.1.2, def. 3.1.10, 2.2, eqn. (2.2.1),
mixing coefficient dependence coefficient dependence sequence moment coefficient sequence vector moment sequence maximum moment sequence dependence coefficient dependence sequence Bernoulli shift coefficient general coefficient
xiii
xiv
(n) pp. 4–11, §§ 1.1–2.2, eqn. (1.1.3), η(n) p. 12, § 2.2.1, eqn. (2.2.3), p. 18, § 2.2.3, eqn. (2.2.14), f −1 F , G, Ψ p. 11, § 2.2, eqn. (2.2.1), F, Ψ φ(U, V) p. 4, § 1.2, eqn. (1.2.3), φ(M, X) p. 16, § 2.2.3, def. 2.5-3, φr (n) p. 19, § 2.2.3, def. 2.2.15, I p. 67, § 4.1, p. 88, § 4.4.1, def. 4.4.9, κp (n) κ(n) p. 12, § 2.2.1, eqn. (2.2.5), L p. 38, § 3.3, p. 6, § 1.3, def. 1.2, Lp -NED (1) p. 10, § 2.1, Λ(δ), Λ (δ) λ(n) p. 12, § 2.2.1, eqn. (2.2.4), p. 5, § 1.2, eqn. (1.2.5), μX (n) μX,r,s (n) p. 5, § 1.2, eqn. (1.2.6), ψ(n) p. 5, § 1.3, QX (·) p. 74, § 4.3, eqn. (4.3.3), ρ(U, V) p. 4, § 1.2, eqn. (1.2.2), θ(n) p. 14, § 2.2.2, eqn. (2.2.7), p. 15, § 2.2.2, eqn. (2.2.8), θp (M, X) θp,r (n) p. 15, § 2.2.2, eqn. (2.2.9), p. 16, § 2.2.2, eqn. (2.2.12), τp (M, X) τp,r (n) p. 16, § 2.2.2, eqn. (2.4), ζ(n) p. 12, § 2.2.1, eqn. (2.2.6), p. 22, § 3.1, eqn. (3.1.3), ωp,n
LIST OF NOTATIONS coefficient sequence dependence sequence inverse of a monotonic f weak dependence mixing coefficient dependence coefficient dependence sequence class of indicators cumulant sequence dependence sequence Perron Frobenius operator NED notion Lipschitz spaces dependence sequence μ-mixing sequence μ-mixing field Lp -mixingale coefficient quantile function mixing coefficient dependence sequence dependence coefficient dependence sequence coupling coefficient coupling sequence dependence sequence shift increment coefficient
Chapter 1
Introduction This chapter is aimed to justify some of our choices and to provide a basic background of the other competitive notions like those linked to mixing conditions. In our mind mixing notions are not related to time series but really to σ-algebras. They are consequently more adapted to work in areas like Finance where history, that is the σ-algebra generated by the past is of a considerable importance. Having in view the most elementary ideas, Doukhan and Louhichi (1999) [67] introduced the more adapted weak dependence condition developped in this monograph. This definition makes explicit the asymptotic independence between ‘past’ and ‘future’; this means that the ‘past’ is progressively forgotten. In terms of the initial time series, ‘past’ and ‘future’ are elementary events given through finite dimensional marginals. Roughly speaking, for convenient functions f and g, we shall assume that Cov (f (‘past’), g(‘future’)) is small when the distance between the ‘past’ and the ‘future’ is sufficiently large. Such inequalities are significant only if the distance between indices of the initial time series in the ‘past’ and ‘future’ terms grows to infinity. The convergence is not assumed to hold uniformly on the dimension of the ‘past’ or ‘future’ involved. Another direction to describe the asymptotic behavior of certain time series is based on projective methods. It will be proved that this is coherent with the previous items.
Sections in this chapter first provide general considerations on independence, then we define classical mixing coefficients, mixingales and association to conclude with simple counterexamples.
1.1
From independence to dependence
We recall here some very basic facts concerning independence of random variables. Let P, F be random variables defined on the same probability space 1
2
CHAPTER 1. INTRODUCTION
(Ω, A, P) and taking values in measurable spaces (EP , EP ) and (EF , EF ). Independence of both random variables P, F writes P(X ∈ A, Y ∈ B) = P(X ∈ A)P(Y ∈ B),
∀(A, B) ∈ EP × EF
(1.1.1)
extending this identity by linearity to stepwise constant functions and using limits yields a formulation of this relation which looks more adapted for applications: Cov(f (P), g(F)) = 0,
∀(f, g) ∈ L∞ (EP , EP ) × L∞ (EF , EF )
where, for instance, L∞ (EP , EP ) denotes the subspace of ∞ (EP , R) (the space of bounded and real valued functions), of measurable and bounded function f : (EP , EP ) → (R, BR ). If the spaces EP , EF are topological spaces endowed with their Borel σ-algebras (the σ-algebra generated by open sets) then it is sufficient to state Cov(f (P), g(F)) = 0,
∀(f, g) ∈ P × F
(1.1.2)
where P, F are dense subsets of the spaces of continuous functions EP → R and EF → R. In order to qualify a simple topology on both spaces it will be convenient to assume that EP and EF are locally compact topological spaces and the density in the space of continuous functions will thus refer to uniform convergence of compact subsets of EP , EF . From a general principle of economy, we always should wonder about the smallest classes P, F possible. The more intuitive (necessary) condition for independence is governed by the idea of orthogonality. In this idea we now develop some simple examples for which, however Orthogonality
=⇒
Independence.
• Bernoulli trials If EP = EF = {0, 1} are both two-points spaces, the random variables now follow Bernoulli distributions and independence follows form the simple orthogonality of Cov(P, F) = 0. Indeed from the standard bilinearity properties of the covariance, we also have Cov(1−P, F) = 0, Cov(P, 1−F) = 0 and Cov(1−P, 1−F) = 0 which respectively means that eqn. (1.1.1) holds if (A, B) = ({a}, {b}) with (a, b) = (0, 0) or respectively (1, 0), (0, 1) and (1, 1). The problem of this example is that in this case the σ-algebras generated by P, F are very poor and this example will thus not fit the forthcoming case of ‘important’ past and future. • Gaussian vectors If now EP = Rp and EF = Rq , then if the vector Z = (P, F) ∈ Rp+q is Gaussian then its distribution only depends on its second order properties∗ , ∗ This only means that Z’s distribution depends only on the expressions EZ and EZ Z i i j for 1 ≤ i, j ≤ d if Z = (Z1 , . . . , Zd ).
1.1. FROM INDEPENDENCE TO DEPENDENCE
3
hence coordinate functions are enough to determine independence. In more precise words two random vectors P = (P1 , . . . , Pp ) and F = (F1 , . . . , Fq ) are independent if and only if Cov(Pi , Fj ) = 0 for any 1 ≤ i ≤ p and any 1 ≤ j ≤ q. This is a very simple characterization of independence which clearly does not extend to any category of random variables. For example, let X, Y ∈ R be independent and symmetric random variables such that P(X = 0) = 0 if we set P = X and F = sign(X)Y then Cov(P, F) = E(|X|Y ) − EX · EF = E|X| · EY − EX · EF = 0 because X and Y are centered random variables even if those variables are not independent if the support of Y ’s distribution contains more than two points. A simple situation of uncorrelated individually standard Gaussian random variables which are not independent is provided† with the couple (P, F) = (N, RN ) where N ∼ N (0, 1) and R is a Rademacher random variable (that means P(R = 1) = P(R = −1) = 12 ) independent of N . • Associated vectors (cf. § 1.4) Again, we assume that EP = Rp and EF = Rq , then the random vector X = (P, F) ∈ Rp+q = Rd is called associated in case Cov(h(X), k(X)) ≥ 0 for any measurable couple of functions h, k : Rd → R such that both E(h2 (X) + k2 (X)) < ∞ and the partial functions xj → h(x1 , . . . , xd ) and xj → k(x1 , . . . , xd ) are non-decreasing for any choice of the remaining coordinates x1 , . . . ,xj−1 , xj+1 ,. . . , xd ∈ R and any 1 ≤ j ≤ d. A essential property of such vectors is that here too, orthogonality implies independence. This will be developed in a forthcoming chapter. An example of associated vectors is that of independent coordinates. Even if it looks uninteresting case in our dependent setting, this leads to much more involved examples of associated random vectors through monotonic functions. The class of such coordinatewise increasing functions is a cone of L2 (X), the class of functions such that Eh2 (X) < ∞, hence the set of associated distributions looks like a (very thin) cone of the space of distributions on Rd . The same idea applies to Gaussian distributions which is even finite dimensional in the large set of laws on Rd .
If now, we consider a time series X = (Xn )n∈Z with values in a locally compact topological space E (typically E = Rd ) we may consider one variable P of the past and one variable F of the future: P = (Xi1 , . . . , Xiu ),
F = (Xj1 , . . . , Xjv ),
† In this case both variables are indeed centered with Normal distributions and E{N (RN )} = ER · EN 2 = 0 while |N | = |RN | is not independent of itself since it is not a.s. a constant.
4
CHAPTER 1. INTRODUCTION
for i1 ≤ i2 ≤ · · · ≤ iu < j1 ≤ j2 ≤ · · · ≤ jv , u, v ∈ N∗ = {1, 2, . . .}. Independence of the time series X thus writes as the independence of P and F for any choices i1 ≤ i2 ≤ · · · ≤ iu < j1 ≤ j2 ≤ · · · ≤ jv . Independence of the times series up to time m, also called m-dependence, is now characterized as independence of P and F if iu + m ≤ j1 . Finally, asymptotic independence of past and future will thus be given by arbitrary asymptotics
(r) =
sup
sup
d(P,F)≥r (f,g)∈F ×G
|Cov(f (P), g(F))|
(1.1.3)
where d(P, F) = j1 − iu . The only problem of the previous definition is that the corresponding dependence coefficient should also be indexed by suitable multiindices (i1 , i2 , . . . , iu ) and (j1 , j2 , . . . , jv ). This definition will be completed in chapter 2 by considering classes Pu ⊂ P and Fv ⊂ F and suprema as well with respect to ordered multi-indices (i1 , i2 , . . . , iu ) and (j1 , j2 , . . . , jv ) such that j1 − iu ≥ r.
1.2
Mixing
Mixing conditions, as introduced by Rosenblatt (1956) [166] are weak dependence conditions in terms of the σ−algebras generated by a random sequence. In order to define such conditions we first introduce the conditions relative to sub-σ−algebras U, V ⊂ A on an abstract probability space (Ω, A, P): α(U, V) = ρ(U, V) =
sup
U∈U ,V ∈V
|P(U ∩ V ) − P(U )P(V )|
(1.2.1)
|Corr(u, v)|
(1.2.2)
sup
u∈L2 (U ),v∈L2 (V)
φ(U, V) =
P(U ∩ V ) sup − P(V ) P(U ) U∈U ,V ∈V
β(U, V) =
1 2
sup I, J ≥ 1 (Ui )1≤i≤I ∈ U I , (Vj )1≤j≤J ∈ V J
I J
(1.2.3)
|P(Ui ∩ Vj ) − P(Ui )P(Vj )| (1.2.4)
i=1 j=1
In the definition of β, the supremum is considered over all measurable partitions (Ui )1≤i≤I , (Vj )1≤j≤J of Ω. The above coefficients are, respectively, Rosenblatt (1956) [166]’s strong mixing coefficient α(U, V), Wolkonski and Rozanov (1959) [187]’s absolute regularity coefficient β(U, V), Kolmogorov and Rozanov (1960) [112]’s maximal correlation coefficient ρ(U, V), and Ibragimov (1962) [110]’s uniform mixing coefficient φ(U, V). A more comprehensible formulation for β is written in terms of a norm in total variation β(U, V) = PU ⊗V − PU ⊗ PV T V
1.3. MIXINGALES AND NEAR EPOCH DEPENDENCE
5
here PU , PV denote the restrictions of P to σ−fields U, V and PU ⊗V is a law on the product σ−fields defined of rectangles by PU ⊗V (U, V ) = P(U, V ). In case U, V are generated by random variables U, V this may be written β(U, V) = P(U,V ) − PU ⊗ PV T V as the total variation norm of distributions of (U, V ) and (U, V ) for some independent copy V of V. The Markov frame is however adapted to prove β-mixing since this condition holds under positive recurrence. In fact any coefficient μ such that μ(U, V) ∈ [0, +∞] is well defined and such that independence of U, V implies μ(U, V) = 0 may be considered as a mixing coefficient. Once a mixing coefficient has been chosen, the corresponding mixing condition is defined for random processes (Xt )t∈Z and for random fields (Xt )t∈Zd : μX (r) = sup c(σ(Xt , t ≤ i), σ(Xt , t ≥ i + r)) (1.2.5) i∈Z
and the random process is called μ-mixing in case μX (r) →r→∞ 0. Here μ = α, β, φ or ρ thus yield the coefficient sequences αX (r), βX (r), φX (r) or ρX (r); many other coefficients may also be introduced. For the more difficult case of random fields, one needs a more intricate definition. The one we propose depends on two additional integers, and the random field (Xt )t∈Zd is μ-mixing in case for any u, v ∈ N , cX,u,v (r) →r→∞ 0, where now μX,a,b (r) =
sup #A=a,#B=b,d(A,B)≥r
c(σ(Xt , t ∈ A), σ(Xt , t ∈ B))
(1.2.6)
the supremum is considered over finite subsets with cardinality u, v and at least r distant (where a metric has been fixed on Zd ). The following relations hold: ρ − mixing φ − mixing ⇒ ⇒ α − mixing β − mixing and no reverse implication holds in general. Examples for such conditions to hold are investigated in Doukhan (1994) [61], and Rio (2000) [161] provides up-to-date results in this setting. We only quote here that those conditions are usually difficult to check.
1.3
Mixingales and Near Epoch Dependence
Definition 1.1 (Mc Leish (1975) [129], Andrews (1988) [3]). Let p ≥ 1 and let (Fn )n∈Z be an increasing sequence of σ-algebras. The sequence (Xn , Fn )n∈Z is
6
CHAPTER 1. INTRODUCTION
called an Lp -mixingale if there exist nonnegative sequences (cn )n∈Z and (ψ(n))n∈Z such that ψ(n) − → 0 as n → ∞ and for all integers n ∈ Z, k ≥ 0, Xn − E (Xn | Fn+k ) p E (Xn | Fn−k ) p
≤ cn ψ (k + 1) , ≤ cn ψ (k) .
(1.3.1) (1.3.2)
This property of fading memory is easier to handle than the martingale condition. A more general concept is the near epoch dependence (NED) on a mixing process. Its definition can be found in Billingsley (1968) [20] who considered functions of φ−mixing processes. Definition 1.2 (P¨ otscher and Prucha (1991) [152]). Let p ≥ 1. We consider a c-mixing process (defined as in eqn. (1.2.5)) (Vn )n∈Z . For any integers i ≤ j, set Fij = σ (Vi , . . . , Vj ) . The sequence (Xn , Fn )n∈Z is called an Lp -NED process on the c-mixing process (Vn )n∈Z if there exist nonnegative sequences (cn )n∈Z and (ψ(n))n∈Z such that ψ(n) − → 0 as n → ∞ and for all integers n ∈ Z, k ≥ 0,
n+k
≤ cn ψ (k) .
Xn − E Xn | Fn−k p
This approach is developed in details in P¨ otscher and Prucha (1991) [152]. Functions of MA(∞) processes can be handled using NED concept. For instance, limit theorems can be deduced for sums of such Functions of MA(∞) processes. These previous definitions translate the fact that a k-period – ahead in the first case, both ahead and backwards in the second definition – projection is convergent to the unconditional mean. They are known to be satisfied by a wide class of models. For example, martingale differences can be described as L1 mixingale sequences, and linear processes with martingale difference innovations as well.
1.4
Association
The notion of association was introduced independently by Esary, Proschan and Walkup (1967) [85] and Fortuin, Kastelyn and Ginibre (1971) [87]. The motivations of those authors were radically different since the first ones were working in reliability theory and the others in mechanical statistics, and their condition is known as FKG inequality. Definition 1.3. The sequence (Xt )t∈Z is associated, if for all coordinatewise increasing real-valued functions h and k, Cov(h(Xt , t ∈ A), k(Xt , t ∈ B)) ≥ 0 for all finite disjoint subsets A and B of Z and if moreover E h(Xt , t ∈ A)2 + k(Xt , t ∈ B)2 < ∞.
1.4. ASSOCIATION
7
This extends the positive correlation assumption to model the notion that two stochastic processes have a tendency to evolve in a similar way. This definition is deeper than the simple positivity of the correlations. Besides the evident fact that it does not assume that the variances exist, one can easily construct orthogonal (hence positively correlated) sequences that do not have the association property. An important difference between the above conditions is that its uncorrelatedness implies independence of an associated sequence (Newman, 1984 [136]). Let for instance (ξk , ηk ) be independent and i.i.d. N (0, 1) sequences. Then the sequence (Xn )n∈Z defined by Xk = ξk (ηk − ηk−1 ) is neither correlated nor independent, hence it is not an associated sequence. Heredity of association only holds under monotonic transformations. This unpleasant restriction will disappear under the assumption of weak dependence. The following property of associated sequences was a guideline for the forthcoming definition of weak dependence. Association does not imply at all any mixing assumption‡ . The forthcoming inequality (1.4.1) also contains the idea that weakly correlated associated sequences are also ‘weakly dependent’. The following result provide a quantitative idea of the loss of association to independence: Theorem 1.1 (Newman, 1984 [136]). For a pair of measurable numeric functions (f, g) defined on A ⊂ Rk , we write f g if both functions g + f and g − f are non-decreasing with respect to each argument. Let now X be any associated random vector with range in A. Then (fi gi , for i = 1, 2) ⇒ Cov(f1 (X), f2 (X)) ≤ Cov(g1 (X), g2 (X)) . This theorem follows simply from several applications of the definition to the coordinatewise non-decreasing functions gi −fi and gi +fi . By an easy application of the above inequalities one can check that
k l
∂f
|Cov(f (X), g(Y ))| ≤
∂xi
i=1 j=1
∞
∂g
∂yj
Cov(Xi , Yj ),
(1.4.1)
∞
for Rk or Rl valued associated random vectors X and Y and C 1 functions f and g with bounded partial derivatives. For this, it suffices to note that f f1 if
p
∂f
one makes use of Theorem 1.1 with f1 (x1 , . . . , xp ) =
∂xi xi . i=1
∞
Denote by R(z) the real part of the complex number z. Theorem 1.1 can be extended to complex valued functions, up to a factor 2 in the left hand side of the above inequality (1.4.1). Indeed, we can set now f g if for any real number ‡ E.g. Gaussian processes with nonnegative covariances are associated while this is well known that this condition does not implies mixing, see [61], page 62.
8
CHAPTER 1. INTRODUCTION
ω the mapping t = (t1 , . . . , tk ) → R g(t) + eiω(t1 +···+tk ) f (t) is non-decreasing with respect to each argument. Also, for any real numbers t1 , . . . , tk , k k i(t1 X1 +···+tk Xk ) it1 X1 itk Xk ≤ 2 − Ee · · · Ee |ti ||tj |Cov(Xi , Xj ). Ee i=1 j=1
On the opposite side, negatively associated sequences of r.v.’s are defined by a similar relation than the aforementioned covariance inequality, except for the sense of this inequality. This property breaks the seemingly parallel definitions of positively and negatively associated sequences.
1.5
Nonmixing models
Finite moving averages Xn = H(ξn , ξn−1 , . . . , ξn−m ) are trivially m-dependent. However this does not remain exact as m → ∞. For example, the Bernoulli −(k+1) xk ) is not mixing; this shift Xn = H(ξn , ξn−1 , . . .) (with H(x) = ∞ k=0 2 is an example of a Markovian, non-mixing sequence.
∞ Indeed, its stationary representation writes Xn = k=0 2−k−1 ξn−k . Here ξn−k is the k-th digit in the binary expansion of the uniformly chosen number Xn = 0.ξn ξn−1 · · · ∈ [0, 1]. This proves that Xn is a deterministic function of X0 which is the main argument to derive that such models are not mixing ([61], page 77, counterexample 2 or [2]); more precisely, as Xn is some deterministic function of X0 the event A = (X0 ≤ 12 ) belongs both to the sigma algebras of the past σ(Xt , t ≤ 0) an and the sigma algebras of the future σ(Xt , t ≥ n), hence with the notation in § 1.2, α(n) ≥ |P(A ∩ A) − P(A)P(A)| =
1 1 1 − = . 2 4 4
The same arguments apply to the model described before of an autoregressive process with innovations taking p distinct values. The difference between two such independent processes of this type or ((−1)n Xn )n provide example of nonassociated and non-mixing processes. Assume now that more generally ξj ∼ b(s) follows a Bernoulli distribution with parameter 0 < s < 1. Concentration properties then hold e.g. Xn is uniform if s = 12 , and it has a Cantor marginal distribution if s = 13 . Much more stationary models may be in fact proved to be nonmixing; e.g. for integer valued models (3.6.2) this is simple to prove that Xt = 0 ⇒ X0 = 0 and P(X0 = 0) ∈]0, 1[. With stationarity this easily excludes this model to be strong mixing since, setting P(X0 = 0) := p, α(n) ≥ P (X0 = 0) ∩ (Xn = 0) − P(X0 = 0)P(Xn = 0) = p(1 − p) > 0.
Chapter 2
Weak dependence Many authors have used one of the two following type of dependence: on the one hand mixing properties, introduced by Rosenblatt (1956) [166], on the other hand martingales approximations or mixingales, following the works of Gordin (1969, 1973) [97], [98] and Mc Leisch (1974, 1975) [127], [129]. Concerning strongly mixing sequences, very deep and elegant results have been established: for recent works, we mention the books of Rio (2000) [161] and Bradley (2002) [30]. However many classes of time series do not satisfy any mixing condition as it is quoted e.g. in Eberlein and Taqqu (1986) [83] or Doukhan (1994) [61]. Conversely, most of such time series enter the scope of mixingales but limit theorems and moment inequalities are more difficult to obtain in this general setting. Between those directions, Bickel and B¨ uhlmann (1999) [18] and simultaneously Doukhan and Louhichi (1999) [67] introduced a new idea of weak dependence. Their notion of weak dependence makes explicit the asymptotic independence between ‘past’ and ‘future’; this means that the ‘past’ is progressively forgotten. In terms of the initial time series, ‘past’ and ‘future’ are elementary events given through finite dimensional marginals. Roughly speaking, for convenient functions f and g, we shall assume that Cov (f (‘past’), g(‘future’)) is small when the distance between the ‘past’ and the ‘future’ is sufficiently large. Such inequalities are significant only if the distance between indices of the initial time series in the ‘past’ and ‘future’ terms grows to infinity. The convergence is not assumed to hold uniformly on the dimension of the ‘past’ or ‘future’ involved. The main advantage is that such a kind of dependence contains lots of pertinent examples and can be used in various situations: empirical central limit theorems are proved in Doukhan and Louhichi (1999) [67] and Borovkova, Burton and 9
10
CHAPTER 2. WEAK DEPENDENCE
Dehling (2001) [25], while applications to Bootstrap are given by Bickel and B¨ uhlmann (1999) [18] and Ango Nz´e et al.(2002) [6] and to functional estimation (Coulon-Prieur & Doukhan, 2000 [40]). In this chapter a first section introduces the function spaces necessary to define the various dependence coefficients of the second section. They are classified in separated subsections. We shall first consider noncausal coefficients and then their causal counterparts; in both cases the subjacent spaces are Lipschitz spaces. A further case associated to bounded variation spaces is provided in the following subsection. Projective measure of dependence are included in the last subsection.
2.1
Function spaces
In this section, we give the definitions of some function spaces used in this book. • Let m be any measure on a measurable space (Ω, A). For any p ≥ 1, we denote by Lp (m) the space of measurable functions f from Ω to R such that
f p,m f ∞,m
1/p = |f (x)| m(dx) < ∞, = inf M > 0 m(|f | > M ) = 0 < ∞, for p = ∞. p
For simplicity, when no confusion can arise, we shall write Lp and · p instead of Lp (m) and · p,m . Let X be a Polish space and δ be some metric on X (X need not be Polish with respect to δ). • Let Λ(δ) be the set of Lipschitz functions from X to R with respect to the distance δ. For f ∈ Λ(δ), denote by Lip (f ), f ’s Lipschitz constant. Let Λ(1) (δ) = {f ∈ Λ(δ) / Lip (f ) ≤ 1}. • Let (Ω, A, P) be a probability space. Let X be a Polish space and δ be a distance on X . For any p ∈ [1, ∞], we say that a random variable X with values in X is Lp -integrable if, for some x0 in X , the real valued random variable δ(X, x0 ) belongs to Lp (P). Another type of function class will be used in this chapter: it is the class of functions with bounded variation on the real line. To be complete, we recall,
2.2. WEAK DEPENDENCE
11
Definition 2.1. A σ-finite signed measure is the difference of two positive σfinite measures, one of them at least being finite. We say that a function h from R to R is σ-BV if there exists a σ-finite signed measure dh such that h(x) = h(0) + dh([0, x[) if x ≥ 0 and h(x) = h(0) − dh([x, 0[) if x ≤ 0 (h is left continuous). The function h is BV if the signed measure dh is finite. Recall also the Hahn-Jordan decomposition: for any σ-finite signed measure μ, there is a set D such that μ+ (A) = μ(A ∩ D) ≥ 0,
−μ− (A) = μ(A\D) ≤ 0.
μ+ and μ− are mutually singular, one of them at least is finite and μ = μ+ −μ− . The measure |μ| = μ+ + μ− is called the total variation measure for μ. The total variation of μ writes as μ = |μ|(R). Now we are in position to introduce • BV1 the space of BV functions h : R → R such that dh ≤ 1.
2.2
Weak dependence
Let (Ω, A, P) be a probability space and let X be a Polish space. Let F= Fu and G = Gu , u∈N∗
u∈N∗
where Fu and Gu are two classes of functions from X u to R. u Definition 2.2. Let X and Y be two random variables with values in X and v X respectively. If Ψ is some function from F × G to R+ , define the F , G, Ψ dependence coefficient (X, Y ) by
(X, Y ) =
sup
f ∈Fu g∈Gv
|Cov(f (X), g(Y ))| . Ψ(f, g)
(2.2.1)
Let (Xn )n∈Z be a sequence of X -valued random variables. Let Γ(u, v, k) be the set of (i, j) in Zu × Zv such that i1 < · · · < iu ≤ iu + k ≤ j1 < · · · < jv . The dependence coefficient (k) is defined by
(k) = sup
sup
u,v (i,j)∈Γ(u,v,k)
((Xi1 , . . . , Xiu ), (Xj1 , . . . , Xjv )) .
The sequence (Xn )n∈Z is (F , G, Ψ)-dependent if the sequence ( (k))k∈N tends to zero. If F = G we simply denote this as (F , Ψ)-dependence. Remark 2.1. Definition 2.2 above easily extends to general metric sets of indices T equipped with a distance δ (e.g. T = Zd yields the case of random fields). The set Γ(u, v, k) is then the set of (i, j) in T u × T v such that k = min {δ(i , jm ) / 1 ≤ ≤ u, 1 ≤ m ≤ v } .
12
CHAPTER 2. WEAK DEPENDENCE
2.2.1
η, κ, λ and ζ-coefficients
In this section, we focus on the case where Fu = Gu . If f belongs to Fu , we define df = u. In a first time, Fu is the set of bounded functions from X u to R, which are Lipschitz with respect to the distance δ1 on X u defined by δ1 (x, y) =
u
δ(xi , yi ) .
(2.2.2)
i=1
In that case: • the coefficient η corresponds to Ψ(f, g) = df g∞ Lip (f ) + dg f ∞ Lip (g) ,
(2.2.3)
• the coefficient λ corresponds to Ψ(f, g) = df g∞ Lip (f ) + dg f ∞Lip (g) + df dg Lip (f )Lip (g) . (2.2.4) To define the coefficients κ and ζ, we consider for Fu the wider set of functions from X u to R, which are Lipschitz with respect to the distance δ1 on X u , but which are not necessarily bounded. In that case we assume that the variables Xi are L1 -integrable. • the coefficient κ corresponds to Ψ(f, g) = df dg Lip (f )Lip (g) ,
(2.2.5)
• the coefficient ζ corresponds to Ψ(f, g) = min(df , dg )Lip (f )Lip (g) .
(2.2.6)
These coefficients have some hereditary properties. For example, let h : X → R be a Lipschitz function with respect to δ, then if the sequence (Xn )n∈Z is η, κ, λ or ζ weakly dependent, then the same is true for the sequence (h(Xn ))n∈Z . One can also obtain some hereditary properties for functions which are not Lipschitz on the whole space X , as shown by Lemma 2.1 below, in the special case where X = Rk equipped with the distance δ(x, y) = max1≤i≤k |xi − yi |. Proposition 2.1 (Bardet, Doukhan, Le´on, 2006 [11]). Let (Xn )n∈Z be a sequence of Rk -valued random variables. Let p > 1. We assume that there exists some constant C > 0 such that max1≤i≤k Xi p ≤ C. Let h be a function from Rk to R such that h(0) = 0 and for x, y ∈ Rk , there exist a in [1, p[ and c > 0 such that |h(x) − h(y)| ≤ c|x − y|(|x|a−1 + |y|a−1 ) . We define the sequence (Yn )n∈Z by Yn = h(Xn ). Then,
2.2. WEAK DEPENDENCE
13
• if (Xn )n∈Z is η-weak dependent, then (Yn )n∈Z also, and p−a ηY (n) = O η(n) p−1 ; • if (Xn )n∈Z is λ-weak dependent, then (Yn )n∈Z also, and p−a λY (n) = O λ(n) p+a−2 . Remark 2.2. The function h(x) = x2 satisfies the previous assumptions with a = 2. This condition is satisfied by polynomials with degree a. Proof of Proposition 2.1. Let f and g be two real functions in Fu and Fv respectively. Denote x(M) = (x ∧ M ) ∨ (−M ) for x ∈ R. Now, for x = (x1 , . . . , xk ) ∈ (M) (M) Rk , we analogously denote x(M) = (x1 , . . . , xk ). Assume that (i, j) belong to the set Γ(u, v, r) defined in Definition 2.2. Define Xi = (Xi1 , . . . , Xiu ) and Xj = (Xj1 , . . . , Xjv ). We then define functions F : Ruk → R and G : Rvk → R through the relations: (M)
(M)
(M)
(M)
• F (Xi ) = f (h(Xi1 ), . . . , h(Xiu )), F (M) (Xi ) = f (h(Xi1 ), . . . , h(Xiu )), • G(Xj ) = g(h(Xj1 ), . . . , h(Xjv )), G(M) (Xj ) = g(h(Xj1 ), . . . , h(Xjv )). Then: |Cov(F (Xi ), G(Xj ))|
≤ |Cov(F (Xi ), G(Xj ) − G(M) (Xj ))| +|Cov(F (Xi ), G(M) (Xj ))| ≤ 2f ∞ E|G(Xj ) − G(M) (Xj ))| +2g∞ E|F (Xi ) − F (M) (Xi )| +|Cov(F (M) (Xi ), G(M) (Xj ))|
But we also have from the assumptions on h and Markov inequality, E|G(Xj ) − G(M) (Xj ))|
≤
Lip g
v
(M)
E|h(Xjl ) − h(Xjl
)|
l=1
≤ ≤
2c Lip g
v E |Xjl |a 1|Xjl |>M , l=1 p
2c v Lip gC M a−p .
The same thing holds for F . Moreover, the functions F (M) : Ruk → R and G(M) : Rvk → R satisfy Lip F (M) ≤ 2cM a−1 Lip (f ) and Lip G(M) ≤ 2cM a−1
14
CHAPTER 2. WEAK DEPENDENCE
Lip (g), and F (M) ∞ ≤ f ∞ , G(M) ∞ ≤ g∞ . Thus, from the definition of weak dependence of X and the choice of i, j , we obtain respectively, if M ≥ 1 Cov F (M) (Xi ), G(M) (Xj ) ≤ 2c(uLip (f )g∞ + vLip (g)f ∞ )M a−1 η(r), ≤ 2c(df Lip (f )g∞ + dg Lip (g)f ∞ )M a−1 λ(r) + 4c2 df dg Lip (f )Lip (g)M 2a−2 λ(r). Finally, we obtain respectively, if M ≥ 1: |Cov(F (Xi ), G(Xj ))| ≤ ≤
2c(uLip f g∞ + vLip gf ∞ ) × M a−1 η(r) + 2C p M a−p , c(uLip f + vLip g + uvLip f Lip g) ×(M 2a−2 λ(r) + M a−p ).
Choosing M = η(r)1/(1−p) and M = λ(r)−1/(p+a−2) respectively, we obtain the result. In the definition of the coefficients η, κ, λ and ζ, we assume some regularity conditions on Fu = Gu . In the case where the sequence (Xn )n∈Z is an adapted process with respect to some increasing filtration (Mi )i∈Z , it is often more suitable to work without assuming any regularity conditions on Fu . In that case Gu is some space of regular functions and Fu = Gu . This last case is called the causal case. In the situations where both Fu and Gu are spaces of regular functions, we say that we are in the non causal case.
2.2.2
θ and τ -coefficients
Let Fu be the class of bounded functions from Xu to R, and let Gu be the class of functions from Xu to R which are Lipschitz with respect to the distance δ1 defined by (2.2.2). We assume that the variables Xi are L1 -integrable. • The coefficient θ corresponds to Ψ(f, g) = dg f ∞ Lip (g) .
(2.2.7)
The coefficient θ has some hereditary properties. For example, Proposition 2.2 below gives hereditary properties similar to those given for the coefficients η and λ in Lemma 2.1. Proposition 2.2. Let (Xn )n∈Z be a sequence of Rk -valued random variables. We define the sequence (Yn )n∈Z by Yn = h(Xn ). The assumptions on (Xn )n∈Z and on h are the same as in Lemma 2.1. Then,
2.2. WEAK DEPENDENCE
15
• if (Xn )n∈Z is θ-weak dependent, (Yn )n∈Z also, and p−a θY (n) = O θ(n) p−1 . The proof of Proposition 2.2 follows the same line as the proof of Proposition 2.1 and therefore is not detailed. We shall see that the coefficient θ defined above belongs to a more general class of dependence coefficients defined through conditional expectations with respect to the filtration σ(Xj , j ≤ i). Definition 2.3. Let (Ω, A, P) be a probability space, and M be a σ-algebra of A. Let X be a Polish space and δ a distance on X . For any Lp -integrable random variable X (see § 2.1) with values in X , we define θp (M, X) = sup{E(g(X)|M) − E(g(X))p / g ∈ Λ(1) (δ)}.
(2.2.8)
Let (Xi )i∈Z be a sequence of Lp -integrable X -valued random variables, and let (Mi )i∈Z be a sequence of σ-algebras of A. On X l , we consider the distance δ1 defined by (2.2.2). The sequence of coefficients θp,r (k) is then defined by θp,r (k) = max ≤r
1 sup θp (Mi , (Xj1 , . . . , Xj )) . (i,j)∈Γ(1,,k)
(2.2.9)
When it is not clearly specified, we shall always take Mi = σ(Xk , k ≤ i). The two preceding definitions are coherent as proved below. Proposition 2.3. Let (Xi )i∈Z be a sequence of L1 -integrable X -valued random variables, and let Mi = σ(Xj , j ≤ i). According to the definition of θ(k) and to the definition 2.3, we have the equality θ(k) = θ1,∞ (k).
(2.2.10)
Proof of Proposition 2.3. The fact that θ(k) ≤ θ1,∞ (k) is clear since, for any f in Fu , g in Gv , and any (i, j) ∈ Γ(u, v, k), f (X , . . . , X ) g(X , . . . , X ) i1 iu j1 jv , Cov f ∞ vLip (g) g(X , . . . , X )
1
g(Xj1 , . . . , Xjv )
j1 jv ≤ E Miu − E
≤ θ1,∞ (k). v Lip (g) Lip (g) 1 To prove the converse inequality, we first notice that θ(Mi , (Xj1 , . . . , Xjv ) = lim θ (Mk,i , (Xj1 , . . . , Xjv )) , k→−∞
(2.2.11)
16
CHAPTER 2. WEAK DEPENDENCE
where Mk,i = σ(Xj , k ≤ j ≤ i). Now, letting f (Xk , . . . , Xi ) = sign E(g(Xj1 , . . . , Xjv )|Mk,i ) − E(g(Xj1 , . . . , Xjv )) , we have that, for (i, j) in Γ(1, v, k) and g in Λ(1) (δ1 ), E(g(Xj1 , . . . , Xjv )|Mk,i ) − E(g(Xj1 , . . . , Xjv ))1 = Cov(f (Xk , . . . , Xi ), g(Xj1 , . . . , Xjv )) ≤ vθ(k) . We infer that
1 θ(Mk,i , (Xj1 , . . . , Xjv ) ≤ θ(k) v and we conclude from (2.2.11) that θ1,∞ (k) ≤ θ(k). The proof is complete. Having in view the coupling arguments in § 5.3, we now define a variation of the coefficient (2.2.8) where we exchange the order of .p and the supremum. This is the same step as passing from α−mixing to β−mixing, which is known to ensure nice coupling arguments (see Berbee, 1979 [16]). Definition 2.4. Let (Ω, A, P) be a probability space, and M a σ-algebra of A. Let X be a Polish space and δ a distance on X . For any Lp −integrable (see § 2.1)) X -valued random variable X, we define the coefficient τp by:
τp (M, X) = sup (2.2.12) g(x)PX|M (dx) − g(x)PX (dx)
g∈Λ(1) (δ) p
where PX is the distribution of X and PX|M is a conditional distribution of X given M. We clearly have θp (M, X) ≤ τp (M, X) .
(2.2.13)
Let (Xi )i∈Z be a sequence of Lp -integrable X -valued random variables. The coefficients τp,r (k) are defined from τp as in (2.2.9).
2.2.3
˜ α, ˜ β˜ and φ-coefficients.
In the case where X = (Rd )r , we introduce some new coefficients based on indicator of quadrants. Recall that if x and y are two elements of Rd , then x ≤ y if and only if xi ≤ yi for any 1 ≤ i ≤ d. Definition 2.5. Let X = (X1 , . . . , Xr ) be a (Rd )r -valued random variable and M a σ-algebra of A. For ti in Rd and x in Rd , let gti ,i (x) = 1x≤ti − P(Xi ≤ ti ). Keeping the same notations as in Definition 2.4, define for t = (t1 , . . . , tr ) in (Rd )r , r r gti ,i (xi )PX|M (dx) and LX (t) = E gti ,i (Xi ). LX|M (t) = i=1
Define now the coefficients
i=1
2.2. WEAK DEPENDENCE
17
1. α ˜ (M, X) = sup LX|M(t) − LX (t)1 . t∈(Rd )r
˜ 2. β(M, X) = sup |LX|M (t) − LX (t)| . 1
t∈(Rd )r
˜ 3. φ(M, X) = sup LX|M(t) − LX (t)∞ . t∈(Rd )r
Remark 2.3. Note that if r = 1, d = 1 and δ(x, y) = |x − y|, then, with the above notation, τ1 (M, X) = LX|M (t)1 dt . The proof of this equality follows the same lines than the proof of the coupling property of τ1 (see Chapter 5, proof of Lemma 5.2). In the definition of the coefficients θ and τ , we have used the class of functions Λ(1) (δ). In the case where d = 1, we can define the coefficients α ˜ (M, X), ˜ ˜ β(M, X) and φ(M, X) with the help of bounded variation functions. This is the purpose of the following lemma: Lemma 2.1. Let (Ω, A, P) be a probability space, X = (X1 , . . . , Xr ) a Rr valued random variable and M a σ-algebra of A. If f is a function in BV1 , let f (i) (x) = f (x) − E(f (Xi )). The following relations hold:
r
r
(i)
(i) fi (Xi )M − E fi (Xi ) . 1. α ˜ (M, X) = sup
E
f1 ,...,fr ∈BV1
i=1
1
i=1
r
(i) ˜ 2. β(M, X) =
sup fi (xi ) PX|M − PX (dx) .
f1 ,...,fr ∈BV1
1
i=1
r r
(i) (i)
˜ 3. φ(M, X) = sup fi (Xi )|M − E fi (Xi ) .
E
f1 ,...,fr ∈BV1
i=1
i=1
∞
Remark 2.4. For r = 1 and d = 1, the coefficient α(M, ˜ X) was introduced by Rio (2000, equation 1.10c [161]) and used by Peligrad (2002) [140], while τ1 (M, X) was introduced by Dedecker and Prieur (2004a) [45]. Let α(M, σ(X)), β(M, σ(X)) and φ(M, σ(X)) be the usual mixing coefficients defined respectively by Rosenblatt (1956) [166], Rozanov and Volkonskii (1959) [187] and Ibragimov (1962) [110]. Starting from Definition 2.5 one can easily prove that ˜ ˜ α ˜ (M, X) ≤ 2α(M, σ(X)), β(M, X) ≤ β(M, σ(X)), φ(M, X) ≤ φ(M, σ(X)).
18
CHAPTER 2. WEAK DEPENDENCE
Proof of Lemma 2.1. Let fi be a function in BV1 . Assume without loss of generality that fi (−∞) = 0. Then (i) fi (x) = − (1x≤t − P(Xi ≤ t)) dfi (t) . Hence, k
(i) fi (xi )PX|M (dx)
i=1
k
= (−1)
k
k gti ,i (xi )PX|M (dx) dfi (ti ) ,
i=1
i=1
and the same is true for PX instead of PX|M . From these inequalities and the fact that |dfi |(R) ≤ 1, we infer that sup
f1 ,...,fk ∈BV1
k k (i) (i) fi (xi )PX|M (dx) − fi (xi )PX (dx) i=1
i=1
≤ sup |LX|M (t) − LX (t)| . t∈Rr
The converse inequality follows by noting that x → 1x≤t belongs to BV1 . The following proposition gives the hereditary properties of these coefficients. Proposition 2.4. Let (Ω, A, P) be a probability space, X an Rr -valued, random variable and M a σ-algebra of A. Let g1 , . . . , gr be any nondecreasing functions, and let g(X) = (g1 (X1 ), . . . , gr (Xr )). We have the inequalities α ˜ (M, g(X)) ≤ ˜ ˜ ˜ ˜ α ˜ (M, X), β(M, g(X)) ≤ β(M, X) and φ(M, g(X)) ≤ φ(M, X). In particu˜ (M, F (X)) = α ˜ (M, X), lar, if Fi is the distribution function of Xi , we have α ˜ ˜ ˜ ˜ β(M, F (X)) = β(M, X) and φ(M, F (X)) = φ(M, X). Notations 2.1. For any distribution function F , we define the generalized inverse as F −1 (x) = inf t ∈ R F (t) ≥ x . (2.2.14) For any non-increasing c` adl` ag function f : R → R we analogously define the generalized inverse f −1 (u) = inf{t/f (t) ≤ u}. Proof of Proposition 2.4. The fact that α ˜ (M, g(X)) ≤ α ˜ (M, X) is immediate, from its definition. We infer that α(M, ˜ F (X)) ≤ α ˜ (M, X). Applying the first result once more, we obtain that α ˜ (M, F −1 (F (X))) ≤ α ˜ (M, F (X)). −1 To conclude, it suffices to note that F ◦ F (X) = X almost surely, so that ˜ α ˜ (M, X) ≤ α ˜ (M, F (X)). Of course, the same arguments apply to β(M, X) ˜ and φ(M, X). We now define the coefficients α ˜ r (k), β˜r (k) and φ˜r (k) for a sequence of σalgebras and a sequence of Rd -valued random variables.
2.2. WEAK DEPENDENCE
19
Definition 2.6. Let (Ω, A, P) be a probability space. Let (Xi )i∈Z be a sequence of Rd -valued random variables, and let (Mi )i∈Z be a sequence of σ-algebras of A. For r ∈ N∗ and k ≥ 0, define α ˜ r (k) = max
sup
1≤l≤r (i,j)∈Γ(1,l,k)
α ˜ (Mi , (Xj1 , . . . , Xjl )) .
(2.2.15)
The coefficients β˜r (k) and φ˜r (k) are defined in the same way. When it is not clearly specified, we shall always take Mi = σ(Xk , k ≤ i).
2.2.4
Projective measure of dependence
Sometimes, it is not necessary to introduce a supremum over a class of functions. We can work with the simple following projective measure of dependence Definition 2.7. Let (Ω, A, P) be a probability space, and M a σ-algebra of A. Let p ∈ [1, ∞]. For any Lp −integrable real valued random variable define γp (M, X) = E(X|M) − E(X)p .
(2.2.16)
Let (Xi )i∈Z be a sequence of Lp −integrable real valued random variables, and let (Mi )i∈Z be a sequence of σ-algebras of A. The sequence of coefficients γp (k) is then defined by (2.2.17) γp (k) = sup γp (Mi , Xi+k ) . i∈Z
When it is not clearly specified, we shall always take Mi = σ(Xk , k ≤ i). Remark 2.5. Those coefficients are defined in Gordin (1969) [97], if p ≥ 2 and in Gordin (1973) [98] if p = 1. Mc Leish (1975a) [128] and (1975b) [129] uses these coefficients in order to derive various limit theorems. Let us notice that γp (M, X) ≤ θp (M, X) .
(2.2.18)
Chapter 3
Models The chapter is organized as follows: we first introduce Bernoulli shifts, a very broad class of models that contains the major part of processes derived from a stationary sequence. As an example, we define the class of Volterra processes that are multipolynomial transformation of the stationary sequence. We will discuss the dependence properties of Bernoulli shifts, whether the initial is a dependent or independent sequence. When the innovation sequence is independent, we will distinguish between causal and non-causal processes. After these general properties, we focus on Markov models and some of their extensions, as well as dynamical systems which may be studied as Markov chains up to a time reversal. After this we shall consider LARCH(∞)-models which are built by a mix of definition of Volterra series and Markov processes and will provide an attractive class of non linear and non Markovian times series. To conclude, we consider associated processes and we review some other types of stationary processes or random fields which satisfy some weak dependence condition.
3.1
Bernoulli shifts
Definition 3.1. Let H : RZ → R be a measurable function. Let (ξn )n∈Z be a strictly stationary sequence of real-valued random variables. A Bernoulli shift with innovation process (ξn )n∈Z is defined as Xn = H ((ξn−i )i∈Z ) ,
n ∈ Z.
(3.1.1)
This sequence is strictly stationary. Remark that the expression (3.1.1) is certainly not always clearly defined; as H is a function depending on an infinite number of arguments, it is generally given in form of a series, which is usually only defined in some Lp space. In order to 21
22
CHAPTER 3. MODELS
define (3.1.1) in a general setting, we denote for any subset J ⊂ Z: H (ξ−j )j∈J = H (ξ−j 1j∈J )j∈Z . For finite subsets J this expression is generally well defined and simple to handle. In order to define such models in Lm we may assume ∞
ωm,n < ∞,
(3.1.2)
n=1
where, for some m ≥ 1: m m ωm,n = E H (ξ−j )|j|≤n − H (ξ−j )|j|
(3.1.3)
This condition indeed proves that the sequence H (ξ−j )|j|≤n has the Cauchy property and thus converges in the Banach space Lm of the classes of random variables with a finite moment order m. In fact the strict definition of the function H as an element of the space Lm (RZ , B(RZ ), μ) is the following. Denote by μ the distribution of a process ξ = (ξt )t∈Z . The measure μ is a probability distribution on the measurable space (RZ , B(RZ )). If as before we assume that ξ is stationary, that S ⊂ R is the support of the distribution of ξ0 , and that S ⊂ RZ is the support of the distribution of the sequence ξ, then the random variable H defines a function over S ⊃ S (Z) where S (Z) is the set of sequences with values 0 excepted for finitely many indices. Now given a function defined over R(Z) the previous condition (3.1.2) ensures that such a function may be extended to a function H ∈ Lm (RZ , B(RZ ), μ). Dependence properties. No mixing properties have been derived for such models excepted for the simple case of m-dependent Bernoulli shifts, i.e. when H depends only on a finite number of variables.
3.1.1
Volterra processes
The most simple case of infinitely dependent Bernoulli shift is the infinite moving average process with independent innovations: Xt =
∞
ai ξt−i
(3.1.4)
−∞
This simple case is generalised by Volterra processes defined with use of polynomials of the innovation process. A Volterra process is a stationary process
3.1. BERNOULLI SHIFTS
23
defined through a convergent Volterra expansion Xt
= v0 +
∞
Vk;t ,
where
(3.1.5)
k=1
Vk;t
=
ak;i1 ,...,ik ξt−i1 · · · ξt−ik ,
(3.1.6)
i1 <···
and v0 denotes a constant and (ak;i1 ,...,ik )(i1 ,...,ik )∈Zk are real numbers for each k ≥ 1. Let p ≥ 1, then this expression converges in Lp , provided that E|ξ0 |p < ∞ and the weights satisfy ∞
p
|ak;i1 ,...,ik | < ∞.
k=1 i1 <···
If the sequences ak;i1 ,...,ik = 0 as i1 < 0 then the process is causal in the sense that Xt is measurable with respect to σ{ξi , i ≤ t}. In this case t may be seen as the usual time σ{ξi , i ≤ t} denotes the history at epoch t. Assume now that p = 2, E ξ0 = 0 and E ξi2 = 1, then the k−th order homogeneous chaotic processes Vk;t are pairwise orthogonal, and it is thus enough to prove the existence of such homogeneous processes (3.1.6) in L2 in order to obtain the existence of the more general Volterra processes (3.1.5). Normal convergence of Vk;t follows clearly from the convergence of the series defining its variance Γ2k Γ2k = a2i1 ,...,ik−1 ,ik < ∞. 0≤i1 <···
For the general infinite order Volterra series (3.1.5), the corresponding variance is trivially related by orthogonality: 2
Γ =
∞
Γ2k < ∞.
k=1
The formula defining Volterra processes can be generalized to expansions Xt = v0 +
∞
Vk;t , where Vk;t =
(i1 ,...,ik
k=1
ak;i1 ,...,ik ξt−i1 · · · ξt−k ,
(3.1.7)
)∈Zk
and v0 denotes a constant and (ak;i1 ,...,ik )(i1 ,...,ik )∈Zk are real numbers for each k ≥ 1. The major difference with the preceding definition is the fact that the indices in the product are not all different. Let p ≥ 1, then the series converges in Lp provided that the weights satisfy ∞ k=1
E|ξ0 |pk
i1 ≤···≤ik
p
|ak;i1 ,...,ik | < ∞.
(3.1.8)
24
CHAPTER 3. MODELS
The drawback of this generalization is the loss the properties of orthogonality and moments derived for the models (3.1.5). It is but possible to rewrite the process as a sum of orthogonal term by relaxing the condition of identical distribution and independence of the innovation process. Consider more general Volterra processes defined under the same condition (3.1.8) by Vk;t =
(k)
(k,1)
(k,k)
aj1 ,...,jk ξt−j1 · · · ξt−jk
(3.1.9)
j1 <···<jk (k,l)
For a fixed k > 0, the series (ξt )t∈Z are i.i.d. and mutually orthogonal for l ≤ k. Clearly, models (3.1.5) have this form but it is also interesting to see that models (3.1.7) may also be written as sums of such models. Consider an expansion (3.1.7), we may assume without loss of generality that j1 ≤ · · · ≤ jk and that Eξ0 = 0; we replace each power of an innovation variable by its decomposition on the Appell polynomial of the distribution of ξ0 . For example the squares will be replaced by 2 2 = (ξt−i − σ 2 ) + σ 2 = A2 (ξt−i ) + σ 2 . ξt−i
For higher order polynomials, recall that Appell polynomials (see e.g. Doukhan, k +· · · in such a way that EAk (ξt )P (ξt ) = 2002 [62]) are defined as Ak (ξt−i ) = ξt−i 0 if the degree of the polynomial P is less than k. Replacing all the powers with the help of such Appell polynomial leads to a decomposition (3.1.9) in orthogonal terms. Dependence properties. Note that such models may have no weak dependence properties, as in the case of simple moving averages, see Doukhan, Oppenheim and Taqqu (2003) [72] for a thorough survey of strongly dependent Volterra processes. No mixing property have been derived in the general case. The degenerated case of m dependence, when Vt depends only on the ξt−i for i = 1 to m, so that only a finite number of coefficients in each series are nonzero, satisfies any of the mixing properties. The mixing properties of causal linear processes corresponding to the term V1;t with ai1 = 0 when i1 < 0 were derived under the strong additional assumption that ξ0 ’s distribution admits a density which is itself an absolutely continuous function; see Doukhan, 1994 [61] for references, in this monograph the proof of mixing for non causal linear processes is not complete.
3.1.2
Noncausal shifts with independent inputs
Assume here that the shift is well defined and that the sequence of innovations (ξt ) is i.i.d.
3.1. BERNOULLI SHIFTS
25
Dependence properties. In order to prove weak dependence properties, once the existence and measurability of the function H is ensured, it is sufficient to assume the sequence {δr }r∈N defined by: E H (ξt−j , j ∈ Z) − H ξt−j 1|j|≤r , j ∈ Z = δr (3.1.10) converges to 0 as r tends to infinity. Note that a simple bound for δr is δr ≤
i>r ω1,i . The following elementary lemma is easily proved: Lemma 3.1. Bernoulli shifts are η−weakly dependent with η(r) ≤ 2δ[r/2] . Proof. (s)
(s)
(s)
Let Xn = H((ξn−i 1|i|≤s )). Clearly, the two sequences (Xn )n≤i and
(Xn )n≥i+r are independent if r > 2s; now consider Cov(f , g) for the functions f = f (Xi1 , . . . , Xiu ), g = g(Xj1 , . . . , Xjv ), where f and g are bounded and f, g ∈ Λ(1) (| · |1 ) with | · |1 defined by (2.2.2). Let i1 ≤ · · · ≤ iu and j1 ≤ · · · ≤ jv (s) (s) such that j1 − iu > 2s. From the previous remark, f (s) = f (Xi1 , . . . , Xiu ) and (s)
(s)
g(s) = g(Xj1 , . . . , Xjv ) are independent, and consequently |Cov(f , g)| ≤ Cov(f − f (s) , g) + Cov(f (s) , g − g(s) ) ≤ 2g∞ E f − f (s) + 2f ∞E g − g(s) ≤ 2g∞ Lip f
u
v (s) (s) E Xit − Xit + 2f ∞Lip g E Xjt − Xjt
t=1
t=1
≤ 2(ug∞Lip f + vf ∞ Lip g)δs .
The sequence (δk )k is related to the modulus of uniform continuity of H. Under the following regularity conditions: ai |ui − vi |b , |H(ui , i ∈ Z) − H(vi , i ∈ Z)| ≤ i∈Z
0 < b ≤ 1 and if the sequence (ξi )i∈Z for some non negative constants (ai )i∈Z , has finite b-th order moment, then δk ≤ ai E|ξi |b . |i|>k
Recall here that processes can be η-weakly dependent and nonmixing, see § 1.5.
3.1.3
Noncausal shifts with dependent inputs
The condition of independent inputs ξ may be relaxed. E.g. in eqn. (3.1.4), instead of independence, assume that the sequence (ξn )n∈Z is ηξ -weak dependent
26
CHAPTER 3. MODELS
then the process (Xn )n∈Z is η-weak dependent with η(r) ≤ ηξ (r/2) + δr/2 . Such an heredity property of weak dependence is unknown under mixing. A general statement of this property is provided below in lemma 3.3. Let us now note by (ξi )i∈Z a weakly dependent innovation process. The coefficient λ is proved to very useful to study Bernoulli shifts Xn = H(ξn−j , j ∈ Z) with weakly dependent innovation process (ξi )i from the forthcoming lemma (see Doukhan and Wintenberger, 2005 [77]). Let H : RZ → R be a measurable function and Xn = H(ξn−i , i ∈ Z). In order to define Xn , we assume that H satisfies: for each s ∈ Z, if x, y ∈ RZ satisfy xi = yi for each index i = s |H(x) − H(y)| ≤ bs (sup |xi |l ∨ 1)|xs − ys | i =s
(3.1.11)
where z is defined by zs = 0 and zi = xi = yi for each i = s. This assumption is stronger than in the case of independent innovations (see equation (3.1.10)). The following lemma proves the existence of such models: Lemma 3.2. Let Xn = H(ξn−i , i ∈ Z) be a Bernoulli shift such that H : RZ → R l ≥ 0 and some sequence bs ≥ 0 such that
satisfies the condition (3.1.11) with m |s|b < ∞. Assume that E|ξ | < ∞ with lm + 1 < m for some m > 2. s 0 s Then Xn = H(ξn−i , i ∈ Z) is a strongly stationary process, well defined in Lm . The existence of example (3.1.4) was stated without proof, we now precise more involved examples of Bernoulli shifts with dependent innovations: Example 3.1 (Volterra models with dependent inputs.). Consider H(x) =
K
(k)
aj1 ,...,jk xj1 · · · xjk ,
k=0 j1 ,...,jk
then if x, y are as in eqn. (3.1.11): (k) H(x)−H(y) = aj1 ,...,ju−1 ,s,ju+1 ,...,jk xj1 · · · xju−1 (xs −ys )xju+1 · · · xjk . 1 ≤ u ≤ k ≤ K j1 , . . . , ju−1 ju+1 , . . . , jk
From the triangular inequality we derive that suitable constants in condition (3.1.11) may be chosen as l = K − 1 and bs =
K (k,s)
(k)
|aj1 ,...,jk |,
k=1
(k,s)
stands for the sums over all indices in Zk and one of the indices where j1 , . . . , jk takes the value s.
3.1. BERNOULLI SHIFTS
27
Example 3.2 (Uniform Lipschitz Bernoulli shifts). Assume that the condition (3.1.11) holds with l = 0, then the previous result still holds. An example of such a situation is the case of LARCH(∞) non-causal processes with bounded (m = +∞) and dependent stationary innovations. Proof of lemma 3.2. We first prove the existence of Bernoulli shift with dependent innovations in L1 . The same proof leads to the existence in Lm for all m ≥ 1 such that lm + 1 ≤ m . Here we set ξ (s) = (ξ−i 1|i|<s )i∈Z and (s)
ξ+ = (ξ−i 1−s
With (3.1.11) we obtain |H(ξ (1) ) − H(0)| ≤ |H(ξ
(s+1)
)−
(s) |H(ξ+ )
(s) H(ξ+ )|
− H(ξ
(s)
≤
| ≤
b0 |ξ0 |, b−s ( sup |ξ−i |l ∨ 1)|ξ−s |, −s
bs ( sup |ξ−i |l ∨ 1)|ξs |. |i|<s
H¨older inequality yields ∞ (s) (s) E H(ξ (1) ) − H(0) + E H(ξ (s+1) ) − H(ξ+ ) + E H(ξ+ ) − H(ξ (s) ) s=1
≤
2|i|bi (ξ0 1 + ξ0 l+1 l+1 ). (3.1.12)
i∈Z
Hence assumption l + 1 ≤ m with i∈Z |i|bi < ∞ together imply that the variable H(ξ) is well defined. The same way proves that the process Xn = H(ξn−i , i ∈ Z) is a well defined process in L1 and that it is strongly stationary. We can extend this result in Lm for all m ≥ 1 such that lm + 1 ≤ m . Dependence properties. Such models are proved to exhibit either λ- or η-weak dependence properties, as described below. Lemma 3.3. Assume that the conditions of lemma 3.2 are satisfied
28
CHAPTER 3. MODELS • if the innovation process (ξi )i∈Z is λ-weakly dependent (with coefficients λξ (r)), then Xn is λ-weakly dependent with m −1−l λ(k) = c inf |i|bi ∨ (2r + 1)2 λξ (k − 2r) m −1+l . r≤[k/2]
i≥r
• if the innovation process (ξi )i∈Z is η-weakly dependent (with coefficients ηξ (r)) then Xn is η-weakly dependent and there exists a constant c > 0 such that ⎛ ⎞ m −2 1+ ml−1 −1 ⎝ ⎠ m |i|bi ∨ (2r + 1) ηξ (k − 2r) η(k) = c inf . r≤[k/2]
i≥r
Because Bernoulli shifts of κ-weak dependent innovations are neither κ- nor ηweakly dependent, the case of κ dependent innovation is here included in that of λ dependent inputs. The proof of lemma 3.2 will be given below. If the weak dependence coefficients of ξ are regularly decreasing, it is easy to explicit the decay of the weak dependence coefficients of X: Proposition 3.1. Here λ > 0 and η > 0 are constants which can differ in each case. • If bi = O i−b for some b > 2 and λξ (i) = O i−λ , resp. ηξ (i) = we O (i−η ) (as i ↑ ∞) then from a simple calculation, optimize both −1−l −λ(1− 2b ) m m −1+l terms in order to prove that λ(k) = O k , resp. η(k) =
O k
(b−2)(m −2) −η (b−1)(m −1)−l
. Note that in the case m = ∞ this exponent is ar-
bitrarily close to λ for large values of b > 0 and takes all possible values between 0 and λ. ηξ (i) = • If bi = O e−ib for some b > 0 andλξ (i) = O e−iλ , resp. b(m −1−l) −iη −λk b(m −1+l)+2η(m −1−l) O e , resp. (as i ↑ ∞) we have λ(k) = O k 2 e b(m −2) m −1−l −ηk b(m −1)+2η(m −2) η(k) = O k m −1 e . The geometrical decays for both (bi )i and coefficients of the innovations ensure the geometric decay of the weakly dependence coefficient of the Bernoulli shift. • If the Bernoulli shift have a geometric decay, say bi = O e−ib −λcoefficients , resp. ηξ (i) = O (i−η ) (as i ↑ ∞) we find and λξ (i) = O i
3.1. BERNOULLI SHIFTS
29
m −1−l m −2 l λ(k) = O (log k)2 k −λ m −1+l , resp. η(k) = O (log k)1+ m −1 k −η m −1 . If m = ∞ this means that we only lose at most a factor log2 k with respect to the dependence coefficients of the input dependent series (ξi )i . Proof of lemma 3.3. We exhibit some Lipschitz function by using a convenient truncation. Write ξ = ξ ∨ (−T ) ∧ T for a truncation T which will be precisely (r) stated later. As in the proof of Lemma 3.1, we denote by Xn = H((ξn−i 1|i|≤r )) (r)
and X n = H((ξ n−i 1|i|≤s )). Furthermore, for any k ≥ 0 and any (u + v)-tuples such that s1 < · · · < su ≤ su + k ≤ t1 < · · · < tv , set Xs = (Xs1 , . . . , Xsu ), (r)
(r)
(r)
(r)
(r)
(r)
Xt = (Xt1 , . . . , Xtv ) and X s = (X s1 , . . . , X su ), X t = (X t1 , . . . , X tv ). Then we have for all f, g satisfying f ∞ , g∞ ≤ 1 and Lip f + Lip g < ∞: |Cov(f (Xs ), g(Xt ))| ≤ + +
(r)
|Cov(f (Xs ) − f (X s ), g(Xt ))|
(3.1.13)
(r) (r) |Cov(f (X s ), g(Xt ) − g(X t ))| (r) (r) |Cov(f (X s ), g(X t ))|.
(3.1.14) (3.1.15)
Using that g∞ ≤ 1, the term (3.1.13) in the sum is bounded by u (r) 2Lip f · E Xsi − X si i=1
≤ 2 u Lip f
+ max EXs(r) − X (r) . max EXsi − Xs(r) si i i
1≤i≤u
1≤i≤u
With the same arguments as in the proof of the existence of H(ξ (∞) ) (see equation (3.1.12)), the first term in the right hand side is bounded by (ξ0 1 +
) 2|i|b ξ0 l+1 i . Notice now that if x, y are sequences with xi = yi = 0 if l+1 i≥s |i| ≥ r then a repeated application of the previous inequality (3.1.11) yields |H(x) − H(y)| ≤ L(xl∞ ∨ yl∞ ∨ 1)x − y
(3.1.16)
where L = i∈Z |i|bi < ∞. The second term is bounded by using (3.1.16): (r) (r) (r) ξ = E − H − X ξ E Xs(r) H si i ⎛ ⎞ l ≤ LE ⎝ max |ξi | {|ξj |1ξj ≥T }⎠ −r≤i≤r
≤ L(2r + 1)2 E 2
−r≤j≤r
max
−r≤i,j≤r
|ξi |l {|ξj |1|ξj |≥T }
l+1−m ≤ L(2r + 1) ξ0 m m T
30
CHAPTER 3. MODELS
The term (3.1.14) is analogously bounded. We write (3.1.15) as (r) (r) Cov(F (ξsi +j , 1 ≤ i ≤ u, |j| ≤ r), G (ξti +j , 1 ≤ i ≤ v, |j| ≤ r) , (r)
(r)
where F : Ru(2r+1) → R and G : Ru(2r+1) → R. If r ≤ [k/2], assume that ξ is η weakly dependent (resp. λ-weak dependent) to bound this covariance by (r)
(r)
ψ(Lip F , Lip G , u(2r + 1), v(2r + 1)) k−2r , where ψ(u, v, a, b) = uvab and
i = ηi (resp. ψ(u, v, a, b) = uvab + ua + vb and i = λi ). Let x = (x1 , . . . , xu ) and y = (y1 , . . . , yu ) where xi , yi ∈ R2r+1 ; a bound for Lip F supremum over sequences x, y of:
(r)
writes as the
|f (H(xsi +l , 1 ≤ i ≤ u, |l| ≤ r) − f (H(ysi +l , 1 ≤ i ≤ u, |l| ≤ r)|
u . j=1 xj − yj Using (3.1.16), we have: |F
(r)
(x) − F
(r)
(y)|
≤ Lip f L
u l xsi ∞ ∨ ysi ∞ ∨ 1 xsi − y si i=1
≤ Lip f LT l
u
|xsi +l − ysi +l |.
i=1 −r≤l≤r
Hence Lip F (r) ≤ Lip f · L · T l and, similarly, Lip G(r) ≤ Lip g · L · T l . • Under η-weak dependence, we bound the covariance as: |Cov(f (Xs ), g(Xt ))| ≤ (uLip f + vLip g) |i|bi (ξ0 1 + ξ0 l+1 × 4 l+1 ) i≥r
l+1−m + (2r + 1)L (2r + 1)2ξ0 m + T l ηξ (k − 2r) m T We then fix the truncation T m −1 = 2(2r + 1)ξ0 m m ηξ (k − 2r) to obtain the result of the lemma 3.3 in the η-weak dependent case. • Under λ-weak dependence, we obtain: |Cov(f (Xs ), g(Xt ))| ≤ (uLip f + vLip g + uvLip f Lip g) |i|bi (ξ0 1 + ξ0 l+1 × 4 l+1 ) i≥r
l + (2r + 1)L 2(2r + 1)T l+1−m ξ0 m + T λ (k − 2r) ξ m ∨ (2r + 1)2 L2 T 2l λξ (k − 2r)
With the truncation such that T l+m −1 = 2ξ0 m m /(Lλξ (k − 2r)), we obtain the result of the lemma 3.3 in the present η-weak dependent case.
3.1. BERNOULLI SHIFTS
3.1.4
31
Causal shifts with independent inputs
Let (ξi )i∈Z be a stationary sequence of random variables with values in a measurable space X . Assume that there exists a function H defined on a subset of X N , with values in R and such that H(ξ0 , ξ−1 , ξ−2 , . . .) is defined almost surely. The stationary sequence (Xn )n∈Z defined by Xn = H(ξn , ξn−1 , ξn−2 , . . .)
(3.1.17)
is called a causal function of (ξi )i∈Z . In this section, we assume that (ξi )i∈Z is i.i.d. In this causal case, another way to define a coupling coefficient is to consider a non increasing sequence (δ˜p,n )n≥0 (p may be infinite) such that ˜ n p , δ˜p,n ≥ Xn − X
(3.1.18)
˜ t = H(ξ˜t , ξ˜t−1 , ξ˜t−2 , . . .), ξ˜n = ξn if n > 0 and ξ˜n = ξ for n ≤ 0 for an where X n ˜ t has the same distribution as Xt independent copy (ξt )t∈Z of (ξt )t∈Z . Here X and is independent of M0 = σ(Xi , i ≤ 0). Dependence properties. In this section, we shall use the results of chapter 5 to give upper bounds for the coefficients θp,∞ (n), τp,∞ (n), α ˜ k (n), β˜k (n) and ˜ φk (n). More precisely, we have that 1. θp,∞ (n) ≤ τp,∞ (n) ≤ δ˜p,n . 2. Assume that X0 has a continuous distribution function with modulus of continuity w. Let gp (y) = y(w(y))1/p . For any 1 ≤ p < ∞, we have α ˜ k (n) ≤ β˜k (n) ≤ 2k
δ˜p,n p. gp−1 (δ˜p,n )
In particular, if X0 has a density bounded by K, we obtain the inequality β˜k (n) ≤ 2k(K δ˜p,n )p/(p+1) . 3. Assume that X0 has a continuous distribution function, with modulus of uniform continuity w. Then α ˜k (n) ≤ β˜k (n) ≤ φ˜k (n) ≤ k w(δ˜∞,n ). 4. For φ˜k (n) it is sometimes interesting to use the coefficient
= E(|Xn − Xn∗ |p |M0 )1/p δp,n ∞ .
With the same notations as in point 2, φ˜k (n) ≤ 2k
δp,n p. −1
gp (δp,n )
32
CHAPTER 3. MODELS
The point 1. can be proved by using Lemma 5.3. The points 2. 3. and 4. can be proved as the point 3. of Lemma 5.1, by using Proposition 5.1 and Markov inequality. aj ξn−j . One Application (causal linear processes). In that case Xn =
can take δp,n = 2ξ0 p
j≥0
|aj |. For p = 2 and Eξ0 = 0 one may set
j≥n
12
= 2Eξ02 a2j . δ2,n j≥n
For instance, if ai = 2−i−1 and ξ0 ∼ B(1/2), then δi,∞ ≤ 2−i . Since X0 is uniformly distributed over [0, 1], we have φ˜1 (i) ≤ 2−i . Recall that this sequence is not strongly mixing (see section 1.5). Remark 3.1. By interpreting causal Bernoulli shifts as physical systems, denoted Xt = g(. . . , t−1 , t ) Wu (2005) [188] introduces physical dependence coefficients quantifying the dependence of outputs (Xt ) on inputs ( t ). He considers the nonlinear system theory’s coefficient δ¯t = g(. . . , 0 , . . . , t−1 , t ) − g(. . . , −1 , 0 , . . . , t−1 , t )2 with an independent copy of . This provides a sharp framework for the study of the question of CLT random processes and shed new light on a variety of problems including estimation of linear models with dependent errors in Wu (2006) [191], nonparametric inference of time series in Wu (2005) [192], representations of sample quantiles (Wu 2005 [189]) and spectral estimation (Wu 2005 [190]) among others. This specific L2 -formulation is rather adapted to CLT and it is not directly possible to compare it with τ -dependence because coupling is given here with only one element in the past. Justification of the Bernoulli shift representation follows from Ornstein (1973) [138].
3.1.5
Causal shifts with dependent inputs
In this section, the innovations are not required to be i.i.d., but the method introduced in the preceding section still works. More precisely, assume that there exists a stationary sequence (ξi )i∈Z distributed as (ξi )i∈Z and independent ˜ n = H(ξ , ξ , ξ , . . .). Clearly X ˜ n is independent of of (ξi )i≤0 . Define X n n−1 n−2 M0 = σ(Xi , i ≤ 0) and distributed as Xn . Hence one can apply the result of Lemma 5.3: if (δ˜p,n )n≥0 is a non increasing sequence satisfying (3.2.2), then the upper bounds 1. 2. 3. and 4. of the preceding section hold. In particular, these results apply to the case where the sequence (ξi )i∈Z is βmixing. According to Theorem 4.4.7 in Berbee (1979) [16], if Ω is rich enough,
3.2. MARKOV SEQUENCES
33
there exists (ξi )i∈Z distributed as (ξi )i∈Z and independent of (ξi )i≤0 such that P(ξi = ξi for some i ≥ k) = β(σ(ξi , i ≤ 0), σ(ξi , i ≥ k)).
Application (causal linear processes). In that case Xn = j≥0 aj ξn−j . For any p ≥ 1, we can take the non increasing sequence (δ˜p,n )n≥0 such that δ˜p,n ≥ ξ0 − ξ0 p
|aj | +
n−1
|aj |ξi−j − ξi−j p ≥
j=0
j≥n
|aj |ξi−j − ξi−j p .
j≥0
From Proposition 2.3 in Merlev`ede and Peligrad (2002) [130], one can take δ˜p,n ≥ ξ0 − ξ0 p
j≥n
|aj | +
n−1
|aj | 2p
j=0
β(σ(ξk ,k≤0),σ(ξk ,k≥i−j))
0
1/p Qpξ0 (u) du ,
where Qξ0 is the generalized inverse of the tail function x → P(|ξ0 | > x) (see Lemma 5.1 for the precise definition).
3.2
Markov sequences
Let (Xn )n≥1−d be sequence of random variables with values in a Banach space (B, · ). Assume that Xn satisfies the recurrence equation Xn = F (Xn−1 , . . . , Xn−d ; ξn ).
(3.2.1)
where F is a measurable function with values in B, the sequence (ξn )n>0 is i.i.d. and (ξn )n>0 is independent of (X0 , . . . , Xd−1 ). Note that if Xn satisfies (3.2.1) then the random variable Yn = (Xn , . . . , Xn−d+1 ) defines a Markov chain such that Yn = M (Yn−1 ; ξn ) with M (x1 , . . . , xd ; ξ) = (F (x1 , . . . , xd ; ξ), x1 , . . . , xd−1 ). Dependence properties. Assume that (Xn )n≤d−1 is a stationary solution ˜0 , . . . , X ˜ 1−d ) to (3.2.1). As previously, let Y0 = (X0 , . . . , X1−d ), and let Y˜0 = (X be and independent vectors with the same law as Y0 (that is a distribution ˜ n = F (X ˜ n−1 , . . . , X ˜ n−d ; ξn ). Clearly, for n > 0, invariant by M ). Let then X ˜ n is distributed as Xn and independent of M0 = σ(Xi , 1 − d ≤ i ≤ 0). As X in the preceding sections, let (δ˜p,n )n≥0 (p may be infinite) be a non increasing sequence such that ˜ n p )1/p . (3.2.2) δ˜p,n ≥ (EXn − X Applying Lemma 5.3, we infer that θp,∞ (n) ≤ τp,∞ (n) ≤ δ˜p,n .
34
CHAPTER 3. MODELS
For the coefficients α ˜ k (n), β˜k (n) and φ˜k (n) in the case B = R, the upper bounds 2., 3., and 4. of section 3.1.4 also hold under the same conditions on the distribution function of X0 . Assume that, for some p ≥ 1, the function F satisfies 1
(EF (x; ξ1 ) − F (y; ξ1 )p ) p ≤
d
ai xi − yi ,
i=1
d
ai < 1.
(3.2.3)
i=1
Then one can prove that the coefficient τp,∞ (n) decreases at an exponential rate. Indeed, we have that δ˜p,n ≤
d
ai δ˜p,n−i .
i=1
For two vectors x, y in R , we write x ≤ y if xi ≤ yi for any 1 ≤ i ≤ d. Using this notation, we have that (δ˜p,n , . . . , δ˜p,n−d+1 )t ≤ A(δ˜p,n−1 , . . . , δ˜p,n−d )t , d
with the matrix A equal to ⎛ a1 ⎜1 ⎜ ⎜0 ⎜ ⎜· ⎜ ⎜· ⎜ ⎝· 0
a2 0 1 · · · 0
· · · · · · ·
· · · · · · ·
· · · · · · ·
ad−1 0 0 · · · 1
⎞ ad 0⎟ ⎟ 0⎟ ⎟ ·⎟ ⎟ ·⎟ ⎟ ·⎠ 0
Iterating this inequality, we obtain that (δ˜p,n , . . . , δ˜p,n−d+1 )t ≤ An (δ˜p,0 , . . . , δ˜p,1−d )t .
d Since i=1 ai < 1, the matrix A has a spectral radius strictly smaller than 1. Hence, we obtain that there exists C > 0 and ρ in [0, 1[ such that δ˜p,n ≤ Cρn . Consequently θp,∞ (n) ≤ τp,∞ (n) ≤ Cρn . If B = R and the condition (3.2.3) holds for p = 1, and if the distribution function FX of X0 is such that |FX (x) − FX (y)| ≤ K|x − y|γ for some γ in ]0, 1], then we have the upper bound α ˜ k (n) ≤ β˜k (n) ≤ 2kK 1/(γ+1)C γ/(γ+1) ρnγ/(γ+1) . If the condition (3.2.3) holds for p = ∞, then α ˜ k (n) ≤ β˜k (n) ≤ φ˜k (n) ≤ kKC α ργn . We give below some examples of the general situation described in this section.
3.2. MARKOV SEQUENCES
3.2.1
35
Contracting Markov chain.
For simplicity, let B = Rd , and let · be a norm on Rd . Let F be a Rd valued function and consider the recurrence equation Xn = F (Xn−1 ; ξn ).
(3.2.4)
Assume that Ap = EF (0; ξ1 )p < ∞ and EF (x, ξ1 ) − F (y, ξ1 )p ≤ ap x − yp , (3.2.5) for some a < 1 and p ≥ 1. Duflo (1996) [81] proves that condition (3.2.5) implies that the Markov chain (Xi )i∈N has a stationary law μ with finite moment of order p. In the sequel, we suppose that μ is the distribution of X0 (i.e. the Markov chain is stationary). Bougerol (1993) [27] and Diaconis and Friedmann (1999) [60] provide a wide variety of examples of stable Markov chains, see also Ango-Nz´e and Doukhan (2002) [7]. Dependence properties. Mixing properties may be derived for Markov chains (see the previous references and Mokkadem (1990) [132]), but this property always need an additional regularity assumption on the innovations, namely the innovations must have some absolutely continuous component. By contrast, no assumption on the distribution of ξ0 is necessary to obtain a geometrical decay of the coefficient τp,∞ (n). More precisely, arguing as in the previous section, ˜ 0 is independent of X0 and distributed as X0 , one has the upper bounds: if X ˜ 0 − X 0 p an . θp,∞ (n) ≤ τp,∞ (n) ≤ X In the same way, if each component of X0 has a distribution function which is H¨older, then the coefficients αk (n) and βk (n) decrease geometrically (see lemma 5.1). Let us show now that contractive Markov chains can be represented as Bernoulli shifts in a general situation when Xt and ζt take values in Euclidean spaces Rd and RD , respectively, d, D ≥ 1 with · denoting indifferently a norm on RD or on Rd . Any homogeneous Markov chain Xt may also be represented as solution of a recurrence equation Xn = F (Xn−1 , ξn )
(3.2.6)
where F (u, z) is a measurable function and (ξn )n>0 is an i.i.d. sequence independent of X0 , see e.g. Kallenberg (1997, Proposition 7.6) [111]. Proposition 3.2 (Stable Markov chains as Bernoulli shifts). The stationary iterative models (3.2.6) are Bernoulli shifts (3.1.17) if condition (3.2.5) holds.
36
CHAPTER 3. MODELS
N Proof. We denote by μ the distribution of ζ = (ζn )n∈N on RD . We define the space Lp (μ, Rd ) of Rd -valued Bernoulli shifts functions H such that H(ζ0 , ζ1 , . . .) ∈ Lp . We shall prove as in Doukhan and Truquet (2006) [76] that the operator of Lp (μ, Rd ) Φ : H → K with K(ζ0 , ζ1 , . . .) = M (H(ζ1 , ζ2 , . . .), ζ0 ) satisfies the contraction principle. Then, Picard fixed point theorem will allows to conclude. We first mention that the condition (3.2.5) implies M (x, ζ0 )p ≤ A + ax hence with independence of the sequence ζ this yields Kp ≤ A + aHp ; thus Φ(Lp (μ, Rd )) ⊂ Lp (μ, Rd ). Now for H, H ∈ Lp (μ, Rd ) we also derive with analogous arguments that Φ(H) − Φ(H )p ≤ aH − H p . Remark 3.2. It is also possible to derive the Bernoulli shift representation through a recursive iteration in the autoregressive formula.
3.2.2
Nonlinear AR(d) models
For simplicity, let B = R. Autoregressive models of order d are models such that: Xn = r(Xn−1 , . . . , Xn−d ) + ξn . (3.2.7) In such a case, the function F is given by F (u1 , . . . , ud , ξ) = r(u1 , . . . , ud ) + ξ, Assume that E|ξ1 |p < ∞ and that |r(u1 , . . . , ud ) − r(v1 , . . . , vd )| ≤
d
ai |ui − vi |
i=1
for some a1 , . . . , ad ≥ 0 such that a =
d
ai < 1. Then the condition (3.2.3)
i=1
holds, and we infer from section 3.2 that the coefficients τp,∞ (n) decrease exponentially fast.
3.2.3
ARCH-type processes
For simplicity, let B = R. Let F (u, z) = A(u) + B(u)z
(3.2.8)
for suitable Lipschitz functions A(u), B(u), u ∈ R. The corresponding iterative model (3.2.6) satisfies (3.2.5) if a = Lip (A) + ξ1 p Lip (B) < 1.
3.2. MARKOV SEQUENCES
37
If p = 2 and E(ξn ) = 0, it is sufficient to assume that a=
" 2 2 (Lip (A)) + E(ξ12 ) (Lip (B)) < 1.
Some examples of iterative Markov processes given by functions of type (3.2.8) are: • nonlinear AR(1) processes (case B ≡ 1); • stochastic volatility models (case A ≡ 0); # • classical ARCH(1) models (case A(u) = αu, B(u) = β + γu2 , α, β, γ ≥ 0). In the last example, the inequality (3.2.5) holds for p = 2 with a2 = α2 + Eξ02 γ. A general description of these models can be found in Section 3.4.2.
3.2.4
Branching type models
(1) (D) . Let now A1 , . . . , AD Here B = R and ξn is RD -valued. Let ξn = ξn , . . . , ξn be Lipschitz functions from R to R, and let D Aj (u)z (j) , F u, z (1) , . . . , z (D) = j=1 (i) (j)
for (z (1) , . . . , z (D) ) ∈ RD . For such functions F , if E(ξ1 ξ1 ) = 0 for i = j, the relation (3.2.5) holds with p = 2 if a2 =
D
2 (j) 2 < 1. (Lip (Aj )) E ξ0
j=1
Some examples of this situation are (1)
p) is a Bernoulli variable independent of a centered • If D = 2, and ξ1 ∼ b(¯ (2) 2 variable ξ1 ∈ L and A1 (u) = u, A2 (u) = 1 then the previous relations hold if p < 1. (1)
(2)
• If D = 3, ξ1 = 1 − ξ1 ∼ b(p) is independent of a centered variable (3) ξ1 ∈ L2 , then one obtains usual threshold models if A3 ≡ 1. (3) This only means that Xn = Fn (Xn−1 ) + ξn where Fn is an i.i.d. se(3) quence, independent of the sequence (ξn )n≥1 , and such that Fn = A1 with probability p and Fn = A2 , else. The condition (3.2.5) with p = 2 writes here a = p (Lip (A1 ))2 + (1 − p) (Lip (A2 ))2 < 1.
38
3.3
CHAPTER 3. MODELS
Dynamical systems
Let I = [0, 1], T be a map from I to I and define Xi = T i . If μ is invariant by T , the sequence (Xi )i≥0 of random variables from (I, μ) to I is strictly stationary. Denote by g1,λ the L1 -norm with respect to the Lebesgue measure λ on I and by ν = |ν|(I) the total variation of ν. Covariance inequalities. In many interesting cases, one can prove that, for any BV function h and any k in L1 (I, μ), |Cov(h(X0 ), k(Xn ))| ≤ an k(Xn )1 (h1,λ + dh) ,
(3.3.1)
for some non increasing sequence an tending to zero as n tends to infinity. Note that if (3.3.1) holds, then |Cov(h(X0 ), k(Xn ))|
= ≤
|Cov(h(X0 ) − h(0), k(Xn ))| an k(Xn )1 (h − h(0)1,λ + dh) .
Since h − h(0)1,λ ≤ dh, we obtain that |Cov(h(X0 ), k(Xn ))| ≤ 2an k(Xn )1 dh .
(3.3.2)
The associated Markov chain. Define the operator L from L1 (I, λ) to L1 (I, λ) via the equality 0
1
L(h)(x)k(x)λ(dx) =
0
1
h(x)(k ◦ T )(x)λ(dx)
where h ∈ L1 (I, λ) and k ∈ L∞ (I, λ). The operator L is called the PerronFrobenius operator of T . Assume that μ is absolutely continuous with respect to the Lebesgue measure, with density fμ . Let I ∗ be the support of μ (that is (I ∗ )c is the largest open set in I such that μ((I ∗ )c ) = 0) and choose a version of fμ such that fμ > 0 on I ∗ and fμ = 0 on (I ∗ )c . Note that one can always choose L such that L(fμ h)(x) = L(fμ h)(x)1fμ (x)>0 . Define a Markov kernel associated to T by K(h)(x) =
L(fμ h)(x) 1fμ (x)>0 + μ(h)1fμ (x)=0 . fμ (x)
(3.3.3)
It is easy to check (see for instance Barbour et al.(2000) [9]) that (X0 , X1 , . . . , Xn ) has the same distribution as (Yn , Yn−1 , . . . , Y0 ) where (Yi )i≥0 is a stationary Markov chain with invariant distribution μ and transition kernel K. Spectral gap. In many interesting cases, the spectral analysis of L in the Banach space of BV -functions equipped with the norm hv = dh + h1,λ
3.3. DYNAMICAL SYSTEMS
39
can be done by using the Theorem of Ionescu-Tulcea and Marinescu (see Lasota and Yorke (1974) [115]): assume that 1 is a simple eigenvalue of L and that the rest of the spectrum is contained in a closed disk of radius strictly smaller than one. Then there exists an unique T -invariant absolutely continuous probability μ whose density fμ is BV , and Ln (h) = λ(h)fμ + Ψn (h)
(3.3.4)
with Ψ(fμ ) = 0 and Ψn (h)v ≤ Dρn hv , for some 0 ≤ ρ < 1 and D > 0. Assume moreover that
1
1fμ >0 = γ < ∞. (3.3.5)
fμ v Starting from (3.3.3), we have that K n (h) = μ(h) +
Ψn (hfμ ) 1fμ >0 . fμ
Let · ∞,λ be the essential sup with respect to λ. Taking C1 = 2Dγ(dfμ + 1), we obtain K n (h) − μ(h)∞,λ ≤ C1 ρn hv . This estimate implies (3.3.1) with an = C1 ρn . Indeed, |Cov(h(X0 ), k(Xn ))|
= ≤
|Cov(h(Yn ), k(Y0 ))| k(Y0 )(E(h(Yn )|σ(Y0 )) − E(h(Yn )))1
≤
k(Y0 )1 K n (h) − μ(h)∞,λ
≤
C1 ρn k(Y0 )1 (dh + h1,λ ) .
Moreover, we also have that dK n (h) = dK n (h − h(0))
≤ 2γΨn (fμ (h − h(0)))v ≤ 8Dρn γ(1 + dfμ )dh .
(3.3.6)
Dependence properties. If (3.3.2) holds, the upper bound ˜ φ(σ(X n ), X0 ) ≤ 2an follows from the following lemma. Lemma 3.4. Let (Ω, A, P) be a probability space, X a real-valued random variable and M a σ-algebra of A. We have the equality ˜ φ(M, X) = sup |Cov(Y, h(X))| Y is M-measurable, Y 1 ≤ 1 and h ∈ BV1 .
40
CHAPTER 3. MODELS
Proof of Lemma 3.4. Write first |Cov(Y, h(X))| = |E(Y (E(h(X)|M)−E(h(X))))|. For any positive ε, there exists Aε in M such that P(Aε ) > 0 and for any ω in Aε , |E(h(X)|M)(ω) − E(h(X))| > E(h(X)|M) − E(h(X))∞ − ε. Define the random variable Yε :=
1Aε sign (E(h(X)|M) − E(h(X))) . P(Aε )
Yε is M-measurable, E|Yε | = 1 and |Cov(Yε , h(X))| ≥ E(h(X)|M) − E(h(X))∞ − ε . This is true for any positive ε, we infer from Definition 2.5 that ˜ φ(M, X) ≤ sup{|Cov(Y, h(X))| / Y is M-measurable, Y 1 ≤ 1 and h ∈ BV1 }. The converse inequality follows straightforwardly from Definition 2.5. Now, if (3.3.4) and (3.3.5) hold, we have that: for any n ≥ il > · · · > i1 ≥ 0, i1 ˜ φ(σ(X k , k ≥ n), Xn−i1 , . . . , Xn−il ) ≤ C(l)ρ ,
for some positive constant C(l). This is a consequence of the following lemma by using the upper bound (3.3.6). Lemma 3.5. Let (Yi )i≥0 be a real-valued Markov chain with transition kernel K. Assume that there exists a constant C such that for any BV function f and any n > 0,
dK n (f ) ≤ Cdf .
(3.3.7)
Then, for any il > · · · > i1 ≥ 0, l−1 ˜ ˜ φ(σ(Y )φ(σ(Yk ), Yk+i1 ) . k ), Yk+i1 , . . . , Yk+il ) ≤ (1 + C + · · · + C
Consequently, if (3.3.4) and (3.3.5) hold, the coefficients φ˜k (i) of the associated Markov chain (Yi )i≥0 satisfy: for any k > 0, φ˜k (i) ≤ C(k)ρi . Proof of Lemma 3.5. We only give the proof for two points i1 = i and i2 = j, the general case being similar. Let fk (x) = f (x) − E(f (Yk )). We have, almost surely, E(fk+i (Yk+i )gk+j (Yk+j )|Yk ) − E(fk+i (Yk+i )gk+j (Yk+j )) = E(fk+i (Yk+i )(K j−i (g))k+i (Yk+i )|Yk ) − E(fk+i (Yk+i )(K j−i (g))k+i (Yk+i )).
3.3. DYNAMICAL SYSTEMS
41
Let f and g be two functions in BV1 . It is easy to see that d((K j−i (g))k+i fk+i )
≤
dfk+i (K j−i (g))k+i ∞ + d(K j−i (g))k+i fk+i ∞
≤
(1 + d(K j−i (g))k+i ).
Hence, applying (3.3.7), the function (K j−i (g))k+i fk+i /(1 + C) belongs to BV1 . The result follows from Definition 2.5. Application: uniformly expanding maps. A large class of uniformly expanding maps T is given in Broise (1996) [31], Section 2.1, page 11. If Broise’s conditions are satisfied and if T is mixing in the ergodic-theoretic sense, then the Perron-Frobenius operator L satisfies the assumption (3.3.4). Let us recall some well know examples (see Section 2.2 in Broise): 1. T (x) = βx − [βx] for β > 1. These maps are called β-transformations. 2. I is the finite union of disjoints intervals (Ik )1≤k≤n , and T (x) = ak x + bk on Ik , with |ak | > 1. 3. T (x) = a(x−1 − 1) − [a(x−1 − 1)] for some a > 0. For a = 1, this transformation is known as the Gauss map. Remark 3.3. Expanding maps with a neutral fixed point. For some maps which are non uniformly expanding, in the sense that there exists a point for which the right (or left) derivative of T is equal to 1, Young (1999) [194] gives some sharp upper bounds for the covariances of H¨ older functions of T n . For instance, let us consider the maps introduced by Liverani et al. (1999) [123]: $ x(1 + 2γ xγ ) if x ∈ [0, 1/2[ for 0 < γ < 1, T (x) = 2x − 1 if x ∈ [1/2, 1] , for which there exists a unique invariant probability μ. Contrary to uniformly ˜ expanding maps, these maps are not φ-dependent. For λ ∈]0, 1], let δλ (x, y) = (λ) λ |x−y| and let θ1 be the coefficient associated to the distance δλ (see definition 2.3). Starting from the upper bounds given by Young, one can prove that there exist some positive constants C1 (λ, γ) and C2 (λ, γ) such that C1 (λ, γ) n
γ−1 γ
(λ)
≤ θ1 (σ(T n ), T ) ≤
C2 (λ, γ) n
γ−1 γ
.
Approximating the indicator function fx (t) = 1x≤t by λ-H¨ older functions for small enough λ, one can prove that for any > 0, there exist some positive constant C3 ( , γ) such that C1 (1, γ) n
γ−1 γ
≤ α ˜ (σ(T n ), T ) ≤
C3 ( , γ) n
γ−1 γ −
.
42
CHAPTER 3. MODELS
3.4
Vector valued LARCH(∞) processes
A vast literature is devoted to the study of conditionally heteroskedastic models. One of the best-known model is the GARCH model (Generalized Autoregressive Conditionally Heteroskedastic) introduced by Engle (1982) [84] and Bollerslev (1986) [23]. A usual GARCH(p, q) model can be written: rt = σt ξt ,
σt2
= α0 +
p
2 βi σt−i
i=1
+
q
2 αj rt−j
j=1
where α0 ≥ 0, βi ≥ 0, αj ≥ 0, p ≥ 0, q ≥ 0 are the model’s parameters and the ξt are i.i.d. If the βi are null, we have an ARCH(q) model which can be extended in LARCH(∞) model (see Robinson, 1991 [165], Giraitis, Kokozska and Leipus, 2000 [92]). These models are often used in finance because their properties are close to the properties observed on empirical financial data such as volatility clustering, white noise behaviour or autocorrelation of the squares of those series. To reproduce other properties of the empirical data, such as leverage effect, a lot of extensions of the GARCH model have been introduced: EGARCH, TGARCH. . . A simple equation in terms of a vector valued process allows simplifications in the definition of various ARCH type models Let (ξt )t∈Z be an i.i.d. sequence of random d × m-matrices, (aj )j∈N∗ be a sequence of m × d matrices, and a be a vector in Rm . A vector valued LARCH(∞) model is a solution of the recurrence equation ⎛ ⎞ ∞ Xt = ξt ⎝a + (3.4.1) aj Xt−j ⎠ j=1
Some examples of LARCH(∞) models are now provided. Even if standard LARCH(∞) models simply correspond to the case of real valued Xt and aj , general LARCH(∞) models include a large variety of models, such as 1. Bilinear models, precisely addressed in the forthcoming subsection 3.4.2. They are solution of the equation: ⎛ ⎞ ∞ ∞ αj Xt−j ⎠ + β + βj Xt−j Xt = ζt ⎝α + j=1
j=1
where the variables are real valued and ζt is the innovation. This is easy to see that such models take the form with = 2 and d = 1: write previous m α αj and aj = for this ξt = ζt 1 , a = for j = 1, 2, . . . β βj
3.4. VECTOR VALUED LARCH(∞) PROCESSES
43
2. GARCH(p, q) models, are defined by ⎧ ⎨ rt = σt εt q q
2 2 2 βj σt−j +γ+ γj rt−j ⎩ σt = j=1
j=1
where α0 > 0, αj 0, βi 0 (and the variables ε are centered at expectation). as bilinear
They may
be written
models: for this set α0 = γ0 /(1 − βi ) and αi z i = γi z i /(1 − βi z i ) (see Giraitis et al.(2006) [93]). 3. ARCH(∞) processes, given by equations, ⎧ ⎨ rt = σt εt ∞
2 βj ε2t−j ⎩ σt = β0 + j=1
They may be written as bilinear models: set ξt = κβj with λ1 = E(ε20 ), κ2 = Var (ε20 ). aj = λ1 βj
εt
κβ0 1 , a= , λ1 β0
4. Models with vector valued innovations ⎧ ∞ ∞ ∞ ⎪
⎪ 1 1 1 1 1 1 ⎪ αj Xt−j + μt β + βj Xt−j + γ 1 + γj1 Xt−j ⎪ ⎨Xt = ζt α + j=1 j=1 j=1 ∞ ∞ ∞ ⎪
⎪ ⎪ α2j Yt−j + μ2t β 2 + βj2 Yt−j + γ 2 + γj2 Yt−j ⎪Yt = ζt2 α2 + ⎩ j=1
j=1
j=1
may clearly be written as LARCH(∞) models with now m = d = 2.
3.4.1
Chaotic expansion of LARCH(∞) models
We provide sufficient conditions for the following chaotic expansion ⎞ ⎛ ∞ aj1 ξt−j1 aj2 . . . ajk ξt−j1 −···−jk a⎠ . Xt = ξt ⎝a + k=1j1 ,...,jk 1
For a k × l matrix M , and · a norm on Rk , let as usual M =
sup
M x .
x∈Rl ,x≤1
For a random matrix M we set M pp = E(M p ).
(3.4.2)
44
CHAPTER 3. MODELS
Theorem 3.1. Assume that either
aj p Eξ0 p < 1 for some p ≤ 1, or
j≥1
aj (Eξ0 ) p
1 p
< 1 for p > 1. Then one stationary of solution of eqn.
j≥1
(3.4.1) in Lp is given by (3.4.2). In this section we set A(x) =
aj ,
A = A(1),
and λp = Aξ0 p .
(3.4.3)
j≥x
Proof. We first prove that expression (3.4.2) is well defined. Set
S =
∞
aj1 ξt−j1 · · · ajk ξt−j1 −···−jk
k=1 j1 ,...,jk ≥1
Clearly
S ≤
∞
aj1 · · · ajk ξt−j1 · · · ξt−j1 −···−jk
k=1 j1 ,...,jk 1
Using that the sequence (ξn )n∈Z is i.i.d., we obtain for p ≥ 1,
Sp
≤ ≤
∞
k=1 j1 ,...,jk 1 ∞
aj1 · · · ajk ξt−j1 p · · · ξt−j1 −···−jk p k
(ξ0 p A) .
k=1
If λp < 1, the last series is convergent and S belongs to Lp . If p < 1 we conclude as for p = 1, using the bound
∞ k=1
p aj1 ξt−j1 · · · ajk ξt−j1 −···−jk
∞ k=1
aj1 ξt−j1 · · · ajk ξt−j1 −···−jk p
3.4. VECTOR VALUED LARCH(∞) PROCESSES
45
Now we prove that the expression (3.4.2) is a solution of eqn. (3.4.1), Xt = ξt a +
aj1 ξt−j1 · · · ajk ξt−j1 −···−jk a
k ≥ 1, j1 , . . . , jk 1
⎞ aj1 ξt−j1 a + aj1 ξt−j1 aj2 ξt−j1 −j2 · · · ajk ξt−j1 −j2 −···−jk a⎠ = ξt⎝a + ⎛
= ξt a +
j1 1
k≥2,j1 1
aj1 ξt−j1 a +
j1 1
j2 ,...,jk 1
aj2 ξ(t−j1 )−j2 · · · ajk ξ(t−j1 )−j2 −···−jk a
k ≥ 2, j2 , . . . , jk 1
∞ aj Xt−j . = ξt a +
j=1
Theorem 3.2. Assume that p 1 in the assumption of theorem 3.1, and
assume that ϕ = j aj ξ0 p < 1. If a stationary solution (Yt )t∈Z to equation (3.4.1) exists (a.s.), if Yt is independent of the σ-algebra generated by {ξs ; s > t}, for each t ∈ Z, then this solution is also in Lp and it is (a.s.) equal to the previous solution (Xt )t∈Z defined by equation (3.4.2). Proof. Step 1. We first prove that Y0 p < ∞. From the equation (3.4.1), from the stationarity of {Yt }t∈Z and from the independence assumption, we derive that ⎛ ⎞ ∞ Y0 p ξ0 p ⎝a + aj Y0 p ⎠ . j=1
ξ0 p a < ∞. Hence, the first point in the theorem follows from Y0 p −ϕ 1 Step 2. As in Giraitis et al. (2000) [92] we write Yt = ξt a + j1 aj Yt−j = Xtm + Stm with ⎛ ⎞ m aj1 ξt−j1 · · · ajk ξt−j1 ···−jk a⎠ , Xtm = ξt ⎝a + k=1 j1 ,··· ,jk 1
⎛ Stm
=
ξt ⎝
⎞
aj1 ξt−j1 · · · ajm ξt−j1 ···−jm ajm +1 Yt−j1 ···−jm ⎠ .
j1 ,··· ,jm+1 1
We have Stm p ξp
j1 ,··· ,jm+1 1
m+1 aj1 · · · ajm+1 ξm . p Y0 p = Y0 p ϕ
46
CHAPTER 3. MODELS
We recall the additive decomposition of the chaotic expansion Xt in equation (3.4.2) as a finite expansion plus a negligible remainder that can be controlled Xt = Xtm + Rtm where ⎛ Rtm = ξt ⎝
⎞
aj1 ξt−j1 · · · ajk ξt−j1 ···−jk a⎠ ,
k>m j1 ,··· ,jk 1
satisfies Rtm p aξ0 p
k>m
ϕk aξ0 p
ϕm → 0. 1−ϕ
Then, the difference between those two solutions is controlled as a function of m with Xt − Yt = Rtm − Stm , hence Xt − Yt p
Rtm p + Stm p ϕm aξ0 p + Y0 p ϕm 1−ϕ ϕm aξ0 p 2 1−ϕ
thus, Yt = Xt a.s. Dependence properties. To our knowledge, there is no study of the weak dependence properties of ARCH or GARCH type models with an infinite number of coefficients. Mixing seems difficult to prove for such models excepted in the Markov case (aj = 0 for large enough j), because this case is included in the Markov section for which we already mention that additional assumptions are needed to derive mixing. This section refers to Doukhan, Teyssi`ere and Winant (2005) [75], who improve on the special case of bilinear models precisely considered in Doukhan, Madr´e and Rosenbaum (2006) [69]. We use the notations given in (3.4.3). Theorem 3.3. The solution (3.4.2) of eqn. (3.4.1) is θ−weakly dependent with θ(r) ≤ 2Eξ0 Eξ0
t−1
t t λ a kλk−1 A + k 1−λ
for any t ≤ r.
k=1
Proof. Consider f : (Rd )u → R with f ∞ < ∞ and g : (Rd )v → R with Lip (g) < ∞, that is |g(x) − g(y)| ≤ Lip(g)(x1 − y1 + · · · + xu − yu ). Let i1 < · · · < iu , j1 < · · · < jv , such that j1 − iu r. To bound weak dependence ˆ in coefficients we use an approximation of the vector v = Xj1 , . . . , Xjv by v
3.4. VECTOR VALUED LARCH(∞) PROCESSES
47
such a way that, for each index j ∈ {j1 , . . . , jk } and s ≤ r, the random variable ˆ j is independent of Xj−s . More precisely, let X ⎛ ⎞ ∞ ⎟ ˆ t = ξt ⎜ X aj1 ξt−j1 · · · ajk ξt−j1 −···−jk a⎠ . ⎝a + k=1j1 +···+jk <s
Now, let f (u) = f (Xi1 , . . . , Xiu ) and g(v) = g (Xj1 , . . . , Xjv ). We have that |Cov(f (u), g(v))|
|E (f (u)(g(v) − g(ˆ v)) − E(f (u))E(g(v) − g(ˆ v))| v)| 2f ∞E |g(v) − g(ˆ v ˆj 2f ∞Lip (g) EXjk − X k
ˆ 0 . 2vf ∞Lip (g)EX0 − X
k=1
ˆ 0 for any s ≤ r, which implies the bound of the Hence θ(r) ≤ 2EX0 − X theorem. This bound is made explicit for simple decay rates. More precisely Kt−b , under Riemanian decay A(x) Cx−b √ θ(t) t K(q ∨ λ) , under geometric decay A(x) Cq x Further approximations. In order to simulate and also to better understand their behaviour, it is an important feature to see how far those models are from simple processes. Weak dependence was proved with independent approximations ⎞ ⎛ ∞ ˆ t =ξt ⎝a + aj1 ξt−j1 · · · ajk ξt−j1 −···−jk a⎠ X k=1
j1 +···+jk <s
of the LARCH models; we precise this approximation through coupling arguments and we also prove below proximity to the Markov sequence obtained by truncating the series which defines them. ˆ t of Xt has not the same distribution as Xt . Coupling. The approximation X But we are in the case of causal Bernoulli shifts with independent inputs, so that the method of Section 3.1.4 applies. Let (ξi )i∈Z be an independent copy of (ξi )i∈Z , and let ξ˜n = ξn if n > 0 and ξ˜n = ξn for n ≤ 0. Finally, let ∞ ˜ t = ξ˜t a + X aj1 ξ˜t−j1 · · · ajk ξ˜t−j1 −···−jk a . k=1 j1 ,...,jk
˜ t has the same distribution as Xt and is independent of the σ-algebra Here X M0 = σ(Xi , i ≤ 0). Consequently, if δ˜p,n is a non increasing sequence satisfying
48
CHAPTER 3. MODELS
(3.2.2), we obtain the upper bounds τp,∞ (n) ≤ δ˜p,n . Since for any s ≤ n, we ˜ n p ≤ 2X0 − X ˆ 0 p , we obtain that have that Xn − X t−1 λtp t k−1 τp,∞ (n) ≤ inf 2Eξ0 p Eξ0 p kλp A a. + t≤r k 1 − λp k=1
For p = 1, we recover the upper bound of Theorem 3.3. If d = 1, we obtain the same upper bounds for α ˜k (n), β˜k (n) and φ˜k (n) as in Section 3.1.4, by assuming that the distribution function of X0 is continuous (similar bounds me bay obtained for d > 1 by assuming that each component of X0 has a continuous distribution function, see Lemma 5.1). Markov approximation.
Consider equation (3.4.1) truncated at rank N , ⎛ ⎞ N N ⎠ XtN = ξt ⎝a + aj Xt−j . j=1
The previous solution rewrites as ⎛ ∞ XtN = ξt ⎝a +
⎞ aj1 ξt−j1 · · · ajk ξt−j1 −···−jk a⎠ .
k=1N j1 ,...,jk 1
The corresponding error is then bounded by EXt − XtN
∞
A(N )k .
k=1
In the Riemanian decay case, the error is decay case, the error is q N /(1 − q N ).
3.4.2
∞ k=1
N −bk , and in the geometric
Bilinear models
Those models are defined through the recurrence relation ⎛ ⎞ ∞ ∞ αj Xt−j ⎠ + β + βj Xt−j , Xt = ζt ⎝α + j=1
j=1
the variables here are real valued and ζt now denotes the innovation. To see that this is still a LARCH(∞) model, we set as before α αj , a= , and aj = ξt = ζt 1 , , for j ≥ 1. β βj
3.4. VECTOR VALUED LARCH(∞) PROCESSES
49
One usually takes β = 0. ARCH(∞) and GARCH(p, q) models are particular cases of the bilinear models. Giraitis and Surgailis (2002) [95] prove that under some restrictions, there is a unique second order stationary solution for these models. This solution has a chaotic expansion. Assume that the power series ∞ ∞ αj z j and B(z) = βj z j exist for |z| ≤ 1, let A(z) = j=1
j=1 ∞
G(z) =
∞
A(z) 1 = = gj z j and H(z) = hj z j 1 − B(z) j=1 1 − B(z) j=1
we will note hpp =
∞ j=0
|hj |p . Then:
Proposition 3.3 (Giraitis, Surgailis, 2002 [95]). If (ζt )t is i.i.d., and h2 < 1, then there is a unique second order stationary solution : Xt = α
∞
gt−s1 hs1 −s2 · · · hsk−1 −sk ζs1 · · · ζsk
(3.4.4)
k=1 sk <...<s1 ≤t
Lemma 3.6. Expansion (3.4.2) coincides with the chaotic expansion in proposition 3.3. Proof.
Assuming that β = 0, the expansion (3.4.2) writes as:
Xt = ζt α +
∞
sk <···<s1
(ζt αt−s1 + βt−s1 )×
× (ζs1 αs1 −s2 + βs1 −s2 ) · · · (ζsk−1 αsk−1 −sk + βsk−1 −sk )ζsk α or Xt = ζt α + S1 + S2 with S1
=
ζt αt−s1 (ζs1 αs1 −s2 + βs1 −s2 ) · · · (ζsk−1 αsk−1 −sk + βsk−1 −sk )ζsk α
k≥1 sk < · · · < s1 < t
S2
=
βt−s1 (ζs1 αs1 −s2 + βs1 −s2 ) · · · (ζsk−1 αsk−1 −sk + βsk−1 −sk )ζsk α
k≥1 sk < · · · < s1 < t
Under additional assumptions Giraitis and Surgailis prove the expansion: Xt = α
∞
gt−s1 hs1 −s2 · · · hsk−1 −sk ζs1 · · · ζsk
k=1 sk <···<s1 t
50
CHAPTER 3. MODELS
rewritten as Xt = ζt α + T1 + T2 with T1
∞
=
T2
ζt ht−s1 ζs1 hs1 −s2 ζs2 · · · hsk−1 −sk ζsk α
k=1sk <···<s1
gt−s1 hs1 −s2 ζs1 · · · hsk−1 −sk ζsk α
=
k=1sk <···<s1
The terms in the sum (T1 ) take a form: ζt (αt−i(1) βi(1) −i(1) · · · βi(1) −s1 )ζs1 (αs1 −i(2) βi(2) −i(2) · · · βi(2) −s2 ) · · · 1
1
2
1
p1
1
2
p2
· · · ζsk−1 (αs
(k) k−1 −i1
βi(k) −i(k) · · · βi(k) −s )ζsk α 1
2
pk
k
hence between each couple ζsp , ζsp+1 read from the left to the right one founds a term (αj ) followed with several (βj )’s in such a way that the sum of all indices equals sp+1 − sp . On the one hand, quote that expanding terms in the (S1 ) yields a sum of products of such terms proves that (S1 )’s is included in (T1 )’s. On the other hand, expand the term k¯ = k + p1 + · · · + pk (¯ sk¯ , s¯k−1 , . . . , s¯1 ) = ¯ (k) (k) (1) (1) (1) (sk , ipk , . . . , i1 , sk−1 , . . . , s1 , ip1 , . . . , i2 , i1 , t) in the sum (S1 ) yields the generic term in (T1 ) expansion. Hence (S1 ) and (T1 )’s expansions coincide. Analogously (T2 ) = (S2 ), by quoting that the generic term in (T2 ) writes: (βt−i(1) βi(1) −i(1) · · · βi(1) −s1 )ζs1 (αs1 −i(2) βi(2) −i(2) · · · βi(2) −s2 ) · · · 1
1
2
1
p1
1
· · · ζsk−1 (αs
2
(k) k−1 −i1
p2
βi(k) −i(k) · · · βi(k) −s )ζsk α. 1
2
pk
k
Conditional densities of Bilinear models. Another more specific feature of those models is the following result on the existence of conditional densities (see Doukhan et al.1995 [69]). This is relevant for subsampling techniques and density estimation. We use the fact that the bilinear equation may be written ˜t Xt = A˜t ζt + B
with
A˜t = a +
∞ j=1
aj Xt−j ,
˜t = B
∞
bj Xt−j
j=1
Hence, conditionally to the past σ-algebra (the history of {ζs , s < t}), the distribution of Xt is as smooth as the one of ζt if A˜t = 0. Now if we are interested by higher order marginal distributions we need a bit more work.
3.4. VECTOR VALUED LARCH(∞) PROCESSES
51
˜t into Split A˜t and B A˜t
=
t−1
aj Xt−j + At ,
At =
j=1
˜t B
=
t−1 j=1
∞
aj Xt−j
j=t
bj Xt−j + Bt ,
Bt =
∞
bj Xt−j
j=t
where At and Bt are measurable with respect to the σ algebra of the past σ{ζs , s ≤ 0}. This entails, for instance: X1 X2
˜1 = A˜1 ζ1 + B = (a1 ζ1 + A2 )ζ2 + (b1 ζ1 + B2 ).
˜1 , A2 and B2 the previous system is triangular and it Thus conditionally to A˜1 , B ˜ thus may be solved if A1 , a1 ζ1 + A2 do not vanish. The following result extends this observation: Lemma 3.7 (Conditional densities Doukhan et al. [69]). Assume that the random variables (ζt )t∈Z and the coefficients aj are non negative for j = 1, 2, . . .. Also suppose that ζi are independent random variables with a density fζi for all i ∈ {1, . . . , n}. Then, conditionally with respect to the past of the process σ{ζs , s ≤ 0}, the random vector (X1 , . . . , Xn ) admits the density fn (x1 , . . . , xn ) defined by: β1 βn 1 fζ fn (x1 , . . . , xn ) = · · · fζn |α1 α2 · · · αn | 1 α1 αn with βj = xj − b1 xj−1 − b2 xj−2 − · · · − bj−1 x1 − Bj and αj = a1 xj−1 + a2 xj−2 + · · · + aj−1 x1 + Aj for 1 ≤ j ≤ n (here α1 = A1 ). Corollary 3.1 (Density). Under the same assumptions as in lemma 3.7, with a = 0 and (ζt )t i.i.d. with a density f bounded by M then n fn (x1 , . . . , xn ) ≤ (M/a) for all (x1 , . . . , xn ). Corollary 3.2 (Density of a couple). Under the same assumptions as in lemma 3.7, and if ζt are i.i.d. with density f , then gi the density of the couple (X1 , Xi ) satisfies gi ∞ ≤ f 2 /A1 for all i > 1. Remark 3.4. Asymptotic properties of a standard kernel density estimate relies on such bounds. Indeed an expression of its variance follows jointly from weak dependence properties and such assumptions on the two dimensional distributions.
52
CHAPTER 3. MODELS
Proof of lemma 3.7. We work conditionally to the infinite past from X0 . We write ⎞ ⎛ ⎞ ⎛ An ζn + Bn Xn ⎜ Xn−1 ⎟ ⎜ An−1 ζn−1 + Bn−1 ⎟ ⎟ ⎜ ⎟ ⎜ M⎜ ⎟=⎜ ⎟, .. .. ⎠ ⎝ ⎠ ⎝ . . A1 ζ1 + B1
X1 where
⎛
⎜ ⎜ M =⎜ ⎜ ⎝
1 −a1 ζn − b1 0 1 ... ... 0 0 0 0
−a2 ζn − b2 −a1 ζn−1 − b1 ... 0 0
... −an−1 ζn − bn−1 . . . −an−2 ζn−1 − bn−2 ... ... 1 −a1 ζ2 − b1 0 1
⎞ ⎟ ⎟ ⎟. ⎟ ⎠
This may be rewritten as, ζ1
=
ζ2
=
.. . ζn
=
X1 − B1 A1 X2 − b1 X1 − B2 a1 X1 + A2 Xn − b1 Xn−1 − b2 Xn−2 − · · · − bn−1 X1 − Bn a1 Xn−1 + a2 Xn−2 + · · · + an−1 X1 + An
where the previous coefficients At and Bt are deterministic in this conditional setting. Thus, Eg(X1 , X2 , . . . , Xn ) = g φ−1 (u1 , . . . , un ) fζ1 (u1 ) · · · fζn (un )du1 · · · dun with fζi the density of ζi and with (ζ1 , . . . , ζn ) = φ(X1 , . . . , Xn ). Here, put (u1 , . . . , un ) = φ(x1 , . . . , xn ). The function φ has a diagonal Jacobian, hence ∂uj 1 = ∂xj αj for 1 ≤ j ≤ n and the result follows. Proof of the corollaries. The first corollary follows by integration from lemma 3.7. We prove the result for the density of the couple (X1 , X4 ), and we can prove the general result in the same way. With f4 (x1 , x2 , x3 , x4 ) =
β β 1 1 4 ···f , f |α1 · · · α4 | α1 α4
3.5. ζ-DEPENDENT MODELS
53
an integration with respect to x2 and x3 implies that β2 β3 dx2 dx3 f 2 g4 (x1 , x4 ) = f4 (x1 , x2 , x3 , x4 )dx2 dx3 ≤ . f f A1 α2 α3 |α2 α3 | Hence with u=
x2 − b1 x1 − B2 , a1 x1 + A2
v=
x3 − b1 x2 − b2 x1 − B3 , a1 x2 + a2 x1 + A3
we write x2 = u × (a1 x1 + A2 ) + b1 x1 + B2 , x3 = v × (a1 (u × (a1 x1 + A2 ) + b1 x1 + B2 ) + a2 x1 + A3 ) + b1 (u × (a1 x1 + A2 ) + b1 x1 + B2 ) + b2 x1 + B3 . The Jacobian matrix is diagonal and the absolute value of the Jacobian is equal to |(a1 x1 + A2 )(a1 x2 + a2 x1 + A3 )|, and thus f 2 f 2 g4 (x1 , x4 ) ≤ . f (u) f (v) dudv ≤ A1 A1
3.5
ζ-dependent models
In this section, we give some classes of ζ-dependent models: associated processes, Gaussian processes and interacting particle systems.
3.5.1
Associated processes
An analogous formula to (1.4.1) proves that associated random variables belong to the class of ζ-dependent models. Several associated models are obtained from nondecreasing transformations of independent variables. For Gaussian vector, Pitt (1982) [146] gave a necessary and sufficient condition to be associated. We discuss Pitt’s result in the sequel. Theorem 3.4. Let X = (X1 , . . . , Xn ) be a Gaussian vector with mean vector 0 and covariance matrix Σ = (σi,j = Cov(Xi , Xj ))1≤i,j≤n . The condition Cov(Xi , Xj ) ≥ 0, for all i, j = 1, . . . , n
(3.5.1)
is necessary and sufficient for the variables to be associated. Proof of Theorem 3.4. The method of the proof below is due to Pitt (1982). Assuming Condition (3.5.1), the task is to prove that Cov(f (X), g(X)) ≥ 0, for all nondecreasing functions f and g defined on Rn (the second implication being trivial). We suppose, without loss of generality, that Σ is non-singular and that the
54
CHAPTER 3. MODELS
∂f function f and g are continuously differentiable with bounded derivatives ∂xi ∂g and , for i = 1, . . . , n. Let Z be an independent copy of X. For any λ ∈ [0, 1], ∂xi let Y (λ) be the random vector defined by # Y (λ) = λX + 1 − λ2 Z. Clearly, for each fixed λ, Y (λ) is a Gaussian vector with covariance matrix Σ and Cov(Xi , Yj (λ)) = λσi,j . Set F (λ) = E (f (X)g(Y (λ))) . Clearly Cov(f (X), g(X)) = F (1) − F (0).
(3.5.2)
It is sufficient to show that F (λ) exists and F (λ) ≥ 0 for 0 ≤ λ ≤ 1. To this end, let φ and p be respectively the density of X and the conditional density of Y (λ) given X = x. We have : ⎞ ⎛ n 1 1 ci,j xi xj ⎠ , with Σ−1 = (ci,j )1≤i,j≤n φ(x) = # exp ⎝− 2 i,j=1 (2π)n |Σ| and
1 φ (1 − λ2 )n/2
p(λ; x, y) =
λx − y √ 1 − λ2
Hence F (λ) =
φ(x)f (x)g(λ; x)dx,
(3.5.3)
Rn
where g(λ; x) is defined by
g(λ; x) =
p(λ; x, y)g(y)dy, Rn
which is equal to
g(λ; x) = Rn
where
g(λx − y)φλ (y)dy,
1 φλ (x) = φ (1 − λ2 )n/2
x √ 1 − λ2
(3.5.4)
.
∂g(λ, x) exist and are Now Equation (3.5.4) proves that the partial derivatives ∂xi bounded. Moreover, since g is increasing and λ is positive, we have ∂g(λ, x) ≥ 0. ∂xi
(3.5.5)
3.5. ζ-DEPENDENT MODELS
55
Next an explicit calculation based on the heat equations ∂φ 1 ∂2φ = , ∂σi,i 2 ∂x2i
∂φ ∂2φ = , ∂σi,j ∂xi ∂xj
for i = j,
shows that ∂p ∂λ
=
=
⎛ ⎞ p ⎝ λ kλ − xi cij (λxj − yj ) − (λxi − yi )(λxj − yj )⎠ 1 − λ2 1 − λ2 i,j i,j ⎛ ⎞ ∂p ∂2p 1 ⎝ ⎠. (3.5.6) σi,j − xi − λ i,j ∂xi ∂xj ∂xi i
We obtain, combining (3.5.3), (3.5.4) and (3.5.6) ⎞ ⎛ 2 ∂g(λ; x) ∂ g(λ; x) 1 ⎠ dx. F (λ) = − φ(x)f (x) ⎝ σi,j − xi λ Rn ∂x ∂x ∂x i j i i,j i An integration by parts gives F (λ) =
1 λ
⎛ φ(x) ⎝
Rn
i,j
⎞ σi,j
∂f (x) ∂g(λ; x) ⎠ dx. ∂xi ∂xj
(3.5.7)
∂f (x) ≥ 0, since f is increasing. This fact, together with (3.5.5), ∂xi (3.5.1) and (3.5.7) proves that F (λ) ≥ 0, for any λ ∈ [0, 1]. This conclusion together with (3.5.2) proves Theorem 3.4. We have
Remark 3.5. Stable processes have the same linear structure as normal processes since arbitrary linear combinations of stable variables are stable. Lee, Rachev and Samorodnitsky (1990) [118] gave necessary and sufficient conditions for a stable random vector to be associated.
3.5.2
Gaussian processes
Gaussian processes belong to the class of ζ-dependent models. This property is a consequence of the following lemma. Lemma 3.8. Denote XC = (Xi )i∈C if C ⊂ Z. Let (Xn )n∈Z be a Gaussian centered process. Then for all real-valued functions h, k with bounded first partial derivatives, one has
∂h ∂k
|Cov(h(XA ), k(XB ))| ≤ (3.5.8)
|Cov(Xi , Xj )|. ∂xi ∞ ∂xj ∞ i∈A,j∈B
56
CHAPTER 3. MODELS
Remark 3.6. The proof of Lemma 3.8 is along the paper of Pitt (1982) [146]. For more details, we refer the reader to the proof of Lemma 19 in Doukhan and Louhichi (1999) [67].
3.5.3
Interacting particle systems
In this subsection, we develop an example of ζ-dependent interacting particle systems (cf. Proposition 3.4 below). Before stating the main result of this paragraph, we briefly recall the basic construction of general interacting particle systems, described in sections I.3 and I.4 of Liggett’s book (1985) [122]. Let S be a countable set of sites, W a finite set of states, and X = W S the set of configurations, endowed with its product topology, that makes it a compact set. On each site the state evoluate as a Markov chain. But we are interested in the case where the evolution of neighbour sites are linked. We define a Feller process on X by specifying the local transition rates: to a configuration η and a finite set of sites T is associated a nonnegative measure cT (η, ·) on W T . Loosely speaking, we want the configuration to change on T after an exponential time with parameter cT,η = cT (η, ζ) . ζ∈W T
After that time, the configuration becomes equal to ζ on T , with probability cT (η, ζ)/cT,η . Let η ζ denote the new configuration, which is equal to ζ on T , and to η outside T . The infinitesimal generator should be: Ωf (η) = cT (η, ζ)(f (η ζ ) − f (η)) . (3.5.9) T ⊂S ζ∈W T
For Ω to generate a Feller semigroup acting on continuous functions from X into R, some hypotheses have to be imposed on the transition rates cT (η, ·). The first condition is that the mapping η → cT (η, ·) should be continuous (and thus bounded, since X is compact). Let us denote by cT its supremum norm. cT = sup cT,η . η∈X
It is the maximal rate of change of a configuration on T . One essential hypothesis is that the maximal rate of change of a configuration at one given site is bounded. B = sup cT < ∞. (3.5.10) x∈S T x
If f is a continuous function on X , one defines Δf (x) as the degree of dependence of f on x: Δf (x) = sup{ |f (η) − f (ζ)| / η, ζ ∈ X and η(y) = ζ(y) ∀ y = x } .
3.5. ζ-DEPENDENT MODELS
57
Since f is continuous, Δf (x) tends to 0 as x tends to infinity, and f is said to be smooth if Δf is summable: Δf (x) < ∞. |f | = x∈S
It can be proved that if f is smooth, then Ωf defined by (3.5.9) is indeed a continuous function on X and moreover: Ωf ≤ B|f |. We also need to control the dependence of the transition rates on the configuration at other sites. If y ∈ S is a site, and T ⊂ S is a finite set of sites, one defines cT (y) = sup cT (η1 , · ) − cT (η2 , · )tv η1 (z) = η2 (z) ∀ z = y , where · tv is the total variation norm: cT (η1 , · ) − cT (η2 , · )tv =
1 |cT (η1 , ζ) − cT (η2 , ζ)| . 2 T ζ∈W
If x and y are two sites such that x = y, the influence of y on x is defined as: cT (y). γ(x, y) = T x
We will set γ(x, x) = 0 for all x. The influences γ(x, y) are assumed to be summable: γ(x, y) < ∞. (3.5.11) M = sup x∈ S y∈ S
Under both hypotheses (3.5.10) and (3.5.11), it can be proved that the closure of Ω generates a Feller semigroup {St , t ≥ 0} (Theorem 3.9 p. 27 of Liggett (1985)). A generic process with semigroup {St , t ≥ 0} will be denoted by {ηt , t ≥ 0}. The expectations with respect to its distribution, starting from η0 = η will be denoted by Eη . For each continuous function f , one has: St f (η) = Eη [f (ηt )] = E[f (ηt ) | η0 = η]. We have now all the ingredients to control the covariance of f (ηs ) and g(ηt ) for a finite range interacting particle system when the underlying graph structure has bounded degree. Proposition 3.4 shows that if f and g are mainly located on two finite sets R1 and R2 , then the covariance of f and g decays exponentially in the distance between R1 and R2 . From now on, we assume that the set of sites S is endowed with an undirected
58
CHAPTER 3. MODELS
graph structure, and we denote by d the natural distance on the graph. We will assume not only that the graph is locally finite, but also that the degree of each vertex is uniformly bounded. ∀x ∈ S
#{y ∈ S / d(x, y) = 1} ≤ r,
where # denotes the cardinality of a finite set. Thus the size of the sphere or ball with center x and radius n is uniformly bounded in x, and increases at most geometrically in n. #{y ∈ S / d(x, y) = n} ≤
r r (r−1)n , #{y ∈ S / d(x, y) ≤ n} ≤ (r−1)n . r−1 r−2
Let R be a finite subset of S. We shall use the following upper bounds for the number of vertices at distance n, or at most n from R. #{x ∈ S / d(x, R) = n} ≤ #{y ∈ S / d(x, R) ≤ n} ≤ 2enρ #R,
(3.5.12)
with ρ = log(r − 1). In the case of an amenable graph (e.g. a lattice on Zd ), the ball sizes have a subexponential growth. Therefore, for all ε > 0, there exists c such that: #{x ∈ S / d(x, R) = n} ≤ #{y ∈ S / d(x, R) ≤ n} ≤ c enε . What follows is written in the general case, using (3.5.12). It applies to the amenable case replacing ρ by ε, for any ε > 0. We are going to deal with smooth functions, depending weakly on coordinates away from a fixed finite set R. Indeed, it is not sufficient to consider functions depending only on coordinates in R, because if f is such a function, then for any t > 0, St f may depend on all coordinates. Definition 3.2. Let f be a function from S into R, and R be a finite subset of S. The function f is said to be mainly located on R if there exists two constants α and β > ρ such that α > 0, β > ρ and for all x ∈ R: Δf (x) ≤ αe−βd(x,R) .
(3.5.13)
Since β > ρ, the sum x Δf (x) is finite. Therefore a function mainly located on a finite set is necessarily smooth. The system we are considering will be supposed to have finite range interactions in the following sense (cf. Definition 4.17 p. 39 of Liggett (1985)). Definition 3.3. A particle system defined by the rates cT (η, ·) is said to have finite range interactions if there exists k > 0 such that if d(x, y) > k: 1. cT = 0 for all T containing both x and y,
3.6. OTHER MODELS
59
2. γ(x, y) = 0. The first condition imposes that two coordinates cannot simultaneously change if their distance is larger than k. The second one says that the influence of a site on the transition rates of another site cannot be felt beyond distance k. Under these conditions, the following covariance inequality holds. Proposition 3.4. Assume (3.5.10) and (3.5.11). Assume moreover that the process is of finite range. Let R1 and R2 be two finite subsets of S. Let β be a constant such that β > ρ. Let f and g be two functions mainly located on R1 and R2 , in the sense that there exist positive constants κf , κg such that, Δf (x) ≤ κf e−βd(x,R1 )
and
Δg (x) ≤ κg e−βd(x,R2 ) .
Then for all positive reals s, t, sup Covη (f (ηs ), g(ηt )) ≤ Cκf κg (#R1 ∧#R2 )eD(t+s) e−(β−ρ)d(R1 ,R2 ) , (3.5.14) η∈X
where, D = 2M e
(β+ρ)k
and
2Beβk C= D
1+
eρk 1 − e−β+ρ
.
Proof. We refer the reader to the proof of Proposition 3.3 in Doukhan et al. (2005) [64]. Remark 3.7. Shashkin (2005) [176] obtains a similar inequality for random fields indexed by Zd . For transitive graphs, the covariance inequality stated in Proposition 3.4 was studied by Doukhan et al. (2005) [64] in order to derive a functional central limit theorem for interacting particle systems.
3.6 3.6.1
Other models Random AR models
Assume here that Xt = At Xt−1 + ξt , where the sequence ξt ∈ Rd is still i.i.d. but now At is assumed to be a stationary sequence of random d × d matrices. A stationary solution of the previous equation has the formal expansion Xt =
∞ k=0
At · · · At−k ξt−k
60
CHAPTER 3. MODELS
A first and simple case for convergence of this series is EA0 p < 1 for a suitable matrix norm, in the case where the sequence (At ) is i.i.d. and that At , At−1 , . . . are independent of the inputs (ξt ). For this we also assume Eξ0 p < ∞ for some p ≥ 1. This condition∗ also implies convergence of the previous series in Lp . For d = 1, a simple example of this situation is the bilinear model At = a+bξt−1 .
If now At = ζt + Jj=1 bj ξt−j for a stationary sequence (ζt ) independent of (ξt ) the condition Jp J E ζ0 + bj ξj < 1 j=1 older’s inimplies absolute convergence of the previous series in Lp through H¨ equality. The previous relation holds if ζ0 pJ + ξ0 pJ
J
bj < 1.
j=1
Those models are also suitable for the previous section related to Markov chains, but a special case of this situation is provided if the sequence (At ) is stationary and independent of the sequence (ξt ). In this case the assumption ∞
EAk Ak−1 · · · A0 p < ∞
k=0
implies the convergence of the previous series in Lp if Eξ0 p < ∞. Extension of such models, solutions of the non Markov equation j Xt = αt Xt−j + ζt ,
(3.6.1)
j∈A
are seen in Doukhan and Truquet [76] as random fields († ) with infinitely
(2006) j many interactions. If b = j∈A α0 p < 1 the solution of equation (3.6.1) writes a.s. and in Lp , Xt = ζt +
∞
i=1 j1 ,...,ji ∈A
2 i αjt1 αjt−j · · · αjt−j ζ . 1 1 −···−ji−1 t−(j1 +···+ji )
∗ Existence
of the model in this case also relies on the weaker assumption E log A0 < 0; in this case the previous series only converges a.s. and dependence conditions are not easy to derived; for this a concentration inequality is needed and a log transformation should be applied to the obtained coefficients. † Innovations ζ are vectors of Rk and coefficients αj are k × k matrices, · is a norm of t t algebra on this set of matrices and X will valued random field. Let A ⊂ Zd \ {0}, we be an E assume that the i.i.d. random field ξ = (αjt )i∈A , ζt here Mk×k denotes the set of k × k matrices.
t∈Zd
takes now its values in (Mk×k )A ×E;
3.6. OTHER MODELS
3.6.2
61
Integer valued models
The idea of Galton Watson models conducted Alain Latour (see [116], [117], [65]) to the construction of integer valued extensions of the standard econometric models. As they are discrete valued no mixing condition may usually be expected from such models (see section 1.5) and this is why they fit nicely in the weak dependent frame. Definition 3.4 (Steutel and van Harn Operator). Let (Yj )j∈N be a sequence of independent and identically distributed (i.i.d.) non-negative integer-valued variables with mean α and variance λ, independent of X, a non-negative integervalued variable. The Steutel and van Harn operator, α◦ is defined by: X i=1 Yi , if X = 0, α◦X = 0, otherwise. The sequence (Yi )i∈N is called a counting sequence. Note that, as indicated in Definition 3.4, the mean of the summands Yi associated with the operator α◦ is denoted by α. Suppose that β◦ is another Steutel and van Harn operator based on a counting sequence (Yi )i∈N . The operator α◦ and β◦ are said to be independent if, and only if, the counting sequences (Yi )i∈N and (Yi )i∈N are mutually independent. One may first think to Poisson distributed variables Yi with parameter a. The first example, Galton Watson with immigration Xt = a ◦ Xt−1 + ξt
(3.6.2)
was extended in various papers by Alain Latour (see e.g. [116] or [117]) for bilinear type extensions (see Doukhan, Latour and Oraichi, 2006 [65]). We would like to extend the integer-valued model class to give a non-negative integer-valued bilinear process, denoted by INBL(p, q, m, n), similar to the realvalued bilinear process. A time series (Xt )t∈N is generated by a bilinear model, if it satisfies the equation: Xt = α +
p
ai Xt−i +
i=1
q
cj εt−j +
j=1
m n
bk (εt− Xt−k ) + εt
(3.6.3)
k=1 =1
where (εt )t∈N is a sequence of i.i.d. random variables, usually but not always with zero mean, and where α, ai , i = 1, . . . , p, cj , j = 1, . . . , q, and bk , k = 1, . . . , m, = 1, . . . , n are real constants. In (3.6.3), we can “formally” substitute Steutel and van Harn operators to some of the parameters giving an equation of the form Xt =
p i=1
ai ◦ Xt−i +
q j=1
cj ◦ εt−j +
n m k=1 =1
bk ◦ (εt− Xt−k ) + εt
(3.6.4)
62
CHAPTER 3. MODELS
where the operators ai ◦, i = 1, . . . , p, cj ◦, j = 1, . . . , q, and bk ◦, k = 1, . . . , m, = 1, . . . , n, are mutually independent and (εt )t∈N is a sequence of i.i.d. nonnegative integer-valued random variables of finite mean μ and finite variance σ 2 , independent of the operators. As in Latour et al., we restrict to the first-order bilinear model (3.6.5) Xt = a ◦ Xt−1 + b ◦ (εt−1 Xt−1 ) + εt where the sequence involved in the operator a◦ and b◦ are respectively of mean a and b and variance α and β. Y and Y denote generic variables used in a◦ and b◦, respectively. If a + b · μ < 1 Doukhan, Latour and Oraichi (2006) [65] prove that this model is strictly stationary in L1 ; it is θ−weakly dependent with θ(r) ≤ 2(a + b · μ)r E(X0 ). + ε0 p Y p < 1 this solution belongs to Lp . Moment estiIf moreover Y p √ mators thus yield n−consistent estimators of the parameters in the previously cited paper. We finally mention that in the case of non negative coefficients such models are also associated sequences.
3.6.3
Random fields
Analogously, one may define some simple stationary random fields. Let T be any group (in an additive notation) with some metric d, then Bernoulli shifts still write Xt = H((ξs−t )s∈T ) for a function H : RT → R if (ξt )t∈T is stationary, this is also the case of (Xt )t∈T . In order to derive dependence properties of such models one better considers i.i.d. innovations and we assume that EH((ξs )s∈T ) − H((ξ (r) )s∈T ) →r→∞ 0 (r)
(r)
if we set ξt = ξt for d(s, 0) < r and ξt = z is a fixed point of ξ’s values set. Another option is to use a i.i.d. sequence ξ = (ξt )t∈T independent and with (r) the same distribution as ξ and to set ξt = ξt for d(s, 0) ≥ r. Here again linear random fields as well as Volterra random fields are simple to define. Standard sets T are Zd and (Z/nZ)d . It is less natural to work here with continuous time processes because i.i.d. white noise are discontinuous processes: they are thus less natural to define. A nice example of this situation is given in the next subsection. LARCH(∞) random fields Let (ξt )t∈Zd be a stationary sequence of random d × m-matrices, (aj )j∈N∗ be a sequence of m × d matrices, and a be a vector in Rm . A vector valued
3.6. OTHER MODELS
63
LARCH(∞) random field model is a solution of the recurrence equation ⎛ Xt = ξt ⎝a +
⎞ aj Xt−j ⎠ , t ∈ Zd
(3.6.6)
j =0
Such LARCH(∞) models include a large variety of models, as those in § 3.4 but the main point is here that causality in no more assumed in general. The same proof as in Section 3.4 entails the Proposition 3.5 (Doukhan, Teyssi`ere, Winant, 2006 [75]). Assume that ξ0 ∞
aj < 1,
j =0
then one stationary of solution of eqn. (3.6.6) in Lp is given as ⎛ Xt = ξt ⎝a +
∞
⎞ aj1 ξt−j1 aj2 . . . ajk ξt−j1 −···−jk a⎠
(3.6.7)
k=1 j1 ,...,jk =0
In the following of this section we set A(x) = j≥x aj , A = A(1) and λ = Aξ0 ∞ where (j1 , . . . , jk ) = |j1 | + · · · + |jk |. Approximations. We assume here that the random field (ξt )t∈Zd is i.i.d.. One first approximates here Xt by a random variable independent of X0 . Set ⎛ ⎞ ∞ ⎟ ˜ t = ξt ⎜ X aj1 ξt−j1 · · · ajk ξt−j1 −···−jk a⎠ ⎝a + k=1 j1 +···+jk
Proposition 3.6. One bound for the error is given by: ˜ t Eξ0 EXt − X
Eξ0
t−1
k−1
kλ
k=1
t λt a A + k 1−λ
We now specialize this result. Assume that b, C > 0 are constants, then there exists some constant K > 0 such that K , under√ Riemaniann decay A(x) Cx−b tb ˜ Xt − Xt K(q ∨ λ) t , under geometric decay A(x) Cq x
64
CHAPTER 3. MODELS
Markov approximation. Consider equation (3.6.6) truncated at rank N ,
N N Xt = ξt a + 0<j≤N aj Xt−j . The previous solution rewrites as ⎛ ⎞ ∞ XtN = ξt ⎝a + aj1 ξt−j1 · · · ajk ξt−j1 −···−jk a⎠ . k=1 0<j1 ,...,jk ≤N
∞
∞ Then EXt − XtN ≤ k=1 A(N )k . This error has rate k=1 N −bk for Riemanian decays and q N /(1 − q N ) in the geometric case. Moreover: Theorem 3.5. The solution (3.6.7) of eqn. (3.6.6) is η−weakly dependent with t−1 t λt k−1 kλ A η(r) = ξ0 ∞ ξ0 ∞ + a . k 1−λ k=1
This bound may be made explicit for the decays considered previously. Models with infinite memory We also mention rapidly here truly non linear extensions of LARCH(∞) models which are the chains with infinite memory from Doukhan and Wintenberger (2006) [78] and the random fields with infinite interactions from Doukhan and Truquet (2007) [55], those models are respectively solutions of the equations‡ Xt
=
Xt
=
F (Xt−1 , Xt−2 , . . . ; ξt ), t ∈ Z, F Xt−j j =0 ; ξt ), t ∈ Zd ,
those models are usual excited by i.i.d. inputs ξ. Even if no explicit chaotic solution seems to be available in general, such models are well defined and Lp stationary if αj xj − yj , a= αj < 1, F (x; ξ0 ) − F (y; ξ0 )m ≤ j =0
j =0
in the previous inequality§ one should take m = p both for causal random processes and causal random fields (accurately defined in the above mentioned work) and m = ∞ else. Moreover the weak dependence coefficients are proved to follow analogous decays with now αi = ai in theorem 3.4.2. More precisely, the respectively τ or η weak dependence coefficients have rates driven by the relation r αi . inf a p + p
‡ Here
|i|≥p d
the function F (x; u) is perhaps not defined over all RN × R or RZ × R but it is enough that it is defined on trajectories of the solution. § For respectively j ∈ N∗ and j ∈ Zd \ {0}.
3.6. OTHER MODELS
65
With the previous geometric decays, the bound is the
same as in the paragraph related to theorem 3.3 and for Riemanian decays i≥p ai ≤ C i−b a log loss appears and τ1,∞ (r) ≤ Klogb∨1 r · r−b .
3.6.4
Continuous time
In the continuous time setting one better consider a process with independent increments, (Zt )t∈R , then a simple extension of linear processes is defined through the Wiener integral ∞
Xt =
−∞
f (t − s)dZs
It is for example simple to define such integrals for a Brownian motion but other possibilities are all the classes of L´evy processes. Among them, the SαS-L´evy motion is described in Samorodnitsky and Taqqu (1994) [172]. Analogues of Volterra processes are now multiple stochastic integrals. A complete theory is developed by Major (1981) [126]. More examples are provided in the monograph by Doukhan, Oppenheim and Taqqu (2003) [72].
Chapter 4
Tools for non causal cases Moment inequalities are the main tools when using non causal weak dependence. A first useful section addresses the weak dependence properties of indicators of processes, useful both for moment inequalities and for the empirical process. After this separate sections address variances of sums, (2 + δ)-order moments and higher order moments. They yield both Rosenthal type and MarcinkiewiczZygmund inequalities. Finally cumulants sums are also considered as dependence coefficients and they are used in order to derive sharp exponential inequalities. A last section is devoted to prove tightness criteria for empirical processes through suitable moment inequalities.
4.1
Indicators of weakly dependent processes
Define, for positive real number x, the function gx : R → R by gx (y) = 1x≤y − 1x≤−y . We are interested along this chapter by (I, Ψ)-dependent sequences, where * )u gxi xi > 0, u ∈ N∗ , I= i=1
and Ψ(f, g) = c(df , dg ), for some positive function c defined on N∗ × N∗ , in this case we will simply say that the sequence is (I, c)-dependent. Set )u * Λ0 = fi fi ∈ Λ ∩ L∞ , fi : R → R, i = 1, . . . , u, u ∈ N∗ . i=1
The following lemma relates η, κ or θ weak dependence to I weak dependence under additional concentration assumptions. 67
68
CHAPTER 4. TOOLS FOR NON CAUSAL CASES
Lemma 4.1. Let (Xn )n∈N be a sequence of r.v’s. Suppose that, for some positive real constants C, α, λ sup sup P (x ≤ Xi ≤ x + λ) ≤ Cλα .
(4.1.1)
x∈R i∈N
(i) If the sequence (Xn ) is (Λ0 , η)-dependent, then it is (I, c)-dependent with α 1
(r) = η(r) 1+α and c(u, v) = 2(8C) 1+α (u + v). (ii) If the sequence (Xn ) is (Λ0 , κ)-dependent, then it is (I, c)-dependent with α
2
(r) = κ(r) 2+α and c(u, v) = 2(8C) 2+α (u + v)
2(1+α) 2+α
.
(iii) If the sequence (Xn ) is (Λ0 , θ)-dependent, then it is (I, c)-dependent with α 1 1
(r) = θ(r) 1+α and c(u, v) = 2(8C) 1+α (u + v) 1+α . , λ)-dependent (with λ(r) then it is (I, c)(iv) If the sequence (Xn ) is (Λ0 ≤ 1),2(1+α) 1 2 1+α 2+α 2+α (u + v) dependent with c(u, v) = 2 (8C) + (8C) and (r) = α
λ(r) 2+α . Proof of Lemma 4.1. First
recall that for all real numbers 0 ≤ xi , yi ≤ 1, |x1 · · · xm − y1 · · · ym | ≤ m i=1 |xi − yi |. Let then g, f ∈ I, i.e. g(y1 , . . . , yu ) = gx1 (y1 ) · · · gxu (yu ), and f (y1 , . . . , yv ) = gx1 (y1 ) · · · gxv (yv ) for some u, v ∈ N∗ and xi , x j ≥ 0. For fixed x > 0 and a > 0 let y x y x fx (y) = 1y>x − 1y≤−x + − + 1 1x−a
k(y1 , . . . , yv ) = fx1 (y1 ) · · · fxv (yv )
then Lip(h) ≤ a−1 , Lip(k) ≤ a−1 . Consider i1 ≤ · · · ≤ iu ≤ iu + r ≤ j1 ≤ · · · ≤ jv and set Cov(h, k) := Cov(h(Xi1 , . . . , Xiu ), k(Xj1 , . . . , Xjv )). (i) η-weak dependence ⇒ |Cov(h, k)| ≤ (u + v)η(r)/a. (ii) κ-weak dependence ⇒ |Cov(h, k)| ≤ ((u + v)/a)2 κ(r). (iii) θ-weak dependence ⇒ |Cov(h, k)| ≤ vθ(r)/a. Inequality (4.1.1) yields |Cov(g, f ) − Cov(h, k)| ≤ 8Caα (u + v) and (i) |Cov(g, f )| ≤ 8Caα (u + v) + (u + v)η(r)/a,
4.2. LOW ORDER MOMENTS INEQUALITIES (ii) |Cov(g, f )| ≤ 8Caα (u + v) +
u+v 2 a
69
κ(r), or
(iii) |Cov(g, f )| ≤ 8Caα (u + v) + va θ(r). The lemma follows by setting respectively 1/(1+α) 1/(2+α) 1/(1+α) (u+v)κ(r) θ(r) η(r) (ii) a = (iii) a = (i) a = 8C 8C 8C(u+v) The case of λ dependence is obtained by the summation of both cases (i) and (ii).
4.2
Low order moments inequalities
Our proof for central limit theorems is based on a truncation method. For a truncation level T ≥ 1 we shall denote X k = fT (Xk ) − EfT (Xk ) with fT (X) = X ∨ (−T ) ∧ T . Let us simply remark that X k has moments of any orders because it is bounded. Suppose that μ = E|X|m is finite for some m > 0. Furthermore, for any a ≤ m, we control the difference E|fT (X0 ) − X0 |a with Markov inequality: E|fT (X0 ) − X0 |a ≤ E|X0 |a 1{|X0 |≥T } ≤ μT a−m , thus using Jensen inequality yields 1
X 0 − X0 a ≤ 2μ a T 1− a . m
(4.2.1)
Deriving from this truncation, we are now able to control the limiting variance as well as the higher order moments.
4.2.1
Variances
Lemma 4.2 (Variances). If one of the following conditions holds then the series
k≥0 |Cov(X0 , Xk )| is convergent ∞ k=0 ∞
κ(k) m−2
λ(k) m−1
< ∞
(4.2.2)
< ∞
(4.2.3)
k=0
Proof. Using the fact that X 0 = gT (X0 ) is a function of X0 with Lip gT = 1, gT ∞ ≤ 2T we derive, for T large enough, |Cov(X 0 , X k )| ≤ κ(k),
or ≤ (2T + 1)λ(k) ≤ 4T λ(k) respectively
(4.2.4)
70
CHAPTER 4. TOOLS FOR NON CAUSAL CASES
In the κ-dependent case, truncation can thus be omitted and |Cov(X0 , Xk )| ≤ κ(k)
(4.2.5)
we only consider λ dependence below. Now we develop Cov(X0 , Xk ) = Cov(X 0 , X k ) + Cov(X0 − X 0 , Xk ) + Cov(X 0 , Xk − X k ) and using a truncation T to be determined we use the two previous bounds 1 1 (4.2.1) and (4.2.4) with H¨ older inequality with the exponents + = 1 to a m derive |Cov(X0 , Xk )| ≤ 4T λ(k) + 2X0 m X 0 − X0 a ≤ 4T λ(k) + 4μ1/a+1/m T 1−m/a ≤ 4(T λ(k) + μT 2−m ). Note that we used the relation 1 − m/a = 2 − m. Thus using the truncation μ such that T m−1 = λ(k) yields the bound 1
m−2
|Cov(X0 , Xk )| ≤ 8μ m−1 λ(k) m−1 .
4.2.2
(4.2.6)
A (2 + δ)-order moment bound
Lemma 4.3. Assume that the stationary and centered process (Xi )i∈Z satisfies −κ E|X0 |2+ζ < ∞, and it is either κ-weakly −λ dependent with 1κ(r) = O (r )2 or λ. Then if κ > 2 + ζ , or λ > 4 + ζ , then weakly dependent with λ(r) = O r for all δ ∈]0, A ∧ B ∧ 1[ (where A and B are constants smaller than ζ and only depending of ζ and respectively κ or λ, see 4.2.10 and 4.2.11), there exist C > 0 such that: √ where Δ = 2 + δ. Sn Δ ≤ C n, Remarks.
1/Δ 5 • The constant C satisfies C > |Cov(X0 , Xk )|. Under 2δ/2 − 1 k∈Z the conditions of this lemma, Lemma 4.2 entails c≡ |Cov(X0 , Xk )| < ∞. k∈Z
• The result is sketched from Bulinski and Sashkin (2005) [33]; notice, however that their condition of dependence is of a causal nature while our is not which explains a loss with respect to the exponents λ and κ. In their ζ−weak dependence setting the best possible value of the exponent is 1 while it is 2 for our non causal dependence.
4.2. LOW ORDER MOMENTS INEQUALITIES
71
Proof of lemma 4.3. Analogously to Bulinski and Sashkin (2005) [33], who sketch Ibragimov (1962) [110], we proceed by recurrence on k for n ≤ 2k to prove the property:
√
1 + |Sn | ≤ C n. (4.2.7) Δ We then assume (4.2.7) for all n ≤ 2K−1 . We note N = 2K and we want to bound 1 + |SN | Δ . We always can share the sum SN in three blocks, the two first with the same size n ≤ 2K−1 denoted A and B, and the third V placed between the two first and of size q < n. We then have 1 + |SN | Δ ≤ 1 + |A| + |B| Δ + V Δ . The term V Δ is directly bounded with 1 + |V | Δ √ and the property of recurrence, i.e. C q. Writing q = N b with b < 1, then this √ term is of order strictly smaller than N . For 1 + |A| + |B| Δ , we have: E(1 + |A| + |B|)Δ
≤
E(1 + |A| + |B|)2 (1 + |A| + |B|)δ ,
≤
E(1 + 2|A| + 2|B| + (|A| + |B|)2 )(1 + |A| + |B|)δ .
An expansion yields the terms: √ • E(1 + |A| + |B|)δ ≤ 1 + |A|δ2 + |B|δ2 ≤ 1 + 2cδ ( n)δ , δ • E|A|(1 + |A| + |B|)δ ≤ E|A|((1 + |B|)δ + |A|δ ) ≤ E|A|(1 + |B|) E|A|1+δ . √ +1+δ 1+δ 1+δ 1+δ The term E|A| is bounded with A2 and then c ( n) . The term E|A|(1+|B|)δ is bounded using H¨older A1+δ/2 1+|B| δΔ and then √ is at least of order cC δ ( n)1+δ .
• We have the analogous with B instead of A. • E(|A| + |B|)2 (1 + |A| + |B|)δ . For this term, we use an inequality from Bulinski: E (|A| + |B|)2 (1 + |A| + |B|)δ ≤ E|A|Δ + E|B|Δ + 5(EA2 (1 + |B|)δ + EB 2 (1 + |A|)δ ). √ The term E|A|Δ is bounded using (4.2.7) by C Δ ( n)Δ . The second term is the analogous with B. The third is treated with particular care in the following. We now want to control EA2 (1 + |B|)δ and the analogous with B. For this, we introduce the weak dependence. We then have to truncate the variables. We denote X the variable X ∨ (−T ) ∧ T for a real T that will determined later. We then note by extension A and B the sums of the truncated variables X i . Remarking that |B| − |B| ≥ 0, we have: 2
2
E|A|2 (1 + |B|)δ ≤ EA2 (|B| − |B|)δ + E(A2 − A )(1 + |B|)δ + EA (1 + |B|)δ .
72
CHAPTER 4. TOOLS FOR NON CAUSAL CASES
We begin by control EA2 (|B|−|B|)δ . Set m = 2+ζ, then using H¨older inequality with 2/m + 1/m = 1 yields: EA2 (|B| − |B|)δ ≤ A2m (|B| − B|)δ m AΔ is bounded using (4.2.7) and we remark that:
(|B|−|B|)δm ≤ (|B|−|B|1{∀i,|Xi |≤T } )δm ≤ |B|δm 1{∃i,|Xi |>T } ≤ |B|δm 1|B|>T . We then bound 1|B|>T ≤ (|B|/T )α with α = m − δm . Then
E||B| − |B||δm ≤ E|B|m T δm −m . Then, by convexity and stationarity, we have E|B|m ≤ nm E|X0 |m . Then:
EA2 (|B| − |B|)δ n2+m/m T δ−m/m . Finally, remarking that m/m = m − 2, we obtain: EA2 (|B| − |B|)δ nm T Δ−m . We obtain the same bound for the second term: 2
E(A2 − A )(1 + |B|)δ nm T Δ−m . For the third term, we introduce a covariance term: 2
2
2
EA (1 + |B|)δ ≤ Cov(A , (1 + |B|)δ ) + EA E(1 + |B|)δ . √ Δ The last term is bounded with |A|22 |B|δ2 ≤ cΔ n . The covariance is controlled by the weak-dependence notions: • in the κ-dependent case: n2 T κ(q), • in the λ-dependent case: n3 T 2 λ(q). We then choose either the truncation T m−δ−1 = nm−2 /κ(q) or T m−δ = nm−3 /λ(q). Now the three terms of the decomposition have the same order: 1/(m−δ−1) under κ-dependence, E|A|2 (1 + |B|)δ n3m−2Δ κ(q)m−Δ 1/(m−δ) E|A|2 (1 + |B|)δ n5m−3Δ λ(q)m−Δ under λ-dependence. Set q = N b , we note that n ≤ N/2 and this term has order N
3m−2Δ+bκ(Δ−m) m−δ−1
5m−3Δ+bλ(Δ−m) m−δ
under κ-weak dependence and the order N under λ-weak dependence. Those terms are thus negligible with respect to N Δ/2 if: κ λ
> >
3m−2Δ−Δ/2(m−δ−1) , b(m−Δ)
under κ-dependence,
(4.2.8)
5m−3Δ−Δ/2(m−δ) , b(m−Δ)
under λ-dependence.
(4.2.9)
4.3. COMBINATORIAL MOMENT INEQUALITIES
73
Finally, using this assumption, b < 1 and n ≤ N/2 we derive the bound for some suitable constants a1 , a2 > 0: √ Δ E(1 + |SN |)Δ ≤ 2−δ/2 C Δ + 5 · 2−δ/2 cΔ + a1 N −a2 N . Using the relation linking C and c, we conclude that (4.2.7) is also true at the step N if the constant C satisfies 2−δ/2 C Δ +5·2−δ/2 cΔ +a1 N −a2 ≤ C Δ . Choose 1/Δ Δ 5c + a1 2δ/2 with c = |Cov(X0 , Xk )|, then the previous relation C> δ/2 2 −1 k∈Z holds. Finally, we use eqns. (4.2.8) and (4.2.9) to find a condition on δ. In the case of κ-weak dependence, we rewrite inequality (4.2.8) as: 0 > δ 2 + δ(2κ − 3 − ζ) − κζ + 2ζ + 1. It leads to the following condition on λ: # (2κ − 3 − ζ)2 + 4(κζ − 2ζ − 1) + ζ + 3 − 2κ δ< = A. 2 We do the same in the case of the λ-weak dependence: # (2λ − 6 − ζ)2 + 4(λζ − 4ζ − 2) + ζ + 6 − 2λ δ< = B. 2
(4.2.10)
(4.2.11)
Remark: those bounds are always smaller than ζ.
4.3
Combinatorial moment inequalities
n Let (Xn )n∈N be a sequence of centered r.v.s. Let Sn = i=1 Xi . In this section, q we obtain bounds for |E(Sn )|, when q ∈ N and q ≥ 2. Our main references are here Doukhan and Portal (1983) [73], Doukhan and Louhichi (1999) [67], and Rio (2000) [161]. We first introduce the following coefficient of weak dependence. Definition 4.1. Let (Xn ) be a sequence of centered r.v.s. For positive integer r, we define the coefficient of weak dependence as non-decreasing sequences (Cr,q )q≥2 such that sup |Cov(Xt1 · · · Xtm , Xtm+1 · · · Xtq )| =: Cr,q ,
(4.3.1)
where the supremum is taken over all {t1 , . . . , tq } such that 1 ≤ t1 ≤ · · · ≤ tq and m, r satisfy tm+1 − tm = r.
74
CHAPTER 4. TOOLS FOR NON CAUSAL CASES
Below, we provide explicit bounds of Cr,q in order to obtain inequalities for moments of the partial sums Sn . We shall assume, that either there exist constants C, M > 0 such that for any convenient q-tuple {t1 , . . . , tq } as in the definition, Cr,q ≤ CM q (r),
(4.3.2)
or, denoting by QX the quantile function of |X| (inverse of the tail function t → P(|X| > t), see (2.2.14)), (r) Cr,q ≤ c(q) QXt1 (x) · · · QXtq (x)dx, (4.3.3) 0
The bound (4.3.2) holds mainly for bounded sequences. E.g. if Xn ∞ ≤ M and X is (Λ ∩ L∞ , Ψ)-weak dependent, we have: Cr,q ≤ max Ψ(j ⊗m , j ⊗(q−m) , m, q − m)M q (r), 1≤m
where j(x) = x1|x|≤1 + 1x>1 − 1x<−1 . If Ψ(h, k, u, v) = c(u, v)Lip (h)Lip (k), the bound becomes Cr,q ≤ max c(m, q − m)M q−2 (r). 1≤m
The bound (4.3.3) holds for more general r.v.s, using moment or tail assumptions. With Lemma 4.1, we derive that if the concentration property (4.1.1) holds then the η (resp. κ) weak dependence implies (I, c)-weak dependence. Now relation (4.3.3) follows from the following lemma. Lemma 4.4. If the sequence (Xn )n∈N is (I, c)-weak dependent, then (r) |Cov(Xt1 · · · Xtm , Xtm+1 · · · Xtq )| ≤ Cq Qt1 (u) · · · Qtq (u)du, 0
where Cq = (maxu+v≤q c(u, v)) ∨ 2. The quantity Qti denotes the inverse of the tail function t → P(|Xti | > t) (inverse are defined in eqn. 2.2.14)). Proof of Lemma 4.4. Let Y + = 0 ∨ Y and Y − = 0 ∨ (−Y ), +∞ +∞ Y+ = 1x≤Y + dx = 1x≤Y dx, 0
Y−
=
0
and
0
+∞
1x≤Y − dx =
0
+∞
1x≤−Y dx.
The inclusion-exclusion formula entails, Y1 · · · Yq =
q i=1
(Yi+ − Yi− ) =
(−1)q−r Yi+ · · · Yi+ Yi− · · · Yi− , 1 r r+1 q
4.3. COMBINATORIAL MOMENT INEQUALITIES
75
where denotes a summation over all the permutations {i1 , . . . , iq } of {1, . . . , q}. Using Fubini’s theorem, the preceding integral representation yields Y1 · · · Yq = (−1)q−r × 1x1 ≤Yi1 · · · 1xr ≤Yir 1xr+1 ≤−Yr+1 · · · 1xq ≤−Yiq dx1 · · · dxq Rd +
=
q
Rd + i=1
(1xi ≤Yi − 1xi ≤−Yi )dx1 · · · dxq .
Again Fubini’s theorem yields q E(Y1 · · · Yq ) = E (1xi ≤Yi − 1xi ≤−Yi ) dx1 · · · dxq . Rd +
(4.3.4)
i=1
Now, eqn. (4.3.4) applied with Yi = Xti for i = 1, . . . , q, together with Fubini’s theorem implies
⎛
Cov Xt1 · · · Xtm , Xtm+1 · · · Xtq =
Rd +
Cov ⎝
m
fi (Xti ),
i=1
q
⎞ fi (Xti )⎠ dx1 · · · dxq ,
i=m+1
where fi (y) = 1xi ≤y − 1xi ≤−y . Define m q fi (Xti ), fi (Xti ) . B = Cov i=1
(4.3.5)
i=m+1
In the sequel, we give two bounds of the quantity B. The first bound does not use the dependence structure, only that |fi (y)| = 1xi ≤|y| . Thus B ≤ 2 inf(ΦXt1 (x1 ), . . . , ΦXtq (xq )),
(4.3.6)
with ΦX (x) = P(|X| ≥ x). The second bound is deduced from the (I, c)-weak dependence property. In fact, we have (recall that r = tm+1 − tm ) B ≤ c(u, v) (r).
(4.3.7)
The bound (4.3.7) together with (4.3.6) yields B ≤ Cq inf( (r), ΦXt1 (x1 ), . . . , ΦXtq (xq )). Hence |Cov(Xt1 · · · Xtm , Xtm+1 · · · Xtq )| +∞ +∞ ... inf( (r), ΦXt1 (x1 ), . . . , ΦXtq (xq ))dx1 · · · dxq . ≤ (cq ∨ 2) 0
0
76
CHAPTER 4. TOOLS FOR NON CAUSAL CASES
The proof of Theorem 1-1 in Rio (1993) [157] can be completely implemented here. We give it for completeness. Let U be an uniform-[0,1] r.v, then
(r) ∧ min ΦXtj (xj ) 1≤j≤q
= P(U ≤ (r), U ≤ ΦXt1 (x1 ), . . . , U ≤ ΦXtq (xq )) = P(U ≤ (r), x1 ≤ QXt1 (U ), . . . , xq ≤ QXtq (U )).
We obtain, collecting the above results |Cov(Xt1 · · · Xtm , Xtm+1 · · · Xtq )| ≤ Cq EQXt1 (U ) · · · QXtq (U )1U≤(r) . The lemma is thus proved. In order to make possible to use such bounds it will be convenient to express bounds of the quantities sa,b,N =
N
(r)
(r + 1)a
Qb (s)ds
(4.3.8)
0
r=0
under conditions of summability of the series , for suitable constants a ≥ 0 and N, b > 0 and a tail function Q of a random variable X such that E|X|b+δ < ∞ r for some δ > 0. Set for convenience Ar = i=0 (i + 1)a for r ≥ 0 and = 0 for (r) b r < 0, and Br = 0 Q (s)ds for r ≥ 0 (= 0 for r < 0), then expression sa,b,N rewrites as follows; Abel transform with H¨ older inequality for the conjugate exponents p = 1 + b/δ and q = 1 + δ/b implies the succession of inequalities sa,b,N =
N
(Ar − Ar−1 )Br
r=0
=
N −1 r=0
1
=
Ar (Br − Br+1 ) + AN BN
N −1
0
≤ ≤
Ar 1](r+1),(r)](s) + AN 1[0,(N )] (s) Qb (s)ds
r=0 1
−1 N
0 −1 N
1/p ds
Apr 1](r+1),(r)](s) + ApN 1[0,(N )] (s)
r=0
1/p 1/q E|X|bq Apr ( (r) − (r + 1)) + ApN (N )
r=0
Hence sa,b,N ≤
N r=0
1/p (Apr
−
Apr−1 ) (r)
b/(b+δ) E|X|b+δ
0
1
Qbq (s)ds
1/q
4.3. COMBINATORIAL MOMENT INEQUALITIES
77
Now we note that ra+1 /(a + 1) ≤ Ar ≤ (r + 1)a+1 /(a + 1) so that Apr −Apr−1 ≤ ((r+1)p(a+1) −(r−1)p(a+1) )/(a+1)p ≤ 2p(r+1)p(a+1)−1 /(a+1)p−1 hence, setting c = (2p)1/p (a + 1)1/p−1 , sa,b,N ≤ c
N
δ/(b+δ) a+(a+1)b/δ
(r + 1)
(r)
Xbb+δ .
r=0
We summarize this in the following lemma. Lemma 4.5. Let a > 0, b ≥ 0 be arbitrary then there exists a constant c = c(a, b) such that for any real random valued variable X with quantile function QX , we have for any N > 0 N
(r)
QbX (s)ds ≤ c
a
(r + 1)
r=0
4.3.1
0
N
δ/(b+δ) (r + 1)a+(a+1)b/δ (r) Xbb+δ .
r=0
Marcinkiewicz-Zygmund type inequalities
Our first result is the following Marcinkiewicz-Zygmund inequality. Theorem 4.1. Let (Xn )n∈N be a sequence of centered r.v.s fulfilling for some fixed q ∈ N, q ≥ 2 (4.3.9) Cr,q = O(r−q/2 ), as r → ∞. Then there exists a positive constant B not depending on n for which |E(Snq )| ≤ Bnq/2 . Proof of Theorem 4.1. For any integer q ≥ 2, let Aq (n) = |E Xt1 · · · Xtq |.
(4.3.10)
(4.3.11)
1≤t1 ≤···≤tq ≤n
Clearly, |E(Snq )| ≤ q!Aq (n).
(4.3.12)
Hence, in order to bound |E(Snq )|, it remains to bound Aq (n). For this, we argue as in Doukhan and Portal (1983) [73]. Clearly |E(Xt1 · · · Xtm )E(Xtm+1 · · · Xtq )| Aq (n) ≤ 1≤t1 ≤···≤tq ≤n
+
1≤t1 ≤···≤tq ≤n
|Cov(Xt1 · · · Xtm , Xtm+1 · · · Xtq )|.
78
CHAPTER 4. TOOLS FOR NON CAUSAL CASES
The first term on the right hand of the last inequality is bounded by:
|E(Xt1 · · · Xtm )E(Xtm+1 · · · Xtq )|
≤
1≤t1 ≤···≤tq ≤n
q−1
Am (n)Aq−m (n).
m=1
Hence Aq (n) ≤
q−1
Am (n)Aq−m (n) + Vq (n).
(4.3.13)
|Cov(Xt1 · · · Xtm , Xtm+1 · · · Xtq )|,
(4.3.14)
m=1
with
Vq (n) =
(t1 ,··· ,tq )∈Gr
where Gr is the set of {t1 , . . . , tq } fulfilling 1 ≤ t1 ≤ · · · ≤ tq ≤ n with r = tm+1 − tm = max1≤i
n ∗
|Cov(Xt1 · · · Xtm , Xtm+1 · · · Xtq )|,
t1 =1
∗ denotes a sum over such a collection 1 ≤ t1 ≤ · · · ≤ tq ≤ n with fixed where t1 , and r = tm+1 − tm = max1≤i≤q−1 (ti+1 − ti ) ∈ [0, n − 1]. Again ∗
|Cov(Xt1 · · · Xtm , Xtm+1 · · · Xtq )| ≤
n−1 ∗∗
|Cov(Xt1 · · · Xtm , Xtm+1 · · · Xtq )|.
r=0
where
∗∗
denotes the (q − 2)-dimensional sums each over {(t1 , . . . , tq ) / ti−1 ≤ ti ≤ ti−1 + r, i = 1, m + 1 }.
∗∗ Hence 1 = (r + 1)q−2 , with |Cov(Xt1 · · · Xtm , Xtm+1 · · · Xtq )| ≤ Cr,q , we deduce that n n−1 Vq (n) ≤ (r + 1)q−2 Cr,q . (4.3.15) t1 =1 r=0
We obtain, collecting inequalities (4.3.13) and (4.3.15), Aq (n) ≤
q−1 m=1
Am (n)Aq−m (n) + n
n−1
(r + 1)q−2 Cr,q .
(4.3.16)
r=0
By induction on q, and using the last inequality together with condition (4.3.9), it is easy to check that Aq (n) ≤ Kq nq/2 . Theorem 4.1 follows from (4.3.12).
4.3. COMBINATORIAL MOMENT INEQUALITIES
4.3.2
79
Rosenthal type inequalities
The following lemma, which is a variant of Lemma 1 page 195 in Billingsley (1968) [20], gives moment inequalities of order q ∈ {2, 4}. Lemma 4.6. If (Xn )n∈N is a sequence of centered r.v.s, then ⎧ ⎫ 2 n−1 n−1 n−1 ⎨ ⎬ n E(Sn2 ) ≤ 2n Cr,2 , E(Sn4 ) ≤ 4! Cr,2 +n (r + 1)2 Cr,4 . ⎩ ⎭ r=0
r=0
r=0
(4.3.17) Proof of Lemma 4.6. We take respectively q = 2 and q = 4 in the relation (4.3.16). The obtained formulas, together with (4.3.12) and the fact that A1 (n) = 0 for any positive integer n, prove Lemma 4.6. The following theorem deals with higher order moments. Theorem 4.2. Let q be some fixed integer not less than 2. Assume that the dependence coefficients Cr,p associated to the sequence (Xn ) satisfy, for every nonnegative integer p, p ≤ q, and for some positive constants C and M , Cr,p ≤ CM p (r).
(4.3.18)
Then, for any integer n ≥ 2 ⎧ q/2 ⎫ n−1 n−1 ⎬ ⎨ (2q − 2)! . Cn Mq
(r) ∨ Cn (r + 1)q−2 (r) |E(Snq )| ≤ ⎭ ⎩ (q − 1)! r=0
r=0
(4.3.19) Proof of Theorem 4.2. The relation (4.3.16) together with Condition (4.3.18) gives, Aq (n) ≤
q−1
Am (n)Aq−m (n) + CM q n
m=1
n−1
(r + 1)q−2 (r).
(4.3.20)
r=0
In order to solve the above inductive relation, we need the following lemma. Lemma 4.7. Let (Uq )q>0 and (Vq )q>0 be two sequences of positive real numbers satisfying for some γ ≥ 0, and for all q ∈ N∗ Uq ≤
q−1 m=1
Um Uq−m + M q Vq ,
(4.3.21)
80
CHAPTER 4. TOOLS FOR NON CAUSAL CASES
with U1 = 0 ≤ V1 . Suppose that for every integers m, q fulfilling 2 ≤ m ≤ q − 1, there holds m/2 (q−m)/2 q/2 ∨ Vm )(V2 ∨ Vq−m ) ≤ (V2 ∨ Vq ). (4.3.22) (V2 Then, for any integer q ≥ 2 Uq ≤
Mq q
2q − 2 q−1
q/2
(V2
∨ Vq ).
(4.3.23)
Remark 4.1. Note that a sufficient condition for (4.3.22) to hold is that for all p, q such that 2 ≤ p ≤ q − 1 Vpq−2 ≤ Vqp−2 V2q−p
(4.3.24)
Proof of Lemma 4.7. Let (Uq )q>0 and (Vq )q>0 be two sequences of positive real numbers as defined by Lemma 4.7. We deduce from (4.3.21) and (4.3.22), that q/2 the sequence (U˜q ), defined by U˜q = Uq /M q (V2 ∨ Vq ) satisfies the relation, ˜q ≤ U
q−1
˜m U ˜q−m + 1, U
˜1 = 0. U
m=1
In order to prove (4.3.23), it suffices to show that, for any integer q ≥ 2, 1 2q − 2 ˜ Uq ≤ dq := , (4.3.25) q−1 q where dq is called the q-th number of Catalan. The proof of the last bound is done by induction on q. Clearly (4.3.25) is true for q = 2. Suppose now that (4.3.25) is true for every integer m less than q − 1. The inductive hypothesis ˜1 = 0: yields with U q−2 ˜ Uq ≤ dm dq−m + 1. (4.3.26) m=2
The last inequality, together with the identity dq = q−1 m=1 dm dq−m (cf. Comtet ˜q ≤ dq , proving (4.3.25) and thus Lemma 4.7. (1970) [39], page 64), implies U We continue the proof of Theorem 4.2. We deduce from (4.3.20) that the sequence (Aq (n))q satisfies (4.3.21) with Vq := Vq (n) = CM q−2 n
n−1
(r + 1)q−2 (r).
r=0
Hence, to prove Theorem 4.2, it suffices to check condition (4.3.22). m/2
(V2
(q−m)/2
∨ Vm )(V2
q/2
∨ Vq−m ) ≤ V2
m/2
∨ V2
(q−m)/2
Vq−m ∨ Vm V2
∨ Vm Vq−m .
4.3. COMBINATORIAL MOMENT INEQUALITIES
81
To control each of these terms, we use three inequalities. Let p be a positive integer, 2 ≤ p ≤ q − 1. We deduce from, n−1 n−1 q−p p−2 q−2 q−2 n−1 p−2 q−2 (r + 1) (r) ≤
(r) (r + 1) (r) , r=0
r=0
that, for 2 ≤ p ≤ q − 1,
r=0 p−2
q−p
Vp ≤ Vqq−2 V2q−2 .
(4.3.27)
n−1
Define the discrete r.v. Z by P(Z = r + 1) = (r)/ i=0 i . Jensen inequality implies Zp−2 ≤ Zq−2 if 1 ≤ p − 2 ≤ q − 2 so that p−2
Vp ≤ Vqq−2 . For 0 < α < 1,
αq
V2 2 Vq1−α ≤ V2
q/2
(4.3.28) ∨ Vq .
(4.3.29)
Using (4.3.28), we get (q−m)/2
Vm V2
m/2
V2
(q−m)q
m−2
≤ V2 2(q−2) Vq q−2 mq
q−m−2 q−2
Vq−m ≤ V22(q−2) Vq
From (4.3.27) we obtain q
q−4
Vm Vq−m ≤ V2q−2 Vqq−2 q/2
Now (4.3.29) implies that these three bounds are less than V2
∨ Vq .
Theorem 4.3. If (Xn )n∈N is a centered and (I, c)-weak dependent sequence, then $ n 1 −1 q−1 q (2q − 2)! q |E(Sn )| ≤ Cq
(u) ∧ n Qi (u)du (q − 1)! i=1 0 q/2 ⎫ n 1 ⎬ 2 −1 ∨ C2 ,
(u) ∧ n Qi (u)du ⎭ 0 i=1
where −1 (u) is the generalized inverse of [u] (see eqn. (2.2.14)). Proof of Theorem 4.3. We begin from relation (4.3.16), and we try to evaluate the coefficient of dependence Cr,q for (I, ψ)-weak dependent sequences. For this, we need the forthcoming lemma. Lemma 4.8. If the sequence (Xn )n∈N is (I, c)-weak dependent, then n 1 −1 q−1 q
(u) ∧ n Vq (n) ≤ Cq Qi (u)du. i=1
0
82
CHAPTER 4. TOOLS FOR NON CAUSAL CASES
Proof of Lemma 4.8. [157], we obtain
We use Lemma 4.4. Arguing exactly as in Rio (1993)
Vq (n) ≤
Cq
n−1
r=0 (i1 ,...,iq )∈Gr n−1 Cq q r=0
≤
Cq q
≤
(r)
q (r) 0
(i1 ,...,iq )∈Gr
n−1
Qi1 (u) · · · Qiq (u)du
0
j=1
r=0 (i1 ,...,iq )∈
. k≤r
Qqij (u)du
(r)
(r+1)
Gk
Qqij (u)du
Now fixing ij and noting . that the number of completing (i1 , . . . , ij−1 , ij+1 , . . . , iq ) to get an sequence in k≤r Gk is less than (r + 1)q−1 : n−1
r=0 (i1 ,...,iq )∈∪k≤r Gk
(r)
(r+1)
Qqij (u)du
≤
n n−1 ij =0 r=0
≤
n i=0
1 0
(r)
(r+1)
(r + 1)q−1 Qqij (u)du
( −1 (u) ∧ n)q−1 Qqi (u)du
Lemma 4.8 is now completely proved. We continue the proof of Theorem 4.3. We deduce from Lemma 4.8 and Inequality (4.3.16), that the sequence (Aq (n))q fulfills relation (4.3.21). So, as in the proof of Theorem 4.2, Theorem 4.3 is proved, if we prove that the sequence p−1 p
n 1 V˜p (n) := (cp ∨ 2) i=1 0 −1 (u) ∧ n Qi (u)du satisfies (4.3.22). We have, 1 −1 p−1 p
(u) ∧ n Qi (u)du 0
≤
0
1
−1
(u) ∧ n Q2i (u)du
q−p q−2 0
1
−1 q−2 q
(u) ∧ n Qi (u)du
p−2 q−2 .
The last bound proves that the sequence (V˜p (n))p fulfills the convexity inequality (4.3.27), which in turns implies (4.3.22).
4.3.3
A first exponential inequality
For any positive integers n and q ≥ 2, we consider the following assumption, Mq,n = n
n−1
(r + 1)q−2 Cr,q ≤ An
r=0
q! βq
(4.3.30)
4.3. COMBINATORIAL MOMENT INEQUALITIES
83
where β is some positive constant and An is a sequence independent of q. As a consequence of Theorem 4.2, we obtain an exponential inequality. Corollary 4.1. Suppose that (4.3.18) and (4.3.30) hold for some sequence An ≥ 1 for any n ≥ 2. Then for any positive real number x # # P |Sn | ≥ x An ≤ A exp −B βx , (4.3.31) for universal positive constants A and B. Remark 4.2.
√ • One may choose the explicit values A = e4+1/12 8π, and B = e5/2 . • Let us note that condition (4.3.30) holds if Cr,q ≤ CM q e−br for positive constants C, M, b. In such a case An is of order n. E.g. this holds if Xn ∞ ≤ M and Xn 2 ≤ σ under (Λ ∩ L∞ , Ψ)-weak dependence if
(r) = O(e−br ) and Ψ(h, k, u, v) ≤ eδ(u+v) Lip (h)Lip (k) for some δ ≥ 0. q−2 −ar e with integrals or with For this, either compare the series r (r + 1) derivatives of the function t → 1/(1 − t) = i ti at point t = e−a .
• The use of combinatorics in those inequalities makes them relatively weak. E.g. Bernstein inequality, valid for independent sequences allows to replace √ the term x in the previous inequality by x2 under the same assumption nσ 2 ≥ 1; in the mixing cases analogue inequalities are also obtained by using coupling arguments (not available here), e.g. the case of absolute regularity is studied in Doukhan (1994) [61]. Proof of Corollary 4.1. Theorem 4.2 written with q = 2p yields (2p)! 4p − 2 p E(Sn2p ) ≤ ). (M2p,n ∨ M2,n 2p − 1 2p Hence inequality (4.3.32) together with condition (4.3.30) implies p 2An (2p)! (4p − 2)! ∨ A E(Sn2p ) ≤ n (2p − 1)! β2 β 2p (2p)! (4p − 2)! (An ∨ Apn ) 2p ≤ (2p − 1)! β (4p)! ≤ (An ∨ Apn ) 2p . β From Stirling formula and from the fact that An ≥ 1 we obtain P(|Sn | ≥ x)
≤ ≤
E(Sn2p ) Apn 1/12−4p # ≤ e 8πp(4p)4p x2p x2p β 2p 2p √ 16 −7/4 2 # e e1/12 8π p An . xβ
(4.3.32)
84
CHAPTER 4. TOOLS FOR NON CAUSAL CASES
Now setting h(y) = (Cn y)4y with Cn2 =
16 −7/4 xβ e
√ An , one obtains
√ P(|Sn | ≥ x) ≤ e1/12 8πh(p). Define the convex function g(y) = log h(y). Clearly 1 inf+ g(y) = g . eCn y∈R 1 , then Suppose that eCn ≤ 1 and let p0 = eC n
√ √ −4 P(|Sn | ≥ x) ≤ e1/12 8πh(p0 ) ≤ e4+1/12 8π exp( ). eCn √ −4 ). Suppose now that eCn ≥ 1, then 1 ≤ e4+1/12 8π exp( eC n In both cases, inequality (4.3.31) holds and Corollary 4.1 is proved.
Remark. More accurate applications of those inequalities are proposed in Louhichi (2003) [125] and Doukhan & Louhichi (1999) [67]. In particular in Section 3 of [67] conditions for (4.3.18) are checked, providing several other bounds for the coefficients Cr,q .
4.4
Cumulants
The main objective of this section is to reinterpret the cumulants which classically used expressions to measure the dependence properties of a sequence.
4.4.1
General properties of cumulants
Let Y = (Y1 , . . . , Yk ) ∈ Rk be a random vector, setting φY (t) = Eeit·Y =
k E exp i j=1 tj Yj for t = (t1 , . . . , tk ) ∈ Rk , we write mp (Y ) = EY1p1 · · · Ykpk for p = (p1 , . . . , pk ) if E(|Y1 |s + · · · + |Y1 |s ) < ∞ and |p| = p1 + · · · + pk = s. Finally if the previous moment condition holds for some r ∈ N∗ , then the function log φY (t) has a Taylor expansion log φY (t) =
i|p| κp (Y )tp + o(|t|r ), p!
as t → 0
|p|≤r
for some coefficients κp (Y ) called the cumulant of Y of order p ∈ Rk if |p| ≤ s where we set p! = p1 ! · · · pk !, tp = tp11 · · · tpkk if t = (t1 , . . . , tk ) ∈ Rk and p = (p1 , . . . , pk ).
4.4. CUMULANTS
85
In the case p = (1, . . . , 1), to which the others may be reduced, we denote κ(1,...,1) (Y ) = κ(Y ). Moreover, if μ = {i1 , . . . , iu } ⊂ {1, . . . , k} κμ (Y ) = κ(Yi1 , . . . , Yiu ),
mμ (Y ) = m(Yi1 , . . . , Yiu ).
Leonov and Shyraev (1959) [119] (see also Rosenblatt, 1985, pages 33-34 [168]) obtained the following expressions κ(Y )
=
k
=
u
mμj (Y )
(4.4.1)
μ1 ,...,μu j=1
u=1
m(Y )
(−1)u−1 (u − 1)!
k
u
κμj (Y )
(4.4.2)
u=1 μ1 ,...,μu j=1
The previous sums are considered for all partitions μ1 , . . . , μu of the set {1, . . . , k}. We now recall some notions from Saulis and Statulevicius (1991) [173]. For this we reformulate their notations. Definition 4.2. Centered moments of a random vector Y = (Y1 , . . . , Yk ) are defined by setting E (Y1 , . . . , Yl ) = EY1 c(Y2 , . . . , Yl ) where the centered random /012 variables c(Y2 , . . . , Yl ) are defined recursively, by setting c(ξ1 ) = ξ = ξ1 −Eξ1 , / 01 2 c(ξj , ξj−1 , . . . , ξ1 ) = ξj c(ξj−1 , . . . , ξ1 ) = ξj (c(ξj−1 , . . . , ξ1 ) − Ec(ξj−1 , . . . , ξ1 )) We also write Yμ = (Yj /j ∈ μ) as a p−tuple if μ ⊂ {1, . . . , k}.
Quote for comprehension that E (ξ) = 0, E (η, ξ) = Cov(η, ξ) and, E (ζ, η, ξ) = E(ζηξ) − E(ζ)E(ηξ) − E(η)E(ζξ) − E(ξ)E(ζη). A remarkable result from Saulis and Statulevicius (1991) will be informative Theorem 4.4 (Saulis, Statulevicius, 1991 [173]). κ(Y1 , . . . , Yk ) =
k
(−1)u−1
u=1
Nu (μ1 , . . . , μu )
μ1 ,...,μu
u
E Yμj
j=1
where sums are considered for3all partitions3 μ14,4. . . , μu of the set {1, . . . , k} and the integers Nu (μ1 , . . . , μu ) ∈ 0, (u − 1)! ∧ k2 ! , defined for any such partition, satisfy the relations • N (k, u) =
μ1 ,...,μu
Nu (μ1 , . . . , μu ) =
u−1 j=1
Ckj (u − j)k−1 ,
86
CHAPTER 4. TOOLS FOR NON CAUSAL CASES •
k
N (k, u) = (k − 1)!
u=1
Using this representation the following bound will be useful Lemma 4.9. Let Y1 , . . . , Yk ∈ R be centered random variables. For k ≥ 1, we set Mk = (k − 1)!2k−1 max1≤i≤k E|Yi |k , then |κ(Y1 , . . . , Yk )| Mk Ml
≤ Mk ,
(4.4.3)
≤ Mk+l , if k, l ≥ 2.
(4.4.4)
Mention that a consequence of this lemma will be used in the following: u
|κp (Y1 , . . . , Ypu )| ≤ Mp1 +···+pu
(4.4.5)
i=1 (a )
We shall use this inequality for components Yi = Xki i of a stationary sequence (j)
of RD −valued random variable hence maxi≥1 E|Yi |p ≤ max1≤j≤D E|X0 |p and we may set (j)
Mp = (p − 1)!2p−1 max E|X0 |p . 1≤j≤D
(4.4.6)
Proof of lemma 4.9. The second point in this lemma follows from the elementary inequality a! b! ≤ (a + b)! and the first one is a consequence of theorem 4.4 and of the following lemma Lemma 4.10. For any j, p ≥ 1 and any real random variables ξ0 , ξ1 , ξ2 , . . . with identical distribution, c(ξj , ξj−1 , . . . , ξ1 )p ≤ 2j max ξi jpj , 1≤i≤j
with ξq = E1/q |ξ|q .
Proof of lemma 4.10. For simplicity we shall omit suprema replacing maxi≤j ξi p by ξ1 p . First of all, H¨ older inequality implies c(ξ1 )p ≤ ξ1 p + |Eξ1 | ≤ 2ξ1 p , We now use recursion; setting Zj = c(ξj , ξj−1 , . . . , ξ1 ) yields Zj = ξj (Zj−1 − EZj−1 ) hence Minkowski and H¨older inequalities entail Zj p
≤
ξj Zj−1 p + ξ0 p |EZj−1 |
≤
ξ0 pj Zj−1 q + ξ0 pj Zj−1 p
≤
2j ξ0 pl ξ0 j−1 q(j−1)
4.4. CUMULANTS where
87
1 1 + = 1; from p ≥ 1 we infer q(j − 1) ≤ pj to conclude. q p·l
Proof of lemma 4.9. Here again, we omit suprema and we replace maxj≤J Yj p
by Y0 p . With lemma 4.10 we deduce that | E Yμ | ≤ 2l−1 Y0 ll with l = #μ. Indeed, write Z = c(Y2 , . . . , Yl ), then with 1q + 1l = 1 we get E (Y1 , . . . , Yl ) = |EY1 Z| ≤ Y0 l Zq ≤ 2l−1 Y0 ll since q(l − 1) = l. Hence theorem 4.4 implies, |κ(Y )|
≤
k
Nu (μ1 , . . . , μu )
u=1 μ1 ,...,μu
≤
k
u
i 2#μi −1 Y0 #μ #μi
i=1
2k−u N (k, u)Y0 kk
u=1
≤ 2k−1 Y0 kk
k
N (k, u)
u=1
= 2k−1 (k − 1)!Y0 kk .
The following lemmas are essentially proved in Doukhan & Le´on (1989) [66] for real valued sequences (Xn )n∈Z . Let now (Xn )n∈Z denote a vector valued and stationary sequence (with values in RD ), we define∗ , extending Doukhan and Louhichi (1999)’s coefficients, (a ) (al+1 ) (a ) (a ) sup · · · Xtq q (4.4.7) cX,q (r) = Cov Xt1 1 · · · Xtl l , Xtl+1 1 ≤ l < q 1 ≤ a1 , . . . , aq ≤ D t1 ≤ · · · ≤ tq tl+1 − tl ≥ r
We also define the following decreasing coefficients, for further convenience, cX,q (r) = max cX,l (r)μq−l , 1≤l≤q
μt = max E|X0 |t .
with
1≤d≤D
(4.4.8)
In order to state the following results we set, for 1 ≤ a1 , . . . , aq ≤ D, (al )
κ(q)(a1 ,...,aq ) (t2 , . . . , tq ) = κ(1,...,1) (X0
(a )
(a )
, Xt2 2 , . . . , Xtq q )
The following decomposition lemma will be very useful. It explain how cumulants behave naturally as covariances. Precisely, it proves that a cumulant ∗ For D = 1 this coefficient was already defined in definition 4.1 as C r,q but the present notation recalls also the underlying process.
88
CHAPTER 4. TOOLS FOR NON CAUSAL CASES
κQ (Xk1 , . . . , XkQ ) is small if kl+1 − kl is large, k1 ≤ · · · ≤ kQ , and the process is weakly dependent. This is a natural extension of one essential property of the cumulants which states that such a cumulant vanishes if the index set can be partitioned into two strict subsets such that the vector random variables determined this way are independent. Definition 4.3. Let t = (t1 , . . . , tp ) be any p−tuple in Zp such that t1 ≤ · · · ≤ tp , we denote r(t) = max1≤l
max
t1 ≤ · · · ≤ tp r(t1 , . . . , tp ) ≥ r
max
1≤a1 ,...,ap ≤D
(a ) (a ) κp Xt1 1 , . . . , Xtp p
(4.4.9)
Lemma 4.11. Let (Xn )n∈Z be a stationary process, centered at expectation with finite moments of any order. Then if Q ≥ 2 we have, using the notation in lemma 4.9, κX,Q (r) ≤ cX,Q (r) +
Q−2
5 MQ−s
s=2
Q 2
6Q−s+1 κX,s (r)
7 (a) (a ) Proof of lemma 4.11. We denote Xη = i∈η Xi i for any p−tuples η ∈ Zp and p a = (a1 , . . . , ap ) ∈ {1, . . . , D} (this way admits repetitions of the succession η). We assume that k1 ≤ · · · ≤ kQ satisfy kl+1 − kl = r = max1≤s
(a )
(a)
(a)
κ(Xk1 1 , . . . , XkQQ ) = Cov(Xη(k) , Xη(k) ) −
u
where Kμ,k =
κνμ (k) Kμ,k ,
(4.4.10)
{μ}
κμi (k) and where the previous sum extends to partitions
μi =νu
μ = {μ1 , . . . , μu } of {1, . . . , Q} such that there exists some 1 ≤ i ≤ u with μi ∩ν = ∅ and μi ∩ν = ∅. For simplicity the previous formulas omit the reference to indices (a1 , . . . , aQ ) which is implicit. We use the simple but essential remark that r νμ (k) ≥ r(k) to derive |κνμ (k) | ≤ κX,#νμ (r). We now use lemma 4.9 to deduce |Mμ | ≤ MQ−#μν as in eqn. (4.4.5). This
4.4. CUMULANTS
89
yields the bound (a ) (a ) κ Xk1 1 , . . . , XkQQ ≤
[Q/2]
CX,Q (r) +
(u − 1)!
CX,Q (r) +
(u − 1)!
u=2
CX,Q (r) +
u=2
≤
CX,Q (r) +
Q−2 s=2
U since the inequality u=1 (u−1)p ≤ a sum and an integral.
Q−2
MQ−s κX,s (r)
s=2
[Q/2]
≤
MQ−#νμ |κνμ (k) (X (a) )|
μ1 ,...,μu
u=2 [Q/2]
≤
(u − 1)!
μ1 , . . . , μu #νμ = s
Q−2
(u − 1)Q−s MQ−s κX,s (r)
s=2
5 6Q−s+1 Q 1 MQ−s κX,s (r) Q−s+1 2
1 p+1 p+1 U
follows from a comparison between
Rewrite now the lemma 4.11 as κX,Q (r) ≤ cX,Q (r) +
Q−2
BQ,s κX,s (r)
s=2
thus the following formulas follow κX,2 (r) κX,3 (r)
≤ ≤
cX,2 (r), cX,3 (r),
κX,4 (r)
≤ ≤
cX,4 (r) + B4,2 κX,2 (r) cX,4 (r) + B4,2 cX,2 (r),
κX,5 (r)
≤ ≤
cX,5 (r) + B5,3 κX,3 (r) + B5,2 κX,2 (r) cX,5 (r) + B5,3 cX,3 (r) + B5,2 cX,2 (r),
κX,6 (r)
≤
cX,6 (r) + B6,4 κX,4 (r) + B6,3 κX,3 (r) + B6,2 κX,2 (r)
≤ ≤
cX,6 (r) + B6,4 (cX,4 (r) + B4,2 cX,2 (r)) + B6,3 cX,3 (r) + B6,2 cX,2 (r) cX,6 (r) + B6,4 cX,4 (r) + B6,3 cX,3 (r) + (B6,2 + B6,4 B4,2 )cX,2 (r).
A main corollary of lemma 4.11 is the following, it is proved by induction. Corollary 4.2. For any Q ≥ 2, there exists a constant AQ ≥ 0 only depending on Q, such that κX,Q (r) ≤ AQ c∗X,Q (r).
1
90
CHAPTER 4. TOOLS FOR NON CAUSAL CASES
Remark 4.3. This corollary explains an equivalence between the coefficients cX,Q (r) and the κQ (r) which may also be preferred as a dependence coefficient. A way to derive sharp bounds for the constants involved is, using theorem 4.4, to decompose the corresponding sum of centered moments in two terms, the first of them involving the maximal covariance of a product. Section 12.3.2 will provide multivariate extensions devoted to spectral analysis. For completeness sake remark that formula (4.4.10) implies with BQ,Q = 1 that Q Q such that BQ,s κX,s (r). Hence there exists some constant A cX,Q (r) ≤ s=2
Q κ∗X,Q (r), cX,Q (r) ≤ A
κ∗X,Q (r) = max κ∗X,l (r)μQ−l 2≤l≤Q
Finally, we have proved that constants aQ , AQ > 0 satisfy aQ c∗X,Q (r) ≤ κ∗X,Q (r) ≤ AQ c∗X,Q (r) Hence, for fixed Q those inequalities are equivalent. The previous formula (4.4.10) implies that the cumulant (a )
(a )
κ(Xk1 1 , . . . , XkQQ ) =
(a)
(a)
Kα,β,k Cov(Xα(k) , Xβ(k) )
α,β
writes as the linear combination of such covariances with α ⊂ {1, . . . , l} and β ⊂ {l + 1, . . . , Q} where the coefficients Kα,β,k are some polynomials of the cu(a ) (a ) (a) mulants. For this one replaces the Q−tuple (Xk1 1 , . . . , XkQQ ) by (Xi )i∈νμ (k) for each partition μ, in formula (4.4.10) and use recursion. Such representation is useful namely if one precisely knows such covariances; let us mention the cases of Gaussian and associated processes for which additional informations are provided. The main attraction for cumulants with respect to covariance of products is that if a sample (Xk1 , . . . , Xkq ) is provided, the behaviour of the cumulant is that of cX,q (r(k)) with appears as suprema over a position l of the maximal lag in the sequence k. This means an automatic search of this maximal lag r(k) may be performed with the help of cumulants. Examples. The constants AQ , which are not explicit, may be determined for some low orders. A careful analysis of the previous proof allows the sharp
4.4. CUMULANTS
91
bounds† κX,2 (r)
= cX,2 (r)
κX,3 (r)
= cX,3 (r)
κX,4 (r) κX,5 (r)
≤ cX,4 (r) + 3μ2 cX,2 (r) ≤ cX,5 (r) + 10μ2 cX,3 (r) + 10μ3 cX,2 (r)
κX,6 (r)
≤ cX,6 (r) + 15μ2 cX,4 (r) + 20μ3 cX,3 (r)) + 150μ4cX,2 (r)
Our main application of those inequalities is the Lemma 4.12. Set ∞
κQ =
···
k2 =0
∞
max
kQ =0
1≤a1 ,...,aQ ≤D
(a ) (a ) (a ) κ X0 1 , Xk2 2 , . . . , XkQQ .
(4.4.11)
We use notation (4.4.8). For each Q ≥ 2, there exists a constant BQ such that κQ ≤ BQ
∞
∗ (r + 1)Q−2 CX,Q (r).
r=0
Proof of lemma 4.12. For this we only decompose the sums (a ) (a ) (a ) max κQ ≤ (Q − 1)! κ X0 1 , Xk2 2 , . . . , XkQQ k2 ≤···≤kQ
≡
1≤a1 ,...,aQ ≤D
(Q − 1)! κ Q
by considering the partition of the indice set E = {k = (k2 , . . . , kQ ) ∈ NQ−1 /k2 ≤ · · · ≤ kQ } into Er = {k ∈ E/ r(k) = r} for r ≥ 0, then κ Q =
∞ r=0 k∈Er
max
1≤a1 ,...,aQ ≤D
(a ) (a ) (a ) κ X0 1 , Xk2 2 , . . . , XkQQ
The previous lemma implies that there exists some constant AQ > 0 such that (a ) (a ) (a ) ∗ max (r) κ X0 1 , Xk2 2 , . . . , XkQQ ≤ AQ #Er CX,Q k∈Er
1≤a1 ,...,aQ ≤D
and the simple bound #Er ≤ (Q − 1)(r + 1)Q−2 concludes the proof. † To bound higher order cumulants we shall prefer lemma 4.11 rough bounds to those involved combinatorics coefficients.
92
CHAPTER 4. TOOLS FOR NON CAUSAL CASES
Lemma 4.13. Let us assume now that D = 1 (the process is real valued and we omit the super-indices aj ). If the series (4.4.11) is finite for each Q ≤ p we set q = [p/2] (q = p/2 if p is even and q = (p − 1)/2 if p is odd) then ⎛ ⎞p q n E ⎝ ⎠ ≤ X nu γu , where (4.4.12) j u=1 j=1 γu
2q
=
v=1 p1 +···+pu
p! κp · · · κpu p ! · · · pu ! 1 =p 1
Proof. As in Doukhan and Louhichi (1999) [67], we bound p EXk1 · · · Xkp |E(X1 + · · · + Xn ) | = 1≤k1 ,...,kp ≤n EXk1 · · · Xkp ≤ Ap,n ≡ 1≤k1 ,...,kp ≤n
Let now μ = {i1 , . . . , iv } ⊂ {1, . . . , p} and k = (k1 , . . . , kp ), we set for convenience μ(k) = (ki1 , . . . , kiv ) ∈ Nv
(4.4.13)
In order to count the terms with their order of multiplicity, it is indeed not suitable to define the previous item as a set and cumulants or moments are defined equivalently in this case. Hence, as in Doukhan and Le´ on (1989) [66], we compute, using formula (4.4.2) and using all the partitions μ1 , . . . , μu of {1, . . . , p} with exactly 1 ≤ u ≤ p elements Ap,n
=
p
u
κμj (k) (X)
1≤k1 ,...,kp ≤n u=1 μ1 ,...,μu j=1
=
p
u
κμj (k) (X)
u=1 μ1 ,...,μu 1≤k1 ,...,kp ≤n j=1
=
p
r=1 p1 +···+pr r
p! × p ! · · · pr ! =p 1
×
κpu (Xk1 , . . . , Xkpu )
(4.4.14)
u=1 1≤k1 ,...,kpu ≤n
|Ap,n | ≤
q u=1
nu
p1 +···+pu
u p! κp p ! · · · pu ! j=1 j =p 1
(4.4.15)
4.4. CUMULANTS
93
Identity (4.4.14) follows from a simple change of variables and takes in account that the number of partitions of {1, . . . , p} into u subsets with respective cardinalities p1 , . . . , pu is simply the corresponding multinomial coefficient. Remark that, for λ ∈ N, and taking X’s stationarity in account, we obtain
|κpu (Xk1 , . . . , Xkλ )| ≤ nκλ
1≤k1 ,...,kλ ≤n
Using this remark and the fact that cumulants of order 1 vanish and the only non-zero terms are those for which p1 , . . . , pu ≥ 2 and thus 2u ≤ p, hence u ≤ q we finally deduce inequality (4.4.15). C s for s ≤ p and for a constant C > 0, the bound Remark 4.4. If κs ≤ q p p u (4.4.15) rewrites as C u=1 u n by using the multinomial identity.
4.4.2
A second exponential inequality
This section is based on Doukhan and Neumann (2005), [71]. In this section we will be concerned with probability and moment inequalities for Sn = X 1 + · · · + X n , where X1 , . . . , Xn are zero mean random variables which fulfill appropriate weak dependence conditions. We denote by σn2 the variance of Sn . Result are here stated without proof and we defer the reader to [71]. The first result is a Bernstein-type inequality which generalizes and improves previous inequalities of this chapter. Theorem 4.5. Suppose that X1 , . . . , Xn are real-valued random variables defined on a probability space (Ω, A, P) with EXi = 0 and P(|Xi | ≤ M ) = 1, for all i = 1, . . . , n and some M < ∞. Let Ψ : N2 → N be one of the following functions: (a) Ψ(u, v) = 2v, (b) Ψ(u, v) = u + v, (c) Ψ(u, v) = uv, or (d) Ψ(u, v) = α(u + v) + (1 − α)uv, for some α ∈ (0, 1). We assume that there exist constants K, L1 , L2 < ∞, μ ≥ 0, and a nonincreasing sequence of real coefficients (ρ(n))n≥0 such that, for all u-tuples (s1 , . . . , su ) and all v-tuples (t1 , . . . , tv ) with 1 ≤ s1 ≤ · · · ≤ su ≤ t1 ≤ · · · ≤ tv ≤ n the following inequality is fulfilled: |Cov (Xs1 · · · Xsu , Xt1 · · · Xtv )| ≤ K 2 M u+v−2 Ψ(u, v)ρ(t1 − su ), where
∞ s=0
(s + 1)k ρ(s) ≤ L1 Lk2 (k!)μ ,
∀k ≥ 0.
(4.4.16)
(4.4.17)
94
CHAPTER 4. TOOLS FOR NON CAUSAL CASES
Then P (Sn ≥ t) ≤ exp −
t2 /2 1/(μ+2) (2μ+3)/(μ+2) t
,
(4.4.18)
An + Bn
where An can be chosen as any number greater than or equal to σn2 and 4+μ 2 nK 2 L1 ∨1 . Bn = 2(K ∨ M )L2 An Remark 4.5. (i) Inequality (4.4.18) resembles the classical Bernstein inequality for independent random variables. Asymptotically, σn2 is usually of order O(n) and An can be chosen equal to σn2 while Bn is usually O(1) and hence negligible. In cases where σn2 is very small or where the knowledge of the value of An is required for some statistical procedure, it might, however, be better to choose An larger than σn2 . It follows from (4.4.16) and (4.4.17) that a rough bound for σn2 is given by (4.4.19) σn2 ≤ 2nK 2 Ψ(1, 1)L1 . Hence, taking An = 2nK 2 Ψ(1, 1)L1 we obtain from (4.4.18) that n t2 Xi ≥ t ≤ exp − , P C1 n + C2 t(2μ+3)/(μ+2) i=1 1/(μ+2)
(4.4.20)
where C1 = 4K 2 Ψ(1, 1)L1 and C2 = 2Bn with Bn such that Bn = 2(K ∨ M ) L2 (23+μ /Ψ(1, 1)) ∨ 1 . Inequality (4.4.20) is then more of Hoeffding-type. (ii) In the causal case, we obtain in Theorem 5.2 of Chapter 5 a Bennett-type inequality for τ -dependent random variables. This also implies a Bernstein-type inequality, however, with different constants. In particular, the leading term in the denominator of the exponent differs from σn2 . This is a consequence of the method of proof which consists of replacing weakly dependent blocks of random variables by independent ones according to some coupling device (an analogue argument is used in [61] for the case of absolute regularity). (iii) Condition (4.4.16) in conjunction with (4.4.17) may be interpreted as a weak dependence condition in the sense that the covariances on the left-hand side tend to zero as the time gap between the two blocks of observations increases. Note that the supremum of expression (4.4.16) for all u-tuples (s1 , . . . , su ) and all v-tuples (t1 , . . . , tv ) with 1 ≤ s1 ≤ · · · ≤ su ≤ t1 ≤ · · · ≤ tv < ∞ such that t1 − su = r is denoted by Cu+v,r in (4.3.1). Conditions (4.4.16) and (4.4.17) are typically fulfilled for truncated versions of random variables from many time series models; see also Proposition 4.1 below. The constant K in (4.4.16) is included to possibly take advantage of a sparsity of data as it appears, for example, in nonparametric curve estimation.
4.4. CUMULANTS
95
(iv) For unbounded random variables, the coefficients Cp,r may still be bounded by an explicit function of the index p under a weak dependence assumption; see Lemma 4.14 below. For example, assume that E exp(A|Xt |a ) ≤ L holds for some constants A, a > 0, L < ∞. Since the inequality up ≤ p!eu (p ∈ N, u ≥ 0) implies that um = (Aa)−m/a (Aaua )m/a ≤ (Aa)−m/a (m!)1/a eAu
a
∀m ∈ N
we obtain that E|Xt |m ≤ L(m!)1/a (Aa)−m/a holds for all m ∈ N. Lemma 4.14 below provides then appropriate estimates for Cp,r . Note that the variance of the sum does not explicitly show up in the Rosenthaltype inequality given in Theorem 4.2. Using the formula of Leonov and Shiryaev (1959) [119], we are able to obtain a more precise inequality which resembles the Rosenthal inequality in the independent case (see Rosenthal (1970) [170] and and Theorem 2.12 in Hall and Heyde (1980) [100] in the case of martingales). Theorem 4.6. Suppose that X1 , . . . , Xn are real-valued random variables on a probability space (Ω, A, P) with zero mean and let p be a positive integer. We assume that there exist finite constants K, M , and a nonincreasing sequence of real coefficients (ρ(n))n≥0 such that, for all u-tuples (s1 , . . . , su ) and all v-tuples (t1 , . . . , tv ) with 1 ≤ s1 ≤ · · · ≤ su ≤ t1 ≤ · · · ≤ tv ≤ n and u + v ≤ p, condition (4.4.16) is fulfilled. Furthermore, we assume that E|Xi |p−2 ≤ M p−2 . Then, with Z ∼ N (0, 1),
|ESnp − σnp EZ p | ≤ Bp,n
Au,p K 2u (M ∨ K)p−2u nu ,
1≤u
where Bp,n = (p!)2 2p max2≤k≤p {ρk,n }, ρk,n = p/k
Au,p =
1 u!
n−1
k1 +···+ku =p,ki ≥2,∀i
s=0 (s
+ 1)k−2 ρ(s) and
p! . k1 ! · · · ku !
Remark 4.6. For even p, the above result implies that ESnp ≤ (p − 1)(p − 3) · · · 1 σnp + Bp,n Au,p K 2u (M ∨ K)p−2u nu , 1≤u
which resembles the classical Rosenthal inequality from the independent case. If supn Bp,n < ∞ and σn2 n, then the first term on the right-hand side is asymptotically dominating, as n → ∞. This term is equal to the p-th moment of a Gaussian random variable with mean 0 and variance σn2 .
96
CHAPTER 4. TOOLS FOR NON CAUSAL CASES
Remark 4.7. The inequality from Theorem 4.6 is well suited for proving a central limit theorem via the method of moments. Assume first that the random variables Xi are uniformly bounded, centred and satisfy condition (4.4.16) with lims→∞ ρ(s)/sp = 0, for all p > 0. Then ∞ σn2 = σ2 = E(X0 Xk ) n→∞ n
lim
k=−∞
is a convergent series, and thus the method of moments implies a central limit theorem, 1 d √ Sn −→n→∞ σZ. n
4.4.3
From weak dependence to the exponential bound
A large class of weak dependent sequences satisfies the assumption (4.4.16) of Theorem 4.5. If δ(x, y) = |x − y|, we write Λ(1) instead of Λ(1) (δ). (i) Assume that (Xt )t∈Z is an Rd -valued and stationary process which is (Λ(1) , Ψ)-weakly dependent. Then for any Lipschitz-continuous function F : Rd → R with F ∞ = M < ∞ and Lip F ≤ 1, the process Yt = F (Xt ) is real valued, stationary, and Yt ∞ ≤ M . Moreover, it is also (Λ(1) , Ψ)weakly dependent. (ii) In the more general case when Lip F possibly exceeds 1 (e.g., if the function F depends on the sample size in a statistical context), then weak dependence still holds where only Ψ(a, b, u, v) has to be replaced by ψY (a, b, u, v) = Ψ(aLip F, bLip F, u, v). For the special cases of η, κ and λ weak dependence conditions, one may re-formulate this as (Yt )t∈Z is still an η, κ or λ weakly dependent sequence but now we have to respectively consider the coefficients ηY (r) λY (r)
= Lip F · η(r), κY (r) = Lip 2 F · κ(r), = max Lip F, Lip 2 F λ(r).
Now we relate the conditions of weak dependence to condition (4.4.16). Suppose that (Xt )t∈Z is a stationary sequence of real-valued random variables with Xt ∞ ≤ M which satisfies a weak dependence condition. To see the connection to (4.4.16), we consider the 7 functions g1 and g2 with g1 (x1 , . . . , xu ) = 7u v f (x /M ) and g (x , . . . , x ) = i 2 1 v i=1 i=1 f (xi /M ), where f (u) = u ∨ (−1) ∧ 1. These functions satisfy Lip gi ≤ 1/M and gi ∞ ≤ 1. The covariance in the definition of weak dependence can be expressed as in equation (4.4.16), up to a factor M u+v since g1 (Xi1 , . . . , Xiu ) = Xi1 · · · Xiu /M u and g2 (Xi1 , . . . , Xiv ) = Xi1 · · · Xiv /M v .
4.4. CUMULANTS
97
Proposition 4.1. Assume that the real valued sequence (Xt )t∈Z is (Λ(1) , Ψ)weakly dependent and that Xt ∞ ≤ M . Then Cov(Xs1 · · · Xsu , Xt1 · · · Xtv ) ≤ M u+v Ψ(M −1 , M −1 , u, v) (t1 − su )(4.4.21) Moreover, if (r) = e−ar , for some a > 0, then we may choose in inequality b (4.4.17) μ = 1 and L1 = L2 = 1/(1 − e−a ). If (r) = e−ar , for some a > 0, b ∈ (0, 1), then we may choose μ = 1/b and L1 , L2 appropriately as only depending on a and b. Remark 4.8. (i) Notice that Proposition 4.1 implies that (Λ(1) , Ψ)-dependence implies (4.4.16) with (a) (b) (c) (d)
Ψ(u, v) = 2v, K 2 = M and (r) = θ(r)/2, under θ-dependence, Ψ(u, v) = u + v, K 2 = M and (r) = η(r), under η-dependence, Ψ(u, v) = uv, K = 1 and (r) = κ(r), under κ-dependence, Ψ(u, v) = (u + v + uv)/2, K 2 = M ∨ 1 and (r) = 2λ(r), under λ-dependence.
(ii) Now if the vector valued process (Xt )t∈Z is an η, κ or λ-weakly dependent sequence, for any Lipschitz function F : Rd → R such that F ∞ = M < ∞, then the process Yt = F (Xt ) is real valued and the relation (4.4.16) holds with (a) (b) (c) (d)
Ψ(u, v) = 2v, K 2 = M Lip F and (r) = θ(r)/2, under θ-dependence, Ψ(u, v) = u + v, K 2 = M Lip F and (r) = η(r), under η-dependence, Ψ(u, v) = uv, K = Lip F and (r) = κ(r), under κ-dependence, Ψ(u, v) = (u + v + uv)/2, K 2 = (M ∨ 1)(Lip 2 F ∨ Lip F ) and (r) = 2λ(r), under λ-dependence.
Those bounds allow to use the Bernstein-type inequality in Theorem 4.5 for sums of functions of weakly dependent sequences. A last problem in this setting is to determine sharp bounds for the coefficients Cp,r . This is even possible when the variables Xi are unbounded and is stated in the following lemma. Lemma 4.14. Assume that the real valued sequence (Xt )t∈Z is η, κ or λ-weakly dependent and that E|Xi |m ≤ Mm , for any m > p. Then, according to the type of the weak dependence condition: p−1
Cp,r
p−1
≤ 2p+3 p2 Mmm−1 η(r)1− m−1 , p−2 m−2
≤ 2p+3 p4 Mm
p−1 m−1
≤ 2p+3 p4 Mm
(4.4.22)
p−2 1− m−2
,
(4.4.23)
p−1 1− m−1
.
(4.4.24)
κ(r) λ(r)
98
CHAPTER 4. TOOLS FOR NON CAUSAL CASES
Remark 4.9. This lemma is the essential tool to provide a version of Theorem 4.6 which yields both a Rosenthal-type moment inequality and a rate of convergence for moments in the central limit theorem. We also note that this result does not involve the assumption that the random variables are a.s. bounded. In fact even the use of Theorem 4.5 does not really require such a boundedness; see Remark 4.5-(iv).
4.5
Tightness criteria
Following Andrews and Pollard (1994) [5], we give in this section a general criterion based on Rosenthal type inequalities and on chaining arguments. Let d ∈ N∗ and let (Xi )i∈Z be a stationary sequence with values in Rd . Let F be a class of functions from Rd to R. We define the empirical process {Zn (f ) , f ∈ F } by √ Zn (f ) := n(Pn (f ) − P (f )) , with P the common law of (Xi )i∈Z and, for f ∈ F, 1 Pn (f ) := f (Xi ) , n i=1
n
P (f ) =
f (x)P (dx) . Rd
We study the process {Zn (f ) , f ∈ F } on the space ∞ (Rd ) of bounded functions from Rd to R equipped with the uniform norm · ∞ . For more details on tightness on the non separable space ∞ (Rd ), we refer to van der Vaart and Wellner (1996) [183]. In particular, we shall not discuss any measurability problems which can be handled by using the outer probability. The process {Zn (f ) , f ∈ F } is tight on (∞ (Rd ), .∞ ) as soon as there exists a semi-metric ρ such that (F , ρ) is totally bounded for which lim lim sup P
δ→0 n→∞
sup
|Zn (f ) − Zn (g)| > ε
= 0,
∀ε > 0.
f,g∈F , ρ(f,g)≤δ
We recall the following definition of bracketing number. Definition 4.4. Let Q be a finite measure on a measurable space X . For any measurable function f from X to R, let f Q,r = Q(|f |r )1/r . If f Q,r is finite, one says that f belongs to LrQ . Let F be some subset of LrQ . The number of brackets NQ,r (ε, F ) is the smallest integer N for which there exist − some functions f1− ≤ f1 , . . . , fN ≤ fN in F such that: for any integer 1 ≤ i ≤ N − we have fi − fi Q,r ≤ ε, and for any function f in F there exists an integer 1 ≤ i ≤ N such that fi− ≤ f ≤ fi .
4.5. TIGHTNESS CRITERIA
99
Proposition 4.2. Let (Xi )i≥1 be a sequence of identically distributed random variables with values in a measurable space
n X , with common distribution P . Let −1 P = n Pn be the empirical measure n i=1 δXi , and let Zn be the normalized √ empirical process Zn = n(Pn − P ). Let Q be any finite measure on X such that Q − P is a positive measure. Let F be a class of functions from X to R and G = {f − l/(f, l) ∈ F × F }. Assume that there exist r ≥ 2, p ≥ 1 and q > 2 such that for any function g of G, we have 1/r
Zn (g)p ≤ C(gQ,1 + n1/q−1/2 ) ,
(4.5.1)
where the constant C does not depend on g nor n. If moreover 1 x(1−r)/r (NQ,1 (x, F ))1/p dx < ∞ and lim xp(q−2)/q NQ,1 (x, F ) = 0 , x→0
0
lim lim sup E
then
δ→0 n→∞
sup g∈G,gQ,1 ≤δ
|Zn (g)|p = 0 .
(4.5.2)
Proof of Proposition 4.2. It follows the line of Andrews and Pollard (1994) [5] and Louhichi (2000) [124]. It is based on the following inequality: given N real-valued random variables, we have max |Zi | p ≤ N 1/p max Zi p . 1≤i≤N
1≤i≤N
(4.5.3)
For any positive integer k, denote by Nk = NQ,1 (2−k , F ) and by Fk a family of k,− k ≤ fN in F such that fik − fik,− Q,1 ≤ 2−k , and functions f1k,− ≤ f1k , . . . , fN k k for any f in F , there exists an integer 1 ≤ i ≤ Nk such that fik,− ≤ f ≤ fik . First step. We shall construct a sequence hk(n) (f ) belonging to Fk(n) such that
(4.5.4) lim sup |Zn (f ) − Zn (hk(n) (f ))| = 0 . n→∞ f ∈F
p
For any f in F , there exist two functions gk− and gk+ in Fk such that gk− ≤ f ≤ gk+ and gk+ − gk− Q,1 ≤ 2−k . Since Q − P is a positive measure, we have 1 Zn (gk+ ) − Zn (gk− ) + √ E((gk+ − f )(Xi )) n i=1 √ ≤ |Zn (gk+ ) − Zn (gk− )| + n2−k . √ Since gk− ≤ f , we also have that Zn (gk− ) − Zn (f ) ≤ n2−k √ , which enables us to − + − conclude that |Zn (f ) − Zn (gk )| ≤ |Zn (gk ) − Zn (gk )| + n2−k . Consequently √ (4.5.5) sup |Zn (f ) − Zn (gk− )| ≤ max |Zn (fik ) − Zn (fik,− )| + n2−k . n
Zn (f ) − Zn (gk− ) ≤
f ∈F
1≤i≤Nk
100
CHAPTER 4. TOOLS FOR NON CAUSAL CASES
Combining (4.5.3) and (4.5.5), we obtain that
√
1/p max Zn (fik ) − Zn (fik,− )p + n2−k .
sup |Zn (f ) − Zn (gk− )| ≤ Nk f ∈F
1≤i≤Nk
p
(4.5.6) Starting from (4.5.6) and applying the inequality (4.5.1),we obtain
√
1/p 1/p
sup |Zn (f ) − Zn (gk− )| ≤ C(Nk 2−k/r + Nk n1/q−1/2 ) + n2−k . (4.5.7) f ∈F
p
From the integrability condition on NQ,1 (x, F ), and since the function x → 1/p −k/r tends to 0 as k x(1−r)/r N (x, F )1/p is non increasing, we infer that √ Nk 2 k(n) tends to infinity. Take k(n) such that 2 = n/a for some sequence an n √ decreasing to zero. Then n2−k(n) tends to 0 as n tends to infinity. Il remains to control the second term on right hand in (4.5.7). By definition of Nk(n) , we have that a 1 p(q−2)/q n . (4.5.8) Nk(n) np(1/q−1/2) = NQ,1 √ , F √ n n Since xp(q−2)/p NQ,1 (x, F ) tends to 0 as x tends to zero, we can find a sequence an such that the right hand term in (4.5.8) converges to 0. The function hk(n) (f ) = − satisfies (4.5.4). gk(n) Second step. We shall prove that for any > 0 and n large enough, there exists a function hm (f ) in Fm such that
(4.5.9)
sup |Zn (hm (f )) − Zn (hk(n) (f )) ≤ . f ∈F
p
Given h in Fk , choose a function Tk−1 (h) in Fk−1 such that h − Tk−1 (h)Q,1 ≤ 2−k+1 . Denote by πk,k = Id and for l < k, πl,k (h) = Tl ◦ · · · ◦ Tk−1 (h). We consider the function hm (f ) = πm,k(n) (hk(n) (f )). We have that
sup |Zn (hm ) − Zn (hk(n) )| ≤ f ∈F
p
sup |Zn (πl,k(n) (hk(n) ) − Zn (πl−1,k(n) (hk(n) )| . (4.5.10)
k(n)
l=m+1 f ∈F
p
Clearly
sup |Zn (πl,k(n) (hk(n) )−Zn (πl−1,k(n) (hk(n) )| ≤ max |Zn (f )−Zn (Tl−1 (f ))| . f ∈F
p
f ∈Fl
Applying the inequality (4.5.1) to (4.5.10) we obtain k(n)
1/p 1/p (21/r Nl 2−l/r + Nl n1/q−1/2 )
sup |Zn (hm ) − Zn (hk(n) )| ≤ C f ∈F
p
l=m+1
p
4.5. TIGHTNESS CRITERIA
101
Clearly ∞
1/p Nl 2−l/r
≤2
2−m−1
0
l=m+1
x(1−r)/r (NQ,1 (x, F ))1/p dx,
which by assumption is as small as we wish. To control the second term, write
k(n)
n
1/q−1/2
≤ n
1/q−1/2
l=m+1
≤ 2n1/q−1/2
k(n) 1/p Nl
1/p −l
2l Nl
2
l=0 1
2−k(n)
1 (NQ,1 (x, F ))1/p dx. x
It is easy to see that if xp(q−2)/q NQ,1 (x, F ) tends to 0 as x tends to 0, then (q−2)/q
1
lim x
x→0
x
1 (NQ,1 (y, F ))1/p dy = 0 . y
Consequently, we can choose the decreasing sequence an such that 1 1 q−2 1 q (NQ,1 (x, F ))1/p dx = 0. lim √ n→∞ n x −1/2 an n The function hm (f ) = πm,k(n) (hk(n) (f )) satisfies (4.5.9). Third step. From steps 1 and 2, we infer that for any > 0 and n large enough, there exists hm (f ) in Fm such that
sup |Zn (f ) − Zn (hm (f ))| ≤ 2 . f ∈F
p
Using the same argument as in Andrews and Pollard (1994) [5] (see the paragraph “Comparison of pairs” page 124), we obtain that, for any f and g in F,
2/r |Zn (f ) − Zn (g)| ≤ 8 + Nm sup Zn (f ) − Zn (g)p . sup
p
f −gQ,1 ≤δ
We conclude the proof by noting that
lim sup lim sup
sup δ→0
n→∞
g∈G,gQ,1 ≤δ
f −gQ,1 ≤δ
|Zn (g)| ≤ 8 . p
Chapter 5
Tools for causal cases The purpose of this chapter is to give several technical tools useful to derive limit theorems in the causal cases. The first section gives comparison results between different causal coefficients. The second section deals with covariance inequalities for the coefficients γ1 , β˜ and φ˜ already defined in Chapter 2. Section 3 discusses a coupling result for τ1 -dependent random variables. This coupling result is generalized to the case of variables with values in any Polish space. Section 4 gives various inequalities mainly Bennett, Fuk-Nagaev, Burkholder and Rosenthal type inequalities for different dependent sequences. Finally Section 5 gives a maximal inequality as an extension of Doob’s inequality for martingales.
5.1
Comparison results
Weak dependence conditions may be compared as in the case of mixing conditions. Lemma 5.1. Let (Ω, A, P) be a probability space. Let (Xi )i∈Z be a sequence of real valued random variables. We have the following comparison results: 1. ∀ r ∈ N∗ , ∀ k ≥ 0,
α ˜ r (k) ≤ β˜r (k) ≤ φ˜r (k).
Assume now that Xi ∈ L1 (P) for all i ∈ Z. If Y is a real valued random variable QY is the generalized inverse of the tail function, t → P(|Y | > t), see (2.2.14). 2. Let QX ≥ supk∈Z QXk . Then, ∀ k ≥ 0, τ1,1 (k) ≤ 2
0
103
α ˜ 1 (k)
QX (u)du.
(5.1.1)
104
CHAPTER 5. TOOLS FOR CAUSAL CASES
Assume now that the sequence (Xi )i∈Z is valued in Rd for some d ∈ N∗ and that
each Xi belongs to L1 (P). On Rd , we put the distance d1 (x, y) = dp=1 |x(p) − y (p) |, where x(p) (resp. y (p) ) denotes the pth coordinate of x (resp. y). Assume that for all i ∈ Z, each component of Xi has a continuous distribution function, and let ω be the supremum of the modulus of continuity, that is sup |FX (k) (y) − FX (k) (z)| ,
ω(x) = sup max
i∈Z 1≤k≤d |y−z|≤x (1)
i
i
(d)
where for all i ∈ Z, Xi = (Xi , . . . , Xi ). Define g(x) = xω(x). Then we have 3. ∀ r ∈ N∗ , ∀ k ≥ 0, β˜r (k)
≤
φ˜r (k)
≤
2 r τ1,r (k) . τ1,r (k) d
g −1
2 r τ∞,r (k) , τ∞,r (k) d
g −1
where g −1 denotes the generalized inverse of g defined in (2.2.14). 4. Assume now that there exists some positive constant K such that for all i ∈ Z, each component of Xi has a density bounded by K > 0. Then, for all r ∈ N∗ and k ≥ 0, " α ˜r (k) ≤ 4 r K d θ1,r (k). Remark 5.1. If each marginal distribution satisfies a concentration condition |FX (k) (y) − FX (k) (z)| ≤ K|y − z|a with a ≤ 1, K > 0 then Item 3. yields i
i
β˜r (k) φ˜r (k)
1
a
≤
2 r τ1,r (k) 1+a (Kd) 1+a ,
≤
2 r τ∞,r (k) 1+a (Kd) 1+a .
a
1
If e.g. for all i ∈ Z, each component of Xi has a density bounded by K > 0 then those relations write more simply with a = 1 as in Item 4. In order to prove Lemma 5.1, we introduce some general comments related with the notation (2.2.14). Let (Ω, A, P) be a probability space, M a σ-algebra of A and X a real-valued random variable. Let FM (t, ω) = PX|M (] − ∞, t], ω) be a conditional distribution function of X given M. For any ω, FM (·, ω) is a distribution function, and for any t, FM (t, ·) is a M-measurable random −1 variable. Hence for any ω, define the generalized inverse FM (u, ω) as in (2.2.14). −1 Now, from the equality {ω/t ≥ FM (u, ω)} = {ω/FM (t, ω) ≥ u}, we infer
5.1. COMPARISON RESULTS
105
−1 (u, ·) is M-measurable. In the same way, {(t, ω)/FM (t, ω) ≥ u} = that FM −1 {(t, ω)/t ≥ FM (u, ω)}, which implies that the mapping (t, ω) → FM (t, ω) is measurable with respect to B(R) ⊗ M. The same arguments imply that the −1 (u, ω) is measurable with respect to B([0, 1])⊗M. Denote mapping (u, ω) → FM −1 −1 by FM (t) (resp. FM (u)) the random variable FM (t, ·) (resp. FM (u, ·)), and let FM (t − 0) = sups
Proof of Lemma 5.1. ˜ ˜ • Item 1. follows from the definition of α(M, ˜ X), β(M, X) and φ(M, X). • Let us now prove Item 2. We first prove that if M is a σ-algebra of A, and if X is a real valued random variable in L1 (P), then τ (M, X) ≤ 2
α(M,X) ˜ 0
QX (u)du .
(5.1.2)
The proof follows from arguments in Peligrad (2002) [140]. Denote X + = sup(X, 0) and X − = sup(−X, 0). Let F denote the distribution function of X. Let FM (t, ω) be the conditional distribution of X given M. Assume that there exists a random variable δ uniformly distributed over [0, 1], independent of the σ-algebra generated by X and M. As δ is uniformly distributed over [0, 1] and independent of the σ-algebra generated by X and M, U = FM (X − 0) + δ(FM (X) − FM (X − 0)) is uniformly distributed over [0, 1] conditionally to M. So U is independent of M and is uniformly distributed over [0, 1] (see Major (1978) [126] and also Rio (2000) [161]). Hence, (5.1.3) X ∗ = F −1 (U ) is independent of M and distributed as X. Moreover, −1 FM (U ) = X,
It yields
P-almost surely.
X − X ∗ 1 = E
0
1
−1 |FM (u) − F −1 (u)|du .
We start with equality (5.1.4). 1 −1 X − X ∗ 1 = E 0 |FM (u) − F −1 (u)|du ∞ = E 0 |FM (u) − F (u)|du ∞ = E 0 |P(X + > u) − P(X + > u|M)|du ∞ +E 0 |P(X − > u) − P(X − > u|M)|du
(5.1.4)
106
CHAPTER 5. TOOLS FOR CAUSAL CASES
Now, by definition of α ˜ , we have the inequalities E|P(X + > u) − P(X + > u|M)| ≤
α ˜ (M, X + ) ∧ 2P(X + > u)
E|P(X − > u) − P(X − > u|M)| ≤
α ˜ (M, X − ) ∧ 2P(X − > u)
It is clear that sup (˜ α(M, X + ), α ˜ (M, X − )) ≤ α ˜ (M, X). Define HX (x) = P(|X| > x). Next, using the inequality a ∧ b + c ∧ d ≤ (a + c) ∧ (b + d), we obtain that ∞ ∞ α(M,X) ˜ E|X − X ∗ | ≤ 2 α(M, ˜ X) ∧ HX (u)du ≤ 2 1t
0
0
Then, since P(|X| > u) > t if and only if u < QX (t) and applying Fubini Theorem, we get Inequality (5.1.2). Let us now prove (5.1.1). For i ∈ Z, let Mi = σ(Xj , j ≤ i). For i + k ≤ j, we infer from Inequality (5.1.2) that α˜ 1 (k) α˜ 1 (k) τ1 (Mi , Xj ) ≤ 2 QXj (u)du ≤ 2 QX (u)du, 0
0
and the result follows from the definition of τ1,1 (k). • For Item 3. we will write the proof for r = 1, 2, the generalization to r points being straightforward. We will make use of the following proposition: Proposition 5.1. Let (Ω, A, P) be a probability space, X = (X1 , . . . , Xd ) and Y = (Y1 , . . . , Yd ) two random variables with values in Rd , and M a σ-algebra of A. If (X ∗ , Y ∗ ) is distributed as (X, Y ) and independent of M then, assuming that each component Xk and Yk has a continuous distribution function FXk and FYk , we get for any x1 , . . . , xd , y1 , . . . , yd in [0, 1], d
˜ β(M, X) ≤
xk + P |FXk (Xk∗ ) − FXk (Xk )| > xk M ,
(5.1.5)
1
k=1 d
˜ xk + P |FXk (Xk∗ ) − FXk (Xk )| > xk M
β(M, X, Y ) ≤
1
k=1
d
+
yk + P |FYk (Yk∗ ) − FYk (Yk )| > yk M) . (5.1.6) 1
k=1
We prove Proposition 5.1 at the end of this section and we continue the proof of Item 3. Starting from (5.1.5) with x1 = · · · = xk = ω(x) and applying Markov’s inequality, we infer that
d 1
˜ β(M, X) ≤ dω(x) + E |Xk − Xk∗ | M . x 1 k=1
5.1. COMPARISON RESULTS
107
Now, from Proposition 6 in R¨ uschendorf (1985) [171] (see also Equality (5.3.5) of Lemma 5.3), one can choose X ∗ such that d
|Xk − Xk∗ | M = τ1 (M, X).
E 1
k=1
Hence
τ1 (M, X) ˜ , β(M, X) ≤ dω(x) + x and Item 3. follows for r = 1 by noting that d xω(x) = τ1 (M, X) for x = −1 τ1 (M,X) . Starting now from (5.1.6), Item 3. may be proved in the same g d way for r = 2. Proposition 5.1 is still true when replacing β˜ by φ˜ and · 1 by · ∞ (see Proposition 7 in Dedecker & Prieur, 2006 [49]). The proof for φ˜ follows then the same lines and is therefore omitted here. • It remains now to prove Item 4. We write the proof for r = 1 and d = 1 and then generalize to any dimension d ≥ 1 and any r ≥ 1. Case r = 1, d = 1. For i ∈ Z, let Mi = σ(Xj , j ≤ i). Let i + k ≤ j1 . We know from Definition 2.5 of Chapter 2 that α(M ˜ i , Xj1 ) = sup LXj1 |Mi (t) − LXj1 (t)1 . t∈R
Define ϕt (x) = 1x≤t − P(Xj1 ≤ t). We want to smooth the function ϕt which is not Lipschitz. Let ε > 0. We consider the following Lipschitz function ϕεt smoothing ϕt : ϕt (x), x ∈] − ∞, t] ∪ [t + ε, +∞[, ϕεt (x) = ϕt (x) + t−x+ε 1 , t < x < t + ε. t<x
1 ε
θ1,1 (k) + 2Kε. 8 θ1,1 (k) Then, if we take ε = 4 K and if we consider the supremum in t, we get " α ˜ 1 (k) ≤ 4 K θ1,1 (k) .
108
CHAPTER 5. TOOLS FOR CAUSAL CASES
Case r ≥ 1, d ≥ 1. The proof follows essentially r the same lines. Let i + k ≤ j1 < · · · < jr . Let X = (Xj1 , . . . , Xjr ) in Rd . We know from Definition 2.5 of Chapter 2 that α ˜ (Mi , X) = sup LX|Mi (t) − LX (t)1 . t∈(Rd )r
For t = (t1 , . . . , tr ) ∈ (Rd )r and x = (x1 , . . . , xr ) ∈ (Rd )r , define ϕt (x) =
r
(1xi ≤ti − P(Xji ≤ t)) =
i=1
r
ϕti (xi ).
i=1
We want to smooth the function ϕt which is not Lipschitz. Let ε > 0. We first smooth each of the functions ϕti . In the following, if x is in Rd , x(p) denotes its pth coordinate. We consider the following Lipschitz function ϕεti smoothing ϕti : / − ∞, ti ] ∪ [ti + ε, +∞[, ϕεti is equal to ϕti on ] − ∞, ti ] ∪ [ti + ε, +∞[, and for xi ∈] d (j) (j) ti − xi + ε ε ϕti (xi ) = 1t(j) <x(j)
7 d1 (x, y) = dj=1 |x(j) − y (j) |. We then smooth ϕt (x) by ϕεt (x) = ri=1 ϕεti (xi ). We have ϕεt ||∞ ≤ 1 and Lip (ϕεt ) ≤ 1ε . Let X = (Xj1 , Xj2 ). We get LX|Mi (t) − LX (t)1
E |E(ϕt (X) − Eϕt (X) |Mi )| 2rKdε + 1ε rθ1,r (k) + 2 rKdε. " θ1,r (k) We conclude by taking the supremum in t ∈ (Rd )r and ε = 4K d . = ≤
Proof of Proposition 5.1. Let Z be a random variable with values in Rm and let f : Rm → R such that |f (z1 , . . . , zi , . . . , zm ) − f (z1 , . . . , zi , . . . , zm )| ≤ |1zi ≤ai − 1zi ≤ai | for some real numbers a1 , . . . , am . Let U be a σ-algebra and let Z ∗ be a random variable distributed as Z and independent of U. Then m ∗ ∗ ∗ f (Z1 , . . . Zk , Zk+1 , . . . Zm ) − f (Z1 , . . . Zk−1 , Zk∗ , . . . Zm ) |f (Z) − f (Z ∗ )| = k=1
≤
m k=1
|1Zk ≤ak − 1Zk∗ ≤ak | .
5.1. COMPARISON RESULTS
109
Hence m E |1Zk ≤ak − 1Zk∗ ≤ak | U . |E f (Z)|U) − E(f (Z))| ≤ E(|f (Z) − f (Z ∗ )|U) ≤ k=1
(5.1.7) Let t ∈ Rd . We first apply (5.1.7) to Z = X, Z ∗ = X ∗ , U = M, and f (z) = 1z≤t −1 with a1 = t1 , . . . , ad = td . Since FX (FXk (Xk )) = Xk almost surely, we obtain k |E(1X≤t |M) − P(X ≤ t)| ≤
d E |1Xk ≤tk − 1Xk∗ ≤tk | M
(5.1.8)
k=1
≤
d E |1FXk (Xk )≤FXk (tk ) − 1FXk (Xk∗ )≤FXk (tk ) | M . k=1
Define Tk = FXk (Xk ), Tk∗ = FXk (Xk∗ ). We have E |1Tk ≤FXk (tk ) − 1Tk∗ ≤FXk (tk ) | M ≤ max FTk |M (FXk (tk )) − FXk (tk ), 1 − FTk |M (FXk (tk )) − (1 − FXk (tk )) . Now, for any y in [0, 1], 1v+u−v≤y PTk ,Tk∗ |M (du, dv) FTk |M (y) = ≤ 1v≤y+xk PTk∗ |M (dv) + 1v−u>xk PTk ,Tk∗ |M (du, dv) ≤ y + xk + 1v−u>xk PTk ,Tk∗ |M (du, dv) . In the same way, 1 − FTk |M (y) ≤ 1 − (y − xk ) +
1u−v>xk PTk ,Tk∗ |M (du, dv) .
Consequently, taking y = FXk (tk ), E |1Tk ≤FXk (tk ) − 1Tk∗ ≤FXk (tk ) | M ≤ xk +
1|u−v|>xk PTk ,Tk∗ |M (du, dv),
(5.1.9) and Inequality (5.1.5) follows from (5.1.9) and (5.1.8) by taking the supremum in t and the expectation. Let s, t ∈ Rd . In the same way, applying (5.1.7) to Z = (Z (1) , Z (2) ) = (X, Y ), Z ∗ = (X ∗ , Y ∗ ), U = M and f (z (1) , z (2) ) = (1z(1) ≤s − FX (s))(1z(2) ≤t − FY (t)) ,
110
CHAPTER 5. TOOLS FOR CAUSAL CASES
we obtain that |E((1X≤s − FX (s))(1Y ≤t − FY (t)) M) − E((1X≤s − FX (s))(1Y ≤t − FY (t)))| ≤
d d E |1Xk ≤sk − 1Xk∗ ≤sk | M + E |1Yk ≤tk − 1Yk∗ ≤tk | M , k=1
k=1
and we conclude the proof of (5.1.6) by using the same arguments as for (5.1.5).
5.2
Covariance inequalities
In this section we present some covariance inequalities for the coefficients γ1 , β˜ ˜ We begin with the weakest of those coefficients. and φ.
5.2.1
A covariance inequality for γ1
Definition 5.1. Let X, Y be real valued random variables. Denote by – QX the generalized inverse x of the tail function HX : x → P(|X| > x). – GX the inverse of x → 0 QX (u)du. – HX,Y the generalized inverse of x → E(|X|1|Y |>x ). Proposition 5.2. Let (Ω, A, P) be a probability space and M be a σ-algebra of A. Let X be an integrable random variable and Y be an M-measurable random variable such that |XY | is integrable. The following inequalities hold |E(Y X)| ≤
E(X|M)1 0
HX,Y (u)du ≤
E(X|M)1
0
QY ◦ GX (u)du .
(5.2.1)
If furthermore Y is integrable, then |Cov(Y, X)| ≤
γ1 (M,X)
0
QY ◦GX−E(X) (u)du ≤ 2
0
γ1 (M,X)/2
QY ◦GX (u)du . (5.2.2)
Combining Proposition 5.2 and the comparison results (2.2.13), (2.2.18) and Item 2. of Lemma 5.1, we easily derive covariance inequalities for θ1 , τ1 and α. ˜ Proof of Proposition 5.2. We start from the inequality ∞ |E(Y X)| ≤ E(|Y E(X|M)|) = E |E(X|M)1|Y |>t dt. 0
Clearly we have that E |E(X M |1|Y |>t ) ≤ E(X|M)1 ∧ E(|X|1|Y |>t ). Hence |E(Y X)|≤
1u<E(|X|1|Y |>t ) du dt ≤
∞ E(X|M)1 0
0
0
E(X|M)1 ∞ 0
1t
5.2. COVARIANCE INEQUALITIES
111
and the first inequality in (5.2.1) is proved. In order to prove the second one we use Fr´echet’s inequality (1957) [88]: E(|X|1|Y |>t ) ≤
P(|Y |>t)
QX (u)du.
0
(5.2.3)
We infer from (5.2.3) that HX,Y (u) ≤ QY ◦ GX (u), which yields the second inequality in (5.2.1). We now prove (5.2.2). The first inequality in (5.2.2) follows directly from (5.2.1). To prove the second one, note that QX−E(X) ≤ QX + X1 and consequently x x QX−E(X) (u)du ≤ QX (u)du + xX1 . (5.2.4) x
0
0
Set R(x) = 0 QX (u)du − xX1 . Clearly, R is non-increasing over ]0, 1], R ( ) ≥ 0 for small enough and R (1) ≤ 0. We infer that R is first non-decreasing and next 1 non-increasing, and that for any x ∈ [0, 1], R(x) ≥ min(R(0), R(1)). Since 0 QX (u)du = X1 , we have that R(1) = R(0) = 0 and we infer from (5.2.4) that 0
x
QX−E(X) (u)du ≤
0
x
QX (u)du + xX1 ≤ 2
x
0
QX (u)du .
This implies GX−E(X) (u) ≥ GX (u/2) which concludes the proof of (5.2.2). Combining Proposition 5.2 and the comparison results (2.2.13), (2.2.18) and ˜ Item 2. of Lemma 5.1, we easily derive covariance inequalities for θ1 , τ1 and α. Corollary 5.1. Let (Ω, A, P) be a probability space. Let X be a real valued integrable random variable, and M be a σ−algebra of A. We have that |Cov(Y, X)| ≤ 2
0
α(M,X) ˜
QY (u)QX (u)du .
(5.2.5)
Proof of Corollary 5.1. To prove (5.2.5), put z = GX (u) in the second integral of (5.2.2), and we use the results of comparison (2.2.13), (2.2.18) and Item 2. of Lemma 5.1.
5.2.2
A covariance inequality for β˜ and φ˜
Proposition 5.3. Let X and Y be two real-valued random variables on the probability space (Ω, A, P). Let FX|Y : t → PX|Y (] − ∞, t]) be a distribution function of X given Y and let FX be the distribution function of X. Define the
112
CHAPTER 5. TOOLS FOR CAUSAL CASES
random variable b(σ(Y ), X) = supx∈R |FX|Y (x) − FX (x)|. For any conjugate exponents p and q, we have the inequalities 1
|Cov(Y, X)| ≤
1
2{E(|X|p b(σ(X), Y ))} p {E(|Y |q b(σ(Y ), X))} q (5.2.6) 1 1 ˜ ˜ (5.2.7) 2φ(σ(X), Y ) p φ(σ(Y ), X) q Xp Y q .
≤
Remark 5.2. Inequality (5.2.6) is a weak version of that of Delyon (1990) [57] (see also Viennet (1997) [185], Lemma 4.1) in which appear two random variables b1 (σ(Y ), σ(X)) and b2 (σ(X), σ(Y )) each having mean β(σ(Y ), σ(X)). Inequality (5.2.7) is a weak version of that of Peligrad (1983) [139], where the dependence coefficients are φ(σ(Y ), σ(X)) and φ(σ(X), σ(Y )). Proof of Proposition 5.3. We start from the equality ∞ ∞ Cov(Y, X) = Cov(1X>x − 1X<−x , 1Y >y − 1Y <−y ) dx dy . 0
(5.2.8)
0
˜ Since b(σ(Y ), X)∞ = φ(σ(Y ), X), we only need to prove (5.2.6). Here, note that the value of b(σ(Y ), X) does not change if we replace FX|Y (x) and FX (x) by PX|Y (] − ∞, x[) and PX (] − ∞, x[) respectively. Consequently, the following inequalities hold: E(1X>x 1Y >y − P(X > x)P(Y > y)) ≤ E 1Y >y b(σ(Y ), X) ∧ E 1X>x b(σ(X), Y ) E(1X<−x 1Y >y − P(X < −x)P(Y > y)) ≤ E 1Y >y b(σ(Y ), X) ∧ E 1X<−x b(σ(X), Y ) E(1X>x 1Y <−y − P(X > x)P(Y < −y)) ≤ E 1Y <−y b(σ(Y ), X) ∧ E 1X>x b(σ(X), Y ) E(1X<−x 1Y <−y − P(X < −x)P(Y < −y)) ≤ E 1Y <−y b(σ(Y ), X) ∧ E 1X<−x b(σ(X), Y ) . Since a1 ∧ b1 + a1 ∧ b2 + a2 ∧ b1 + a2 ∧ b2 ≤ 2(a1 + a2 ) ∧ (b1 + b2 ), we infer from (5.2.8) that ∞ ∞ |Cov(Y, X)| ≤ 2 E(1|X|>x b(σ(X), Y )) ∧ E(1|Y |>y b(σ(Y ), X)) dx dy . 0
0
(5.2.9)
5.2. COVARIANCE INEQUALITIES
113
˜ ˜ Y ) and β2 = β(σ(Y ), X). Note β1 (or β2 ) is 0 if and only if Let β1 = β(σ(X), X is independent of Y , and then (5.2.6) is true in that case. Otherwise, let Pβ1 , Pβ2 be the probabilities with density b(σ(X), Y )/β1 and b(σ(Y ), X)/β2 with respect to P. The inequality (5.2.9) writes ∞ ∞ |Cov(Y, X)| ≤ 2 β1 Pβ1 (|X| > x) ∧ β2 Pβ2 (|Y | > y) dx dy . (5.2.10) 0
0
Let Qβ1 ,X et Qβ2 ,Y be the generalized inverse of x → Pβ1 (|X| > x) and y → Pβ2 (|Y | > y). Starting from (5.2.10), we have successively ∞ ∞ β1 ∧β2 1u<β1 Pβ1 (|X|>x) 1u<β2 Pβ2 (|Y |>y) du dx dy |Cov(Y, X)| ≤ 2 0
≤ 2
0
≤ 2
0
∞
0
∞
0 β1 ∧β2
β1 ∧β2
1Qβ1 ,X (u/β1 )>x 1Qβ2 ,Y (u/β2 )>y du dx dy
0
Qβ1 ,X (u/β1 )Qβ2 ,Y (u/β2 ) du .
0
Applying H¨ older’s inequality, and setting s = u/β1 and t = u/β2 , we obtain |Cov(Y, X)| ≤ 2
0
1/p
1
β1 Qpβ1 ,X (s) ds
0
1/q
1
β2 Qqβ2 ,Y
(t) dt
.
To complete the proof of (5.2.6), note that, by definition of Qβ1 ,X , 1 β1 Qpβ1 ,X (s) ds = E(|X|p b(σ(X), Y )) . 0
Corollary 5.2. Let f1 , f2 , g1 , g2 be four increasing functions, and let f = f1 −f2 et g = g1 − g2 . For any random variable Z, let Δp (Z) = inf a∈R Z − ap and Δp,σ(X),Y (Z) = inf a∈R (E(|Z − a|p b(σ(X), Y )))1/p . For any conjugate exponents p and q, we have the inequalities |Cov(g(Y ), f (X))| ≤ 2 Δp,σ(X),Y (f1 (X)) + Δp,σ(X),Y (f2 (X)) × Δq,σ(Y ),X (g1 (Y )) + Δq,σ(Y ),X (g2 (Y )) , 1
1
|Cov(g(Y ), f (X))| ≤ 2φ(σ(X), Y ) p φ(σ(Y ), X) q × Δp (f1 (X)) + Δp (f2 (X)) Δq (g1 (Y )) + Δq (g2 (Y )) . In particular, if μ is a signed measure with total variation μ and f (x) = μ(] − ∞, x]), we have ˜ |Cov(Y, f (X))| ≤ μE(|Y |b(σ(Y ), X)) ≤ φ(σ(Y ), X)μ Y 1 .
(5.2.11)
114
CHAPTER 5. TOOLS FOR CAUSAL CASES
Proof of Corollary 5.2. For the two first inequalities, note that, for any a1 , a2 , b1 , b2 , |Cov(g(Y ), f (X))| ≤
|Cov(g1 (Y ) − b1 , f1 (X) − a1 )| +|Cov(g1 (Y ) − b1 , f2 (X) − a2 )| +|Cov(g2 (Y ) − b2 , f1 (X) − a1 )| +|Cov(g2 (Y ) − b2 , f2 (X) − a2 )| .
the functions f1 − a1 , f2 − a2 , g1 − b1 , g2 − b2 being nondecreasing, we infer that b(σ(fi (X)), gj (Y ) − bj ) ≤ E(b(σ(X), Y )|σ(fi (X))) almost surely. Now, apply (5.2.6) and (5.2.7), and take the infimum over a1 , b1 , a2 , b2 . To show (5.2.11), we take q = 1 and p = ∞. Let μ = μ+ − μ− be the Jordan decomposition of μ. We have f (x) = f1 (x) − f2 (x), with f1 (x) = μ+ (] − ∞, x]) and f2 (x) = μ− (] − ∞, x]). To conclude, apply the preceding inequalities and note that μ = μ+ (R) + μ− (R) and Δ1,σ(Y ),X (Y ) ≤ E(|Y |b(σ(Y ), X)), 2Δ∞ (f1 (X)) ≤ μ+ (R), and 2Δ∞ (g1 (X)) ≤ μ− (R).
5.3
Coupling
There exist several methods to obtain limit theorems for sequences of dependent random variables. One of the most popular and useful is the coupling of the initial sequence with an independent one. The main result in Section 5.3.1 is a coupling result (Dedecker and Prieur (2004) [45]) allowing to replace a sequence of τ1 −dependent random variables by an independent one, having the same marginals. Moreover, a variable in the newly constructed sequence is independent of the past of the initial one and it is close, for the L1 norm, to the variable having the same rank. The price to pay to replace the initial sequence by an independent one depends on the τ1 −dependence properties of the initial sequence. Various approaches to coupling have been developed by different authors. We refer to a recent paper by Merlev`ede and Peligrad (2002) [130] for a survey on the various coupling methods and their applications. The approach used in this chapter lies on the quantile transform of Major (1978) [126]. It has been used for strongly mixing sequences by Rio (1995) [159] and Peligrad (2002) [140] to obtain a coupling result in L1 . In Section 5.3.2, the coupling result of Section 5.3.1 is generalized to the case of variables with values in any Polish space X .
5.3. COUPLING
5.3.1
115
A coupling result for real valued random variables
The main result of this section is a coupling result for the coefficient τ1 , which appears to be the appropriate coefficient for coupling in L1 . We first recall the nice coupling properties known for the usual mixing coefficients. Berbee (1979) [16] and Goldstein (1979) [96] proved: if Ω is rich enough, there exists a random variable X ∗ distributed as X and independent of M such that P(X = X ∗ ) = β(M, σ(X)). For the mixing coefficient α(M, σ(X)), Bradley (1983) [29] proved the following result: if Ω is rich enough, then for each 1 ≤ p ≤ ∞ and each λ < Xp , there exists X ∗ distributed as X and independent of M such that ∗
P(|X − X | ≥ λ) ≤ 18
Xp λ
p/(2p+1)
(α(M, σ(X)))2p/(2p+1) .
For the weaker coefficient α(M, ˜ X), Rio (1995, 2000) [159, 160] obtained the following upper bound, which is not directly comparable to Bradley’s: if X belongs to [a, b] and if Ω is rich enough, there exists X ∗ independent of M and α(M, X). This result has then distributed as X such that X − X ∗ 1 ≤ (b − a)˜ been extended by Peligrad (2002) [140] to the case of unbounded variables. Recall that the random variable X ∗ appearing in the results by Rio (1995, 2000) [159, 160] and Peligrad (2002) [140] is based on Major’s quantile transformation (1978) [126]. X ∗ has the following remarkable property: X − X ∗ 1 is the infimum of X − Y 1 where Y is independent of M and distributed as X. Starting from the exact expression of X ∗ , Dedecker and Prieur (2004) [45] proved that τ1 (M, X) is the appropriate coefficient for the coupling in L1 (see Lemma 5.2 below). Lemma 5.2. Let (Ω, A, P) be a probability space, X an integrable real-valued random variable, and M a σ-algebra of A. Assume that there exists a random variable δ uniformly distributed over [0, 1], independent of the σ-algebra generated by X and M. Then there exists a random variable X ∗ , measurable with respect to M ∨ σ(X) ∨ σ(δ), independent of M and distributed as X, such that X − X ∗ 1 = τ1 (M, X).
(5.3.1)
This coupling result is a useful tool to obtain suitable inequalities, to prove ˜ various limit theorems and to obtain upper bounds for β(M, X) (see Dedecker and Prieur (2004) [48]). Remark 5.3. From Berbee’s lemma and Lemma 5.2 above, we see that both β(M, σ(X)) and τ1 (M, X) have a property of optimality: they are equal to the infimum of E(d0 (X, Y )) where Y is independent of M and distributed as X, for the distances d0 (x, y) = 1x =y and d0 (x, y) = |x − y| respectively.
116
CHAPTER 5. TOOLS FOR CAUSAL CASES
Proof of Lemma 5.2. We first construct X ∗ using the conditional quantile transform of Major (1978), see (5.1.3) in the proof of Lemma 5.1. The choice of this transform is due to its property to minimize the distance between X and X ∗ in L1 (R). We recall equality (5.1.4): 1 −1 ∗ X − X 1 = E |FM (u) − F −1 (u)|du . (5.3.2) 0
For two distribution functions F and G, denote by M (F, G) the set of all probability measures on R × R with marginals F and G. Define * d(F, G) = inf |x − y|μ(dx, dy) μ ∈ M (F, G) , and recall that (see Dudley (1989) [80], Section 11.8, Problems 1 and 2 page 333) 1 d(F, G) = |F (t) − G(t)|dt = |F −1 (u) − G−1 (u)|du . (5.3.3) R
0
On the other hand, Kantorovich and Rubinstein (Theorem 11.8.2 in Dudley (1989) [80]) have proved that * d(F, G) = sup f dF − f dG f ∈ Λ(1) (R) . (5.3.4) Combining (5.1.4), (5.3.3) and (5.3.4), we have that * X − X ∗ 1 = E sup f dFM − f dF f ∈ Λ(1) (R) , and the proof of Lemma 5.2 is complete.
5.3.2
Coupling in higher dimension
We show that the coupling properties of τ1 described in the previous section remain valid when X is any Polish space. This is due to a conditional version of the Kantorovich Rubinstein Theorem (Theorem 5.1). Lemma 5.3. Let (Ω, A, P) be a probability space, M a σ-algebra of A and X a random variable with values in a Polish space (X , d). Assume that d(x, x0 ) PX (dx) is finite for some (and therefore any) x0 ∈ X . Assume that there exists a random variable δ uniformly distributed over [0, 1], independent of the σ-algebra generated by X and M. Then there exists a random variable X ∗ , measurable with respect to M ∨ σ(X) ∨ σ(δ), independent of M and distributed as X, such that (5.3.5) τp (M, X) = E(d(X, X ∗ )|M )p .
5.3. COUPLING
117
We first need some notations. We denote P(Ω) the set of probability measures on a space Ω and define * Y(Ω, A, P; X ) = μ ∈ P(Ω × X ), A ⊗ BX ∀A ∈ A, μ(A × X ) = P(A) . Recall that every μ ∈ Y(Ω, A, P; X ) is disintegrable, that is, there exists a (unique, up to P-a.s. equality) A∗μ -measurable mapping ω → μω , Ω → P(X ), such that μ(f ) = f (ω, x) dμω (x) dP(ω) Ω
X
for every measurable f : Ω×X → [0, +∞] (see Valadier (1973) [184]). Moreover, the mapping ω → μω can be chosen A-measurable. If X is endowed with the distance d, let denote * d,1 d(x, x0 ) dμ(ω, x) < ∞ , Y (Ω, A, P; X ) = μ ∈ Y Ω×X
where x0 is some fixed element of X (this definition is independent of the choice of x0 ). For any μ, ν ∈ Y, let D(μ, ν) be the set of probability laws π on Ω×X ×X such that π(· × · × X ) = μ and π(· × X × ·) = ν. We now define the parametrized (d) (d) versions of ΔKR and ΔL . Set, for μ, ν ∈ Y d,1 , (d) ΔKR (μ, ν) = inf d(x, y) dπ(ω, x, y). π∈D(μ,ν)
Ω×X ×X
Let also Λ(1) denote the set of measurable integrands f : Ω × X → R such that f (ω, ·) ∈ Λ(1) ∩ L∞ for every ω ∈ Ω. We denote (d)
ΔL (μ, ν) = sup (μ(f ) − ν(f )) . f ∈Λ(1)
We now state the parametrized Kantorovich-Rubinstein Theorem of Dedecker, Prieur and Raynaud De Fitte (2004) [47], which is the main tool to prove the coupling result of Lemma 5.3. The proof of that theorem mainly relies on ideas contained in R¨ uschendorf (1985) [171]. Theorem 5.1. (Parametrized Kantorovich–Rubinstein Theorem) Let μ, ν ∈ Y d,1 (Ω, A, P; X ) and let ω → μω and ω → νω be disintegrations of μ and ν respectively. (d)
(d)
1. Let G : ω → ΔKR (μω , νω ) = ΔL (μω , νω ) and let A∗ be the universal completion of A. There exists an A∗ –measurable mapping ω → λω from Ω to P(X × X ) such that λω belongs to D(μω , νω ) and G(ω) = d(x, y) dλω (x, y). X ×X
118
CHAPTER 5. TOOLS FOR CAUSAL CASES
2. The following equalities hold: (d) ΔKR (μ, ν) =
Ω×X ×X
(d)
d(x, y) dλ(ω, x, y) = ΔL (μ, ν),
where λ is the element of Y(Ω, A, P; X × X ) defined by λ(A × B × C) = λ (B × C) dP(ω) for any A in A, B and C in BX . In particular, A ω (d)
λ belongs to D(μ, ν), and the infimum in the definition of ΔKR (μ, ν) is attained for this λ. Remark 5.4. This theorem is proved in a more general frame in Dedecker, Prieur and Raynaud De Fitte (2004) [47]. It also allows more general coupling results when working with more general cost functions than the metric d of X . Proof of Lemma 5.3. First notice that the assumption that d(x, x0 )PX (dx) < ∞ for some x0 ∈ X means that the unique measure of Y(Ω, A, P; X × X ) with disintegration PX|M (., ω) belongs to Y d,1 (Ω, A, P; X ). We now prove that if Q is any element of Y d,1 (Ω, A, P; X ), there exists a σ(δ) ∨ σ(X) ∨ X −measurable random variable Y such that Q• is a regular conditional probability of Y given M, and sup f (x)P (dx) − f (x)Q• (dx) P-a.s. E d(X, Y ) M = X|M f ∈Λ(1) ∩L∞
(5.3.6) Applying (5.3.6) with Q = P ⊗ PX , we get the result of Lemma 5.3. Proof of Equation (5.3.6). We apply Theorem 5.1 to the probability space (Ω, M, P) and to the disintegrated measures μω (·) = PX|M (·, ω) and νω = Qω . From point 1 of Theorem 5.1 we infer that there exists a mapping ω → λω ∗ from Ω to P(X × X ), measurable for M and BP(X ×X ), such that λω belongs to D(PX|M (·, ω), Qω ) and G(ω) = X ×X d(x, y)λω (dx, dy). On the measurable space (M, T ) = (Ω × X × X , M∗ ⊗ BX ⊗ BX ) we put the probability π(A × B × C) = λω (B × C)P(dω) . A
If I = (I1 , I2 , I3 ) is the identity on M, we see that a regular conditional distribution of (I2 , I3 ) given I1 is P(I2 ,I3 )|I1 =ω = λω . Since PX|M (·, ω) is the first margin of λω , a regular conditional probability of I2 given I1 is PI2 |I1 =ω (·) = PX|M (·, ω). Let λω,x = PI3 |I1 =ω,I2 =x be a regular conditional distribution of I3 given (I1 , I2 ), so that (ω, x) → λω,x is measurable for M∗ ⊗ BX and BP(X ). From the unicity (up to P-a.s. equality) of regular conditional probabilities, it follows that λω (B × C) = λω,x (C)PX|M (dx, ω) P-a.s. (5.3.7) B
5.4. EXPONENTIAL AND MOMENT INEQUALITIES
119
Assume that we can find a random variable Y˜ from Ω to X , measurable for σ(U ) ∨ σ(X) ∨ M∗ and BX , such that PY˜ |σ(X)∨M∗ (·, ω) = λω,X(ω) (·). Since ω → PX|M (·, ω) is measurable for M∗ and BP(X ) , one can check that PX|M is a regular conditional probability of X given M∗ . For A in M∗ , B and C inBX , we thus have = E 1A E 1X∈B E 1Y˜ ∈C |σ(X) ∨ M∗ |M∗ E 1A 1X∈B 1Y˜ ∈C = λω,x (C)PX|M (dx, ω) P(dω) B A = λω (B × C)P(dω) . A
We infer that λω is a regular conditional probability of (X, Y˜ ) given M∗ . By definition of λω , we obtain that ∗ ˜ sup E d(X, Y )|M = f (x)PX|M (dx) − f (x)Q• (dx) P-a.s. f ∈Λ(1) ∩L∞
(5.3.8) As X is Polish, there exists a σ(δ) ∨ σ(X) ∨ M-measurable modification Y of Y˜ , so that (5.3.8) still holds for E(d(X, Y )|M∗ ). We obtain (5.3.6) by noting that E (d(X, Y )|M∗ ) = E (d(X, Y )|M) P-a.s. It remains to build Y˜ . Since X is Polish, there exists a one to one map f from X to a Borel subset of [0, 1], such that f and f −1 are measurable for B([0, 1]) and BX . Define F (t, ω) = λω,X(ω) (f −1 ([0, t])). The map F (·, ω) is a distribution function with generalized inverse F −1 (·, ω) and the map (u, ω) → F −1 (u, ω) is B([0, 1]) ⊗ M∗ ∨ σ(X)measurable. Let T (ω) = F −1 (δ(ω), ω) and Y˜ = f −1 (T ). It remains to see that PY˜ |σ(X)∨M∗ (·, ω) = λω,X(ω) (·). For any A in M∗ , B in BX and t in R, we have E 1A 1X∈B 1Y˜ ∈f −1 ([0,t]) = 1X(ω)∈B 1δ(ω)≤F (t,ω) P(dω). A
Since δ is independent of σ(X)∨M, it is also independent of σ(X)∨M∗ . Hence E 1A 1X∈B 1Y˜ ∈f −1 ([0,t]) = 1X(ω)∈B F (t, ω)P(dω) A = 1X(ω)∈B λω,X(ω) (f −1 ([0, t]))P(dω). A
Since {f
−1
5.4
Exponential and Moment inequalities
([0, t])/t ∈ [0, 1]} is a separating class, the result follows.
The first theorem of this section extends Bennett’s inequality for independent sequences to the case of τ1 -dependent sequences. For any positive integer q,
120
CHAPTER 5. TOOLS FOR CAUSAL CASES
we obtain an upper bound involving two terms: the first one is the classical n Bennett’s bound at level λ for a sum i=1 ξi of independent variables ξi such n that Var ( i=1 ξi ) = vq and ξi ∞ ≤ qM , and the second one is equal to nλ−1 τq (q + 1). Using Item 2. of Lemma 5.1, we obtain the same inequalities as those established by Rio (2000) [161] for strongly mixing sequences. This is not surprising, we follow the proof of Rio and we use Lemma 5.2 instead of Rio’s coupling lemma. Note that the same approach has been previously used by Bosq (1993) [26], starting from Bradley’s coupling lemma (1983) [29]. Theorem 5.2 and Theorem 5.3 below are due to Dedecker and Prieur (2004) [45].
5.4.1
Bennett-type inequality
Theorem 5.2. Let (Xi )i>0 be a sequence of real-valued random variables such
that Xi ∞ ≤ M , and Mi = σ(Xk , 1 ≤ k ≤ i). Let Sk = ki=1 (Xi − E(Xi )) and S n = max1≤k≤n |Sk |. Let q be some positive integer, vq some nonnegative number such that [n/q]
vq ≥ Xq[n/q]+1 + · · · + Xn 22 +
X(i−1)q+1 + · · · + Xiq 22 .
i=1
and h the function defined by h(x) = (1 + x) log(1 + x) − x. λqM n vq 1. For λ > 0, P(|Sn | ≥ 3λ) ≤ 4 exp − + τ1,q (q + 1). h 2 (qM ) vq λ 2. For λ ≥ M q, P(S n ≥ (1q>1 + 3)λ) ≤ 4 exp −
λqM n vq h + τ1,q (q + 1) . (qM )2 vq λ
Proof of Theorem 5.2. We proceed as in Rio (2000) [161] page 83. For 1 ≤ i ≤ [n/q], define the variables Ui = Siq − Siq−q and U[n/q]+1 = Sn − Sq[n/q] . Let (δj )1≤j≤[n/q]+1 be independent random variables uniformly distributed over [0, 1] and independent of (Ui )1≤j≤[n/q]+1 . We apply Lemma 5.2: For any 1 ≤ i ≤ [n/q] + 1, there exists a measurable function Fi such that Ui∗ = Fi (U1 , . . . , Ui−2 , Ui , δi ) satisfies the conclusions of Lemma 5.2, with M = σ(Ul , l ≤ i − 2). The sequence (Ui∗ )1≤j≤[n/q]+1 has the following properties: a. For any 1 ≤ i ≤ [n/q] + 1, the random variable Ui∗ is distributed as Ui . ∗ b. The variables (U2i )2≤2i≤[n/q]+1 are independent and so are the variables ∗ (U2i−1 )1≤2i−1≤[n/q]+1 .
c. Moreover Ui − Ui∗ 1 ≤ τ1 (σ(Ul , l ≤ i − 2), Ui ).
5.4. EXPONENTIAL AND MOMENT INEQUALITIES
121
Since for 1 ≤ i ≤ [n/q] we have τ1 (σ(Ul , l ≤ i − 2), Ui ) ≤ qτ1,q (q + 1), we infer that for 1 ≤ i ≤ [n/q], Ui − Ui∗ 1 and
U[n/q]+1 −
∗ U[n/q]+1 1
≤ qτ1,q (q + 1)
(5.4.1)
≤ (n − q[n/q])τ1,n−q[n/q] (q + 1) .
Proof of 1. Clearly [n/q]+1
|Sn | ≤
|Ui −
([n/q]+1)/2
Ui∗ |+
i=1
i=1
[n/q]/2+1
∗ U2i +
∗ U2i−1 .
(5.4.2)
i=1
Combining (5.4.1) with the fact that τ1,n−q[n/q] (q + 1) ≤ τ1,q (q + 1), we obtain P
[n/q]+1 i=1
n |Ui − Ui∗ | ≥ λ ≤ τ1,q (q + 1) . λ
(5.4.3)
The result follows by applying Bennett’s inequality to the two other sums in (5.4.2). The proof of the second item is omitted. It is similar to the proof of Theorem 6.1 in Rio (2000) [161], page 83, for α-mixing sequences. Proceeding as in Theorem 5.2, we establish Fuk-Nagaev type inequalities (see Fuk and Nagaev (1971) [89]) for sums of τ1 -dependent sequences. Applying Item 2. of Lemma 5.1, we obtain the same inequalities (up to some numerical constant) as those established by Rio (2000) [161] for strongly mixing sequences. Notations 5.1. For any
non-increasing sequence (δi )i≥0 of nonnegative numbers, define δ −1 (u) = i≥0 1u<δi = inf{k ∈ N/δk ≤ u}. Note that δ −1 is the generalized inverse (see (2.2.14)) of the c` adl` ag function x → δ[x] , [·] denoting the integer part. Theorem 5.3. Let (Xi )i>0 be a sequence of centered and square integrable random variables, and define (Mi )i>0 and S n as in Theorem 5.2. Let X be some positive random variable such that QX ≥ supk≥1 Q|Xk | and s2n =
n n
|Cov(Xi , Xj )| .
i=1 j=1 −1 . For any λ > 0 and r ≥ 1, Let R = ((τ /2)−1 ◦ G−1 X )QX and S = R
λ2 −r/2 4n S(λ/r) P(S n ≥ 5λ) ≤ 4 1 + 2 + QX (u)du. rsn λ 0
(5.4.4)
122
CHAPTER 5. TOOLS FOR CAUSAL CASES
Proof of Theorem 5.3. Let q be any positive integer, and M > 0. As for the proof of Theorem 5.2 we define Ui = Siq − Siq−q , for 1 ≤ i ≤ [n/q]. We also define U i = (Ui ∧ qM ) ∨ (−qM ). Let ϕM (x) = (|x| − M )+ . Following the proof of Rio (2000) [161] for strongly mixing sequences, we first prove that Sn ≤
max
1≤j≤[n/q]
|
j
U i | + qM +
i=1
n
ϕM (Xk ) .
(5.4.5)
k=1
To prove (5.4.5), we just have to notice that, if S n = Sk0 , then for j0 = [k0 /q], Sn ≤ |
j0
U i| +
i=1
j0
|Ui − Ui | +
k0
|Xk | ,
(5.4.6)
k=j0 +1
i=1
and then, as ϕM is convex, that j0
|Ui − U i | ≤
i=1
qj0
ϕM (Xk ),
(5.4.7)
k=1
and, by definition of ϕM , that k0
k0
|Xk | ≤ (k0 − qj0 )M +
k=j0 +1
ϕM (Xk ).
(5.4.8)
k=j0 +1
Now, to be able to apply Theorem 5.2, we need to center the variables U i . Then as the random variables Ui are centered, we get max
1≤j≤[n/q]
|
j
U i| ≤
i=1
≤
max
1≤j≤[n/q]
max
1≤j≤[n/q]
| |
j
[n/q]
(U i − E(U i ))| +
i=1
i=1
j
n
(U i − E(U i ))| +
i=1
E(|Ui − U i |) E(ϕM (Xk )) ,
k=1
using the convexity of ϕM . Hence we have proved that Sn ≤
max
1≤j≤[n/q]
|
j i=1
(U i − E(U i ))| + qM +
n
(E(ϕM (Xk )) + ϕM (Xk )). (5.4.9)
k=1
Let us now choose the size q of the blocks and the constant of truncation M . Let v = S(λ/r), q = (τ /2)−1 ◦ G−1 X (v) and M = QX (v). Clearly, we have that qM = R(v) = R(S(λ/r)) ≤ λ/r. Since M = QX (v), n 2n v P (E(ϕM (Xk )) + ϕM (Xk )) ≥ λ ≤ QX (u)du . (5.4.10) λ 0 k=1
5.4. EXPONENTIAL AND MOMENT INEQUALITIES
123
j We are now interested in the term P max1≤j≤[n/q] | i=1 (U i − E(U i ))| ≥ 3λ . We apply Theorem 5.2 to the sequence (U i − E(U i ))i∈Z with n = [n/q] and q = 1. We have U i = h(Ui ) where h : R → R is a Lipschitz function such that Lip (h) ≤ 1. Hence, we have τ1 (σ(U l , l ≤ i − 2), U i ) ≤ qτ1,q (q + 1). Since s2n ≥ U 1 22 + · · · + U [n/q] 22 we obtain: P
j λ2 −r/2 n + τ1,∞ (q +1). (5.4.11) max (U i −E(U i )) ≥ 3λ ≤ 4 1+ 2 1≤j≤[n/q] rsn λ i=1
To conclude, let us notice that the choice of q implies that τ1,∞ (q + 1) ≤ v 2 0 QX (u)du. Hence, since qM ≤ λ, combining (5.4.11), (5.4.10) and (5.4.9), we get the result.
5.4.2
Burkholder’s inequalities
The next result extends Theorem 2.5 of Rio (2000) [161] to non-stationary sequences. Proposition 5.4. Let (Xi )i∈N be a sequence of centered and square integrable random variables, and Mi = σ(Xj , 0 ≤ j ≤ i). Define Sn = X1 + · · · + Xn and l
bi,n = max Xi E(Xk |Mi )
i≤l≤n
k=i
. p/2
For any p ≥ 2, the following inequality holds n 1/2 Sn p ≤ 2p bi,n .
(5.4.12)
i=1
Proof. We proceed as in Rio (2000) [161] pages 46-47. For any t in [0, 1] and p ≥ 2, let hn (t) = Sn−1 + tXn pp . Our induction hypothesis at step n − 1 is the following: for any k < n hk (t) ≤ (2p)p/2
k−1
p/2 bi,k + tbk,k
.
i=1
This assumption is true at step 1. Assuming that it holds for n − 1, we have
n−1 to check it at step n. Setting G(i, n, t) = Xi (tE(Xn | Mi ) + k=i E(Xk | Mi )) and applying Theorem (2.3) in Rio (2000) with ψ(x) = |x|p , we get t n−1 hn (t) 1 p−2 ≤ E |Si−1 +sXi | G(i, n, t) ds+ E(|Sn−1 +sXn |p−2 Xn2 )ds . p2 0 0 i=1 (5.4.13)
124
CHAPTER 5. TOOLS FOR CAUSAL CASES
Note that the function t → E(|G(i, n, t)|p/2 ) is convex, so that for any t in p/2 [0, 1], E(|G(i, n, t)|p/2 ) ≤ E(|G(i, n, 0)|p/2 ) ∨ E(|G(i, n, 1)|p/2 ) ≤ bi,n . Applying H¨older’s inequality, we obtain E |Si−1 +sXi |p−2 G(i, n, t) ≤ (hi (s))(p−2)/p G(i, n, t)p/2 ≤ (hi (s))(p−2)/p bi,n . This bound together with (5.4.13) and the induction hypothesis yields 1 t n−1 hn (t) ≤ p2 bi,n (hi (s))(p−2)/p ds + bn,n (hn (s))(p−2)/p ds 0
i=1
≤ p2
n−1
0
p
(2p) 2 −1 bi,n
i=1
1 i 0
bj,n + sbi,n
t p2 −1 2 ds + bn,n (hn (s))1− p ds . 0
j=1
Integrating with respect to s we find 1 i i i−1 p2 −1 p2 p2 2 2 bj,n + sbi,n ds = bj,n − bj,n , bi,n p j=1 p j=1 0 j=1 and summing in j we finally obtain t p2 n−1 2 hn (t) ≤ 2p bj,n + p2 bn,n (hn (s))1− p ds .
(5.4.14)
0
j=1
Clearly the function u(t) = (2p)p/2 (b1,n + · · · + tbn,n )p/2 solves the equation associated to Inequality (5.4.14). A classical argument ensures that hn (t) ≤ u(t) which concludes the proof. Corollary 5.3. Let (Xi )i∈N and (Mi )i∈N be as in Proposition 5.4. Define ˜ ˜ γ1,i = sup γ1 (Mk , Xi+k ), α ˜ i = sup α(M ˜ k , Xi+k ) and φi = sup φ(Mk , Xi+k ). k≥0
k≥0
k≥0
1. Let X be any random variable such that QX ≥ supk≥1 QXk , and let
n
n −1 γ1,n (u) = k=0 1u≤λ1,k and α ˜ −1 ˜ k . For p ≥ 2 we have n (u) = k=0 1u≤α the inequalities X1 1/p # −1 Sn p ≤ 2pn (γ1,n (u))p/2 Qp−1 X ◦ GX (u)du 0
≤
1 1/p # p/2 p 2pn (˜ α−1 QX du . n (u)) 0
2. Let Mq = supk≥1 Xi q . For q ≥ p ≥ 2 we have the inequality n−1 1/2 (q−1)/q Sn p ≤ 2 pMq Mqp/(2q−p) (n − k)φ˜k . k=0
5.4. EXPONENTIAL AND MOMENT INEQUALITIES
125
Proof of 1. Let r = p/(p − 2). By duality there exists an Mi -mesurable Y such that Y r = 1, and n E(|Y Xi E(Xk |Mi )|) . bi,n ≤ k=i
Let λi = GX (γ1,i ). Applying (5.2.1) and Fr´echet’s inequality (1957) [88], we obtain n γ1,k−i n λk−i QY Xi ◦ GX (u)du ≤ QY (u)Q2X (u)du . bi,n ≤ k=i
0
k=i
0
Using the duality once more, we get p/2
bi,n ≤
0
1
n
p/2 1u≤λk
QpX (u)du =
k=0
X1
0
−1 (γ1,n (u))p/2 Qp−1 X ◦ GX (u)du .
The first inequality follows. To prove the second one, note that λk ≤ α ˜k . Proof of 2. First, note that bi,n ≤
n
Xi E(Xk | Mi )p/2 . Let r = p/(p − 2).
k=i
By duality, there exist a Mi -measurable variable Y such that Y r = 1 and Xi E(Xk | Mi )p/2 = |Cov(Y Xi , Xk )| . Applying inequality (5.2.7), and next H¨ older’s inequality, we obtain that (q−1)/q
Xi E(Xk | Mi )p/2 ≤ 2φ˜k−i
(q−1)/q
Y Xi q/(q−1) Xk q ≤ 2Mq Mqp/(2q−p) φ˜k−i
.
The result follows.
5.4.3
Rosenthal inequalities using Rio techniques
We suppose that the sequence (Xn )n∈N fulfills the following covariance inequality, |Cov(f (X1 , . . . , Xn ), g(X1 , . . . , Xn ))| ≤
∂f ∂g ∞ ∞ |Cov(Xi , Xj )| , ∂xi ∂xj i∈I j∈J
(5.4.15) for all real valued functions f and g defined on Rn having bounded first differentials and depending respectively on (xi )i∈I and on (xi )i∈J where I and J are ∂f disjoints subsets of N. Sequences fulfilling (5.4.15) with sup ∞ < ∞ and ∂x i∈I i ∂g ∞ are κ-dependent with κ(r) = sup |Cov(Xi , Xj )| , they are also sup ∂x j j∈J |i−j|≥r
126
CHAPTER 5. TOOLS FOR CAUSAL CASES
ζ-dependent with ζ(r) = sup i∈N
|Cov(Xi , Xj )| .
{j/ |i−j|≥r}
Our task in this section, is to extend Doukhan and Portal (1983) [73] method in order to provide moment bounds of order r, where r is any positive real number not less than two. The main result of this paragraph is the following theorem. Theorem 5.4. Let r > 2 be a fixed real number. Let (Xn ) be a strictly stationary sequence of centered r.v’s fulfilling (5.4.15). Suppose moreover that this sequence is bounded by M . Then there exists a positive constant Cr depending only on r, such that n k−1 r r r−2 r−2 M (i + 1) |Cov(X1 , X1+i )| , (5.4.16) E|Sn | ≤ Cr sn + k=1 i=0
where s2n := n
n i=0
|Cov(X1 , X1+i )|.
Theorem 5.4 gives, in particular, a unifying Rosenthal-type inequality for at least two models: associated or negatively associated processes. An immediate consequence of Theorem 5.4 is the following MarcinkiewiczZygmund bound. Corollary 5.4. Let r > 2 be a fixed real number. Let (Xn ) be a strictly stationary sequence of centered r.v’s bounded by M and fulfilling (5.4.15). Suppose that (5.4.17) |Cov(X1 , X1+i )| = O(i−r/2 ), as i → +∞. Then E|Sn |r = O(nr/2 ).
(5.4.18)
For bounded associated sequences, condition (5.4.17) is shown to be optimal for the Marcinkiewicz-Zygmund bound (5.4.18) (cf. Birkel (1988) [22]). Proof of Theorem 5.4. The method is a generalization of the Lindeberg decomposition to an order r > 2. This method was first developed by Rio (1995) [158] for mixing sequences and for r ∈]2, 3]. The restriction to sequences fulfilling the bound (5.4.15) is only for the sake of clarity and the method can be adapted successfully to other dependent sequences (cf. Proposition 5.5 below). We give here the great lines of the proof and we refer to Louhichi (2003) [125] for more details. Let p ≥ 2 be a fixed integer. Let Φp be the class of functions φ : R+ → R+ such that φ(0) = φ (0) = · · · = φ(p−1) (0) = 0 and φ(p) is non decreasing and concave. Let φ be a function of the set Φp . Theorem 5.4 is proved if we suitably control Eφ(|Sn |) (since the function x → xr , for r ∈]p, p + 1] is one of those functions φ). Such a control will be done into the following steps.
5.4. EXPONENTIAL AND MOMENT INEQUALITIES
127
Step 1. The purpose of Step 1 is to reduce the control of Eφ(|Sn |) to that of a suitable polynomial function of |Sn |. For this, define gp : R+ × R → R+ by, gp (t, x) :=
3 p+1 4 1 x 10≤x≤t + (xp+1 − (x − t)p+1 )1t<x , (p + 1)!
(5.4.19)
for any x ≥ 0 and gp (t, x) = gp (t, −x). The following lemma is a generalization of Equality (4.3) of Rio (1995) [158], which was written for p = 2. Lemma 5.4. Let p ≥ 2 be a fixed integer. Let φ ∈ Φp . Suppose that φ(p+1) exists and that limx→∞ φ(p+1) (x) = 0. Then φ(x) = 0
+∞
gp (t, x)νp (dt), (p+1)
where νp is the Stieltjes measure of −φp
defined by νp (dt) = −dφ(p+1) (t).
Lemma 5.4 reduces then the estimation of Eφ(|Sn |) to that of Egp (t, Sn ). Step 2. The purpose of Step 2 is then to give bounds of Ef (Sn ), for real-valued functions f belonging to a suitable set containing the functions x → gp (t, x). For this, we denote by Cp the class of real-valued, p times continuously differentiable functions f such that f (0) = · · · = f (p) (0) = 0. Let Fp (b1 , b2 ) be the subclass of Cp+1 such that f (p) ∞ ≤ b1 and that f (p+1) ∞ ≤ b2 , where f (i) ∞ = supx∈R |f (i) (x)| and f (i) is the differential of order i of f . In this step, we give an estimation of Ef (Sn ), for f ∈ Fp (b1 , b2 ). Let us note that the function gp as defined by (5.4.19) belongs to the set Fp (t, 1) We first exhibit the mains terms of our calculations. Notations. We denote by (p−2) the sum over i1 , . . . , ip−2 such that 0 := i0 ≤
i1 ≤ · · · ≤ ip−2 ≤ k − 1, that is 0≤i1 ≤···≤ip−2 ≤k−1 . We define Ep−2,k (Δf ) =
EXk Xk−i1 · · · Xk−ip−2 Δp−2,k (f, u) ,
sup
0≤u≤1
(p−2)
where 3 4 Δp−2,k (f ) := Δp−2,k (f, u) = f (Sk−ip−2 −1 + uXk−ip−2 ) − f (Sk−ip−2 −1 ) 1 = uXk−ip−2 f Sk−ip−2 −1 + uvXk−ip−2 dv. 0
We set for p ≥ 2, Ep−2,k (f ) =
(p−2)
|EXk Xk−i1 · · · Xk−ip−2 f (Sk−ip−2 −1 )|.
128
CHAPTER 5. TOOLS FOR CAUSAL CASES
Finally, recall that i0 = 0, E0,k (f ) = |EXk f (Sk−1 )| and that E0,k (Δf ) = sup |EXk Δ0,k (f, u)|. 0≤u≤1
For any real-valued function f of the set Fp (b1 , b2 ), the quantity |E(f (Sn ))| is evaluated by means of the main terms Ep−2,k (f (p−1) ) and Ep−2,k (Δf (p−1) ) as shows the following lemma. Lemma 5.5. Let p ≥ 2 be a fixed integer. Let (Xn ) be a sequence of r.v’s fulfilling (5.4.15), centered and bounded by M . There exists a positive constant Cp depending only on p, such that for any f ∈ Fp (b1 , b2 ), $ |E(f (Sn ))|
≤ Cp
spn (b1
∧ b2 sn ) + (b1 ∧ b2 M )M
p−2
k−1 n
|Cov(Xk , Xk−i )|
k=1 i=0
+
n
Ep−2,k (f
k=1
(p−1)
)+
n
9
Ep−2,k (Δf
(p−1)
) .
k=1
From now Cp denotes a positive constant depending only on p and that will be different from line to line. Evaluation of the main terms Ep−2,k (f ) and Ep−2,k (Δf ). The object of this step is to evaluate the main terms Ep−2,k (f ) and Ep−2,k (Δf ) of Lemma 5.5. This evaluation involves the following covariance quantities: Mm,n := M m−2
k−1 n (i + 1)m−2 |Cov(X1 , Xi+1 )|, for 2 ≤ m ≤ p, (5.4.20) k=1 i=0
Mm,n (b1 , b2 ) :=
n k−1
(b1 ∧ b2 (r + 1)M )(r + 1)m−2 M m−2 |Cov(X1 , Xr+1 )|.
k=1 r=0
(5.4.21) Let us note that M2,n is close to Var Sn and that M2,n = nVar X1 in the i.i.d. case. Those covariance quantities satisfy the following analogous of H¨ older’s inequality: (r−4)/(r−2) Mr,n ≤ srn + Mr,n , Mm,n Mr−m,n ≤ s2r/(r−2) n
(5.4.22)
for any r > 4, 2 < m < r. We now state the basic technical lemma of the proof of Theorem 5.4. Lemma 5.6. Let f be a real valued function of the set F1 (b1 , b2 ). Let (Xn ) be a centered sequence of random variables fulfilling (5.4.15). Suppose that (Xn )
5.4. EXPONENTIAL AND MOMENT INEQUALITIES
129
is uniformly bounded by M . Then, for any integer p ≥ 2, there exists a positive constant Cp depending only on p, for which n
Ep−2,k (Δf ) +
k=1
Ep−2,k (f )
(5.4.23)
k=1
$ ≤ Cp
n
spn (b1
∧ b 2 sn ) +
p−2
9 Mm,n Mp−m,n (b1 , b2 ) + Mp,n (b1 , b2 ) ,
m=2
where the sum
p−2 m=2
equals to 0, whenever p ∈ {2, 3}.
End of the proof of Theorem 5.4. Finally, we combine the three previous steps in order to finish the proof of Theorem 5.4. Let us explain. We first make use of Lemma 5.4, together with Fubini’s theorem, to obtain, +∞ Egp (t, Sn ) νp (dt). (5.4.24) Eφ(|Sn |) = 0
(p−1)
We recall that the functions x → gp (t, x) and x → gp (t, x) belong respectively to Fp (t, 1) and to F1 (t, 1) (in fact if f ∈ Fp (t, 1), then f (p−1) ∈ F1 (t, 1)). Hence we deduce, applying Lemma 5.5 to the function x → gp (t, x) and Lemma (p−1) (t, x), 5.6 to the function x → gp 9 $ p−2 p Egp (t, Sn ) ≤ Cp Mm,n Mp−m,n (t, 1) + Mp,n (t, 1) + sn (t ∧ sn ) . (5.4.25) m=2
Taking into account Lemma 5.4 and the fact g (p) (x) = x ∧ t, we deduce that +∞ φ(p) (x) = (t ∧ x)νp (dt). (5.4.26) 0
Inequalities (5.4.24), (5.4.25) and (5.4.26) yield: Eφ(|Sn |) ≤ Cp spn φ(p) (sn ) +
+
n k−1 (M (i + 1))p−2 φ(p) (M (i + 1))|Cov(X1 , X1+i )| k=1 i=0 p−2
Mm,n
m=2
n k−1
(5.4.27)
(M (i + 1))p−m−2 φ(p) (M (i + 1))|Cov(X1 , X1+i )| .
k=1 i=0
Now, we use the concavity property of the function φ(p) , 1 1 t(1 − t)p−1 xp dt, (1 − t)p−1 φ(p) (tx)dt ≥ xp φ(p) (x) φ(x) = (p − 1)! 0 (p − 1)! 0
130
CHAPTER 5. TOOLS FOR CAUSAL CASES
to deduce that
xp φ(p) (x) ≤ Cp φ(x).
(5.4.28)
We conclude, combining inequalities (5.4.27) and (5.4.28), $ Eφ(|Sn |)
≤
Cp
n k−1 k=1 i=0
p−2
+
(M (i + 1))−2 φ(M (i + 1))|Cov(X1 , X1+i )| + φ(sn )
Mm,n
m=2
9 k−1 n −m−2 (M (i + 1)) φ(M (i + 1))|Cov(X1 , X1+i )| k=1 i=0
The last inequality applied to φ(x) = xr , $ E|Sn | ≤ Cr r
for r ∈]p, p + 1], leads to
p−2 n k−1 (M (i + 1))r−2 |Cov(X1 , X1+i )| + srn + Mm,n Mr−m,n k=1 i=0
9 .
m=2
The proof of Theorem 5.4 is now complete, using the last inequality together with (5.4.22) (recall that M2,n ≤ s2n ). In the case where the sequence (Xn )n∈N∗ is θ−dependent, the proof of the inequalities of Theorem 5.4 can be adapted. We then get a variation of the technical proof of Theorem 5.4 written just above. Hence it will be omitted here. Let us state the inequalities we obtained in the case where (Xn )n∈N∗ is θ−dependent. Proposition 5.5. Let r be a fixed real number > 2. Let (Xn ) be a stationary sequence of θ1,∞ −dependent centered random variables. Suppose moreover that this sequence is bounded by 1. Let Sn := X1 + X2 + · · · + Xn , for n ≥ 1 and S0 = X0 = 0. Then there exists a positive constant Cr depending only on r, such that srn + Mr,n ) , (5.4.29) E|Sn |r ≤ Cr (˜
n−1
n−1 where Mr,n := n i=0 (i + 1)r−2 θ(i), and s˜2n := M2,n = n i=0 θ(i). We refer to Prieur (2002) [155] for a detailed proof of Proposition 5.5.
5.4.4
Rosenthal inequalities for τ1 -dependent sequences
We give here a corollary of Theorem 5.3. Corollary 5.5. Let (Xi )i>0 be a sequence of centered random variables belonging to Lp for some p ≥ 2. Define (Mi )i>0 , S n , QX and sn as in Theorem 5.3. Recall that τ −1 has been defined in Notations 5.1. We have S n pp ≤ ap spn + nbp
0
X1
((τ /2)−1 (u))p−1 Qp−1 X ◦ GX (u)du ,
5.4. EXPONENTIAL AND MOMENT INEQUALITIES
131
where ap = 4p5p (p + 1)p/2 and (p − 1)bp = 4p5p (p + 1)p−1 . Moreover we have that s2n ≤ 4n
X1
0
(τ /2)−1 (u)QX ◦ GX (u)du .
Proof of Corollary 5.5. It suffices to integrate (5.4.4) (as done in Rio (2000) [161] page 88) and to note that
1
Q(u)(R(u))p−1 (u)du = 0
0
X1
((τ /2)−1 (u))p−1 Qp−1 X ◦ GX (u)du .
The bound for s2n holds with θ1,1 instead of τ1,∞ (see Dedecker and Doukhan (2002) [43]).
5.4.5
Rosenthal inequalities under projective conditions
We recall two moment inequalities given in Dedecker (2001) [42]. We use these inequalities in Chapter 10 to prove the tightness of the empirical process for α, ˜ β˜ and φ˜ dependent sequences. Proposition 5.6. Let (Xi )i∈Z be a stationary sequence of centered and square integrable random variables and let Sn = X1 + · · ·+ Xn . Let Mi = σ(Xj , j ≤ i). The following upper bound holds 1/3 Sn p ≤ (pnV∞ )1/2 + 3p2 n X03 p/3 + M1 (p) + M2 (p) + M3 (p) , where VN = E(X02 ) + 2
N
|E(X0 Xk )| and
k=1
M1 (p)
M2 (p)
M3 (p)
=
=
=
l−1 +∞ l=1 m=0 +∞ +∞
X0 Xm E(Xl+m | Mm )p/3 X0 E(Xm Xl+m − E(Xm Xl+m )| M0 )p/3
l=1 m=l +∞
1 2
X0 E(Xk2 − E(Xk2 )| M0 )p/3 .
k=1
Proposition 5.7. We keep the same notations as in Proposition 5.6. For any positive integer N , the following upper bound holds 1/3 ˜2 (p)+M3 (p) Sn p ≤ (pn (VN −1 + 2M0 (p)))1/2 + 3p2 n X03 p/3 + M˜1 (p)+ M ,
132
CHAPTER 5. TOOLS FOR CAUSAL CASES
where M0 (p) =
+∞
X0 E(Xl | M0 )p/2
l=N
˜1 (p) = M
N −1 l−1
X0 Xm E(Xl+m | Mm )p/3
l=1 m=0
˜2 (p) = M
+∞ N −1
X0 E(Xm Xl+m − E(Xm Xl+m )| M0 )p/3 .
l=1 m=l
5.5
Maximal inequalities
Our first result is an extension of Doob’s inequality for martingales. This maximal inequality is stated in the nonstationary case. Proposition 5.8. Let (Xi )i∈Z be a sequence of square-integrable and centered random variables, adapted to a nondecreasing filtration (Fi )i∈Z . Let λ be any nonnegative real number and Gk = (Sk∗ > λ). We have E((Sn∗ − λ)2+ ) ≤ 4
n
E(Xk2 1Gk ) + 8
k=1
n−1
Xk 1Gk E(Sn − Sk |Fk )1 .
k=1
Proof of Proposition 5.8. We proceed as in Garsia (1965) [90]: (Sn∗ − λ)2+ =
n
∗ ((Sk∗ − λ)2+ − (Sk−1 − λ)2+ ).
(5.5.1)
k=1
Since the sequence (Sk∗ )k≥0 is nondecreasing, the summands in (5.5.1) are nonnegative. Now ∗ ∗ − λ)+ )((Sk∗ − λ)+ + (Sk−1 − λ)+ ) > 0 ((Sk∗ − λ)+ − (Sk−1 ∗ . In that case Sk = Sk∗ , whence if and only if Sk > λ and Sk > Sk−1 ∗ ∗ − λ)2+ ≤ 2(Sk − λ)((Sk∗ − λ)+ − (Sk−1 − λ)+ ). (Sk∗ − λ)2+ − (Sk−1
Consequently (Sn∗ − λ)2+
≤
2
n
(Sk − λ)(Sk∗ − λ)+ − 2
k=1
≤
2(Sn − λ)+ (Sn∗ − λ)+ − 2
n
∗ ((Sk − λ)(Sk−1 − λ)+ )
k=1 n
∗ (Sk−1 − λ)+ Xk .
k=1
5.5. MAXIMAL INEQUALITIES
133
Noting that 2(Sn − λ)+ (Sn∗ − λ)+ ≤ (Sn∗
−
λ)2+
≤ 4(Sn −
1 ∗ (S − λ)2+ + 2(Sn − λ)2+ , we infer 2 n
λ)2+
−4
n
∗ (Sk−1 − λ)+ Xk .
(5.5.2)
k=1
In order to bound (Sn − λ)2+ , we adapt the decomposition (5.5.1) and next we apply Taylor’s formula: (Sn − λ)2+
n
=
((Sk − λ)2+ − (Sk−1 − λ)2+ )
k=1 n
=
2
(Sk−1 − λ)+ Xk + 2
k=1
n
Xk2
1
0
k=1
(1 − t)1Sk−1 +tXk >λ dt.
Since 1Sk−1 +tXk >λ ≤ 1Sk∗ >λ , it follows that (Sn − λ)2+ ≤ 2
n
(Sk−1 − λ)+ Xk +
k=1
n
Xk2 1Sk∗ >λ .
(5.5.3)
k=1
Hence, by (5.5.2) and (5.5.3) (Sn∗ − λ)2+ ≤ 4
n
∗ (2(Sk−1 − λ)+ − (Sk−1 − λ)+ )Xk + 4
k=1
n
Xk2 1Sk∗ >λ .
k=1
Let D0 = 0 and Dk = 2(Sk − λ)+ − (Sk∗ − λ)+ for k > 0. Clearly Dk−1 Xk =
k−1
(Di − Di−1 )Xk .
i=1
Hence (Sn∗
−
λ)2+
≤4
n−1
n
i=1
k=1
(Di − Di−1 )(Sn − Si ) + 4
Xk2 1Sk∗ >λ .
(5.5.4)
Since the random variables Di − Di−1 are Fi -measurable, we have: E((Di − Di−1 )(Sn − Si )) = ≤
E((Di − Di−1 )E(Sn − Si | Fi )) E|(Di − Di−1 )E(Sn − Si | Fi )|. (5.5.5)
∗ − λ)+ , then It remains to bound |Di − Di−1 |. If (Si∗ − λ)+ = (Si−1
|Di − Di−1 | = 2|(Si − λ)+ − (Si−1 − λ)+ | ≤ 2|Xi |1Si∗ >λ ,
134
CHAPTER 5. TOOLS FOR CAUSAL CASES
because Di − Di−1 = 0 whenever Si ≤ λ and Si−1 ≤ λ. Otherwise Si = Si∗ > λ ∗ and Si−1 ≤ Si−1 < Si , which implies that ∗ Di − Di−1 = (Si − λ)+ + (Si−1 − λ)+ − 2(Si−1 − λ)+ .
Hence Di − Di−1 belongs to [0, 2((Si − λ)+ − (Si−1 − λ)+ ) ]. In any case |Di − Di−1 | ≤ 2|Xi |1Si∗ >λ , which together with (5.5.4) and (5.5.5) implies Proposition 5.8. Consider the projection operators Pi : for any f in L2 , Pi (f ) = E(f | Fi ) − E(f | Fi−1 ). Combining Proposition 5.8 and a decomposition due to McLeish, we obtain the following maximal inequality. Proposition 5.9. Let (Xi )i∈Z be a sequence of square-integrable and centered filtration. Define the σrandom variables, : and (Fi )i∈Z be any nondecreasing : algebras F−∞ = i∈Z Fi and F∞ = σ( i∈Z Fi ). Define the random variables Sn = X1 + · · · + Xn and Sn∗ = max{0, S1 , . . . , Sn }. For any i in Z, let
j ∗ (Yi,j )j≥1 be the martingale Yi,j = k=1 Pk−i (Xk ) and Yi,n = max{0, Yi,1 , . . . , ∗ > λ}. AsYi,n }. Let λ be any nonnegative real number and G(i, k, λ) = {Yi,k sume that the sequence is regular: for any integer k, E(Xk |F−∞ ) = 0 and E(Xk |F∞ ) = Xk . For any two sequences of nonnegative numbers (ai )i≥0 and is finite and bi = 1 we have (bi )i≥0 such that K = a−1 i ∞ n 2 E (Sn∗ − λ)2+ ≤ 4K ai E(Pk−i (Xk )1G(i,k,bi λ) ) . i=0
k=1
Proof of Proposition 5.9. Since the sequence is regular, we decompose Xk =
+∞
Pk−i (Xk ).
i=−∞
Consequently Sj =
Yi,j and therefore: (Sj −λ)+ ≤
i∈Z
(Yi,j −bi λ)+ . Applying
i∈Z
H¨older inequality and taking the maximum on both sides, we get ∗ (Sn∗ − λ)2+ ≤ K ai (Yi,n − bi λ)2+ . i∈Z
Taking the expectation and applying Proposition 5.8 to the martingale (Yi,n )n≥1 , we obtain Proposition 5.9.
Chapter 6
Applications of strong laws of large numbers We consider in this chapter a stochastic algorithm with weakly dependent input noise (according to Definition 2.2). In particular, the case of γ1 -dependence is considered. The ODE (ordinary differential equation) method is generalized to such situation. For this, we use tools for causal and non causal sequences developed in the previous chapters. Illustrations to the linear regression frame and to the law of large numbers for triangular arrays of weighted dependent random variables are also given.
6.1
Stochastic algorithms with non causal dependent input
We consider the Rd -valued stochastic algorithm, defined on a probability space (Ω, A, P) and driven by the recurrence equation Zn+1 = Zn + γn h(Zn ) + ζn+1 , where • h is a continuous function from an open set G ⊆ Rd to Rd , • (γn ) a decreasing to zero deterministic real sequence satisfying γn = ∞.
(6.1.1)
(6.1.2)
n≥0
• (ζn ) is a “small” stochastic disturbance. The ordinary differential equation (ODE) method associates (we refer for instance to Benveniste et al. (1987) [15], Duflo (1996) [82], Kushner and Clark 135
136
CHAPTER 6. APPLICATIONS OF SLLN
(1978) [114]) the possible limit sets of (6.1.1) with the properties of the associated ODE dz = h(z). dt
(6.1.3)
These sets are compact connected invariant and “chain-recurrent” in the Bena¨ım sense for the ODE (cf. Bena¨ım (1996) [14]). These sets are more or less complicated. Various situations may then happen. The most simple case is an equilibrium : z is a solution of h(z) = 0, but equilibria cycle, or a finite set of equilibria is linked to the ODE’s trajectories, connected sets of equilibria or, periodic cycles for the ODE may also happen. . . In order to use the ODE method, we suppose that (Zn ) is a.s. bounded and ζn+1
=
cn (ξn+1 + rn+1 ),
where (cn ) denotes a nonnegative deterministic sequence such that γn = O(cn ), c2n < ∞,
(6.1.4)
(6.1.5)
(ξn ) and (rn ) are Rd -valued sequences, defined on (Ω, A, P), and adapted with respect to an increasing sequence of σ-fields (Fn )n≥0 and satisfying almost surely (a.s.) on A ⊂ Ω, ∞
cn ξn+1 < ∞
a.s.
and
(6.1.6)
n=0
lim rn = 0 a.s.
n→∞
(6.1.7)
The classical theory of algorithms is related to a noise (ξn ) which is a martingale difference sequence. Our aim is to replace this condition about the noise by weakly dependence conditions as being introduced in Chapter 2. In Section 6.1.1, we suppose that the sequence (ξn ) is (Λ(1) ∩ L∞ , Ψ)-dependent according to Definition 2.2, where Ψ(f, g) = C(df , dg )(Lip(f ) + Lip(g)), for some function C : N∗ × N∗ → R. Various examples of this situation may be found in Chapter 3, they include general Bernoulli shifts, stable Markov chains such as, ξt = G(ξt−1 , . . . , ξt−p ) +
ζt , ξt = a0 + j≥1 aj ξt−j ζt generated by some i.i.d. sequence (ζt ), or ARCH(∞) models. In Section 6.1.2 below, we consider a weakly dependent noise in the sense of the γ1 -weak coefficients of Dedecker and Doukhan (2003) [43] defined by (2.2.17).
6.1. STOCHASTIC ALGORITHMS
137
Note that (cf. Remark 2.5) a causal version of (F , G, Ψ)-dependence implies γ1 -dependence, where the left hand side in definition of weak dependence writes ≤ C(v)Lip(g) (r). Counter-examples of γ1 -dependent sequences which are not θ-dependent may also be found there. The notion of γ1 -dependence is generalized to Rd valued sequences. Proposition 6.1. The two following assertions are equivalent: (i) A Rd -valued sequence (Xn ) is γ1 -dependent, (ii) Each component (Xn ) ( = 1, . . . , d) of (Xn ) is γ1 - dependent. Proof. Clearly, E(Xn+r −E(Xn+r )|Fn )1 ≤ E(Xn+r −E(Xn+r )|Fn )1 , hence (i) implies (ii). The second implication follows from,
; < d < 2 E(Xn+r E(Xn+r − E(Xn+r )|Fn )1 = E= − E(Xn+r )|Fn ) .
=1
The two forthcoming sections are devoted to provide moment inequalities of the Marcinkiewicz-Zygmund type adapted to deduce the relation (6.1.6) in those two frames. The following sections are devoted to study the examples of RobbinsMonro and Kiefer-Wolfowitz algorithms and to obtain sufficient conditions for the complete convergence of triangular arrays, extending on Chow (1966) [37]. Finally, the last section is devoted to the specific of the linear regression algorithm with dependent entries. In [36], Chen (1985) has also studies this topic. He works in a more general matrix valued framework. Assuming only the stationarity and the ergodicity of entries, he derives the a.s. convergence of the algorithm. We get the same result with a γ1 -dependence assumption, but this assumption, more restrictive, allow us to reach, thanks to a moment technic, a precise n−1/2 -convergence rate.
6.1.1
Weakly dependent noise
n Let (ξn ) be a sequence of centered random variables. Let Sn be the sum i=1 ξi and Cq = maxu+v≤q C(u, v). Suppose that an analogous of the bounds (4.3.2) and (4.3.3) are satisfied by the process ξ: sup |Cov(ξt1 · · · ξtm , ξtm+1 · · · ξtq )| ≤ Cq q γ M q−2 (r),
(6.1.8)
where the supremum is taken over all {t1 , . . . , tq } such that 1 ≤ t1 ≤ · · · ≤ tq , and 1 ≤ m < q such that tm+1 − tm = r, or |Cov(ξt1 · · · ξtm , ξtm+1 · · · ξtq )| ≤ (Cq ∨ 2)
0
(r)∧1
Qξt1 (x) · · · Qξtq (x)dx. (6.1.9)
138
CHAPTER 6. APPLICATIONS OF SLLN
Those bounds respectively imply moment inequalities in Theorems 4.2 and
n 4.3. Denoting Σn = i=1 ci−1 ξi , and using similar techniques as in § 4.3 (see Doukhan and Louhichi (1999) [67]) one has, Proposition 6.2. Let p ≥ 2 be some fixed integer and let (ξn ) be a centered sequence of real random variables such that (6.1.8) holds for all q ≤ p. Then for n ≥ 2, |EΣpn |
(2p − 2)! ≤ (p − 1)!
$ γ
Cp p M
p−2
n
cpi−1
i=1
n−1
p−2
(r + 1)
(r)
r=0
∨
C2 2γ
n i=1
c2i−1
n−1 r=0
p/2 ⎫ ⎬
(r) . ⎭
This result is mainly adapted to bounded sequences. Proof of proposition 6.2. The proof is done in Brandi`ere and Doukhan (2004) [32]. We have, using arguments from Doukhan and Louhichi’s (1999) [67] as done for the proof of Theorem 4.2, p n ci ξi ≤ p! ct1 · · · ctp |E(ξt1 · · · ξtp )|. (6.1.10) E 1≤t1 ≤···≤tp ≤n
i=1
Denote Ap (n) = tp−1 , Ap (n)
≤
1≤t1 ≤···≤tp ≤n ct1
· · · ctp |E(ξt1 · · · ξtp )|, so for any t2 ≤ tm ≤
ct1 · · · ctp |E(ξt1 · · · ξtm )E(ξtm+1 · · · ξtp )|
1≤t1 ≤···≤tp ≤n
+
ct1 · · · ctp |Cov(ξt1 · · · ξtm , ξtm+1 · · · ξtp )|.
1≤t1 ≤···≤tp ≤n
Denote A1p (n)
=
ct1 · · · ctp |E(ξt1 · · · ξtm )E(ξtm+1 . . . ξtp )|,
1≤t1 ≤···≤tp ≤n
A2p (n)
=
ct1 · · · ctp |cov(ξt1 · · · ξtm , ξtm+1 · · · ξtp )|.
1≤t1 ≤···≤tp ≤n
Since the sequence (cn ) is decreasing to 0, we deduce, as in Doukhan and Louhichi (1999) [67], A1p (n) ≤ Am (n)Ap−m (n).
(6.1.11)
6.1. STOCHASTIC ALGORITHMS
139
n p n−1 γ p−2 (r + 1)p−2 (r), and By (6.1.8) we obtain A2p (n) ≤ t1 =1 ct1 r=0 Cp p M
n p n−1 γ p−2 the expression (r + 1)p−2 (r) = Vp (n), verifies, for i=1 ci r=0 Cp p M q−2
p−q
any integer 2 ≤ q ≤ p − 1 : Vq (n) ≤ Vpp−2 (n)V2p−2 (n). Now, Lemma 4.7 (see 12 of Doukhan and Louhichi (1999) [67]) leads to Ap (n) ≤ also Lemma p 2p − 2 1 (V22 (n) ∨ Vp (n)), hence p p−1 n p (2p − 2)! p2 (V (n) ∨ Vp (n)). E ci ξi ≤ (p − 1)! 2 i=1 This ensures the result. The following result is appropriate to more general real-valued random variables but require a moment assumption and a tail condition. Proposition 6.3. Let p > 2 be a fixed integer and (ξn ) be a centered sequence of random variables. Assume that for all 2 < q ≤ p, Inequality (6.1.9) holds with Mq
q−2
≤
p−q
Mpp−2 M2p−2
(6.1.12)
and there exists a constant c > 0 such that ∃k > p, Then for n ≥ 2, |EΣpn |
≤
∀i ≥ 0 :
P(|ξi | > t) ≤
c . tk
(6.1.13)
$ n n−1 k−p (2p − 2)! 1/k p p−2 c ci−1 (r + 1) (r) k Mp (p − 1)! r=0 i=1 p/2 ⎫ n−1 n ⎬ k−2 (6.1.14) ∨ M2 c2i−1
(r) k ⎭ r=0
i=1
Note that (6.1.13) holds as soon as the ξn ’s have a k-th order moment such that supi≥0 E|ξi |k ≤ c. Now we argue as in Billingsley (1968) [20]: if (6.1.8) holds for some p such that $ γ
Cp p M
p−2
n i=1
cpi−1
n−1
p−2
(r + 1)
r=0
(r) ⎛
∨ ⎝ C2 2γ
n i=1
c2i−1
n−1 r=0
p/2 ⎫ ⎬
(r) <∞ ⎭
140
CHAPTER 6. APPLICATIONS OF SLLN
then for any t > 0, limn→∞ P(supk≥1 |Σn+k − Σn | > t) = 0. Thus (Σn ) is a.s. a Cauchy sequence, hence it converges. In the same way, if (6.1.9) holds for some p such that n p/2 n n−1 ∞ p k−p k−2 p−2 2 ci−1 (r + 1) (r) k ci−1
(r) k < ∞, (6.1.15) ∨ i=1
r=0
i=1
r=0
then (Σn ) converges with probability 1. Proof of proposition 6.3. by (6.1.9)
Using the same notations as in the previous proof,
Vp (n) ≤ Mp
n
cpi
1
0
i=1
( −1 (u) ∧ n)p−1 Qpi (u)du,
where (u) = [u] ([u] denotes the integer part of u). Denote Wp (n) = Mp
n
cpi
i=1
1
0
( −1 (u) ∧ n)p−1 Qpi (u)du.
If (6.1.12) is verified, then q−2
p−q
Wq (n) ≤ Wpp−2 (n)W2p−2 (n), which completes the proof.
6.1.2
γ1 −dependent noise
Let (ξn )n≥0 be a sequence of integrable real-valued random variables, and (γ1 (r))r≥0 be the associated mixingale-coefficients defined in (2.2.17). Then the following moment inequality holds. Proposition 6.4. Let p > 2 and (ξn )n∈N be a sequence of centered random variables such that (6.1.13) holds. Then for any n ≥ 2, ⎞p/2 ⎛ n 2(k−p) n−1 2(k−p) 2− ci p(k−1) γ1 (j) p(k−1) ⎠ , (6.1.16) |EΣpn | ≤ ⎝2pK1 i=1
j=0
where K1 depends on r, p and c. Notice that here p ∈ R, and is not necessarily an integer. If now (6.1.16) holds for some p such that ∞ i=1
c2−m i
∞ j=0
γ1 (j)m < ∞,
(6.1.17)
6.1. STOCHASTIC ALGORITHMS
141
where m = 2(k−p) p(k−1) < 1, then (Σn ) converges with probability 1. The result extends to Rd . Indeed, if we consider a centered Rd -valued and γ1 -dependent sequence (ξn )n≥0 , one has as previously EΣn = E p
n d =1
p ci ξi
,
i=1
and if each component (ξn )n≥0 ( = 1, . . . , d) is γ1 -dependent and verifies (6.1.13) and (6.1.17), EΣn p < ∞ and we conclude as before that (Σn )n≥0 converges a.s. Proof of proposition 6.4. we deduce
Proceeding as in Dedecker and Doukhan (2003) [43], |E(Σpn )| ≤
2p
n
p2 bi,n
,
i=1
where bi,n
−i
= max ci ξi E(ci+k ξi+k |Fi ) .
p i≤≤n
k=0
2
Let q = p/(p − 2), then there exists Y such that Y q = 1. Applying Proposition 1 of Dedecker and Doukhan (2003) [43], we obtain bi,n ≤
n−i k=0
γ1 (k)
0
where GX is the inverse of x → G( cui ), we get bi,n ≤
n−i k=0
γ1 (k)
0
Q{Y ci ξi } ◦ G
Q{Y ci ξi } ◦ G{ci+k ξi+k } (u)du, x 0
QX (u)du. Since G{ci ξi } (u) = Gξi ( cui ) =
u
du ≤
ci+k
n−i k=0
ci+k
0
γ1 (k) ci+k
Q{Y ci ξi } ◦ G(u)du,
and the Fr´echet inequality (1957) [88] yields bi,n
≤
n−i
ci+k
k=0
≤
n−i k=0
γ (k)
G( c1
i+k
0
ci ci+k
0
1
)
QY (u)Q{ci ξi } (u)Q(u)du
1 u≤G( γ1 (k) ) Q2 (u)QY (u)du ci+k
142
CHAPTER 6. APPLICATIONS OF SLLN
older’s inequality, we also obtain where Q = Qξi . Using H¨ bi,n ≤ ci
n−i
ci+k
k=0 1
p2
1 p
1{u≤G( γ1 (k) )} Q (u)du
0
1
By (6.1.13), Q(u) ≤ c r u− r and setting K =
bi,n
≤
ci
n−i k=0
≤
ci
n−i k=0
ci+k
.
ci+k
0
1
1
r−1 1
rc r
yields
p
γ (k)
u≤G( c1
i+k
)
p
c r u− r du
p2
r (1− pr ) p2 γ1 (k) r−1 ci+k K . ci+k 2(r−p)
Noting that (cn )n≥0 is decreasing, the result follows with K1 = K p(r−1) . Equip Rd with its p-norm (x1 , . . . , xd )pp = xp1 + · · · + xpd . Let (ξn )n≥0 be an Rd -valued and (F , Ψ)- dependent sequence. Set ξn = (ξn1 , . . . , ξnd ) then
n
d n p i=1 ci ξi p = =1 ( i=1 ci ξi )p . If each component (ξn )n≥0 is (F , Ψ)-dependent and such that a relation like (6.1.15) holds, then EΣn pp < ∞. Arguing as before, we deduce that the sequence (Σn )n≥0 converges with probability 1.
6.2
Examples of application
6.2.1
Robbins-Monro algorithm
The Robbins-Monro algorithm is used for dosage, to obtain level a of a function f which is usually unknown. It is also used in mechanics, for adjustments, as well as in statistics to fit a median (Duflo (1996) [82], page 50). It writes Zn+1 = Zn − cn (f (Zn ) − a) + cn ξn+1 ,
(6.2.1)
2 with cn = ∞ and cn < ∞. It is usually assumed that the prediction error (ξn ) is an identically distributed and independent random variables, but this does not look natural. Weak dependence seems more reasonable. Hence the previous results, ensure the convergence a.s. of this algorithm, under
the usual assumptions and the conditions yielding the a.s. convergence of n0 cn ξn+1 . Under the assumptions of Proposition 6.2, if for some integer p > 2 ∞ r=0
(r + 1)p−2 (r) < ∞,
(6.2.2)
6.3. WEIGHTED DEPENDENT TRIANGULAR ARRAYS
143
then the algorithm (6.2.1) converges a.s. If the assumptions of Proposition 6.3 hold, then the convergence a.s. of the algorithm (6.2.1) is ensured as soon as, for some p > 2,
∞ k−p (r + 1)p−2 (r) k
∨
r=0
∞
(r)
k−2 k
< ∞.
r=0
Under the assumptions of Proposition 6.4, as soon as (6.1.17) is satisfied, the algorithm (6.2.1) converges with probability 1.
6.2.2
Kiefer-Wolfowitz algorithm
It is also a dosage algorithm. Here we want to reach the minimum z ∗ of a real function V which is C 2 on an open set G of Rd . The Kiefer-Wolfowiftz algorithm (Duflo (1996) [82], page 53) is stated as: Zn+1 = Zn − 2cn ∇V (Zn ) − ζn+1
(6.2.3)
where ζn+1 = cbnn ξn+1 + cn b2n q(n, Zn ), q(n, Zn ) ≤ K (for some K > 0),
cn = ∞, n cn b2n < ∞ and n (cn /bn )2 < ∞ (for instance, cn = n1 , bn = n−b with 0 < b < 12 ). Usually, the prediction error (ξn ) is assumed to be i.i.d, centered, square integrable and independent of Z0 . The previous results improve on this assumption until weakly innovations. It is now enough to ensure the a.s. conver cdependent n gence of bn ξn+1 . The weak dependence assumptions are the same as for the Robbins-Monro algorithm. Concerning the γ1 -weak dependence, the condition (6.1.17) is replaced by ∞ 2−m ∞ ci i=1
6.3
bi
γ1 (i)m < ∞.
i=1
Weighted dependent triangular arrays
In this section, we consider a sequence (ξi )i≥1 of random variables and a triangular array of non-negative constant weights {(cni )1≤i≤n ; n ≥ 1}. Let Un =
n
cni ξi .
i=1
If the ξi ’s are i.i.d., Chow (1966) [37] has established the following complete convergence result.
144
CHAPTER 6. APPLICATIONS OF SLLN
Theorem (Chow (1966) [37]) Let (ξi )i be independent and identically distributed random variables with Eξi = 0 and E|ξi |q < ∞ for some q ≥ 2. If for some constant K (non depending on n), n1/q max1≤i≤n |cni | ≤ K, and
n 2 i=1 cni ≤ K then, ∞
∀t > 0,
P(n−1/q |Un | ≥ t) < ∞.
n=1
The last inequality is a result of complete convergence of n−1/q |Un | to 0. This notion was introduced by Hsu and Robbins (1947) [109]. Complete convergence implies the almost sure convergence from the Borel-Cantelli Lemma. Li et al.(1995) [120] extend this result to arrays (cni ){n≥1 i∈Z} for q = 2. Quote also Yu (1990) [196], who obtains a result analogue to Chow’s for martingale differences. Ghosal and Chandra (1998) [91] extend the previous results and prove some similar results to these of Li et al.(1995) [120] for martingales differences. As in [120], their main tool is Hoffmann-Jorgensen Inequality (HoffmannJorgensen (1974) [107]). Peligrad and Utev (1997)
n [143] propose a central limit theorem for partial sums of a sequence Un = i=1 cni ξi where supn c2ni < ∞, max1≤i≤n |cni | → 0 as n → ∞ and ξi ’s are in turn, pairwise mixing martingale difference, mixing sequences or associated sequences. Mcleish (1975) [128], De Jong (1996) [56], and, more recently Shinxin (1999) [177], extend the previous results in the case of Lq -mixingale arrays. Those results have various applications. They are used for the proof of strong convergence of kernel estimators. Li et al.(1995) [120] results are extended to our weak dependent frame. A straightforward consequence of Proposition 6.3 is the following. Corollary 6.1. Under the assumptions of Proposition 6.3, if q is an even integer such that k > q > p, and if for some constant K, non depending on n n q−1 )k, or we assume that i=1 c2n,i−1 < K, and if (r) = O(r−α ), with α > ( k−q
(r) = O(e−r ), then for all positive real number t, P(n−1/p |Un | ≥ t) < ∞. n
Proof. Proposition 6.3 implies E|Un |
q
n
n−1 n (2q − 2)! 1/k q q−2 (k−q)/k c ≤ cn,n−i (r + 1) (r) Mq (q − 1)! r=0 i=1 n n−1 2 (k−2)/k p/2 ∨ M2 cn,n−i
(r) ) . r=0
q−1 < K and (r) = O(r−α ), with α > ( k−q )k, then there is some E|Un |q K1 > 0 with E|Un |q < K1 , ¿the result follows from P(n−1/p |Un | > t) ≤ q q/p . t n
If
2 i=1 cn,i−1
i=1
6.4. LINEAR REGRESSION
145
n
2 −θr ), i=1 cn,i−1< K and (r) = O(e −1/p P n |Un | > t < ∞. K2 and n
If
then E|Un |q < K2 for a real constant
The following corollary is a direct consequence of proposition 6.4. Corollary 6.2. Suppose that all the assumptions of Proposition 6.4 are satis ∞ ∞ m fied. If q > p, k > q > 1, and i=1 c2−n γ (j) < ∞ where m = 2q ( k−q 1 ni j=0 k−1 ), then for any positive real number t, P(n−1/p |Un | ≥ t) < ∞. n
⎛ Proof. We have from Proposition 6.4, E|Un |q ≤ ⎝2qK1 Now the relation
∞
2−n i=1 cni
n i=1
∞
m < ∞ implies j=0 γ1 (j)
⎞q/2 γ1 (j)m ⎠
.
j=0
P(n−1/p |Un | > t) <
n
∞. This concludes the proof.
6.4
c2−m ni
n−1
Linear regression
We observe a stationary bounded sequence, (yn , xn ) ∈ R × Rd , defined on a probability space (Ω, A, P). We look for the vector Z ∗ which minimizes the linear prediction error of yn with xn . We identify the Rd -vector xn and its column matrix in the canonical basis. So Z ∗ = arg min E[(yn − xTn Z)2 ]. Z∈Rd
This problem leads to study the gradient algorithm Zn+1 = Zn + cn (yn+1 − xTn+1 Zn )xn+1 , where cn = ng with g > 0 (so (cn ) verifies (6.1.2) and (6.1.5)). Let Cn+1 = xn+1 xTn+1 , we obtain: Zn+1 = Zn + cn (yn+1 xn+1 − Cn+1 Zn ).
(6.4.1)
Denote U = E(yn+1 xn+1 ), C = E(Cn+1 ), Yn = Zn − C −1 U and h(Y ) = −CY , then (6.4.1) becomes : Yn+1 ζn+1
= =
Yn + cn h(Yn ) + cn ζn+1 , with −1 (yn+1 xn+1 − Cn+1 C U ) + (C − Cn+1 )Yn .
Note that the solutions of (6.1.3) are the trajectories z(t) = z0 e−Ct ,
(6.4.2) (6.4.3)
146
CHAPTER 6. APPLICATIONS OF SLLN
so every trajectory converges to 0, the unique equilibrium point of the differentiable function h (Dh(0) = −C and 0 is an attractive zero of h). Denoting Fn = (σ(Yi ) / i ≤ n), we also define the following assumption A-lr: C is not singular, (Cn ) and (yn , xn ) are γ1 -dependent sequences with γ1 (r) = O(ar ) for a < 1. Note that if (yn , xn )n∈N is θ1,1 -dependent (see Definition 2.3) then A-lr is satisfied. First, note that if a Rd -valued sequence (Xn ) is θ1,1 -dependent, any t Rj -valued sequence (j = 1, . . . , d − 1) (Yn ) = (Xnt1 , . . . , Xnj ) is θ1,1 -dependent. So, if (yn , xn ) is θ1,1 -dependent, then so are (yn ) and (xjn ) (j = 1, . . . , d). Let f a bounded 1-Lipschitz function, defined on R and g the function defined on R2 by g(x, y) = f (xy). It is enough to prove that g is a Lipschitz function defined on R2 . |g(x, y) − g(x , y )| |x − x | + |y − y |
|xy − x y | |x − x | + |y − y | |x||y − y | + |y ||x − x | ≤ |x − x | + |y − y | ≤ max(|x|, |y |),
≤
and g is Lipschitz as soon as x and y are bounded. Thus, since (xn ) and (yn ) are bounded, the result follows. Denoting M = supn xn 2 , one has, Proposition 6.5. Under Assumption A-lr (Yn ) is a.s. bounded and the perturbation (ζn ) of algorithm (6.4.2) splits into three terms of which two are γ1 dependent and one is a rest leading to zero. So the ODE method assures the a.s 1 then convergence of Yn to zero ( hence Z ∗ = C −1 U ). Moreover if g < 2M √ nYn = O(1),
a.s.
(6.4.4)
Proof of Proposition 6.5. To start with, we prove that Yn → 0 a.s by assuming that (Yn ) is a.s bounded. Then we justify this assumption and finally we prove (6.4.4). The perturbation ζn+1 splits into two terms : (yn+1 xn+1 − Cn+1 C −1 U ) and (C − Cn+1 )Yn . The first term is centered and obvious γ1 -dependent with a dependence coefficient γ1 (r). Now γ1 (r) = O(ar ) thanks to Assumption A-lr. It remains to study (C − Cn+1 )Yn . Study of (C − Cn+1 )Yn : write (C − Cn+1 )Yn = ξn+1 + rn+1 with ξn+1 = (C − Cn+1 )Yn − E[(C − Cn+1 )Yn ] and rn+1 = E[(C − Cn+1 )Yn ]. We will prove that the sequence (ξn ) is γ1 -dependent with an appropriate dependent coefficient and that limn→∞ rn = 0. Notice that
6.4. LINEAR REGRESSION rn+1 = E[(C − Cn+1 )
147 n−1
(Yj+1 − Yj )] + E[(C − Cn+1 )Y n2 ],
j= n 2
and since Yj+1 − Yj = −cj Cj+1 Yj + cj (yj+1 xj+1 − Cj+1 C −1 U ), we obtain rn+1 = E(C − Cn+1 )Y n2 −
n−1
E(C − Cn+1 )cj Cj+1 Yj
j= n 2
+
n−1
E(C − Cn+1 )cj (yj+1 xj+1 − Cj+1 C −1 U ).
j= n 2
If n2 is not an integer, we replace it by n−1 2 . In the same way, in the first sum
j−1 we replace Yj by i=j/2 (Yi+1 − Yi ) + Yj/2 with the same remark as above if j/2 is not an integer. So
rn+1 =
n−1
E(C − Cn+1 )cj (yj+1 xj+1 − Cn+1 C −1 U ) + E(C − Cn+1 )Y n2
j= n 2
−
n−1
⎡
E(C−Cn+1 )cj Cj+1 ⎣
j= n 2
j−1
⎤ −ci Ci+1 Yi + ci (yi+1 xi+1 − Ci+1 C −1 U ) + Yj/2 ⎦
i=j/2
Expectations conditionally with respect to Fj+1 of each term of the second sum and with respect F n2 of the last term give, by assuming that (Yn ) is bounded: rn+1 ≤ A + K1
n−1
cj γ1 (n + 1 − j)γ1 (j + 1) + K2 γ1 (n/2 + 1), (6.4.5)
j= n 2
where A denote the last sum of the previous representation of rn , and Ki (for i = 1, 2, . . .) non-negative constants. Moreover, A= −
n−1
E(C − Cn+1 )cj (Cj+1 − C)[
j= n 2
×
j−1
−ci Ci+1 Yi + ci (yi+1 xi+1 − Ci+1 C −1 U )]
i=j/2
−
n−1 j= n 2
E(C − Cn+1 )cj (Cj+1 − C)Yj/2
148
CHAPTER 6. APPLICATIONS OF SLLN
+
n−1
E(C − Cn+1 )cj C[
j= n 2
+
n−1
j−1
−ci Ci+1 Yi + ci (yi+1 xi+1 − Ci+1 C −1 U )]
i=j/2
E(C − Cn+1 )cj CYj/2 .
j= n 2
Expectations conditionally successively with respect to Fj+1 and Fi+1 of each term of the first and the third sum and with respect Fj+1 then F j of the second 2 and the fourth sum give, by assuming that (Yn ) is bounded, A ≤ K3
n−1
cj γ1 (n − j)
j= n 2
j−1
ci γ1 (j − i) + K4
n−1
cj γ1 (n − j)γ1 (j/2).(6.4.6)
j= n 2
i=j/2
Since cj = gj , (6.4.5) and (6.4.6) involve that rn is O(n−2 ), so rn converges to zero. On the other hand, for r ≥ 6: E(ξn+r |Fn ) = =
E[(C − Cn+r )Yn+r−1 |Fn ] − E(ξn+r ), n+r−2
E[(C − Cn+r )(Yj+1 − Yj )|Fn ]
j=n+ r2
+
E[(C − Cn+r )Yn+ r2 |Fn ] − rn+r .
Note also that if r2 is not an integer, we replace it by technics as above, we obtain
r+1 2 .
Using the same
EE(ξn+r |Fn ) ⎛ ⎞ j−1 n+r−2 ≤ K5 ⎝ cj γ1 (n + r − j − 1) ci γ1 (j − i) + γ1 (r/2)⎠ + O((n + r)−2 ) j=n+ r2
i=j/2
hence, r r EE(ξn+r |Fn ) ≤ O((n + )−2 ) + K5 γ1 ( ) + O((n + r)−2 ), 2 2 and E(ξn+r |Fn )1 = γ1 (r), with γ1 (r) = O(r−2 ). So (ξn ) is γ1 -dependent and since (6.1.17) is satisfied, the ODE method may be used and Yn converges to 0 a.s. √ Now we prove that Yn is a.s bounded. Let V (Y ) = Y T CY = CY 2 . Since C is not singular, V is a Lyapounov function and ∇V (Y ) = 2CY is a Lipschitz function, so we have V (Yn+1 ) ≤ V (Yn ) + (Yn+1 − Yn )T ∇V (Yn ) + K6 Yn+1 − Yn 2 .
6.4. LINEAR REGRESSION
149
Furthermore Yn+1 − Yn 2 ≤ 2c2n (yn+1 xn+1 − Cn+1 C −1 U 2 + 2c2n Cn+1 Yn 2 ). Since (yn , xn ) is bounded, (Cn ) and yn+1 xn+1 −Cn+1 C −1 U 2 are also bounded. Moreover K7 V (Y ) Cn+1 Yn 2 ≤ K7 Yn 2 ≤ λmin (C) where λmin (C) is the smallest eigenvalue of C. So V (Yn+1 ) ≤ V (Yn )(1 + K8 c2n ) + K9 c2n + 2(Yn+1 − Yn )T CYn . The last term becomes 2(Yn+1 − Yn )T CYn
=
−2cn CYn 2 + 2cn (yn+1 xn+1 − Cn+1 C −1 U )T CYn
≤
+2cn YnT (C − Cn+1 )CYn , −2cn CYn 2 + cn K10 CYn + 2cn YnT (C − Cn+1 )CYn
≤
−2cn CYn 2 + cn K10 CYn + 2cn un+1 V (Yn ),
where un = max{XiT (C − Cn )Xi 1 ≤ i ≤ d} and {X1 , . . . , Xd } is an orthogonal basis of unit eigenvectors of C. We now obtain V (Yn+1 ) ≤ +
V (Yn )(1 + K8 c2n + 2cn un+1 ) K9 c2n − cn (2CYn 2 − K10 CYn ).
(6.4.7)
Note that under the assumption A-lr (un ) is a γ1 -dependent sequence with a ∞ weakly dependent coefficient γ1 (r) = O(ar ) and n=0 cn un+1 < ∞. If V (Yn ) ≥
2 K10 4λmin (C) ,
then CYn ≥ K10 /2 and −(2CYn 2 − K10 CYn ) ≤ 0. K2
Denote T = inf{n / V (Yn ) ≤ 4λmin10(C) }. By the Robbins-Sigmund theorem, V (Yn ) converges a.s. to a finite limit on {T = +∞}, so (Yn ) is bounded since V is a Lyapounov function. K2 On {lim inf n V (Yn ) ≤ 4λmin10(C) }, V (Yn ) does not converge to ∞ and using Theorem 2 of Delyon (1996) [58], we deduce that V (Yn ) converges to a finite limit, as soon as : ∀k > 0, c2n h(Yn ) + ζn+1 2 1{V (Yn )
0, cn ζn+1 , ∇V (Zn ) 1{V (Yn )
2 Using relation cn < ∞ and the fact that on {V (Yn ) < k}, h(Yn ) + ζn+1 2 is bounded, we deduce (6.4.8). To prove (6.4.9), it is enough, by Proposition 6.4, to prove that ζn+1 , ∇V (Yn ) 1{V (Yn )
of Proposition 6.4, it is necessary to center en+1 . So we are going to prove that cn Een+1 < ∞
150
CHAPTER 6. APPLICATIONS OF SLLN
and that (en+1 −Een+1 ) is a γ1 -dependent sequence with a dependent coefficient γ1 (r) equals to O(r−2 ). Study of E(en+1 ). First of all, we must note a few elements. Denoting I the unit matrix of Rd , Yn = (I − cn−1 Cn )Yn−1 + cn−1 (xn yn − Cn C −1 U ). Note that λmax (Cn ) = xn 2 ≤ M (λmax (Cn ) =: the largest eigenvalue of Cn ). For n large enough cn−1 M < 1 and (I − cn−1 Cn ) is not singular. So, if M1 = supn {xn yn − Cn C −1 U }, then we obtain Yn−1 ≤
1 (Yn + cn−1 M1 ) ≤ (1 + bcn−1 )(Yn ∧ M1 ) 1 − cn−1 λ
where b ≥ 0 does not depend on n. Moreover V (Yn ) < k =⇒ Yn 2 < and
k , λmin (C)
Yn < k =⇒ V (Yn ) < λmax (C)k 2 .
So that, 1{V (Yn )
"
k λmin (C)
∧ M1 .
And since cn = ng , for any 0 ≤ j ≤ n, (1 + acn−1 )j is bounded independently of n, so is kn−j . And E(en+1 ) = E(xn+1 yn+1 − Cn+1 C −1 U )T CYn 1{V (Yn )
n−1
E(yn+1 xn+1 − Cn+1 C −1 U )T C(Yj+1 − Yj ) 1{Yj
j= n 2
+ +
E(yn+1 xn+1 − Cn+1 C −1 U )T CZ n2 n−1
E(Yj+1 − Yj )T (C − Cn+1 )C(Yj+1 − Yj ) 1{Yj
j= n 2
+
2
n−1 n−2 j= n 2
E(Yj+1 − Yj )T (C − Cn+1 )
i=j+1
×C(Yi+1 − Yi ) 1{Yj
2
n−1
T EYn/2 (C − Cn+1 )C(Yj+1 − Yj ) 1{Yj
j= n 2
+
T EYn/2 (C − Cn+1 )CYn/2 1{Yn/2
6.4. LINEAR REGRESSION
151
Note that if n2 is not an integer, we replace it by technique, we obtain that
n−1 2 .
Using always the same
Een+1 = O(n−2 ) + O(a 2 ) + O(n−2 ) + O(n−2 ) + O(a 2 ), n
n
hence cn Een < ∞. Study of (en − Een ). We now prove that this sequence is γ1 -dependent with a relevant dependent coefficient. Write E(en+r − Een+r |Fn ) = Dn+r + Gn+r − Een+r , with Dn+R
= E[(yn+r xn+r − Cn+1 C −1 U )T CYn+r−1 1{V (Yn+r−1 )
Gn+r
T = E(Yn+r−1 (C − Cn+r )CYn+r−1 1{V (Yn+r−1 )
Dn+r
=
n+r−2
E[((yn+r xn+r − Cn+1 C −1 U )T C(Yj+1 − Yj ) 1{Yj
j=n+ r2
+ E[((yn+r xn+r − Cn+1 C −1 U )T CYn+ r2 1{Yn+ r
r 2
Here again, if is not a integer, we replace it by as for rn give
r−1 2 .
2
Again, the same techniques
EDn+r = O((n + r)−2 ) + O(an+ 2 ) r
We study Gn+r in the same way and EGn+r = O((n + r)−2 ), and since Een+r = O((n + r)−2 ), (6.1.17) is satisfied and the result is proved. Proof of (6.4.4) For n > N , denote by ΠN n = (I − cn Cn+1 ) · · · (I − cN CN +1 ). 1 , for N ≥ 1, ΠN is no singular and Since g < 2M n Yn+1 = ΠN n YN +
n
N −1 1 cj ΠN ξj+1 , n (Πj )
j=N 1 where ξj+1 = yj+1 xj+1 − Cj+1 C −1 U . We obtain, since Yn → 0,
−YN =
∞
−1 1 cj (ΠN ξj+1 . j )
j=N −1 = (I − cN CN +1 )−1 · · · (I − cj Cj+1 )−1 and (ΠN j ) −1 ≤ 7j (ΠN j )
1
i=N (1
− ci M )
.
152
CHAPTER 6. APPLICATIONS OF SLLN
Hence
−1 = O exp M (ΠN j )
j i=N
ci
=O
j N
gM ,
(6.4.10)
B
√
∞ g
N N −1 1
√ (Πj ) ξj+1 . N YN =
j j
j=N
1 Since g < 2M , (6.4.10) involves that the sum converges. Indeed (6.1.17) is 1 verified with k = 5 and p = 3 (so m = 13 ) and since ξj+1 is γ1 -dependent with r a mixingale coefficient γ1 (r) = O(a ). Hence the result is proved.
Chapter 7
Central Limit theorem In this chapter, we give sufficient conditions for the central limit theorem in the non causal and causal contexts. In Sections 7.1 and 7.2, we give sufficient conditions for κ and λ dependent sequences, for random variables having moments of order 2 + ζ. The proof is based on a decomposition which combines Bernstein blocks with the Lindeberg method. In Section 7.3, we prove a central limit theorem for random fields, under an exponential decay of the covariance of Lipshitz functions of the variables. The proof is based on Stein’s method, as described in Bolthausen (1982) [24]. In Section 7.4, we focus on the causal case: in Theorem 7.5, we give necessary and sufficient conditions for the conditional central limit theorem. This notion is more precise than convergence in distribution and implies the stable convergence in the sense of R´enyi (1963) [156]. In the last Section 7.5, we give some applications of Theorem 7.5: in particular, we ˜ give sufficient conditions for the central limit theorem for γ, α ˜ and φ-dependent sequences.
7.1
Non causal case: stationary sequences
In all the section, we shall consider a centered and stationary real-valued sequence (Xn )n∈Z such that μ = E|X0 |m < ∞,
for a real number m = 2 + ζ > 2.
We also define σ2 =
(7.1.1)
Cov(X0 , Xk ),
k∈Z
whenever it exists. Let Sn = X1 + · · · + Xn . The following results come from Doukhan and Wintenberger (2005) [77]. 153
154
CHAPTER 7. CENTRAL LIMIT THEOREM
Theorem 7.1 (κ-dependence). Assume that the κ-weakly dependent stationary process satisfies (7.1.1) and that κ(r) = O(r−κ ) for κ > 2 + 1ζ . Then σ 2 is well defined and n−1/2 Sn converges in distribution to the normal distribution with mean 0 and variance σ 2 . Remark 7.1. Under the more restrictive ζ-dependence condition, Bulinski and Shashkin (2005) [33] obtain the central limit theorem with the sharper assumption ζ(r) = O(r−κ ) for κ > 1 + 1/(m − 2) (beware notations in this remark). The difference between the two conditions is natural, since it may be proved
for ζ-weakly dependent sequences that ζ(r) ≥ s≥r κ(s). This simple bound, checked from the definitions, explains the loss in the rate of convergence of κ(r) to 0. The following result relaxes the previous dependence assumptions to the cost of a faster decay for the dependence coefficients. Theorem 7.2 (λ-dependence). Assume that the λ-weakly dependent stationary process satisfies (7.1.1) and that λ(r) = O(r−λ ) for λ > 4 + 2ζ . Then the conclusion of Theorem 7.1 holds. As stressed in section 3.1.3 the coefficient λ is very useful to work out the case of Bernoulli shifts with weakly dependent innovations (ξi )i . Theorem 7.3. Denote by λξ (r) the λ-dependence coefficients of the sequence (ξi )i . Assume that H : RZ → R satisfies the condition (3.1.11) for some m > 2 such that lm ≤ m − 1 with E|ξ0 |m < ∞, and some sequence bi ≥ 0 such that
i |i|bi < ∞. Then Xn = H(ξn−i , i ∈ Z) satisfy the central limit theorem in the following cases: • Geometric case: br = O e−rb and λξ (r) = O (e−rc ). • Mixed case: br = O e−rb and λξ (r) = O (r−c ) with c > 4 + 2/(m − 2). • Riemanian case: If br = O r−b for some b > 2 and λξ (r) = O (r−c ) with (10 − 4m)b(m − 1) . c> (2 − m)(b − 2)(m − 1 − l) The constants b > 0 and c > 0 obtained are different for each case. This theorem is useful to derive the weak invariance principle in many cases (see again Section 3.1.3). We now look with more detail the following example.
7.2. LINDEBERG METHOD
155
∞ Example. Consider the two sided sequence Xt = −∞ ai ξt−i with ARCH(∞) innovations: ⎛ ⎞ ∞ ξt = ξ˜t ⎝a + a j ξt−j ⎠ , j=1
where the process (ξ˜i )i is i.i.d.. Under Riemanian decays (ar = O(r−a ) and a r = O(r−a )), we infer from Theorem 7.3 that the central limit theorem holds as soon as: (10 − 4m)a(m − 1) a > + 1. (2 − m)(a − 2)(m − 1 − l) Remark 7.2. The technique Lindeberg method and we √of the proofs is based on ∗ prove in fact that |E (f (Sn / n) − f (σN ))| ≤ Cn−c for f (x) = eitx , where the constants c∗ , C > 0 depend only on the parameters ζ and κ or λ respectively, and where c∗ < 12 (see Proposition 2 section 7.2.2 for more details). When κ or λ tends to infinity, we have c∗ = ζ/(4 + ζ). For ζ ≥ 2 and κ or λ tends to infinity, we notice that c∗ → 13 . Using a smoothing lemma, this also yields an analogue bound for the uniform distance in the real case (d = 1): 1 sup P √ Sn ≤ t − P (σN ≤ t) ≤ Cn−c . n t∈R A first and easy way to control c is to set c = c∗ /4 but the corresponding rate is really a bad one. Petrov (1995) [144] obtains the exponent 12 in the i.i.d. case and Rio (2000) [161] reaches the exponent 13 for strongly mixing sequences. In proposition 3 section 7.2.2, we achieve c = c∗ /3. Analogous bad convergence rates have been settled in the case of weakly dependent random fields in [63]. The following subsections are devoted to the proofs. We first describe in detail the Lindeberg method with Bernstein blocks in Section 7.2 (another version of the Lindeberg method will be presented is the causal framework in Section 7.4.3). The main tools are the controls of the variance of Sn and of Sn 2+δ obtained in Lemma 4.2 and 4.3 of section 4.2. Rates of convergence for the central limit theorem are obtained in 7.2.2.
7.2
Lindeberg method
d Let x1 , . . . , xk be random variables # with values in R (equipped with the Eu2 2 clidean norm (x1 , . . . , xd ) = x1 + · · · + xd ), centered at expectation and such that for some 0 < δ ≤ 1: k i=1
Exi 2+δ ≤ A < ∞
(7.2.1)
156
CHAPTER 7. CENTRAL LIMIT THEOREM
We consider independent random variables y1 , . . . , yk , independent of the variables x1 , . . . , xk and such that yi ∼ Nd (0, Var xi ). Denote by Cb3 the set of bounded functions from Rd to R with bounded and continuous partial derivatives up to order 3. The Lindeberg method relies on the following lemma Lemma 7.1 (Bardet et al., 2006 [10]). For any f ∈ Cb3 , let:
and
Δ = |E (f (x1 + · · · + xk ) − f (y1 + · · · + yk ))| (7.2.2) k j = 1, 2. (7.2.3) Cov fi (j) i (x1 + · · · + xi−1 ), xji , Tj = i=1
fi (t)
= |E f (t + yi+1 + · · · + yk )
Let Cov(f (x), y)
=
Cov(f
(x), y 2 ) =
∂f (x), y , and ∂x =1 2 d d ∂ f Cov (x), yk y , ∂xk ∂x d
Cov
k=1 =1
where by convention the empty sums are equal to 0. The following upper bound holds: 1
δ Δ ≤ |T1 | + |T2 | + 4f
1−δ ∞ f ∞ A. 2 Remark 7.3. If k tends to ∞, we denote the variables by (xi,k )1≤i≤k and we set A = A(k), Tj = Tj (k). Assume that A(k) and Tj (k) tend to 0 as k tends to
∞, and assume moreover that σk2 = ki=1 Ex2i,k converges to σ 2 as k tends to ∞. Then k Sk = xi,k →k→∞ N (0, σ 2 ), in distribution. i=1
Condition A(k) → 0 implies the usual Lindeberg condition, condition σk2 → σ 2 is only the convergence of variances, while the conditions Tj (k) → 0 are the only one related to dependence. Examples. Assume that (Xt )t∈Z is a stationary times series the following examples are widely developed in [10]. • The first example of application of such a situation is the Bernstein block method used below for proving the central limit theorem. • Functional estimation also enters this frame. In that case we write xi,k = fk (Xi ) for functions fk which approximate the Dirac distribution, see chapter 11.
7.2. LINDEBERG METHOD
157
• A final example is provided by subsampling, in which xi,k = Ximn for 1 ≤ imn ≤ n and 1 mn n; this example fits chapter 13 but we defer the reader to [10] for shortness. Proof of lemma 7.1. We first notice that: Δ1 + · · · + Δk ,
Δ ≤ Δi wi
where
|E (fi (wi + xi ) − fi (wi + yi )))| , x1 + · · · + xi−1
= =
(7.2.4) i = 1, . . . , k (7.2.5)
Let x, w ∈ Rd . The Taylor formula writes in the two distinct following ways (for suitable values w1 , w2 ): f (w + x)
= =
1 f (w) + xf (w) + f
(w1 )(x, x) 2 1
1
f (w) + xf (w) + f (w)(x, x) + f
(w2 )(x, x, x) 2 6
here f (j) (x)(y1 , . . . , yj ) stands for the value of the symmetric j-linear form f (j) at (y1 , . . . , yj ). Let f (j) ∞ = supx f (j) (x) with f (j) (x) =
sup y1 ,...,yj ≤1
|f (j) (x)(y1 , . . . , yj )| .
Thus for w, x, y ∈ Rd we may write: 1 f (w + x) − f (w + y) = f (w)(x − y) + f
(w)(x2 − y 2 ) 2 (f
(w1 ) − f
(w))(x, x) − (f
(w1 ) − f
(w))(y, y) + 2 1
2 = (x − y)f (w) + f (w)(x − y 2 ) 2 f
(w2 )(x, x, x) − f
(w2 )(y, y, y) + 6 Thus T
=
|T | ≤ ≤ ≤
1 f (w + x) − f (w + y) − f (w)(x − y) − f
(w)(x2 − y 2 ) satisfies 2 2 2 2
3 3 2(x + y )f ∞ ∧ (x + y )f
∞ 3 f
∞ f
∞
2 2 2f ∞ x 1 ∧ x + y |y| 1 ∧ 3f
∞ 3f
∞ 2
1−δ
δ f ∞ f ∞ x2+δ + y2+δ , δ 3
158
CHAPTER 7. CENTRAL LIMIT THEOREM
the last inequality following from the inequality 1 ∧ a ≤ aδ , valid for a ≥ 0 and δ ∈ [0, 1[. Here we set f
(w)(x2 − y 2 ) = f
(w)(x, x)) − f
(w)(y, y) for notational convenience. This relation together with the decomposition (7.2.4) (j) and the upper bound fi ∞ ≤ f (j) ∞ (valid for 1 ≤ i ≤ k, and 0 ≤ j ≤ 3) entails Δ ≤
k 1 2
δ |T1 | + |T2 | + δ f
1−δ f Exi 2+δ + Eyi 2+δ ∞ ∞ 2 3 i=1
≤
1
1 1 + 3 4 − 2
1−δ
δ |T1 | + |T2 | + 2 f ∞ f ∞ A 2 3δ 1
δ |T1 | + |T2 | + 4f
1−δ ∞ f ∞ A 2 δ
≤
1
where we have used the bound Eyi 2+δ ≤ (Eyi 4 )(2+δ)/4 ≤ (3(Exi 2 )2 ) 2 + 4 .
7.2.1
δ
Proof of the main results
In this section we first prove theorem 7.1 and 7.2, and then we give rates for this central limit results. Some useful moment inequalities are proved in section 4.1. They are essential in the following proof. Proof of Theorems 7.1 and 7.2. Let S = q = q(n) in such a way that
√1 Sn n
and consider p = p(n) and
1 q(n) p(n) = lim = lim = 0. q(n) n→∞ p(n) n→∞ n n Let k = k(n) = p(n)+q(n) and lim
n→∞
1 Z = √ (U1 + · · · + Uk ) , n
with
Uj =
Xi ,
i∈Bj
where Bj =](p + q)(j − 1), (p + q)(j − 1) + p] ∩ N is a subset of p successive integers from {1, . . . , n} such that, for j = j , Bj and Bj are at least
distant of q = q(n). We note Bj the block between Bj and Bj+1 and Vj = i∈B Xi . Vk j
is the last block of Xi between the end of Bk and n. Let σp2 = Var (U1 )/p, and Y =
V1 + · · · + Vk
√ , n
Vj ∼ N (0, p · σp2 ),
where the Gaussian variables Vj are independent and independent of the sequence (Xn )n∈Z . We fix t ∈ Rd and we define f : Rd → C with f (x) = eit·x . Then: E f (S) − f (σN ) = E(f (S) − f (Z)) + E(f (Z) − f (Y )) + E(f (Y ) − f (σN )).
7.2. LINDEBERG METHOD
159
The Lindeberg method consists in proving that this expression converges to 0 as n → ∞. The first and the last term in this inequality are referred to as auxiliary terms in this Bernstein-Lindeberg method. They come from the replacement of the individual initial - non-Gaussian and Gaussian respectively - random variables. The second term is analogue to that obtained with decoupling and turns the proof of the central limit theorem to the independent case. The third term is referred to as the main term and following the proof under independence it will be bounded above by using a Taylor expansion. Because of the dependence structure, in the corresponding bounds, some additional covariance terms will appear. Auxiliary terms. Using Taylor expansions up to the second order, we bound: |E(f (S) − f (Z))| ≤ |E(f (Y ) − f (σN ))| ≤
f
2∞ E|S − Z|2 2 f
2∞ E|Y − σN |2 2
and,
Firstly: 1 2 E (V1 + · · · + Vk ) n Note that the set under the sums of Xi in V1 + · · · + Vk has cardinality smaller than (k + 1)q + p. Using the bounds (4.2.6) and (4.2.5), we infer that under conditions (4.2.3) and (4.2.2) respectively, E|Z − S|2
(k + 1)q + p . n 8 kp We notice that Y follows the distribution of σp N and then working with n Gaussian random variables: kp E|Y − σN |2 ≤ − 1 σp2 + σp2 − σ 2 . n E|Z − S|2
Since |kp/n − 1| q/p, we need to bound |σp2 − σ 2 | ≤
|i| |E(X0 Xi )| + |E(X0 Xi )|. p
|i|
|i|>p
Let ai = |E(X0 Xi )|. Under conditions (4.2.3) and (4.2.2) respectively, the series
∞ ∞ a converge and s = a converges to 0 as j converges to infinity. i j i i=0 i=j Consequently |σp2 − σ 2 | ≤ 2
p−1 i=0
p−1 i 2 · ai + 2sp ≤ si + 2sp . p p i=0
160
CHAPTER 7. CENTRAL LIMIT THEOREM
Cesaro lemma implies that this term converges to 0. Hence |Ef (S) − f (Z)| + |Ef (Y ) − f (σN )| tends to 0 as n ↑ ∞. To precise the rate of convergence, we now assume that ai = O(i−α ) with α > 1 we see that |σp2 − σ 2 | p1−α . The convergence rate is then given by pq + np + p1−α if E(X0 Xi ) = O(i−α ). Since E(X0 Xi ) = Cov(X0 , Xi ), we then use equations (4.2.5) and (4.2.6) and we find α = κ or α = λ(m − 2)/(m − 1) depending of the weak-dependence setting. With p = na , q = nb , those bounds become: nb−a + na−1 + na(1−κ) , in the κ-weak dependence setting, nb−a + na−1 + na(1−λ(m−2)/(m−1)) , under λ-weak dependence. Main terms. It remains to control the expression |T1 | + 12 |T2 | and A in lemma 7.2.2 with now, for 1 ≤ j ≤ k: 1 xj = √ Uj , n
1 yj = √ Vj n
• The terms Tj , are bounded by using the weak-dependence properties. The expressions of this bound are obtained by rewriting |Cov(F (Xm , m ∈ Bi , i < j), G(Xm , m ∈ Bj ))|. Note that F ∞ ≤ 1 and that we can compute a bound for Lip F with
1 F (x1 , . . . , xkp ) = f √n i<j xi (with possible repetitions in the sequence (x1 , . . . , xkp )): ⎛ ⎞ ⎛ ⎞ ⎞ ⎛ 1 1 1 f ⎝ √ ⎠ ⎝ ⎠ ⎠ ⎝ xi − f √ yi ≤ 1 − exp it · √ (yi − xi ) n i<j n i<j n i<j kp t2 √ yi − xi 2 . ≤ n i=1
p √ For√ G(x1 , . . . , xp ) = f ( i=1 xi / n), we have G∞ ≤ 1 and Lip G 1/ n. We then distinguish the two cases, noting that the gap between blocks is at least q. Since the bounds of the different covariances do not depend of j, we then obtain the controls: – In the κ dependence setting: 1 |T1 | + |T2 | kp · κ(q) 2
7.2. LINDEBERG METHOD
161
– In the λ dependence setting: # 1 |T1 | + |T2 | kp(1 + k/p) · λ(q). 2
Reminding that p = na , q = nb , κ(r) = O (r−κ ) or λ(r) = O r−λ , the bounds become n1−κb or n1+(1/2−a)+ −λb in respectively κ or λ context. • Using the stationarity of the sequence Xn we obtain: δ δ |A| n−1− 2 E|Sp |2+δ ∨ p1+ 2 . 2+δ We then use the result of lemma 4.3 to bound the moment E|S p | . If −λ 1 2 −κ then κ > 2 + ζ , or λ > 4 + ζ , where κ(r) = O (r ) or λ(r) = O r there exists δ ∈]0, ζ ∧ 1[ and C > 0 such that:
E|Sp |2+δ ≤ Cp1+δ/2 . We then obtain:
A k(p/n)1+δ/2 .
Reminding that p = na , the bound is of order n(a−1)δ/2 in both κ or λ-weak dependence setting.
7.2.2
Rates of convergence
We present two propositions that give rates of convergence in the central limit theorem. Proposition 7.1. Assume that the weakly dependent stationary process (Xn )n satisfies (7.1.1) then the difference between the characteristic functions is bounded by: √ E f (Sn / n) − f (σN ) ≤ Cn−c∗ , where C is some positive constant and c∗ depends of the weakly dependent coefficients as follows: • in the λ-dependence case with λ(r) = O(r−λ ) for λ > 4 + ζ2 , then c∗ = where
A 2λ − 1 , 2 (2 + A)(λ + 1)
# (2λ − 6 − ζ)2 + 4(λζ − 4ζ − 2) + ζ + 6 − 2λ ∧ 1, A= 2
162
CHAPTER 7. CENTRAL LIMIT THEOREM
• in the κ-dependence case with κ(r) = O(r−κ ) for κ > 2 + ζ1 , then c∗ =
(κ − 1)B κ(2 + B)
where B=
# (2κ − 3 − ζ)2 + 4(κζ − 2ζ − 1) + ζ + 3 − 2κ ∧ 1. 2
In the case d = 1, we use Theorem 5.1 of Petrov (1995) [144] to obtain: Proposition 7.2 (A rate in the Berry Essen bound). Assume that the real weakly dependent stationary process (Xn )n satisfies the same assumptions than in Proposition 7.1. We obtain: ∗ sup |Fn (x) − Φ(x)| = O n−c /4 . x
where c∗ is defined in Proposition 7.1. Proof of proposition 7.1. In the previous section, we have expressed the rates of the different terms. Let us recall these rates: • In the λ-dependence case, we finally only have to consider the three largest rates: (a−1)δ/2, 1+(1/2−a)+ −λb and b−a. The previous optimal choice of a∗ is smaller than 1/2, then we have to consider the rate 3/2 − a − λb and not 1 − λb. We find: 6 5 1 (1 + λ)δ + 3 ∈ 0, a∗ = (2 + δ)(λ + 1) 2 3 ∈]0, a∗ [ b ∗ = a∗ 2(λ + 1) ∗
Finally, we obtain the rate n−c . • In the κ-dependence case: – Auxiliary terms: b − a, a − 1 and a(1 − κ), – Main terms: 1 − κb and (a − 1)δ/2. The idea is to choose carefully a∗ and b∗ ∈]0, 1[ such that the main rates are equal. Because δ < 1, a > b, we directly see that (a − 1)δ/2 > a − 1
7.3. NON CAUSAL RANDOM FIELDS
163
and 1 − κb > a(1 − κ), so that the only rate of the auxiliary term that it remains to consider is b − a. Finally, we obtain a∗
=
b∗
=
2κ − 2 ∈]0, 1[ (2 + δ)κ + δ 2 + 2δ ∈]0, a∗ [ a∗ 2 + δ + δκ 1−
Finally, we obtain the proposed rate. Proof of proposition 7.2. We have seen that for t fix, we control the distance between the characteristic functions of S and σN by a term proportional to ∗ t2 n−c . Here, t2 appear because |t| was included in the constants (not depending of n) of the bound of the Lipschitz coefficients. Let Φ be the distribution function of σN and Fn be the distribution function S. Theorem 5.1 p. 142 in Petrov (1995) [144] gives, for every T > 0: ∗
sup |Fn (x) − Φ(x)| n−c T 3 + x
1 . T
We optimize T to obtain a rate of convergence in the central limit theorem.
7.3
Non causal random fields
Let (Bn )n∈N be an increasing sequence of finite subsets of Zd fulfilling Zd =
∞ n=1
Bn ,
lim
n→+∞
#∂Bn = 0, #Bn
(7.3.1)
∂Bn = {i ∈ Bn / ∃ j ∈ Bn , d(i, j) = 1} and for i = (i(l))1≤l≤d and j = (j(l))1≤l≤d in Zd , d(i, j) = max1≤l≤d |i(l) − j(l)|. Let (Yi )i∈Zd be a real valued random fields. Suppose that (Yi )i∈Zd satisfies the following two assumptions: A) A covariance inequality. Recall that for a real valued function h defined on Rn |h(x) − h(y)| . Lip (h) = sup n x =y i=1 |xi − yi | Let R1 and R2 be two disjoints and finite subsets of Zd , and let f and g be two real valued functions defined respectively on R#R1 and R#R2 such that
164
CHAPTER 7. CENTRAL LIMIT THEOREM
Lip (f ) < ∞ and Lip (g) < ∞. We suppose that for any positive real number δ, there exists a constant Cδ (not depending on f g, R1 and R2 ) such that |Cov (f (Yi , i ∈ R1 ), g(Yi , i ∈ R2 )| ≤ Cδ Lip (f )Lip (g) (#R1 + #R2 ) exp (−δd(R1 , R2 )) ,
(7.3.2)
where d(R1 , R2 ) = mini∈R1 , j∈R2 d(i, j). We refer the reader to the book of Liggett (1985) [122] for some interacting particle models fulfilling such a covariance inequality. B) Weak stationarity. Suppose that for any i, j ∈ Zd Cov(Yi , Yj ) = Cov(Y0 , Yj−i ).
(7.3.3)
Theorem 7.4 gives a central limit theorem for the random fields (Yi )i∈Zd . Theorem 7.4. Let (Bn )n∈N be an increasing sequence of finite subsets of Zd fulfilling (7.3.1). Let (Yi )i∈Zd be a real valued random field, satisfying (7.3.2) d and (7.3.3). Suppose that, Yi ∞ < M . Let
for any i ∈ Z , EYi = 0 and supi∈Zd−1/2 Sn = i∈Bn Yi . Then k∈Zd |Cov(Y0 , Yk )| < ∞ and (#Bn ) Sn converges
2 in distribution to a centered normal law with variance σ = k∈Zd Cov(Y0 , Yk ). Proof of Theorem 7.4. The proof of Theorem 7.4 follows from Proposition 7.3 and Proposition 7.4 below. Proposition 7.3. Let (Yi )i∈Zd be a real valued random field such that EYi = 0 and EYi2 < ∞ for any i ∈ Zd . Suppose that, for any i, j ∈ Zd , (7.3.3) holds and that, for any positive real number δ, there exists a positive constant Cδ such that |Cov(Yi , Yj )| ≤ Cδ e−δd(i,j) .
(7.3.4)
d Let (B n )n be a sequence of finite and increasing sets of Z fulfilling (7.3.1). Let Sn = i∈Bn Yi . Then
k∈Zd
|Cov(Y0 , Yk )| < ∞
and
1 Var Sn = Cov(Y0 , Yk ). n→+∞ #Bn d lim
k∈Z
Proof of Proposition 7.3. The first conclusion of Proposition 7.3 follows from
7.3. NON CAUSAL RANDOM FIELDS
165
the bound (7.3.4), together with the following elementary calculations
|Cov(Y0 , Yk )|
≤ C
k∈Zd
exp(−δd(0, k))
k∈Zd ∞
≤ C ≤ C ≤ C
k∈Zd r=0 ∞
exp(−δd(0, k))1r≤d(0,k)
exp(−δr)
r=0 ∞
1d(0,k)
k∈Zd
exp(−δr)#{k ∈ Zd / d(0, k) < r + 1}
r=0
≤ C
∞
exp(−δr)rd
(7.3.5)
r=0
where C is a positive constant depending only on δ and d. We now prove the second part of Proposition 7.3. Thanks to (7.3.1), we can find a sequence u = (un ) of positive real numbers such that lim un = +∞,
n→∞
#∂Bn exp(un ) = 0. n→+∞ #Bn lim
(7.3.6)
Let (∂u Bn )n be the sequence of subsets of Zd defined by ∂u Bn = {s ∈ Bn d(s, ∂Bn ) < un }. The following bound #∂u Bn ≤ Cd (#∂Bn )udn , together with the suitable choice of the sequence (un ) and the limit (7.3.1) ensures #∂u Bn lim = 0, (7.3.7) n→∞ #Bn we shall use this fact below without further comments. Let Bnu = Bn \ ∂u Bn . We decompose the quantity Var Sn as in Newman (1980) [135]: 1 Var Sn #Bn
=
1 Cov (Yi , Yj ) = T1,n + T2,n + T3,n , #Bn i∈Bn j∈Bn
166
CHAPTER 7. CENTRAL LIMIT THEOREM
where T1,n
=
1 #Bn u
1 #Bn u
Cov (Yi , Yj ) ,
i∈Bn j∈Bn /d(i,j)≥un
T2,n
=
Cov (Yi , Yj ) ,
i∈Bn j∈Bn :d(i,j)
T3,n
=
1 #Bn
Cov (Yi , Yj ) .
i∈∂u Bn j∈Bn
Control of T1,n . We have, since |Bnu | ≤ |Bn | and applying (7.3.4) |T1,n | ≤ sup |Cov(Yi , Yj )| i∈Zd j∈Zd :d(i,j)≥u n
≤
Cδ sup i∈Zd
e−δd(i,j) .
(7.3.8)
{j∈Zd /d(i,j)≥un }
For any fixed i ∈ Zd , we argue as for (7.3.5) and we obtain
e−δd(i,j)
≤ C
∞
e−δr rd
r=[un ]
{j∈Zd /d(i,j)≥un }
≤ Ce−δ[un ] udn .
(7.3.9)
We obtain, collecting (7.3.8), (7.3.9) together with the first limit in (7.3.6): lim T1,n = 0.
(7.3.10)
n→+∞
Control of T3,n . We obtain using (7.3.3) and (7.3.4) : |T3,n |
≤
#∂u Bn |Cov(Y0 , Yk )| . #Bn d
(7.3.11)
k∈Z
The last bound, together with the limit (7.3.7) and the first conclusion of Proposition 7.3, gives lim T3,n = 0. (7.3.12) n→∞
Control of T2,n . We deduce using the following implication, if i ∈ Bnu and j is not belonging to Bn then d(i, j) ≥ un , that T2,n
=
1 #Bn u
i∈Bn {j∈Zd /d(i,j)
Cov(Yi , Yj )
7.3. NON CAUSAL RANDOM FIELDS Equality (7.3.3) ensures
Cov(Yi , Yj ) =
{j∈Zd /d(i,j)
167
Cov (Y0 , Yk ) .
{k∈Zd /d(0,k)
Hence T2,n
=
#Bnu #Bn
Cov(Y0 , Yk ).
{k∈Zd /d(0,k)
The last equality together with the first limit in (7.3.6) and (7.3.7), implies that Cov(Y0 , Yk ). (7.3.13) lim T2,n = n→∞
k∈Zd
The second conclusion of Proposition 7.3 is proved by collecting the limits (7.3.10), (7.3.12) and (7.3.13). Proposition 7.4. Let (Bn )n∈N be an increasing sequence of finite subsets of Zd such that #Bn tends to infinity as n goes to infinity. Let (Yi )i∈Zd be a real valued random field satisfying (7.3.2). Suppose that EYi = 0 for any i ∈ Zd ,
and that supi∈Zd Yi ∞ < M . Let Sn = i∈Bn Yi . If there exists a finite real number σ 2 such that Var Sn lim = σ2 , (7.3.14) n→∞ #Bn then (#Bn )−1/2 Sn converges in distribution to a centered normal law with variance σ 2 . Proof of Proposition 7.4. We need the following notation. Let (mn ) be a sequence of positive integers to be fixed later. We suppose for the moment that limn→∞ mn = ∞. For any i ∈ Zd , we define a neighborhood Vi of i in Bn as Vi = B(i, mn ) ∩ Bn ,
(7.3.15)
where B(i, mn ) = {j ∈ Zd / d(i, j) < mn }. Let Vic denote the complementary of Vi in Bn i.e. Vic = Bn \ Vi . For I a subset of Bn , we denote by Yi , S(I c ) = Yi . S(I) = i∈I
i∈Bn \I
Hence S(Bn ) = i∈Bn Yi =: Sn for the sake of brevity. Finally we denote by F (b2 , b3 ) the set of the real valued function h defined on R, three times differentiable, such that h(0) = 0, b2 := h
∞ < +∞ and that b3 := h(3) ∞ < +∞. We also need the following proposition.
168
CHAPTER 7. CENTRAL LIMIT THEOREM
Proposition 7.5. Let h be a fixed function of the set F (b2 , b3 ). Let (Bn )n∈N be a sequence of finite subsets of Zd . For any i ∈ Bn , let Vi be the set as defined random field. Suppose that EYi = 0 by (7.3.15). Let (Yi )i∈Zd be a real valued and EYi2 < +∞ for any i ∈ Zd . Let Sn = i∈Bn Yi . Then 1
E(h(Sn )) − Var Sn tE(h (tSn ))dt 0 1 ≤ |Cov (Yi , h (tSn (Vic )))| dt + 2 E|Yi ||S(Vi )| (b2 ∧ b3 |S(Vi )|) 0 i∈B n
i∈Bn
(Yi S(Vi ) − E(Yi S(Vi ))) + b2 |Cov(Yi , S(Vic ))| . +b2 E i∈Bn
(7.3.16)
i∈Bn
Remark 7.4. For a random field (Yi )i∈Zd of independent random variables such that supi∈Zd EYi4 < ∞, Proposition 7.5 applied with Vi = {i} implies that E(h(Sn )) − Var Sn
1
0
tE(h (tSn ))dt ≤ # 2 E|Yi |2 (b2 ∧ b3 |Yi |) + b2 |Bn | sup Yi2 2 .
i∈Zd
i∈Bn
Proof of Proposition 7.5. We have, h(Sn ) =
Sn
1
h (tSn )dt =
0
1
= 0
Yi h
1
0
Yi h (tSn ) dt
i∈Bn
dt
Yi (h (tSn ) − h
i∈Bn
Yi S(Vi )
i∈Bn
1
0
E (Yi S(Vi )) E (Yi Sn )
0
(tS(Vic ))
th
(tSn )dt −
i∈Bn
+
0
(tS(Vic ))
i∈Bn
+
1
i∈Bn
+ +
1
0
− tS(Vi )h (tSn )) dt
E (Yi S(Vi ))
i∈Bn 1
th
(tSn )dt −
th
(tSn )dt.
i∈Bn
1 0
E (Yi Sn )
0
th
(tSn )dt 1
th
(tSn )dt (7.3.17)
We take the expectation in the equality (7.3.17). The obtained formula, together
7.3. NON CAUSAL RANDOM FIELDS
169
with the following estimations, proves Proposition 7.5. |h (tSn ) − h (tS(Vic )) − tS(Vi )h
(tSn )| ≤ |h (tSn ) − h (tS(Vic )) − tS(Vi )h
(tS(Vic ))| + |S(Vi )||h
(tSn ) − h
(tS(Vic ))| ≤ 2|S(Vi )| (b2 ∧ b3 |S(Vi )|) . The purpose now is to control the right hand side of the bound (7.3.16) for a random field (Yi )i∈Zd fulfilling the covariance inequality (7.3.2) and the requirements of Proposition 7.4. Corollary 7.1. Let h be a fixed function of the set F (b2 , b3 ). Let (Bn )n∈N be a sequence of finite subsets of Zd . For any i ∈ Bn , let Vi be the set as defined by (7.3.15). Let (Yi )i∈Zd be a real valued random field, fulfilling the covariance d inequality (7.3.2).
Suppose that, for any i ∈ Z , EYi = 0 and supi∈Zd Yi ∞ < M . Let Sn = i∈Bn Yi . Then, for any positive real number δ, there exists a positive constant C(δ, M, d) independent of n, such that 1
E(h(Sn )) − Var Sn tE(h (tS ))dt n h∈F (b2 ,b3 ) 0 ≤ C(δ, M, d) b2 (#Bn )2 e−δmn + b3 mdn #Bn sup
+ b2 mdn
∞ 1/2 3m 1/2 n # # . #Bn m k d e−δ(k−2mn ) + b2 #Bn mdn e−δk k d k=3mn
k=1
Proof of Corollary 7.1. From now, we denote by C a positive constant that may be different from line to line, independent of n and depending, eventually, on M , δ and d. We have Vic = {j ∈ Zd / d(i, j) ≥ mn } ∩ Bn . Hence d({i}, Vic ) ≥ mn . The last bound together with (7.3.2), proves that c |Cov (Yi , h (tSn (Vic )))| ≤ Cb2 (#Vic + 1)e−δd({i},Vi ) i∈Bn
i∈Bn
≤ In the same way, we prove that b2 |Cov(Yi , S(Vic ))| i∈Bn
Cb2 (#Bn )2 e−δmn .
≤
Cb2 (#Bn )2 e−δmn .
(7.3.18)
(7.3.19)
170
CHAPTER 7. CENTRAL LIMIT THEOREM
Since #Vi ≤ #B(0, mn ) ≤ Cmdn and supj∈Zd k∈Zd |Cov(Yj , Yk )| < ∞, we deduce that E|Yi ||S(Vi )| (b2 ∧ b3 |S(Vi )|) ≤ b3 M #Bn sup E|S(Vi )|2 (7.3.20) i∈Zd
i∈Bn
≤ Cb3 #Bn mdn
|Cov(Y0 , Yk )|.
k∈Zd
It remains to control
E (Yi S(Vi ) − E(Yi S(Vi ))) . i∈Bn
For this, we argue as in Bolthausen (1982) [24]. We have 2 E (Yi S(Vi ) − E(Yi S(Vi ))) = Var ( Yi S(Vi )) i∈Bn i∈Bn = Cov(Yi S(Vi ), Yj S(Vj )). i∈Bn j∈Bn
Hence
2 E (Yi S(Vi ) − E(Yi S(Vi ))) i∈Bn ≤
|Cov(Yi Yi , Yj Yj )| .
(7.3.21)
i∈Bn i ∈B(i,mn ) j∈Bn j ∈B(j,mn )
Next, we have |Cov(Yi Yi , Yj Yj )| (7.3.22) ≤ |Cov(Yi Yi , Yj Yj )| 1d(i,j)≥3mn + |Cov(Yi Yi , Yj Yj )| 1d(i,j)≤3mn . We begin with the first term. The covariance inequality (7.3.2) together with some elementary estimations, imply that |Cov(Yi Yi , Yj Yj )| 1d(i,j)≥3mn
≤ ≤ ≤
∞
|Cov(Yi Yi , Yj Yj )| 1k≤d(i,j)≤k+1
k=3mn ∞
C
C
k=3mn ∞ k=3mn
e−δd({i,i },{j,j
})
1k≤d(i,j)≤k+1
e−δ(k−2mn ) 1d(i,j)≤k+1 .
7.3. NON CAUSAL RANDOM FIELDS
171
To obtain the last bound, note that for any i ∈ B(i, mn ) and j ∈ B(j, mn ), we have d({i, i }, {j, j }) + 2mn ≥ d({i, i }, {j, j }) + d(i, i ) + d(j, j ) ≥ d(i, j). Hence,
|Cov(Yi Yi , Yj Yj )| 1d(i,j)≥3mn
i∈Bn i ∈B(i,mn ) j∈Bn j ∈B(j,mn )
≤ Cm2d n ≤
∞
e−δ(k−2mn ) 1j∈B(i,k+1)
k=3mn i∈Bn j∈Bn ∞ C#Bn m2d k d e−δ(k−2mn ) . n k=3mn
(7.3.23)
We control the second term in (7.3.22). Inequality (7.3.2) and the fact that d({i}, {i , j, j }) ≤ d({i}, {i }), imply that |Cov(Yi Yi , Yj Yj )| 1d(i,j)≤3mn ≤ |Cov(Yi , Yi Yj Yj )| 1d(i,j)≤3mn + |Cov(Yi , Yi )| |Cov(Yj , Yj )| 1d(i,j)≤3mn
≤ Ce−δd({i},{i ,j,j
})
1d(i,j)≤3mn .
Using the last bound, we infer that |Cov(Yi Yi , Yj Yj )| 1d(i,j)≤3mn ≤
3m n
|Cov(Yi Yi , Yj Yj )| 1d(i,j)≤3mn 1k−1≤d({i},{i ,j,j })≤k
k=1
≤C
3m n
e−δ(k−1) 1d(i,j)≤3mn 1d({i},{i ,j,j })≤k .
(7.3.24)
k=1
Next, we have 1d({i},{i ,j,j })≤k ≤ 1d({i},{i })≤k + 1d({i},{j})≤k + 1d({i},{j })≤k , and consequently
d 1d(i,j)≤3mn 1d({i},{i ,j,j })≤k ≤ C#Bn m2d n k .
i∈Bn i ∈B(i,mn ) j∈Bn j ∈B(j,mn )
(7.3.25)
172
CHAPTER 7. CENTRAL LIMIT THEOREM
Combining (7.3.24) and (7.3.25), we obtain that |Cov(Yi Yi , Yj Yj )| 1d(i,j)≤3mn i∈Bn i ∈B(i,mn ) j∈Bn j ∈B(j,mn )
≤
C#Bn m2d n
3m n
e−δk k d . (7.3.26)
k=1
We collect the bounds (7.3.21), (7.3.23) and (7.3.26) and we obtain, ⎧ ∞ 1/2 ⎨ # E (Yi S(Vi ) − E(Yi S(Vi ))) ≤ C #Bn mdn k d e−δ(k−2mn ) ⎩ i∈Bn k=3mn 3m 1/2 ⎫ ⎬ n + mdn e−δk k d . (7.3.27) ⎭ k=1
Finally, the bounds (7.3.18), (7.3.19), (7.3.20), (7.3.27), together with Proposition 7.5 prove Corollary 7.1. End of the proof of Proposition 7.4. We apply √Corollary 7.1 to the real and imaginary parts of the function x → exp(iux/ #Bn ) − 1. Those functions belong to the set F (b2 , b3 ), with 2 3 u |u| √ √ and b3 = . b2 = #Bn #Bn We √ obtain, noting by φn the characteristic function of the normalized sum Sn / #Bn , 1 md φn (u) − 1 + Var Sn u2 ≤ C(δ, M, d, u) #Bn e−δmn + √ n tφ (tu)dt n |Bn | #Bn 0 ∞ 1/2 3m 1/2 ⎫ ⎬ n md md +√ n k d e−δ(k−2mn ) +√ n e−δk k d ⎭ #Bn k=3m #Bn k=1 n $ 9 3d/2 md mn ≤ C(δ, M, d, u) #Bn e−δmn + √ n + √ e−δmn /2 . #Bn #Bn For a suitable choice of the sequence mn (for example we can take mn = 2 δ log #Bn ), the right hand side of the last bound tends to 0 an n goes to infinity: Var Sn 2 1 lim φn (u) − 1 + u tφn (tu)dt = 0. (7.3.28) n→∞ #Bn 0
In order to finish the proof of Proposition 7.4, we need the following lemma.
7.4. CONDITIONAL CENTRAL LIMIT THEOREM (CAUSAL)
173
Lemma 7.2. Let σ 2 be a positive real number. Let (Xn ) be a sequence of real valued random variables such that supn∈N EXn2 < +∞. Let φn be the characteristic function of Xn . Suppose that for any u ∈ R, u 2 (7.3.29) lim φn (u) − 1 + σ tφn (t)dt = 0. n→∞
0
Then, for any u ∈ R, lim φn (u) = e−
n→∞
u2 σ2 2
.
Proof. Lemma 7.2 is a variant of Lemma 2 in Bolthausen (1982) [24], which is an adaptation of Stein’s method. Markov inequality implies that the sequence (μn )n∈N of the laws of (Xn ) is tight with the condition supn∈N EXn2 < ∞. Theorem 25.10 in Billingsley (1995) [21] proves the existence of a subsequence μnk and a probability measure μ such that μnk converges weakly to μ as k tends to infinity. Let φ denote the characteristic function of μ. We deduce from (7.3.29) that, for any u ∈ R, u 2 φ(u) − 1 + σ tφ(t)dt = 0, 0
or equivalently, for any u ∈ R, φ (u) + σ 2 uφ(u) = 0. We obtain integrating the last equation φ(u) = exp(−
σ 2 u2 ), 2
for any u ∈ R. The proof of Lemma 7.2 is complete by the use of Theorem 25.10 in Billingsley (1995) [21] and its corollary. The proof of Proposition 7.4 follows by (7.3.14), (7.3.28) and Lemma 7.2.
7.4
Conditional central limit theorem (causal)
Let Sn be the partial sums of a triangular array with stationary rows. In this section, we give necessary and sufficient conditions for Sn to satisfies the conditional central limit theorem. These conditions imply the weak convergence of n−1/2 Sn to a mixture of normal distribution, but they also imply the stable convergence in the sense of R´enyi (1963) [156] (see section 7.5.1). The main result of this section (Theorem 7.5) is due to Dedecker and Merlev`ede (2002) [44].
174
CHAPTER 7. CENTRAL LIMIT THEOREM
Definition 7.1. Let (Ω, A, P) be a probability space, and T : Ω → Ω be a bijective bimeasurable transformation preserving the probability P. An element A is said to be invariant if T (A) = A. We denote by I the σ-algebra of all invariant sets. The probability P is ergodic if each element of I has measure 0 or 1. Finally, let H be the space of continuous real functions ϕ such that x → |(1 + x2 )−1 ϕ(x)| is bounded. Theorem 7.5. For each positive integer n, let M0,n be a σ-algebra of A satisfying M0,n ⊆ T −1 (M0,n ). Define the .nondecreasing filtration (Mi,n )i∈Z ∞ :∞ by Mi,n = T −i (M0,n ) and Mi,inf = σ ( n=1 k=n Mi,k ). Let X0,n be a M0,n -measurable and square integrable random variable and define the sequence (Xi,n )i∈Z by Xi,n = X0,n ◦ T i . Finally, for any t in [0, 1], write Sn (t) = X1,n + · · · + X[nt],n . Suppose that n−1/2 X0,n converges in probability to zero as n tends infinity. The following statements are equivalent: S1 There exists a nonnegative M0,inf -measurable random variable η such that, for any ϕ in H, any t in [0, 1] and any positive integer k,
√ −1/2
S1(ϕ) : lim E ϕ(n Sn (t)) − ϕ(x tη)g(x)dx Mk,n
=0 n→∞
1
where g is the distribution of a standard normal. S2 (a)
lim lim sup E
t→0 n→∞
Sn2 (t) nt
|Sn (t)| = 0. 1∧ √ n
1 (b) lim lim sup √ E(Sn (t)|M0,n )1 = 0. t→0 n→∞ t n (c) There exists a nonnegative M0,inf -measurable random variable η such that,
S 2 (t)
− η M0,n = 0 . lim lim sup E n t→0 n→∞ nt 1 Moreover the random variable η satisfies η = η ◦ T almost surely. We now give the proof of this theorem. The fact that S1 implies S2 is obvious. In the next sections, we focus on the consequences of condition S2. We start with some preliminary results.
7.4.1
Definitions and preliminary lemmas
Definition 7.2. Let μ be a signed measure on a metric space (S, B(S)). Denote by |μ| the total variation measure of μ, and by μ = |μ|(S) its norm. We say that a family Π of signed measures on (S, B(S)) is tight if for every positive
7.4. CONDITIONAL CENTRAL LIMIT THEOREM (CAUSAL)
175
there exists a compact set K such that |μ|(K c ) < for any μ in Π. Denote by C(S) the set of continuous and bounded functions from S to R. We say that a sequence of signed measures (μn )n>0 converges weakly to a signed measure μ if for any ϕ in C(S), μn (ϕ) tends to μ(ϕ) as n tends to infinity. Lemma 7.3. Let (μn )n>0 be a sequence of signed measure on (Rd , B(Rd )), and set μ ˆn (t) = μn (exp(i < t, · >)). Assume that the sequence (μn )n>0 is tight and that supn>0 μn < ∞. The following statements are equivalent 1. the sequence (μn )n>0 converges weakly to the null measure. 2. for any t in Rd , μ ˆn (t) tends to zero as n tends to infinity. Proof of Lemma 7.3. 1 ⇒ 2 is obvious. It remains to prove that 2 ⇒ 1. We proceed in 3 steps. Step 1. Let D(Rd ) be the space of functions from Rd to C which are infinitely derivable with compact support. Let ϕ be any element of D(Rd ) and set ϕ(t) ¯ = ϕ(−t). ˆ From Plancherel equality, we have μn (ϕ) = (2π)−d μ ˆn (ϕ). ¯ The function μn | ϕ¯ being infinitely derivable and fast decreasing, it belongs to L1 (λ). Since |ˆ converges to zero everywhere and is bounded by supn>0 μn , the dominated convergence theorem implies that μ ˆn (ϕ) ¯ tends to zero as n tends to infinity. Consequently, for any ϕ in D(Rd ), μn (ϕ) converges to zero as n tends to infinity. Step 2. Let ϕ be any function from Rd to R, continuous and with compact support. For any positive , there exists ϕ in D(Rd ) such that ϕ − ϕ ∞ ≤ . Since furthermore supn>0 μn is finite, we infer from Step 1 that μn (ϕ) tends to zero as n tends to infinity. Step 3. For any positive integer k, let fk be a positive and continuous function from Rd to R satisfying: fk ∞ ≤ 1, f (x) = 1 for any x in [−k, k]d , f (x) = 0 c for any x in [−k − 1, k + 1]d . For any continuous bounded function ϕ, write |μn (ϕ)| ≤ |μn (ϕfk )| + ϕ∞ |μn |(([−k, k]d )c ) . From Step 2 the first term on right hand tends to zero as n tends to infinity. Since the sequence (μn )n>0 is tight, the second term on right hand is as small as we wish by choosing k large enough. This completes the proof of 1. Definition 7.3. Define the set R(Mk,n ) of Rademacher Mk,n -measurable random variables: R(Mk,n ) = {21A − 1 / A ∈ Mk,n }. Recall that g is the N (0, 1)distribution. For the random variable η introduced in Theorem 7.5 and any bounded random variable Z, let 1. νn [Z] be the image measure of Z.P by the variable n−1/2 Sn (t).
176
CHAPTER 7. CENTRAL LIMIT THEOREM
2. ν[Z] be the image measure # of g.λ ⊗ Z.P by the variable φ from R ⊗ Ω to R defined by φ(x, ω) = x tη(ω). Lemma 7.4. Let μn [Zn ] = νn [Zn ] − ν[Zn ]. For any ϕ in H, the statement S1(ϕ) is equivalent to: S3(ϕ): for any Zn in R(Mk,n ) the sequence μn [Zn ](ϕ) tends to zero as n tends to infinity. Proof of Lemma 7.4. For Zn in R(Mk,n ) and ϕ in H, we have √ |μn [Zn ](ϕ)| = E Zn ϕ(n−1/2 Sn (t)) − ϕ(x tη)g(x)dx
√ −1/2
S (t)) − ϕ(x tη)g(x)dx E ϕ(n ≤
M n k,n .
1
Consequently S1(ϕ) implies S3(ϕ). Now to prove that S3(ϕ) implies S1(ϕ), choose √ −1/2 A(n, ϕ) = E ϕ(n Sn (t)) − ϕ(x tη)g(x)dx Mk,n ≥ 0 , and Znϕ = 21A(n,ϕ) − 1. Obviously
√ ϕ −1/2
μn [Zn ](ϕ) = E ϕ(n Sn (t)) − ϕ(x tη)g(x)dx Mk,n
, 1
and S3(ϕ) implies S1(ϕ).
7.4.2
Invariance of the conditional variance
We first prove that if S2 holds, the random variables η satisfies η = η ◦ T almost surely (or equivalently that η is measurable with respect to the P-completion of I). From S2(c) and both the facts that (Xi,n )i∈Z is strictly stationary and M0,n ⊆ M1,n , we have for any t in ]0, 1],
Sn2 (t) ◦ T
lim E η ◦ T − (7.4.1) M0,n
= 0. n→∞ nt 1 On the other hand, defining ψ(x) = x2 (1 − (1 ∧ |x|)) and using the fact that T preserves P, we have
2
2
Sn (t) Sn2 (t) ◦ T
Sn (t) |Sn (t)|
≤ 2 nt 1 ∧ √n
nt − nt 1 1
S (t) ◦ T
(t) 1
S n n
. (7.4.2) √ −ψ + ψ √
t n n 1
7.4. CONDITIONAL CENTRAL LIMIT THEOREM (CAUSAL)
177
To control the second term on right hand, note that the function ψ is 3-lipschitz and bounded by 1. It follows that for each positive ,
S (t) ◦ T
Sn (t) √ n
≤ 3 + 2P(|X0,n − X[nt],n | > n ) .
ψ √ √ −ψ
n n 1 Using that n−1/2 X0,n converges in probability to 0, we derive that
S (t) ◦ T
Sn (t)
n
= 0, √ −ψ lim ψ √
n→∞ n n 1 and the second term on right hand in (7.4.2) tends to 0 as n tends to infinity. This fact together with inequality (7.4.2) and Condition S2(a) yield
2
Sn (t) Sn2 (t) ◦ T
= 0,
− lim lim sup
t→0 n→∞ nt nt 1 which together with S2(c) imply that
Sn2 (t) ◦ T
lim lim sup E η − M0,n
= 0. t→0 n→∞ nt 1
(7.4.3)
Combining (7.4.1) and (7.4.3), it follows that lim E(η − η ◦ T |M0,n )1 = 0, n→∞ which implies that C
lim E η − η ◦ T M0,k = 0 . n→∞
k≥n
1
Applying the martingale convergence theorem, we obtain C
lim E η − η ◦ T M0,k = E(η − η ◦ T |M0,inf )1 = 0 . n→∞
k≥n
1
(7.4.4)
According to S2(c), the random variable η is M0,inf -measurable. Therefore, (7.4.4) implies that E(η ◦ T |M0,inf ) = η. The fact that η ◦ T = η almost surely is a direct consequence of the following elementary result. Lemma 7.5. Let (Ω, A, P) be a probability space, X an integrable random variable, and M a σ-algebra of A. If the random variable E(X|M) has the same law as X, then E(X|M) = X almost surely. Proof of Lemma 7.5. For any real number m, consider A1 = {X ≤ m}, A2 = {E(X|M) ≤ m}, C1 = A1 ∩ Ac2 and C2 = A2 ∩ Ac1 . Since by assumption the random variables X and E(X|M) are identically distributed, it follows that P(A1 ) = P(A2 ), P(C1 ) = P(C2 ) and E(X1A1 ) = E(X1A2 ). This implies in
178
CHAPTER 7. CENTRAL LIMIT THEOREM
particular that E((X − m)1C1 ) = E((X − m)1C2 ). These terms having opposite signs, they are zero. Since X − m is positive on C2 , it follows that C2 and consequently A1 ΔA2 have probability zero (Δ denoting the symmetric difference). Now, it is easily seen that E((E(X|M))2 1A1 ) = E((E(X|M))2 1A2 ) = E(X 2 1A1 ) and 2
E(XE(X|M)1A1 ) = E(XE(X|M)1A2 ) = E((E(X|M)) 1A2 )
(7.4.5) (7.4.6)
According to (7.4.5) and (7.4.6), we obtain (X − E(X|M))1A1 22 = X1A1 22 + E(X|M)1A1 22 − 2E(XE(X|M)1A1 ) = 0
(7.4.7)
Since (7.4.7) is true for any real m, it follows that X = E(X|M) almost surely, and Lemma 7.5 is proved. .
7.4.3
End of the proof
First, note that we can restrict ourselves to bounded functions of H: if S2 implies S1(h) for any continuous and bounded function h then we easily infer from S2(c) that n−1 Sn2 (t) is uniformly integrable for any t in [0, 1], which implies that S1 extends to the whole space H. Definition 7.4. Let B13 (R) be the class of three-times continuously differentiable functions from R to R such that max(h(i) ∞ , i ∈ {0, 1, 2, 3}) ≤ 1. Suppose now that S1(h) holds for any h in B13 (R). Applying Lemma 7.4, this is equivalent to say that S3(h) holds for any h in B13 (R), which obviously implies that S3(h) holds for ht = eit . Using that the probability νn [1] is tight (since it converges weakly to ν[1]) and that |μn [Zn ]| ≤ νn [1] + ν[1], we infer that μn [Zn ] is tight, and Lemma 7.3 implies that S3(h) (and therefore S1(h)) holds for any continuous bounded function h. On the other hand, from the asymptotic negligibility of n−1/2 X0,n we infer that, for any positive integer k, n−1/2 (Sn (t) − Sn (t) ◦ T k ) converges in probability to zero. Consequently, since any function h belonging to B13 (R) is 1-Lipschitz and bounded, we have
lim h(n−1/2 Sn (t)) − h(n−1/2 Sn (t) ◦ T k ) = 0 , n→∞
1
and S1(h) is equivalent to
√ −1/2 k
lim E h(n Sn (t) ◦ T ) − h(x tη)g(x)dx Mk,n
= 0. n→∞
1
Now, since both η and P are invariant by T , we infer that Theorem 7.5 is a straightforward consequence of Proposition 7.6 below:
7.4. CONDITIONAL CENTRAL LIMIT THEOREM (CAUSAL) Proposition 7.6. Let Xi,n and Mi,n be defined as holds, then, for any h in B13 (R) and any t in [0, 1],
√ −1/2 E h(n S (t)) − h(x tη)g(x)dx lim
n
n→∞
179
in Theorem 7.5. If S2
M0,n
=0 1
where g is the distribution of a standard normal. Proof of Proposition 7.6. We prove the result for Sn (1), the proof of the general case being unchanged. Without loss of generality, suppose that there exists 1)-distributed and independent random variables, a sequence (εi )i∈Z of N (0, . independent of M∞,∞ = σ( k,n Mk,n ). Definition 7.5. Let i, p and n be three integers such that 1 ≤ i ≤ p ≤ n. Set q = [n/p] and define Ui,n
=
Xiq−q+1,n + · · · + Xiq,n ,
Vi,n
=
Δi
=
εiq−q+1 + · · · + εiq ,
Γi
=
1 √ (U1,n + U2,n + · · · + Ui,n ) 8n η (Δi + Δi+1 + · · · + Δp ) . n
Definition 7.6. Let g be any function from R to R. For k and l in [1, p] and any positive integer n ≥ p, set gk,l;n = g(Vk,n + Γl ), with the conventions gk,p+1;n = g(Vk,n ) and g0,l;n = g(Γl ). Afterwards, we shall apply this notation to the successive derivatives of the function h. For brevity we shall omit the index n. √ Let sn = η(ε1 + · · · + εn ). Since (εi )i∈Z is independent of M∞,∞ , we have, integrating with respect to (εi )i∈Z , √ E h(n−1/2 Sn (1)) − h(x η)g(x)dx M0,n = E(h(n−1/2 Sn (1)) − h(Vp,n )|M0,n ) + E(h(Vp,n ) − h(Γ1 )|M0,n ) + E(h(Γ1 ) − h(n−1/2 sn )|M0,n ) .
(7.4.8)
Here, note that |n−1/2 Sn (1)−Vp,n | ≤ n−1/2 (|Xn−p+2,n |+· · ·+|Xn,n |). Using the asymptotic negligibility of n−1/2 X0,n , we infer that n−1/2 Sn (1) − Vp,n converges in probability to zero. Since furthermore h is 1-Lipschitz and bounded, we conclude that (7.4.9) lim h(n−1/2 Sn (1)) − h(Vp,n )1 = 0 , n→∞
and the same arguments yield lim h(Γ1 ) − h(n−1/2 sn )1 = 0 .
n→∞
(7.4.10)
180
CHAPTER 7. CENTRAL LIMIT THEOREM
In view of (7.4.9) and (7.4.10), it remains to control the second term in the right hand side of (7.4.8). To this end, we use Lindeberg’s decomposition. h(Vp,n ) − h(Γ1 ) =
p
(hi,i+1 − hi−1,i+1 ) +
i=1
p
(hi−1,i+1 − hi−1,i ) .
(7.4.11)
i=1
Now, applying Taylor’s integral formula we get that: ⎧ ⎪ hi,i+1 − hi−1,i+1 ⎪ ⎪ ⎨ ⎪ ⎪ ⎪ ⎩hi−1,i+1 − hi−1,i
1 √ Ui,n h i−1,i+1 n
=
8 −
=
η Δi h i−1,i+1 n
+
1 2
U h 2n i,n i−1,i+1
+
Ri
−
η 2
Δ h 2n i i−1,i+1
+
ri
where √ η|Δi | ηΔ2i |Ui,n | √ √ and |ri | ≤ . (7.4.12) 1∧ 1∧ n n n Since Δi is centered and independent of σ M∞,∞ ∪ σ(h i−1,i+1 ) , we have √ √ E( ηΔi h i−1,i+1 |M0,n ) = E(Δi )E( ηh i−1,i+1 |M0,n ) = 0. It follows that 2 Ui,n |Ri | ≤ n
E(h(Vp ) − h(Γ1 )|M0,n ) = D1 + D2 + D3 ,
(7.4.13)
where D1
=
p
E(n−1/2 Ui,n h i−1,i+1 |M0,n ),
i=1
1 2 E(n−1 (Ui,n − ηΔ2i )h
i−1,i+1 |M0,n ), 2 i=1 p
D2 D3
= =
p E Ri + ri M0,n . i=1
Control of D3 . From (7.4.12) and the fact that T preserves P, we get p
Ri 1 ≤ E
i=1
Sn2 (1/p) |Sn (1/p)| √ 1∧ n/p n
,
and S2(a) implies that lim lim sup
p→∞ n→∞
p i=1
Ri 1 = 0 .
(7.4.14)
7.4. CONDITIONAL CENTRAL LIMIT THEOREM (CAUSAL)
181
−1/2 Moreover, since for t ∈ [0, 1], the sequence (η/n)
p (ε1 + · · · + ε[nt] ) obviously satisfies S2(a), the same argument applies to i=1 ri 1 . Finally
lim lim sup D3 1 = 0 .
p→∞ n→∞
(7.4.15)
Control of D1 . Denote by Eε the integration with respect to the sequence (εi )i∈Z . Set l(i, n) = (i − 1)[n/p]. Bearing in mind the definition of h i−1,i+1 and integrating with respect to (εi )i∈Z we deduce that the random variable Eε (h i−1,i+1 ) is Ml(i,n),n -measurable and bounded by one. Now, since the σalgebra M0,n is included into Ml(i,n),n , we obtain E(n−1/2 Ui,n h i−1,i+1 |M0,n )1 ≤ E(n−1/2 Ui,n |Ml(i,n),n )1 . Using that T preserves P, the latter equals E(n−1/2 Sn (1/p)|M0,n )1 . Consequently D1 1 ≤ pn−1/2 E(Sn (1/p)|M0,n )1 and S2(b) implies that lim lim sup D1 = 0 .
p→∞ n→∞
(7.4.16)
Control of D2 . Integrating with respect to (εi )i∈Z , we have 2 2 − ηΔ2i )h
i−1,i+1 |M0,n )1 = E((Ui,n − η[np−1 ])Eε (h
i−1,i+1 )|M0,n )1 . E((Ui,n
Arguing as for the control of D1 , we have 2 2 − η[np−1 ])Eε (h
i−1,i+1 )|M0,n )1 ≤ E(n−1 Ui,n − η[np−1 ]|Ml(i,n),n )1 . E((Ui,n
Since both η and P are invariant by the transformation T , the latter equals E(Sn2 (1/p) − η[np−1 ]|M0,n )1 . Consequently
2
[n/p] Sn (1/p)
−η D2 1 ≤ E M0,n
n/p n/p 1 and S2(c) implies that lim lim sup D2 1 = 0 .
p→∞ n→∞
(7.4.17)
End of the proof of Proposition 7.6. From (7.4.15), (7.4.16) and (7.4.17) we infer that, for any h in B13 (R), lim lim sup D1 + D2 + D3 1 = 0 .
p→∞ n→∞
This fact together with (7.4.8), (7.4.9), (7.4.10) and (7.4.13) imply Proposition 7.6.
182
CHAPTER 7. CENTRAL LIMIT THEOREM
7.5
Applications
7.5.1
Stable convergence
Let (Ω, A, P) be a probability space. A sequence (Xn )n>0 of integrable random variables is said to converge weakly in L1 to X if for any bounded random variable Y , we have lim E(ZXn ) = E(ZX) . n→∞
Let X be some polish space. A random probability μ on X is a function from B(X ) × Ω such that μ(., ω) is a probability measure for any ω in Ω and μ(B, .) is A-measurable for any B in B(X ). Let μ be random probability on X . We say that a sequence (Xn )n>0 with values in a Polish space X converges stably with respect to μ if the sequence (ϕ(Xn ))n>0 converges weakly in L1 to μ(ϕ) for any continuous bounded function ϕ. The equivalence of this definition with that of R´enyi (1963) [156] is proved in Aldous and Eagleson (1978) [1]. Note that, if (Xn )n>0 converges stably with respect to μ, then it converges in distribution to the probability ν(A) = μ(A, ω)P(dω). Here is a one consequence of the stablility. Lemma 7.6. Let (Xn )n>0 and (Yn )n>0 be two sequences of random variables with values in two polish spaces X and Y. If (Xn )n>0 converges stably with to Y then (Xn , Yn ) converges respect to μ and (Yn )n>0 converges in probability in distribution to the probability ν(A × B) = μ(A, ω)1Y (ω)∈B P(dω). Proof of Lemma 7.6. Let f and g be two bounded Lipschitz functions. We have |E(f (Xn )g(Yn )) − ν(f ⊗ g))| = |E(f (Xn )g(Yn )) − E(μ(f )g(Y ))| . Consequently |E(f (Xn )g(Yn )) − ν(f ⊗ g))| ≤ |E(f (Xn )g(Y )) − E(μ(f )g(Y ))| + g(Yn ) − g(Y )1 . The first term on right hand tends to zero by the weak-L1 convergence and the second term tends to zero because Yn converges in probability to Y and g is bounded Lipschitz. The next Corollary shows that if the CCLT holds then (n−1/2 Sn (t))n>0 converges stably. Corollary 7.2. Let Xi,n , Mi,n , Sn (t) be as in Theorem 7.5. Suppose that the sequence (M0,n )n≥1 is nondecreasing. If Condition S2 is satisfied, then the sequence (n−1/2 Sn (t)) n>0 √ converges stably with respect to the random probability defined by μ(ϕ) = ϕ(x tη)g(x)dx.
7.5.
APPLICATIONS
183
Proof of Corollary 7.2. We shall prove that if S2 holds, then, for any bounded random variable Z, any t in [0, 1] and any ϕ in H, √ lim E Zϕ(n−1/2 Sn (t)) = E Z ϕ(x tη)g(x)dx . (7.5.1) n→∞
Since n−1 Sn2 (t) is uniformly integrable, we only need . to prove (7.5.1) for continuous bounded functions. Recall that M∞,∞ = σ( k,n Mk,n ). Since both Sn (t) and η are M∞,∞ -measurable, we can and do suppose that so is Z. Set Zk,n = E(Z|Mk,n ), and use the decomposition √ E Zϕ(n−1/2 Sn (t)) − E Z ϕ(x tη)g(x)dx = T1 + T2 + T3 , where T1 T2 T3
= E (Z − Zk,n )ϕ(n−1/2 Sn (t)) √ = E Zk,n ϕ(n−1/2 Sn (t)) − ϕ(x tη)g(x)dx √ = E (Zk,n − Z) ϕ(x tη)g(x)dx .
By assumption, the array Mk,n is nondecreasing in k and n. Since the random variable Z is M∞,∞ -measurable, the martingale convergence theorem implies that limk→∞ limn→∞ Zk,n − Z1 = 0. Consequently, lim lim sup |T1 | = lim lim sup |T3 | = 0 .
k→∞ n→∞
k→∞ n→∞
On the other hand, Theorem 7.5 implies that T2 tends to zero as n tends to infinity, which completes the proof of Corollary 7.2. We end this section with an application of the stable convergence to random normalization. The proof is straightforward, using Lemma 7.6 and Corollary 7.2. Corollary 7.3. Let Xi,n , Mi,n , Sn (t) be as in Theorem 7.5. Suppose that the sequence (M0,n )n≥1 is nondecreasing and that P(η > 0) = 1. If Condition S2 is satisfied, and if (ηn )n>0 converges in probability to η then, for any t in ]0, 1], n−1/2 Sn (t) # tηn ∨ n−1
converges in distribution to N (0, 1) .
For more about stable convergence, we refer to the book by Castaing et al. (2004) [35].
184
CHAPTER 7. CENTRAL LIMIT THEOREM
7.5.2
Sufficient conditions for stationary sequences
For strictly stationary sequences, Theorem 7.5 writes as follows. Theorem 7.6. Let M0 be a σ-algebra of A satisfying M0 ⊆ T −1 (M0 ) and define the nondecreasing filtration (Mi )i∈Z by Mi = T −i (M0 ). Let X0 be a M0 -measurable, square integrable and centered random variable. Define the sequence (Xi )i∈Z by Xi = X0 ◦ T i , and Sn = X1 + · · · + Xn . The following statements are equivalent: S1 There exists a nonnegative M0 -measurable random variable η such that, for any ϕ in H and any positive integer k,
√
−1/2 lim E ϕ(n Sn ) − ϕ(x η)g(x)dx Mk = 0 1
n→∞
where g is the distribution of a standard normal. S2 (a) the sequence (n−1 Sn2 )n>0 is uniformly integrable. (b) the sequence E(n−1/2 Sn |M0 )1 tends to 0 as n tends to infinity. (c) there exists a nonnegative M0 -measurable random variable η such that E(n−1 Sn2 − η|M0 )1 tends to 0 as n tends to infinity. Moreover the random variable η satisfies η = η ◦ T almost surely. In Proposition 7.7 and 7.8, we give two conditions implying S2. For the usual central limit theorem, Proposition 7.7 is due to Gordin (1969) [97], Corollary 7.4 is due to Heyde (1974) [103] and Proposition 7.8 is due to Dedecker and Rio (2000) [50]. Proposition 7.7. Let (Mi )i∈Z and (Xi )i∈Z be as in Theorem 7.6. Let Hi be the Hilbert space of Mi -measurable, centered and square integrable functions. For all integer j less than i, denote by Hi ! Hj , the orthogonal of Hj into Hi . Assume that there exists a random variable m in H0 ! H−1 such that n
1
lim √
X0 ◦ T i − m ◦ T i = 0 , n→∞ n i=1 2
(7.5.2)
then S2 (hence S1) holds. Corollary 7.4. Let (Mi )i∈Z and (Xi )i∈Z be as in Theorem 7.6, and define Hi as in Proposition 7.7. Let Pi be the projection operator on Hi ! Hi−1 : for any function f in L2 , Pi (f ) = E(f |Mi ) − E(f |Mi−1 ). If n
P0 (Xi ) converges in L2 to m and lim n−1/2 Sn 2 = m2 ,
i=0
then (7.5.2) (hence S1) holds.
n→∞
(7.5.3)
7.5.
APPLICATIONS
185
Proof of Proposition 7.7. Let Wn = m◦T +· · ·+m◦T n . Since (m◦T i )i∈Z is a stationary sequence of martingale differences with respect to the filtration (Mi )i∈Z , it satisfies S2. More precisely, n−1 E(Wn2 |M0 ) converges to η = E(m2 |I) in L1 . Now, we shall use (7.5.2) to see that the sequence (Xi )i∈Z also satisfies S2. Proof of S2(b). From (7.5.2) it is clear that n−1/2 E(Sn |M0 )2 tends to zero as n tends to infinity. Proof of S2(c). To see that n−1 E(Sn2 |M0 ) converges to η in L1 , write
1
E(Sn2 − Wn2 |M0 )
1 n
≤ ≤
1
Sn2 − Wn2
1 n Sn + Wn 2 Sn − Wn 2 √ √ . n n
(7.5.4)
From (7.5.2) the latter tends to zero as n tends to infinity and therefore (Xi )i∈Z satisfies s2(c) with η = E(m2 |I). Proof of S2(a). Using both that n−1 Wn2 is uniformly integrable and that the function x → (1 ∧ |x|) is 1-Lipschitz, we have, for any positive real M , 2 |Sn | |Wn | Wn √ √ = 0. (7.5.5) 1 ∧ − 1 ∧ lim n→∞ n M n M n Since |x2 (1 ∧ |y|) − z 2 (1 ∧ |t|)| ≤ |x2 − z 2 | + z 2 |(1 ∧ |y|) − (1 ∧ |t|)|, we infer from (7.5.4) and (7.5.5) that
2
Sn |Sn | Wn2 |Wn |
1∧ √ − 1∧ √
lim = 0. n→∞ n M n n M n 1 Now, the uniform integrability of n−1 Wn2 yields 2 Sn |Sn | 1∧ √ lim lim sup E = 0, M→∞ n→∞ n M n which means exactly that n−1 Sn2 is uniformly integrable. Proof of Corollary 7.4. By assumption, the random variable m belongs to H0 ! H−1 . It remains to check (7.5.2). Let mi = m◦ T i and Tn = m1 + · · ·+ mn . Clearly E((Sn − Tn )2 ) = E(Sn2 ) + E(Tn2 ) − 2E(Sn Tn ). By assumption n−1 E(Sn2 ) converges to m22 = n−1 E(Tn2 ). To prove (7.5.2), it suffices to prove that n−1 E(Sn Tn ) converges to m22 . Now, by stationarity n n n |k| 1 1 1− E(Sn Tn ) = E(mXk ) . E(Xi mj ) = n n i=1 j=1 n k=−n
(7.5.6)
186
CHAPTER 7. CENTRAL LIMIT THEOREM
By Kronecker’s lemma n−1 E(Sn Tn ) converges to k∈Z E(mXk ) as soon as
the latter converges. Now since m belongs to H ! H , it follows that 0 −1 k∈Z
2 E(mXk ) = k≥0 E(mP0 (Xk )). Since m = k≥0 P0 (Xk ) in L , the result follows. Proposition 7.8. Let (Mi )i∈Z , (Xi )i∈Z and Sn be as in Theorem 7.6. Consider the condition: n X0 E(Xk |M0 ) converges in L1 . (7.5.7) k=1
If (7.5.7) is satisfied, then S2 holds and the sequence E(X02 |I) + 2E(X0 Sn |I) converges in L1 to η. Proof of Proposition 7.8. We first prove that E(X02 |I) + 2E(X0 Sn |I) converges in L1 . From assumption (7.5.7), the sequence E(X02 |M0 ) +2E(X0 Sn |M0 ) converges in L1 . The result is then a consequence of part (b) of Lemma 7.7 below: Lemma 7.7. We have: (a) Both E(X0 Xk |I) and E(E(X0 Xk |M0 )|I) are almost surely equal to M0 measurable random variables. (b) E(X0 Xk |I) = E(E(X0 Xk |M0 )|I) almost surely. Lemma 7.7(b) is derived from Lemma 7.7(a) via the following elementary fact, whose proof is omitted. Lemma 7.8. Let Y be a random variable in L1 (P) and U, V two σ-algebras of (Ω, A, P). Suppose that E(Y |U) and E(E(Y |V)|U) are V-measurable. Then E(Y |U) = E(E(Y |V)|U) almost surely. Proof of Lemma 7.7(a). The fact that E(E(X0 Xk |M0 )|I) is almost surely equal to some M0 -measurable random variable follows from the L1 -ergodic theorem. Indeed the variables E(Xi Xk+i |M0 ) are M0 -measurable and 1 E(Xi Xk+i |M0 ) in L1 . n→∞ n i=1 n
E(E(X0 Xk |M0 )|I) = lim
Next, from the stationarity of the sequence (Xi )i∈Z , we have n
1 1
Xi Xi+k = E(X0 Xk |I) −
E(X0 Xk |I) − n i=1 n 1
−k i=1−(n+k)
Xi Xi+k . 1
Both this equality and the L1 -ergodic theorem imply that E(X0 Xk |I) is the limit in L1 of a sequence of M0 -measurable random variables.
7.5.
APPLICATIONS
187
Proof of S2(a). Let S n = max{|S1 |, . . . , |Sn |}. In Chapter 8, we shall prove that (n−1 (S n )2 )n>0 is uniformly integrable as soon as (7.5.7) holds. Proof of S2(c). For any positive integer N , we introduce ΛN = [(k − 1)q + 1, kq]2 ∩ {(i, j) ∈ Z2 / |i − j| > N },
¯ N = [1, n]2 − ΛN . and Λ
and ηN = E(X02 |I) + 2(E(X0 X1 |I) + · · · + E(X0 XN |I) ). Since ηN converges in L1 to η, it suffices to prove that n n
1
lim lim sup ηN − E Xi Xj M0 = 0 . N →∞ n→∞ n i=1 j=1 1
(7.5.8)
Using the triangle inequality and the fact that ηN = E(ηN |M0 ) almost surely, we obtain n n
1 1
Xi Xj M0 ≤ ηN − Xi Xj
ηN − E n i=1 j=1 n 1 1 ΛN
1
+
E(Xi Xj |M0 ) . n ¯ 1 ΛN
Applying the L1 -ergodic theorem the first term on right hand tends to 0 as n tends to infinity. To control the second term, write
1
E(Xi Xj |M0 )
n ¯ 1
≤
ΛN
n n
2
E(Xj |Mi )
Xi n i=1 1 j=i+N
≤
2 n
n
n−i
E(Xj |M0 ) .
X0
i=1
By assumption (7.5.7) limN →∞ maxn−N ≤i≤n X0 consequently lim lim sup
N →∞ n→∞
1
j=N
n−i j=N
E(Xj |M0 )1 = 0 and
n n−i
2
E(Xj |M0 ) = 0 .
X0 n i=1 1 j=N
This completes the proof of S2(c). C Mi and recall that Pi has been defined in Proof of S2(b). Let M−∞ = i∈Z
Corollary 7.4. We have the orthogonal decomposition Xk = E(Xk |M−∞ ) +
∞ i=0
Pk−i (Xk ) .
(7.5.9)
188
CHAPTER 7. CENTRAL LIMIT THEOREM
Using the stationarity of (Xi )i∈Z we infer from (7.5.9) that ∞
P0 (Xi )22 =
i=0
∞
P−i (X0 )22 ≤ X0 22 .
(7.5.10)
i=0
Now, from the decomposition n X−1 X X √ E(Sn |M0 ) = √−1 E(Sn |M−1 ) + √−1 P0 (Xi ) , n n n i=1
we infer that n 1 1 X0 2 √ X−1 E(Sn |M0 )1 ≤ √ X0 E(Sn ◦ T |M0 )1 + √ P0 (Xi )2 . n n n i=1 (7.5.11) By (7.5.7), the first term on right hand tends to zero as n tends to infinity. On the other hand, we infer from (7.5.10) and Cauchy-Shwarz’s inequality that n n−1/2 i=1 P0 (Xi )2 vanishes as n goes to infinity, and so does the left hand term in (7.5.11). By induction, we can prove that for any positive k,
1 lim √ X−k E(Sn |M0 )1 = 0 . n→∞ n
(7.5.12)
Now
k
1
1 1
√ E(|X0 ||I)E(Sn |M0 )1 ≤ √ E(Sn |M0 ) E(|X0 ||I) − |X−i |
n n
k i=1 1
k
1
1
+ √ E(Sn |M0 ) |X−i | . (7.5.13)
n
k i=1
1
From (7.5.12), the second term on right hand tends to zero as n tends to infinity. Applying first Cauchy-Schwarz’s inequality and next the L2 -ergodic theorem, we easily deduce that the first term on right hand is as small as we wish by choosing k large enough. Therefore 1 lim √ E(|X0 ||I)E(Sn |M0 )1 = 0 . n } and B = Ac = {1 }. Set A = {1 E |X0 |I >0 E |X0 |I =0 For any positive real m, we have n→∞
(7.5.14)
1 1 √ 1A E(Sn |M0 )1 ≤ √ E(|X0 ||I)E(Sn |M0 )1 n m n 1 + √ 10<E(|X0 ||I)<m E(Sn |M0 )1 . (7.5.15) n
7.5.
APPLICATIONS
189
From (7.5.14), the first term on right hand tends to zero as n tends to infinity. Letting m tend to zero we infer that the second term on right hand of (6.12) is as small as we wish. Consequently 1 lim √ 1A E(Sn |M0 )1 = 0 . n
(7.5.16)
n→∞
On the other hand, noting that E(|X0 |1B ) = 0, we infer that X0 is zero on the set B. Since B is invariant by T , Xk is zero on B for any k in Z. Now arguing as in Lemma 7.7(b), we obtain E(E(|Sn ||M0 )|I) = E(|Sn ||I). These two facts lead to 1B E(Sn |M0 )1 ≤ E(1B E(E(|Sn ||M0 )|I)) ≤ E(|Sn |1B ) ≤ 0
(7.5.17)
Collecting (7.5.16) and (7.5.17), we conclude that n−1/2 E(Sn |M0 )1 tends to zero as n tends to infinity. This completes the proof.
7.5.3
γ-dependent sequences
Applying Corollary 7.4 and Proposition 7.8, we derive sufficient conditions for the CCLT expressed in terms of the coefficients γ2 , and γ1 . Here, the coefficients are defined by γ1 (k) = γ1 (M0 , Xk ),
and γ2 (k) = γ2 (M0 , Xk ) .
Corollary 7.5. Let (Mi )i∈Z and (Xi )i∈Z be as in Theorem 7.6 and define M−∞ = ∩i∈Z Mi . Define the operators Pi as in Corollary 7.4.
1. If E(X0 |M−∞ ) = 0 and i≥0 P0 (Xi )2 < ∞ then (7.5.3) (hence S1) holds. Moreover, η is the same as in Proposition 7.8. 2. Assume that
γ2 (k) √ < ∞. (7.5.18) k k>0
Then E(X0 |M−∞ ) = 0 and i≥0 P0 (Xi )2 < ∞, and S1 holds.
Proof of Corollary 7.5: Proof of 1. Starting from (7.5.9) with E(Xk |M−∞ ) = 0, we obtain E(X0 Xk ) = E(P−i (X0 )P−i (Xk )) = E(P0 (Xi )P0 (Xi+k )) . i≥0
i≥0
Hence, using H¨ older’s inequality, ∞ k=1
|E(X0 Xk )| ≤
i≥0
P0 (Xi )2
∞ k=1
2 P0 (Xi+k )2 ≤ P0 (Xi )2 , i≥0
(7.5.19)
190
CHAPTER 7. CENTRAL LIMIT THEOREM
so that
k∈Z
|E(X0 Xk )| is finite. Consequently
∞ ∞ ∞ 1 E(Sn2 ) = E(X0 Xk ) = E(P0 (Xi )P0 (Xj )) . n→∞ n i=0 j=0
lim
(7.5.20)
k=−∞
On the other hand ∞ ∞ ∞ 2 E(m2 ) = E = P0 (Xi ) E(P0 (Xi )P0 (Xj )) . i=0
(7.5.21)
i=0 j=0
Combining (7.5.20) and (7.5.21), we see that (7.5.3) holds. To compute η, note that we can in fact prove a stronger result than (7.5.19), that is ∞
E(X0 Xk |I)1 ≤
k=1
P0 (Xi )2
2 .
i≥0
It follows that n−1 E(Sn2 |I) converges in L1 to η =
k∈Z
E(X0 Xk |I).
Proof of 2. Note first that (7.5.18) implies that E(Xk |M−∞ ) = 0. Let (Lk )k>0 −1
i be a sequence of positive numbers such that i>0 < ∞. Starting k=1 Lk from (7.5.9) and using the stationarity of (Xi )i∈Z we obtain that
Lk E(Xk |M0 )22 =
k>0
Lk
k>0
Pi (Xk )22 =
i
Lk P0 (Xi )22 .
i>0 k=1
i≤0
Let ai = L1 + · · · + Li . Applying H¨ older’s inequality in 2 , we obtain that
P0 (Xi )2
i>0
1/2 1 1/2 ai P0 (Xi )22 a i>0 i i>0 1/2 1 1/2 ≤ Lk E(Xk |M0 )22 a i>0 i ≤
k>0
Since (7.5.18) holds, one can take L−1 = k lows.
√
kE(Xk |M0 )2 . The result fol-
Corollary 7.6. Let (Mi )i∈Z and (Xi )i∈Z be as in Theorem 7.6. Let GX0 and QX0 be as in Definition 5.1. If ∞ k=0
0
γ1 (k)
QX0 ◦ GX0 (u)du < ∞
then (7.5.7) (hence S1) holds. Moreover (7.5.22) holds as soon as
(7.5.22)
7.5.
APPLICATIONS
191
1. P(|X0 | > x) ≤ (c/x)r for r > 2, and i≥0 (γ1 (i))(r−2)/(r−1) < ∞.
2. X0 r < ∞ for r > 2, and i≥0 i1/(r−2) γ1 (i) < ∞. 3. E(|X0 |2 log(1 + |X0 |)) < ∞ and γ1 (i) = O(ai ) for some a < 1. Proof of Corollary 7.6. Applying Inequality (5.2.1), we obtain that
X0 E(Xk |M0 )1 ≤
k≥0
k≥0
γ1 (k)
0
Q|X0 | ◦ G|X0 | (u)du ,
so that (7.5.22) implies (7.5.7). Proof of 1. Since P(|X| > x) ≤ (c/x)r we easily get that x ur r/(r−1) c(r − 1) (r−1)/r x QX (u)du ≤ and then GX (u) ≥ . r c(r − 1) 0 Set Kc,r = c(c − cr−1 )1/(r−1) . We obtain the bound
γ1 (i)
0
i≥0
QX0 ◦ GX0 (u)du
≤ Kc,r
i≥0
γ1 (i)
u−1/(r−1) du
0
Kc,r (r − 1) (r−2)/(r−1) γ1,i . r−2
≤
i≥0
Proof of 2. Note first that
X0 1
0
Qr−1 X0
◦ GX0 (u)du =
1
0
QrX0 (u)du = E(|X0 |r ) .
Define next γ1−1 (u) =
1u<γ1 (i) = inf{k ∈ N/ γ1 (k) ≤ u} .
i≥0
Applying H¨ older’s inequality, we obtain that i≥0
0
γ1 (i)
QX0 ◦ GX0 (u)du
r−1
≤ X0 rr
0
X0 1
(γ1−1 (u))(r−1)/(r−2) du
r−2 .
Here note that (γ1−1 (u))q
=
∞ j=0
((j + 1)q − j q )1u<γ1 (j)
(7.5.23)
192
CHAPTER 7. CENTRAL LIMIT THEOREM
Now, apply (7.5.23) with q = (r − 1)/(r − 2). Noting that (i + 1)q − iq ≤ q(i + 1)q−1 , we infer that
X1
0
(γ1−1 (u))(r−1)/(r−2) du
≤
q
(i + 1)1/(r−2) γ1 (i) .
i≥0
Proof of 3. Let ρ(i) = γ1 (i)/X1 and U be a random variable uniformly distributed over [0, 1]. We have
X1
0
γ1−1 (u)QX0 ◦ GX0 (u)du = =
0
1
ρ−1 (u)QX0 ◦ GX0 (uX0 1 )du
E((ρ−1 (U ))QX0 ◦ GX0 (U X0 1 )) .
Let φ be the function defined on R+ by φ(x) = x(log(1 + x))p−1 . Denote by φ∗ its Young’s transform. Applying Young’s inequality, we have that E((ρ−1 (U ))QX0 ◦ GX0 (U X0 1 )) ≤ 2(ρ−1 (U ))φ∗ QX0 ◦ GX0 (U X0 1 )φ Here, note that QX0 ◦ GX0 (U X0 1 )φ is finite as soon as 0
X1
QX0 ◦ GX0 (u)(log(1 + QX0 ◦ GX0 (u)))p−1 du < ∞ .
Setting z = GX0 (u), we obtain the condition 0
1
Q2X0 (u)(log(1 + QX0 (u)))du < ∞ .
(7.5.24)
Since QX0 (U ) has the same distribution as |X0 |, we infer that (7.5.24) holds as soon as E(|X0 |2 (log(1 + |X0 |))) is finite. It remains to control (ρ−1 (U ))φ∗ . Arguing as in Rio (2000) [161] page 17, we see that (ρ−1 (U ))φ∗ is finite as soon as there exists c > 0 such that −1 ρ(i) φ ((i + 1)/c) < ∞ . (7.5.25) i≥0 −1
Since φ has the same behavior as x → ex as x goes to infinity, we can always find c > 0 such that (7.5.25) holds provided that γ1 (i) = O(ai ) for some a < 1.
7.5.4
˜ α ˜ and φ-dependent sequences
In this section, the coefficients are defined by ˜ φ˜1 (k) = φ(M 0 , Xk )
˜ and α ˜1 (k) = φ(M 0 , Xk ) .
7.5.
APPLICATIONS
193
Let Xn = X0 ◦ T i and Sn (f ) = f (X0 ) + · · · + f (Xn ) and Sn,0 (f ) = Sn (f ) − nE(f (X0 )) . Let C(p, M, PX ) be the closed convex hull of the class of functions g which are monotonic on an open interval and 0 elsewhere, and such that E(|g(X0 )|p ) < M . Corollary 7.7. Let (Mi )i∈Z be as in Theorem 7.6. Assume that, for some p ≥ 2, (φ˜1 (k))(p−1)/p √ < ∞. (7.5.26) f ∈ C(p, M, PX ) and k k≥0 Then Sn,0 (f ) satisfies S1 with η=
E(f (X0 )(f (Xk ) − E(f (Xk )))|I) .
(7.5.27)
k∈Z
Remark 7.5. We can apply this result to the case of uniformly expanding maps (see Section 3.3). If T is a uniformly expanding map of [0, 1] preserving a probability μ, and if f belongs to C(2, M, μ), then n−1/2 (f ◦T +· · ·+f ◦T n−nμ(f )) converges weakly in the space ([0, 1], μ) to a Gaussian distribution with mean 0 and variance σ 2 (f ) = μ((f − μ(f ))2 ) + 2 μ((f − μ(f ))f ◦ T k ) . k>0
Let C(Q) the closed convex hull of the class of functions g which are monotonic on an open interval and 0 elsewhere, and such that Q|g(X0 )| ≤ Q, where Q is a given quantile function. Corollary 7.8. Let (Mi )i∈Z be as in Theorem 7.6. Assume that, for some quantile function Q, f ∈ C(Q)
and
k≥0
0
α(k) ˜
Q2 (u)du < ∞ .
(7.5.28)
Then Sn,0 (f ) satisfies S1 with η given by (7.5.27). Proof of Corollaries 7.7 and 7.8. Without
k loss of generality, it suffices to prove k the results for f = i=1 αi gi , where i=1 αi = 1 and gi is monotonic on an interval and 0 elsewhere, and E(|gi (X0 )|p ) < M (resp. Q|gi (X0 )| ≤ Q). To prove Corollary 7.7 it suffices to see that the sequence (f (Xi ))i∈Z satisfies (7.5.18), that is E(f (Xk )|M0 ) − E(f (Xk ))2 √ < ∞. (7.5.29) k k≥0
194
CHAPTER 7. CENTRAL LIMIT THEOREM
Clearly, it suffices to check (7.5.29) for a single function gi . Note that by mono˜ ˜ tonicity φ(M 0 , gi (Xk )) ≤ 2φ(M0 , Xk ). Let S1 (M0 ) be the set of all M0 measurable random variables Z such that E(Z 2 ) = 1. Clearly, E(gi (Xk )|M0 ) − E(gi (Xk ))2 =
sup Z∈S1 (M0 )
|Cov(Z, gi (Xk ))|
(7.5.30)
Applying (5.2.7), we have, for any conjugate exponents p, q, |Cov(Z, (f − g)(Yk ))| ≤ 4gi (X0 )p Zq (φ˜1 (k))1/q . and the result easily follows. In the same way, to prove Corollary 7.8, it suffices to prove that gi (X0 )(E(gi (Xk )|M0 ) − E(gi (X0 )))1 < ∞ , k≥0
which follows (5.2.5).
7.5.5
Sufficient conditions for triangular arrays
The next condition is the natural extension of Condition (7.5.7). m
E(Xk,n |M0,n ) = 0 . lim lim sup sup X0,n
N →∞ n→∞ N ≤m≤n
(7.5.31)
1
k=N
If (7.5.31) is satisfied, define R(N, X) and N (X) as follows: m
E(Xk,n |M0,n ) , R(N, X) = lim sup sup X0,n n→∞ N ≤m≤n
1
k=N
and N (X) = inf{N > 0 : R(N, X) = 0} (N (X) may be infinite). Proposition 7.9. Let Xi,n and Mi,n be as in Theorem 7.5. Assume that (7.5.31) and S2(b) are satisfied. Assume furhtermore that, for each 0 ≤ k < N (X), there exists an M0,inf -measurable random variable λk such that for any t in ]0, 1], [nt] 1 Xi+k,n Xi,n converges in L1 to λk . (7.5.32) nt i=1 Then Condition S2 holds with η = λ0 + 2
N (X)−1 k=1
λk .
Remark 7.6. Let us have a look to a particular case, for which N (X) = 1. Conditions (7.5.31) and (7.5.32) are satisfied if condition R1. and R2. below are fulfilled
7.5.
APPLICATIONS
R1. lim
n
n→∞
k=1
195
X0,n E(Xk,n |M0,n )1 = 0.
R2. For any t in ]0, 1],
1 nt
[nt] i=1
2 Xi,n converges in L1 to λ.
In the stationary case, these results extend on classical results for triangular arrays of martingale differences (see for instance Hall and Heyde (1980) [100], Theorem 3.2), for which Condition R1. is satisfied. This particular case is sufficient to improve on many results in the context of kernel estimators. ˜ We conclude with a simple result for φ-mixing arrays. Here, the coefficients are defined by ˜ φ˜1 (k) = sup φ(M 0,n , Xk,n ) . n>0
Corollary 7.9. Let Xi,n and Mi,n be as in Theorem 7.5. If there exists two conjugate exponent p ≤ q such that sup X0,n p · X0,n q < ∞
and
n>0
∞
(φ˜1 (k))1/p < ∞ ,
(7.5.33)
k=0
then (7.5.31) holds. If furthermore, for any k > 0 the sequence X0,n Xk,n 1 converges to 0 as n tends to infinity, then N (X) = 1. If furthermore Lindeberg’s condition holds: for any > 0,
2 lim E(X0,n 1|X0,n |>√n ) = 0 ,
n→∞
(7.5.34)
2 2 then S2 holds as soon as E(X0,n ) converges, and η = limn→∞ E(X0,n ).
Proof of Proposition 7.9. Proof of S2(a). Write first S 2 (t) |Sn (t)| E n (1 ∧ √ ) nt n
≤ ≤
2 S 2 (t) E n 1|Sn (t)|>2√n + E(Sn2 (t)) nt nt 2 4 E (|Sn (t)| − )2+ + E(Sn2 (t)) (7.5.35) nt nt
Since (7.5.31) holds, (nt)−1 E(Sn2 (t)) is bounded, so that the second term on right hand is a small as we wish. Consequently, we infer from (7.5.35) that S2(a) holds as soon as, for any positive ,
lim lim sup
t→0 n→∞
1 E (|Sn (t)| − )2+ = 0 . nt
(7.5.36)
In fact, we shall prove that (7.5.36) holds with S n (t) = sups∈[0,t] |Sn (s)| instead of |Sn (t)|. Define Sn∗ (t) = sups∈[0,t] (Sn (s))+ and G(t, , n) by G(t, , n)
196
CHAPTER 7. CENTRAL LIMIT THEOREM
= {Sn∗ (t) > }. From Proposition 5.8, we have, for any positive integer N , [nt] N −1 1 ∗ 1 2 E (Sn (t) − )+ ≤ 8E 1G(t,,n) |Xk,n Xk+i,n | nt nt i=0 k=1
+ 8 sup
m
X0,n E(Xk,n |M0,n ) . (7.5.37)
N ≤m≤[nt]
1
k=N
Since (7.5.31) holds, the second term on right hand is as small as we wish by choosing N large enough. To control the first term, note that by Proposition 5.8 and Markov’s inequality, limt→0 lim supn→∞ P(G(t, , n)) = 0. The result 2 2 follows by noting that 2|Xk,n Xl,n | ≤ Xk,n + Xl,n and by using (7.5.32) for k = 0. Proof of S2(c). For any finite integer 0 ≤ N ≤ N (X), define the variable ηN = λ0 + 2(λ1 + · · · + λN −1 ) and the two sets ΛN ΛN
= [1, [nt]]2 ∩ {(i, j) ∈ Z2 / |i − j| < N } and = [1, [nt]]2 ∩ {(i, j) ∈ Z2 / j − i ≥ N }, so that,
S 2 (t)
1
− ηN M0,n ≤ ηN − Xi,n Xj,n
E n nt nt 1 1 ΛN
2
+
E(Xi,n Xj,n |M0,n ) . (7.5.38) nt 1 ΛN
Hence, it remains to show that lim
lim sup
N →N (X) n→∞
1
E(Xi,n Xj,n |M0,n ) = 0 .
nt 1
(7.5.39)
ΛN
Using first the inclusion M0,n ⊆ Mi,n for any positive i and second the stationarity of the sequence, we obtain succesively
1
E(Xi,n Xj,n |M0,n )
nt 1
≤
ΛN
1
Xi,n E(Xj,n |Mi,n )
nt 1 ΛN
≤
m
sup X0,n E(Xk,n |M0,n ) .
N ≤m≤n
k=N
and (7.5.39) follows from (7.5.31). This completes the proof.
1
7.5.
APPLICATIONS
197
Proof of Corollary 7.9. Using the stationarity and the inequality (5.2.7), we infer that X0,n E(Xk,n |M0,n )1 ≤ 2X0,np X0,n q (φ˜1 (k))1/p ,
(7.5.40)
so that (7.5.31) follows easily from (7.5.33). Moreover, if for each k > 0 the sequence X0,n Xk,n 1 converges to 0, the fact that N (X) = 1 follows from (7.5.40) and the dominated convergence theorem. To prove the last point of this corollary, it suffices, by applying Proposition (7.9), to prove that for any t in ]0, 1], [nt]
1
2 2 )) = 0 .
(Xk,n − E(X0,n n→∞ nt 1
lim
(7.5.41)
k=1
According to the condition (7.5.34), it suffices to prove that lim lim sup Var
→0 n→∞
[nt] 1 2 Xk,n 1|Xk,n |≤√n = 0 . nt
(7.5.42)
k=1
2 Setting Yk,n = Xk,n 1|Xk,n |≤√n , we have the elementary inequality
Var
[nt] n 1 2 Yk,n ≤ |Cov(Y0,n , Yk,n )| . nt nt k=1
(7.5.43)
k=0
Now, bearing in mind the definition of Yk,n and applying (5.2.7), we have successively |Cov(Y0,n , Yk,n )|
2 = |Cov(Y0,n , Xk,n 1|Xk,n |≤√nK )| 2 2 ≤ 2(φ˜1 (k))1/p X0,n 1|X0,n |≤√n p X0,n 1|X0,n |≤√n q
≤ 2n 2 (φ˜1 (k))1/p X0,n p X0,n q , and (7.5.42) follows from (7.5.43) and (7.5.33).
Chapter 8
Donsker Principles In this chapter we give sufficient conditions for the smoothed partial sum process of a sequence (resp. field) of real-valued random variables to converge to a Brownian motion (resp. Brownian sheet). In the non causal case, we show that the conditions on κ(r) and λ(r) (implied by η dependence) given in Theorems 7.1 and 7.2 respectively, are sufficient to obtain the weak invariance principle. In Section 8.2 we give a general result for η dependent random fields having moments of order 4. For the causal case, we present the functional version of the conditional central limit theorem (CCLT) established in Section 7.4. We shall see in Section 8.4 that the sufficient conditions for the CCLT for γ, α ˜ ˜ and φ-dependent sequences, are also sufficient for the conditional invariance principle.
8.1
Non causal stationary sequences
Let (Xi )i∈Z be a stationary sequence of centered and square integrable random variables, and let Un be the Donsker line [nt]
1 nt − [nt] X[nt]+1 . Un (t) = √ Xi + √ n i=1 n In this short section, we show that under the same assumptions as in Theorem 7.1 or Theorem 7.2, the weak invariance principle holds. This follows easily from the control of Sn 2+δ obtained at the end of Section 7.2.1. Theorem 8.1. Assume that (Xi )i∈Z satisfy the assumptions of Theorem 7.1 or the assumptions of Theorem 7.2. Then the process Un converges weakly in (C([0, 1]), · ∞ ) to a Wiener process with variance σ 2 = k∈Z Cov(X0 , Xk ). 199
200
CHAPTER 8. DONSKER PRINCIPLES
Proof. The finite dimensional convergence of the process Un can be proved as in Section 7.2.1. It remains to prove the tightness. Recall that under the assumptions of Theorem 7.1 or of Theorem 7.2, there exist there exists δ > 0 and C > 0 such that: E|Sp |2+δ ≤ Cp1+δ/2 . The tightness follows then from standard arguments, which can be found in many papers. For instance, if λ denotes the Lebesgue measure, we infer from the moment inequality above that (Un , λ) belongs to the class C(1 + δ/2, 2 + δ) defined in Bickel and Wichura (1971) [19] (see the inequality (3) of this paper). The tightness of the process {Un (t), t ∈ [0, 1]} follows by applying Theorem 3 of the above paper. Remark. The same result follows the same lines for the Bernoulli shift models with dependent inputs from theorem 7.3.
8.2
Non causal random fields
Here we establish the Donsker principle for non causal weak dependent sequences and random fields. A block B in [0, 1]d is a subset of [0, 1]d of the form ]s, t] = 7d d p=1 ]sp , tp ]. Let (Xj )j∈Zd be a random field and B be a block in [0, 1] . Let nB = {nx, x ∈ B}, and Sn (B) =
1 nd/2
Xj .
(8.2.1)
j∈nB∩Zd
For any j ∈ Zd , denote by C j the unit with upper corner j, and define the continuous process: Un (B) =
1 nd/2
λ(nB ∩ C j )Xj ,
j∈Zd
where λ denotes the Lebesgue measure on Rd . For t ∈ [0, 1]d , define Sn (t) = Sn ([0, t]) and Un (t) = Un ([0, t]). Theorem 8.2. Let (Xj )j∈Zd be a η-weak dependent stationary centered random field. Assume that E|X0 |4 = M 4 ≤ ∞. If η(r) ≤ cr−a , with a > 3d, then the weakly in (C([0, 1]d ), · ∞ ) to a Brownian sheet with process Un converges
variance σ 2 = k∈Zd Cov(X0 , Xk ).
8.2. NON CAUSAL RANDOM FIELDS
8.2.1
201
Moment inequality
First we establish a bound for the fourth moment of a partial sum: Lemma 8.1. Assume that the assumptions of theorem 8.2 are satisfied, then for any block B in [0, 1]d, there exists C > 0 such that: E(Sn (B)4 ) ≤ Cλ(B)2 Proof. For a finite sequence k = (k1 , . . . , kq ) of elements of Zd , define Πk = 7q i=1 Xki . For any integer q ≥ 1, set: Aq (n) = |E (Πk )| , (8.2.2) k∈(nB∩Zd )q
then
|E(Sn (B))4 | ≤ n−2d A4 (n).
(8.2.3)
The gap of k is defined by the max of the integers r such that the sequence may be split into two non-empty subsequences k1 and k2 ⊂ Zd whose mutual distance equals r (d(k1 , k2 ) = min{i−j1 / i ∈ k1 , j ∈ k2 } = r). If the sequence is constant, its gap is 0. Define the set Gr (q, n) = {k ∈ (nB)q and the gap of k is r}. Sorting the sequences of indices by their gap: Aq (n) ≤
E|Xk |q +
∞
|Cov (Πk1 , Πk2 )|
(8.2.4)
r=1 k∈Gr (q,n)
k∈nB
+
∞
|E (Πk1 ) E (Πk2 )| .
(8.2.5)
r=1 k∈Gr (q,n)
Define Vq (n) as the sum of the right hand side of (8.2.4). We get A4 (n) ≤ V4 (n) + V2 (n)2 . Denote by N the cardinality of nB ∩ Zd . To build a sequence k belonging to Gr (q, n), we first fix one of the N points of nB ∩ Zd . We choose a second point on the 1 -sphere of radius r centered on the first point. The third point is in a ball of radius r centered on one of the preceding points, and so on. We get #Gr (q, 4) ≤
N 2d(2r + 1)d−1 .2(2r + 1)d .3(2r + 1)d ≤ 12dN 33dr3d−1 ,
#Gr (q, 2) ≤
N 2d(2r + 1)d−1 ≤ 2dN 3d−1 rd−1
and V4 (n) ≤ V2 (n) ≤
N M 4 + 12dN 33d N M 2 + 2dN 3d−1
∞ r=1 ∞ r=1
r3d−1 η(r), rd−1 η(r),
202
CHAPTER 8. DONSKER PRINCIPLES
so that
A4 (n) ≤ C(N + N 2 ).
Because N is an integer and |N − nd λ(B)| ≤ 2dN/n, Lemma 8.1 is proved.
8.2.2
Finite dimensional convergence
To prove the convergence of the finite dimensional distributions, we note that it is sufficient to prove it for the finite dimensional distribution of Sn (B), because Un (B) − Sn (B) tends to zero in probability. Considering B and C two disjoint blocks of [0, 1]d , we check that the joint distribution of (Sn (B), Sn (C)) satisfies: lim Cov(Sn (B), Sn (C)) = 0.
n→∞
Denote b− and b+ the lower and upper vertex of block B. If the domains are non − intersecting, for at least one coordinate (say the first), we have (say) b+ 1 ≤ c1 . Then |Cov(Sn (B), Sn (C))| ≤ n−d |Cov(Xi , Xj )| i∈nB j∈nC
≤
n−d
⎛
j∈nC,j1 ≤nβ
i∈nB
+
⎝
|Cov(Xi , Xj )| ⎞
|Cov(Xi , Xj )|⎠
j∈nC,j1 >nβ
⎛
≤
n−d ⎝nβ+d−1
r∈N
=
o(n
β−1
,n
β(d−a)
rd−1 η(r) + nd
⎞ rd−1 η(r)⎠
r>nβ
)
Taking β < 1 gives the result. Now, let ν and μ be two reals, we show that Sn = νSn (B) + μ(Sn (C)) tends to a Gaussian distribution. Write Sn = n−d/2 αi,n Xi i∈{0,...,n}d
where αi,n = ν + μ if i ∈ nB ∩ nC, αi,n = ν if i ∈ nB \ nC, αi,n = μ if i ∈ nC \ nB and αi,n = 0 elsewhere. We use the Bernstein blocking technique, (1939) [13]. Let p(n) and q(n) be sequences of integers such that p(n) = o(n) and q(n) = o(p(n)). Assume that the Euclidean division of n by (p + q) gives a quotient k and a remainder r. Denote j = (j, . . . , j). Define K = {1, . . . , k + 1}d
8.2. NON CAUSAL RANDOM FIELDS
203
and order K by the lexicographic order; for i ∈ {1, . . . , k}d , define the blocks Pi = [(p + q)(i − 1), . . . , (p + q)i − q1]. If r > 0, for i ∈ {0, . . . , 1}d \ {0} define the blocks Pi = [(p + q)(k + i), . . . , (p + q)(k + i) + (r ∨ p)1]. Denote Q the set of indices that are not in one of the Pi . Note that the cardinality of Q is less than d(k + 1)qpd−1 . For each block Pi and Q, we define the partial sums: ui = n−d/2 αj,n Xj , j∈Pi
v
=
n
−d/2
αj,n Xj .
j∈Q
Recall lemma 11 in Doukhan and Louhichi (1999) [67].
Lemma 8.2. Let Sn = n−d/2 j∈{0,...,n}d αj,n Xj be a sum of centered triangular array; set σn2 = Var Sn . Assume that : 1 Ev 2 = 0. σn2 ⎞ ⎞ t ui ⎠ , h uj ⎠ → 0, for all t ∈ R, σn lim
n→∞
⎛ ⎛ Cov ⎝g ⎝ t σn j∈K
(8.2.6)
(8.2.7)
i∈K,i<j
where h and g are either the sine or the cosine function, lim
n→∞
1 E|ui |2 1{|ui |≥σn } = 0, for all > 0, σn2
(8.2.8)
k 1 E|ui |2 = 1. 2 n→∞ σn i=1
(8.2.9)
i∈K
and
lim
Then Sn /σn converges in distribution to a Gaussian N (0, 1)-distribution. First note that
Cov(α0,n X0 , αj,n Xj ) < ∞
(8.2.10)
j∈Nd
so that σn2 tends to a constant. If this constant is zero then the limit of Sn is 0. If it is not, we check the conditions of the preceding lemma for the array αj,n Xj . To check (8.2.6), note that |Cov(Xi , Xj )| ≤ (|ν| + |μ|)2 n−d η(|j − i|) Ev 2 ≤ (|ν| + |μ|)2 n−d i,j∈Q
≤
2(|ν| + |μ|)2
d(k + 1)qp nd
i∈Q j∈Q ∞ d−1 r=0
rd−1 η(r) = o(1).
204
CHAPTER 8. DONSKER PRINCIPLES ⎛
t Consider 8.2.7. Note that g ⎝ σn
⎞ ui ⎠ is a function of at most ((k + 1)p)d
i∈K,i =j
variables Xl and that its Lipschitz modulus is less than t(|ν| + |μ|)/nd/2 σn . Similarly h(tuj /σn ) is a function of at most pd variables and its Lipschitz modulus is less than t(|ν| + |μ|)/nd/2 σn . Using the weak dependence property, we get ⎛ ⎛ ⎞ ⎞ t t(|ν| + |μ|) Cov ⎝g ⎝ t ui ⎠ , h uj ⎠ ≤ pd ((k + 1)d + 1) · η(q). σ σ nd/2 σn n n i∈K,i =j and
⎛ ⎛ Cov ⎝g ⎝ t σn j∈K
⎞ ui ⎠ , h
i∈K,i =j
⎞ t t(|ν| + |μ|) uj ⎠ ≤ pd (k + 1)d+1 · η(q) σn nd/2 σn = O(n3d/2 p−d q −a ).
Taking p = n5/6 and q = n5d/6a gives a bound tending to 0. To prove (8.2.8), it is sufficient to show that E|ui |4 = O(k −2d ). But ⎛
⎞4 ⎞4 ⎛ 4 2d 1 (|ν| + |μ|) ⎠ ≤ p E (Sp ([0, ¯1]))4 , ⎝ E ⎝ d/2 αj,n Xj ⎠ ≤ E X j n2d n2d n j∈P j∈P i
i
and we conclude with the moment inequality 4
E (Sp ([0, ¯ 1])) = O(1). In order to prove (8.2.9), note that (8.2.6) implies that 1 lim 2 Var ui = 1. n→∞ σn i∈K
But
2 ui − E|ui | Var i∈K
i∈K
≤ 2
|Cov(ui , uj )|
i∈K;i =j
≤ 2(k + 1)d
∞
η(j)
j=q
= O(np−d q −a+1 ). Taking p = n5/6 and q = n5d/6a gives a bound tending to 0.
8.3. CONDITIONAL (CAUSAL) INVARIANCE PRINCIPLE
8.2.3
205
Tightness
Recall that λ is the Lebesgue measure on Rd . Applying Lemma 8.1, we infer that (Sn , λ) and (Un , λ) belong to the class C(2, 4) defined in Bickel and Wichura (1971) [19] (see the inequality (3) of this paper). The tightness or relative compactness of the process {Un (t), t ∈ [0, 1]d } follows by applying Theorem 3 of [19].
8.3
Conditional (causal) invariance principle
In this section we give the functional version of Theorem 7.5. Denote by H∗ the space of continuous functions ϕ from (C([0, 1]), · ∞ ) to R such that x → |(1 + x2∞ )−1 ϕ(x)| is bounded. Theorem 8.3. Let Xi,n , Mi,n and Sn (t) be as in Theorem 7.5. For any t in [0, 1], let Un (t) = Sn (t)+(nt−[nt])X[nt]+1,n and define S n (t) = sup0≤s≤t |Sn (s)|. The following statements are equivalent: S1∗ There exists a nonnegative M0,inf -measurable random variable η such that, for any ϕ in H∗ and any positive integer k,
√
S1∗ (ϕ) : lim E ϕ(n−1/2 Un ) − ϕ(x η)W (dx) Mk,n = 0 1
n→∞
where W is the distribution of a standard Wiener process. S2∗ Properties S2(b) and (c) of Theorem 7.5 hold, and (a) is replaced by: (a∗ ) the sequence (n−1 (S n (1))2 )n>0 is uniformly integrable, and lim lim sup E
t→0 n→∞
(S n (t))2 S n (t) 1∧ √ nt n
= 0.
Moreover the random variable η satisfies η = η ◦ T almost surely. Remark 8.1. Let Xi,n , Mi,n and Un be as in Theorem 8.3. Suppose that the sequence (M0,n )n≥1 is nondecreasing. If Condition S1∗ is satisfied, then the sequence (n−1/2 Un )n>0 converges stably with respect to the random probability √ defined by μ(ϕ) = ϕ(x η)W (x)dx. The proof of this result is the same as that of Corollary 7.2. We now give the proof of this theorem. Once again, it suffices to prove that S2∗ implies S1∗ .
206
8.3.1
CHAPTER 8. DONSKER PRINCIPLES
Preliminaries
Definition 8.1. Recall that R(Mk,n ) is the set of Rademacher Mk,n -measurable random variables: R(Mk,n ) = {21A − 1 / A ∈ Mk,n }. For the random variable η introduced in Theorem 8.3 and any bounded random variable Z, let 1. νn∗ [Z] be the image measure of Z · P by the process n−1/2 Un . 2. ν ∗ [Z] be the image measure of W ⊗Z.P # by the variable φ from C([0, 1])⊗Ω to C([0, 1]) defined by φ(x, ω) = x η(ω). We need the functional analogue of Lemma 7.4 (the proof is unchanged). Lemma 8.3. Let μ∗n [Zn ] = νn∗ [Zn ] − ν ∗ [Zn ]. For any ϕ in H∗ the statement S1∗ (ϕ) is equivalent to: S3∗ (ϕ): for any Zn in R(Mk,n ) the sequence μ∗n [Zn ](ϕ) tends to zero as n tends to infinity. Suppose that S1∗ (ϕ) holds for any bounded function ϕ of H∗ . Since the sequence (n−1 (Sn∗ (1))2 )n>0 is uniformly integrable, S1∗ (ϕ) obviously extends to the whole space H∗ . Consequently, we can restrict ourselves to the space of continuous bounded functions from C([0, 1]) to R. According to Lemma 8.3, the proof of Theorem 8.3 will be complete if we show that, for any Zn in R(Mk,n ), the sequence μ∗ [Zn ] converges weakly to the null measure as n tends to infinity. Definition 8.2. For 0 ≤ t1 < · · · < td ≤ 1, define the functions πt1 ...td and Qt1 ...td from C([0, 1]) to Rd by the equalities πt1 ...td (x) = (x(t1 ), . . . , x(td )) and Qt1 ...td (x) = (x(t1 ), x(t2 ) − x(t1 ), . . . , x(td ) − x(td−1 )). For any signed measure μ on (C([0, 1]), B(C([0, 1]))) and any function f from C([0, 1]) to Rd , denote by μf −1 the image measure of μ by f . Let μ and ν be two signed measures on (C([0, 1]), B(C([0, 1]))). Recall that if = νπt−1 for any positive integer d and any d-tuple such that 0 ≤ μπt−1 1 ...td 1 ...td t1 < · · · < td ≤ 1, then μ = ν. Consequently Theorem 8.3 is a straightforward consequence of the two following items 1. relative compactness: for any Zn in R(Mk,n ), the family (μ∗n [Zn ])n>0 is relatively compact with respect to the topology of weak convergence. 2. finite dimensional convergence: for any positive integer d, any d-tuple 0 ≤ t1 < · · · < td ≤ 1 and any Zn in R(Mk,n ) the sequence μ∗ [Zn ]πt−1 1 ...td converges weakly to the null measure as n tends to infinity.
8.3. CONDITIONAL (CAUSAL) INVARIANCE PRINCIPLE
8.3.2
207
Finite dimensional convergence
Clearly it is equivalent to take Qt1 ...td instead of πt1 ...td in item 2. The following lemma shows that finite dimensional convergence is a consequence of Condition S2 of Theorem 7.5. The stronger condition S2∗ is only required for tightness. Lemma 8.4. For any a in Rd define fa from Rd to R by fa (x) =< a, x >. If S2 holds then, for any a in Rd , any d-tuple 0 ≤ t1 < · · · < td ≤ 1 and any Zn in R(Mk,n ), the sequence μ∗n [Zn ](fa ◦ Qt1 ...td )−1 converges weakly to the null measure. Write μ∗n [Zn ](fa ◦ Qt1 ...td )−1 (exp(i·)) = μ∗n [Zn ]Q−1 t1 ...td (exp(i < a, · >)). According to Lemma 8.4, the latter converges to zero as n tends to infinity. Taking Zn = 1, we infer that the probability measure νn∗ [1]Q−1 t1 ...td converges weakly to the probability measure ν ∗ [1]Q−1 and hence is tight. Since |μ∗ [Zn ]Q−1 t1 ...td t1 ...td | ≤ −1 −1 −1 νn∗ [1]Qt1 ...td + ν ∗ [1]Qt1 ...td , the sequence (μ∗ [Zn ]Qt1 ...td )n>0 is tight. Consequently we can apply Lemma 7.3 to conclude that μ∗ [Zn ]Q−1 t1 ...td converges weakly to the null measure. Proof of Lemma 8.4. According to Lemma 8.3, we have to prove the property S1∗ (ϕ ◦ fa ◦ Qt1 ...td ) for any continuous bounded function ϕ. Arguing as in Section 7.4.3, we can restrict ourselves to the class of function B13 (R). Let h be any element of B13 (R) and write h ◦ fa ◦ Qt1 ...td (n−1/2 Un ) − =
d =1
√ h ◦ fa ◦ Qt1 ...td (x η)W (dx)
U (t ) − U (t ) # n n −1 √ − h (a x (t − t−1 )η) g(x) dx , h a n
where the random variable h (x) is equal to
−1
d d U (t ) − U (t ) # n i n i−1 √ h ai ai xi (ti − ti−1 )η g(xi )dxi . +x+ n i=1 i=+1
i=+1
Note that for any ω in Ω, the random function h belongs to B13 (R). To complete the proof of Lemma 8.4, it suffices to see that, for any positive integers k and , the sequence
#
Un (t−1 )
E h al Un (t ) − √ − h a x (t − t−1 )η g(x) dxMk,n
n 1 (8.3.1)
208
CHAPTER 8. DONSKER PRINCIPLES
tends to zero as n tends to infinity. Since h is 1-Lipshitz and bounded, we infer from the asymptotic negligibility of n−1/2 X0,n that
S (t − t ) ◦ T [nt−1 ]+1
Un (t ) − Un (t−1 ) n −1
= 0.
√ √ lim h a − h a
n→∞ n n 1 (8.3.2) Denote by g the random function g = h ◦ T −[nt−1 ]−1 . Combining (8.3.1), (8.3.2) and the fact that Mk−1−[nt−1 ],n ⊆ Mk,n , we infer that it suffices to prove that
√ −1/2
Sn (u)) − g (a x uη) g(x) dxMk,n
lim E g (a n
= 0 . (8.3.3) n→∞
1
Since the random functions g is M0,n -measurable (8.3.3) can be proved exactly as property S1 of Theorem 7.5 (see Section 7.4.3). This completes the proof of Lemma 8.4.
8.3.3
Relative compactness
In this section, we shall prove that the sequence (μ∗n [Zn ])n>0 is relatively compact with respect to the topology of weak convergence. That is, for any increasing function f from N to N, there exists an increasing function g with value in f (N) and a signed mesure μ on (C([0, 1]), B(C([0, 1]))) such that (μ∗g(n) [Zg(n) ])n>0 converges weakly to μ. Let Zn+ (resp. Zn− ) be the positive (resp. negative) part of Zn , and write μ∗n [Zn ] = μ∗n [Zn+ ] − μ∗n [Zn− ] = νn∗ [Zn+ ] − νn∗ [Zn− ] − ν ∗ [Zn+ ] + ν ∗ [Zn− ] , where νn∗ [Z] and ν ∗ [Z] are defined in 1. and 2. of Definition 8.1. Obviously, it is enough to prove that each sequence of finite positive measures (νn∗ [Zn+ ])n>0 , (νn∗ [Zn− ])n>0 , (ν ∗ [Zn+ ])n>0 and (ν ∗ [Zn− ])n>0 is relatively compact. We prove the result for the sequence (νn∗ [Zn+ ])n>0 , the other cases being similar. Let f be any increasing function from N to N. Choose an increasing function l with value in f (N) such that + lim E(Zl(n) ) = lim inf E(Zf+(n) ) .
n→∞
n→∞
We must sort out two cases: + 1. If E(Zl(n) ) converges to zero as n tends to infinity, then, taking g = l, the + ∗ sequence (νg(n) [Zg(n) ])n>0 converges weakly to the null measure. + 2. If E(Zl(n) ) converges to a positive real number as n tends to infinity, we introduce, for n large enough, the probability measure pn defined by pn
8.4. APPLICATIONS
209
+ + ∗ ))−1 νl(n) [Zl(n) ]. Obviously if (pn )n>0 is relatively compact with = (E(Zl(n) respect to the topology of weak convergence, then there exists an increasing function g with value in l(N) (and hence in f (N)) and a measure ν such that + ∗ [Zg(n) ])n>0 converges weakly to ν. Since (pn )n>0 is a family of probabil(νg(n) ity measures, relative compactness is equivalent to tightness. Here we apply Theorem 8.2 in Billingsley (1968) [20]: to derive the tightness of the sequence (pn )n>0 it is enough to show that, for each positive ,
lim lim sup pn (x/w(x, δ) ≥ ) = 0 ,
δ→0 n→∞
(8.3.4)
where w(x, δ) is the modulus of continuity of the function x. According to the definition of pn , we have U 1 l(n) + pn (x/ w(x, δ) ≥ ) = ,δ ≥ . Z ·P w # + E(Zl(n) ) l(n) l(n) + + Since both E(Zl(n) ) converges to a positive number and Zl(n) is bounded by one, we infer that (8.3.4) holds if U l(n) lim lim sup P w # ,δ ≥ = 0. (8.3.5) δ→0 n→∞ l(n)
From Theorem 8.3 and inequality (8.16) in Billingsley (1968) [20], it suffices to prove that, for any positive , S l(n) (δ)
1 ≥ √ = 0. (8.3.6) lim lim sup P # δ→0 n→∞ δ l(n)δ δ We conclude by noting that (8.3.6) follows straightforwardly from S2(a∗ ) and Markov’s inequality. Conclusion. In both cases there exists an increasing function g with value in + ∗ f (N) and a measure ν such that (νg(n) [Zg(n) ])n>0 converges weakly to ν. Since this is true for any increasing function f with value in N, we conclude that the sequence (νn∗ [Zn+ ])n>0 is relatively compact with respect to the topology of weak convergence. Of course, the same arguments apply to the sequences (νn∗ [Zn− ])n>0 , (ν ∗ [Zn+ ])n>0 and (ν ∗ [Zn− ])n>0 , which implies the relative compactness of the sequence (μ∗n [Zn ])n>0 .
8.4 8.4.1
Applications Sufficient conditions for stationary sequences
For strictly stationary sequences, Theorem 8.3 writes as follows.
210
CHAPTER 8. DONSKER PRINCIPLES
Theorem 8.4. Let (Mi )i∈Z and (Xi )i∈Z be as in Theorem 7.6. Define Sn = X1 + · · · + Xn and Un (t) = S[nt] + (nt − [nt])X[nt]+1 . The following statements are equivalent: S1∗ There exists a nonnegative M0 -measurable random variable η such that, for any ϕ in H∗ and any positive integer k,
√
lim E ϕ(n−1/2 Un ) − ϕ(x η)W (dx) Mk = 0 n→∞
1
where W is the distribution of a standard Wiener process. S2∗ Properties S2(b) and (c) of Theorem 7.6 hold, and (a) is replaced by: (a∗ ) the sequence (n−1 (max1≤i≤n |Si |)2 )n>0 is uniformly integrable. Under the conditions of Proposition 7.8, S1∗ holds: Proposition 8.1. Let (Mi )i∈Z and (Xi )i∈Z be as in Theorem 7.6 and define M−∞ = ∩i∈Z Mi . Define the operators Pi as in Corollary 7.4.
1. If E(X0 |M−∞ ) = 0 and i≥0 P0 (Xi )2 < ∞ then S1∗ holds. Moreover, η is the same as in Proposition 7.8. 2. If (7.5.7) is satisfied, then S1∗ holds and η is the same as in Proposition 7.8. Remark 8.2. As in Chapter 7, we deduce from Proposition 8.1 sufficient con˜ 1 and φ˜1 . ditions for the functional CCLT in terms of the coefficients γ1 , γ2 , α More precisely, S1∗ holds if either (7.5.18), (7.5.22), (7.5.26) or (7.5.28) is satisfied. Proof of Proposition 8.1. In view of Corollary 7.5 and Proposition 7.8, it is enough to prove that S2(a∗ ) holds. Proof of item 1. According to Proposition 5.9, for any two sequences of non−1 ) and (b ) such that K = negative numbers (a m m≥0 m m≥0 m≥0 am is finite and
m≥0 bm = 1, we have ∞ n 1 √ 1 ∗ 2 E (Sn − M n)2+ ≤ 4K am E Pk−m (Xk )1Γ(m,n,bm M √n) , n n m=0 k=1 (8.4.1) k where Γ(m, n, λ) = 0 ∨ max P−m (X ) > λ . Here, we take bm = 1≤k≤n =1
−1 am is finite. 2−m−1 and am = (P0 (Ym,i )2 + (m + 1)−2 )−1 . By assumption,
8.4. APPLICATIONS
211
Since for all m ≥ 0 n 1 P0 (Xm )22 2 am E Pk−m (Xk )1Γ(m,n,bm M √n) ≤ ≤ P0 (Xm )2 , n P0 (Xm )2 + (m + 1)2 k=1
we infer from (8.4.1) that for any > 0, there exists N ( ) such that N () n 1 √ 2 1 ∗ 2 E (Sn − M n)+ ≤ + 4K am E Pk−m (Xk )1Γ(m,n,bm M √n) . n n m=0 k=1 (8.4.2) Now by Doob’s maximal inequality
√ 4 nk=1 Pk−m (Xk )22 4P0 (Xm )22 = P Γ(m, n, bm M n) ≤ , 2 2 bm M n b2m M 2
and consequently √ lim sup P Γ(m, n, bm M n) = 0 .
M→∞ n>0
(8.4.3)
n 2 Since n−1 k=1 Pk−m (Xk ) converges in L1 (apply the ergodic theorem), we infer from (8.4.3) that n 1 2 Pk−m (Xk )1Γ(m,n,bm M √n) = 0 . lim lim sup E M→∞ n→∞ n
(8.4.4)
k=1
Combining (8.4.2) and (8.4.4), we conclude that √ 1 lim lim sup E (Sn∗ − M n)2+ = 0 . M→∞ n→∞ n
(8.4.5)
Of course, the same arguments apply to the sequence (−Xk )k∈Z so that (8.4.4) holds for max1≤k≤n |Sk | instead of Sn∗ . This completes the proof. Proof of item 2. Let Ak (λ) = {max1≤i≤k |Si | > λ}. From Proposition 5.8 applied to the sequences (Xi )i∈Z and (−Xi )i∈Z we get that n 2 ≤8 E(Xk2 1Ak (λ) ) + 21Ak (λ) Xk E(Sn − Sk | Fk )1 . E max |Si | − λ 1≤i≤n
+
k=1
(8.4.6) By assumption the sequence (Xk2 )k>0 and the array (Xk E(Sn − Sk | Fk ))1≤k≤n are uniformly integrable. It follows that the L1 -norms of the above random variables are each bounded by some positive constant K. Hence, from (8.4.6) with λ = 0 we get that E((max1≤i≤n |Si |)2 ) ≤ 24Kn. It follows that 2 √ ≤ 24KM −2 . (8.4.7) P(Ak (M n )) ≤ (nM 2 )−1 E max |Si | 1≤i≤n
212
CHAPTER 8. DONSKER PRINCIPLES
From the inequality (8.4.7) and the uniform integrability of both (Xk2 )k>0 and (Xk E(Sn − Sk | Fk ))1≤k≤n we infer that √ 2 = 0. lim lim sup n−1 E max |Si | − M n
M→∞ n→∞
1≤i≤n
+
This completes the proof.
8.4.2
Sufficient conditions for triangular arrays
Under the conditions of Proposition 7.9, S1∗ holds: Proposition 8.2. Let Xi,n and Mi,n be as in Proposition 7.9. If (7.5.31) and (7.5.32) hold, then S1∗ holds with the same η as in Proposition 7.9. Proof of Proposition 8.2. In view of Proposition 7.9, it is enough to prove that S2(a∗ ) holds. In fact, this follows from the inequality (7.5.37). .
Chapter 9
Law of the iterated logarithm (LIL) In this chapter, we derive laws of the iterated logarithm. We first give a bounded law of the iterated logarithm in a non causal setting. We then focus on τ dependent sequences for which we derive a causal strong invariance principle. The main tool to prove it is the Fuk-Nagaev type inequality given in Theorem 5.3 of Chapter 5.
9.1
Bounded LIL under a non causal condition
In this section, we derive a bounded law of the iterated logarithm under a non causal condition detailed in the assumptions of Theorem 4.5 of Chapter 4. We get the following theorem:
Theorem 9.1. Suppose that (Xn )n∈Z is a stationary process
n satisfying the assumptions of Theorem 4.5 of Chapter 4. If σn2 = Var ( i=1 Xi ), assume that σ 2 = limn→∞ σn2 /n > 0. Then we have 1 lim sup √ |Sn | ≤ 1 n→∞ σ 2n log log n
a.s.
(9.1.1)
Proof of Theorem 9.1. Let a > 1. We define the subsequence (nk )k∈Z as nk = [ak ]. We obtain from Theorem 4.5 that, for any nk ≤ n < nk+1 and any 213
214
CHAPTER 9. LAW OF THE ITERATED LOGARITHM (LIL)
fixed c, # |Sn | σ2 n 2 > c log log nk ≤ 2 exp −c log log nk 2 (1 + o(1)) P √ σn 2nσ 2 2 = 2 exp −c log log nk (1 + o(1)) 2 = O k −c (1+o(1)) . This implies by the maximal inequality given in Theorem 2.2 in M´ oricz et al.(1982) [133] that # |Sn | √ P max > c log log nk ≤ C k −c , (9.1.2) nk ≤n 1, 1 lim sup √ |Sn | ≤ c n→∞ σ 2n log log n
a.s.
This implies (9.1.1).
9.2
Causal strong invariance principle
In this section, we present a strong invariance principle for partial sums of τ1,∞ −dependent sequences. Let (Xn )n∈Z be a stationary sequence of zero-mean square integrable real valued random variables. Let Mi = σ(Xj , j ≤ i). Define Sn = X 1 + · · · + X n
and Sn (t) = S[nt] + (nt − [nt])X[nt]+1 .
We assume that σn2 /n = Var (Sn )/n converges to some constant σ 2 as n tends to infinity (this will always be true for any of the conditions we shall use hereafter). For σ > 0, we study the almost sure behavior of the partial sum process * (9.2.1) σ −1 (2n log log n)−1/2 Sn (t) t ∈ [0, 1] . Before stating the main result, let us recall existing results in the i.i.d. case or in other frames of dependence. Let S be the subset of C([0, 1]) consisting of all absolutely continuous functions with respect to the Lebesgue measure such that 1 h(0) = 0 and (h (t))2 dt ≤ 1. 0
9.2. CAUSAL STRONG INVARIANCE PRINCIPLE
215
In 1964, Strassen [180] proved that if the sequence (Xn )n∈Z is i.i.d. then the process defined in (9.2.1) is relatively compact with a.s. limit set S. This result is known as the functional law of the iterated logarithm (FLIL for short). Heyde and Scott (1973) [104] extended the FLIL to the case where E(X1 |M0 ) = 0 and the sequence is ergodic. Starting from this result and from a coboundary decomposition due to Gordin (1969) [97], Heyde (1975) [105] proved that the FLIL holds if E (Sn |M0 ) converges in L2 and the sequence is ergodic. Heyde’s condition holds as soon as γ1 (k)/2 ∞ k Q ◦ G(u) du < ∞, (9.2.2) 0
k=1
where the functions Q = Q|X0 | and G = G|X0 | have been defined in Chapter 5 and γ1 (k) = E (Xk |M0 ) 1 is the coefficient introduced in Section 2.2.4 of Chapter 2. Other types of dependence have been soon considered for the FLIL (see for instance the review paper by Philipp (1986) [150]). For ρ and φ-mixing sequences, a strong invariance principle is given in Shao (1993) [174]. The case of strongly (α-)mixing sequences has been considered by Oodaira and Yoshihara (1971) [137], Dehling and Philipp (1982) [54] , and Bradley (1983) [29] among others. In 1995, Rio [159] proved a FLIL (and even a strong invariance principle) for the process defined in (9.2.1) as soon as the DMR (Doukhan, Massart and Rio (1994) [70]) condition (9.2.3) is satisfied ∞ 2 αX (k) Q2 (u) du < ∞, (9.2.3) k=1
0
where αX (k) has been defined in Section 1.2 of Chapter 1. Considering Corollary 7.6 which gives the central limit theorem for γ-dependent sequences, we think that a reasonable condition for the FLIL is condition (9.2.2) without the k in front of the integral. Actually, we can only prove this conjecture with τ1,∞ (k) instead of γ1 (k), that is the FLIL holds as soon as ∞ k=1
0
τ1,∞ (k)/2
Q ◦ G(u) du < ∞.
(9.2.4)
Theorem 9.2. Let (Xn )n∈Z be a strictly stationary sequence of centered and square integrable random variables satisfying (9.2.4). Then σn2 /n converges to σ 2 , and there exists a sequence (Yn )n∈N of independent N (0, σ 2 )-distributed random variables (possibly degenerate) such that n i=1
(Xi − Yi ) = o
# n log log n a.s.
216
CHAPTER 9. LAW OF THE ITERATED LOGARITHM (LIL)
Such a result is known as a strong invariance principle. If σ > 0, Theorem 9.2 and Strassen’s FLIL for the Brownian motion yield the FLIL for the process (9.2.1). As in Corollary 7.6, we obtain simple sufficient conditions for the FLIL to hold: Corollary 9.1. Let (Xn )n∈Z be a strictly stationary sequence of centered and square integrable random variables. Any of the following conditions implies (9.2.4) and hence the FLIL.
(r−2)/(r−1) 1. P(|X0 | > x) ≤ (c/x)r for some r > 2, and i≥0 (τ1,∞ (i)) < ∞.
2. X0 r < ∞ for some r > 2, and i≥0 i1/(r−2) τ1,∞ (i) < ∞. 3. E(|X0 |2 log(1 + |X0 |)) < ∞ and τ1,∞ (i) = O(ai ) for some a < 1. Condition (9.2.4) is essentially optimal as shown in Corollary 9.2 below, derived from the examples given in Doukhan, Massart and Rio (1994) [70]: Corollary 9.2. For any r > 2, there is stationary Markov chain (Xn )n∈Z such that 1. E(X0 ) = 0 and, for any nonnegative real x, P(|X0 | > x) = min(1, x−r ). 2. The sequence (τ1,∞ (i))i≥0 satisfies supi≥0 i(r−1)/(r−2) τ1,∞ (i) < ∞. 3. lim sup (n log log n)−1/2 |Sn | = +∞ almost surely. n→∞
This corollary follows easily from Proposition 3 in Doukhan, Massart and Rio (1994) [69]. Let us now write the proof of Theorem 9.2. Proof of Theorem 9.2. We first need to give some precise notations. Notations 9.1. Define the set * √ ψ(n) Ψ = ψ N → N, ψ increasing, →n→∞ ∞, ψ(n) = o(n LLn ) . n If ψ is some function of Ψ, let M1 = 0 and Mn =
n−1
(ψ(k) + k),
for n ≥ 2 .
k=1
For n ≥ 1, define the random variables Mn +ψ(n)
Un =
i=Mn +1
Mn+1
Xi ,
Vn =
i=Mn+1 +1−n
Xi ,
and
9.2. CAUSAL STRONG INVARIANCE PRINCIPLE
217
Mn+1
Un =
|Xi | .
i=Mn +1
If Lx = max(1, log x), define the truncated random variables n −n ,√ U n = max min Un , √ . LLn LLn Theorem 9.2 is a consequence of the following Proposition Proposition 9.1. Let (Xn )n∈Z be a strictly stationary sequence of centered and square integrable random variables satisfying condition (9.2.4). Then σn2 /n a function ψ ∈ Ψ and a sequence (Wn )n∈N converges to σ 2 and there exist of independent N 0, ψ(n)σ 2 -distributed random variables (possibly degenerate) such that n # (a) (Wi − U i ) = o Mn LLn a.s. i=1 ∞ E(|Un − U n |) √ <∞ n LLn n=1 √ (c) Un = o n LLn a.s.
(b)
Proof of Proposition 9.1. It is adapted from the proof of Proposition 2 in Rio (1995) [159]. Proof of (b). Note first that n |Un | − √ E|Un − U n | = E so that LLn + +∞ P(|Un | > t)dt . E|Un − U n | =
(9.2.5)
√n LLn
In the following we write Q instead of Q|X0 | . Since Un is distributed as Sψ(n) , we infer from Theorem 5.3 that t − r2 t2 20 ψ(n) S ( 5 r ) + Q(u)du. (9.2.6) P(|Un | > t) ≤ 4 1 + 25 r s2ψ(n) t 0 Consider the two terms A1,n =
4 √ n LLn
+∞
1+
√n LLn
− r2 t2 dt , 25 r s2ψ(n)
218
CHAPTER 9. LAW OF THE ITERATED LOGARITHM (LIL) 20 ψ(n) A2,n = √ n LLn
+∞ √n LLn
1 t
S ( 5tr ) Q(u)du dt . 0
From (9.2.5) and (9.2.6), we infer that E|Un − U n | √ ≤ A1,n + A2,n . n LLn
(9.2.7)
Study of A1,n . Since the sequence (Xn )n∈N satisfies (9.2.4), s2ψ(n) /ψ(n) converges to some positive constant. Let Cr denote some constant depending only on r which may vary from line to line. We have that A1,n
4 ≤ √ n LLn
+∞ √n LLn
t−r n−r r r . −r dt ≤ Cr sψ(n) Cr sψ(n) LLn1− 2
We infer that A1,n = O(ψ(n) n−r LLn(r−2)/2 ) as n tends to infinity. Since ψ ∈ Ψ and r > 2, we infer that n≥1 A1,n is finite. r/2
Study of A2,n . We use the elementary result: if (ai )i≥1 is a sequence of positive numbers, then there exists a sequence of positive numbers (bi )i≥1 such that bi →
∞ −1 ∞ and i≥1 ai bi < ∞ if and only if i≥1 ai < ∞ (note that b2n = ( i=n ai ) works). Consequently n≥1 A2,n is finite for some ψ ∈ Ψ if and only if
1 √ LLn n≥1
+∞ √n LLn
S ( 5tr ) Q(u)du dt < +∞.
1 t
(9.2.8)
0
Recall that S = R−1 , with the notations of Theorem 5.3. To prove (9.2.8), write
+∞ √n LLn
1 t
S ( 5tr )
Q(u)du dt
+∞
=
0
√n LLn
1 t
1
1
1R(u)≥ 5tr Q(u)du dt
0
5 r R(u)
Q(u)
= 0
√ n LLn
1
=
Q(u) log 0
1 dtdu t
5 r R(u) √n LLn
1R(u)≥
5r
n √ LLn
du.
Consequently (9.2.8) holds if and only if
1
Q(u) 0
5 r R(u) 1 √ log 1 R(u)≥ √n du < +∞. √n LLn 5 r LLn LLn n≥1
(9.2.9)
9.2. CAUSAL STRONG INVARIANCE PRINCIPLE
219
To see that (9.2.9) holds, we shall prove the following result: if f is any increasing function such that f (0) = 0 and f (1) = 1, then for any positive R we have that R log (9.2.10) (f (n) − f (n − 1)) 1f (n)≤R ≤ (R − 1) ∨ 0 ≤ R . f (n) n≥1
Applying this result to f (x) = x(LLx)−1/2 and R = 5rR(u), and noting that (LLn)−1/2 ≤ C (f (n) − f (n − 1)) for some constant C > 1, we infer that 1 1 1 5 r R(u) n √ log Q(u) 1(R(u)≥ √ Q(u)R(u)du , ) du ≤ 5Cr 5 r LLn √n LLn 0 0 LLn n≥1 which is finite as soon as (9.2.4) holds. It remains to prove (9.2.10). If R ≤ 1 the result is clear. Now, for R > 1, let xR be the largest integer such that f (xR ) ≤ R and write R∗ = f (xR ). Note first that (log R) (f (n) − f (n − 1)) 1f (n)≤R ≤ R∗ log R. (9.2.11) n≥1
On the other hand, we have that
log (f (n)) (f (n) − f (n − 1)) 1f (n)≤R =
xR
log (f (n)) (f (n) − f (n − 1)) .
n=1
n≥1
It follows that
log (f (n)) (f (n) − f (n − 1)) ≥
n≥1
1
R∗
log x dx = R∗ log R∗ − R∗ + 1. (9.2.12)
Using (9.2.11) and (9.2.12) we get that R log (f (n) − f (n − 1)) 1f (n)≤R ≤ R∗ − 1 + R∗ (log R − log R∗ ). f (n) n≥1
(9.2.13) Using Taylor’s inequality, we have that R∗ (log R−log R∗ ) ≤ R−R∗ and (9.2.10) follows. The proof of (b) is complete.
Mn+1 (|Xi | − E|Xi |) . We easily see that Proof of (c). Let Tn = i=M n +1 Un = (ψ(n) + n) E|X1 | + Tn . √ By definition of Ψ, we have ψ(n) = o n LLn . Here note that n n Tn ≤ √ + 0 ∨ Tn − √ . LLn LLn
(9.2.14)
(9.2.15)
220
CHAPTER 9. LAW OF THE ITERATED LOGARITHM (LIL)
Using same arguments as for the proof of (b), we obtain that n E 0 ∨ Tn − √LLn √ < +∞ , so that n LLn n≥1 n 0 ∨ Tn − √LLn √ < +∞ a.s. n LLn n≥1 √ Consequently max(0, Tn − n(LLn)−1/2 ) = o(n LLn) almost surely, and the result follows from (9.2.14) and (9.2.15). Proof of (a). In the following, (δn )n≥1 and (ηn )n≥1 denote independent sequences of independent random variables with uniform distribution over [0, 1], independent of (Xn )n≥1 . Since U n is a 1-Lipschitz function of Ui , τ (σ(Ui , i ≤ n − 1), U n ) ≤ ψ(n)τ (n). Using Lemma 5.2 and arguing as in the proof of The∗ orem 5.2, we get the existence of a sequence (U n )n≥1 of independent random ∗ variables with the same distribution as the random variables U n such that U n is a measurable function of U l , δl l≤n and ∗ E |U n − U n | ≤ ψ(n) τ (n). Since (9.2.4) holds, we have that ∗ E U n − U n √ < ∞ so that Mn LLn n≥1
|U n − U ∗ | n √ < +∞ a.s. M LLn n n≥1
Applying Kronecker’s lemma, we obtain that n
∗
(U i − U i ) = o
# Mn LLn a.s.
(9.2.16)
i=1
We infer from (9.2.4) and from Dedecker and Doukhan (2003) [43] that (ψ(n))
−1
Var Un →n→∞ σ 2
and
(ψ(n))
−1/2
D Un −−−−→ N 0, σ 2 . n→∞
Hence the sequence (Un2 /ψ(n))n≥1 is uniformly integrable (Theorem 5.4. in ∗ Billingsley (1968) [20]). Consequently, since the random variables U n have the same distribution as the random variables U n , we deduce from the above limit results, from Strassen’s representation theorem (see Dudley (1968) [79]), and from Skorohod’s lemma (1976) [179] that one can construct some sequence
9.2. CAUSAL STRONG INVARIANCE PRINCIPLE
221
∗
(Wn )n≥1 of σ(U n , ηn )-measurable random variables with respective distribution N 0, ψ(n) σ 2 such that 2 ∗ E U n − Wn = o (ψ(n)) as n → +∞, (9.2.17) which is exactly equation (5.17) of the proof of Proposition 2(c) in Rio (1995) [159]. The end of the proof is the same as that of Rio. Proof of Theorem 9.2. By Skohorod’s lemma (1976) [179], there exists a se quence (Yi )i≥1 of independent N 0, σ 2 -distributed random variables satisfying
Mn +ψ(n) Wn = i=M Yi for all positive n. Define the random variable n +1
Mn+1
Vn =
Yi .
i=Mn+1 +1−n
Let n(k) := √ sup {n ≥ 0/ Mn ≤ k}, and note that by definition of Mn we have n(k) = o( k). Applying Proposition 9.1(c) we see that n(k) k √
Xi − (Ui + Vi ) ≤ Un(k) =o k LLk a.s. i=1
(9.2.18)
i=1
From (5.26) in Rio (1995) [159], we infer that
n(k)
√ Vi = o k LLk a.s.
n(k)
and
i=1
Vi = o
√ k LLk a.s.
(9.2.19)
i=1
Gathering (9.2.18), (9.2.19) and Proposition 9.1(a) and (b), we obtain that k
n(k)
Xi −
i=1
(Wi + Vi ) = o
√
k LLk
a.s.
(9.2.20)
i=1
k
n(k) Clearly i=1 Yi − i=1 (Wi + Vi ) is normally distributed with variance smaller √ than ψ(n(k)) + n(k). Since n(k) = o( k) we have that ψ(n(k)) + n(k) = √ o( kLLk) by definition of ψ. An elementary calculation on Gaussian random variables shows that k i=1
n(k)
Yi −
(Wi + Vi ) = o
√ k LLk a.s.
i=1
Theorem 9.2 follows from (9.2.20) and (9.2.21).
(9.2.21)
Chapter 10
The Empirical process In this chapter, we prove central limit theorems for the empirical distribution function of weakly dependent stationary sequences (or fields). Except in the last section (Section 10.6), where the oscillations of the empirical distribution of weakly dependent random fields are studied, all the results are mainly based on the tightness criterion given in Proposition 4.2. In Section 10.1, we give a sufficient condition for the tightness, based on the control of the covariances between indicators of half lines. In Section 10.2 we prove an empirical central limit theorem for η-dependent sequences by assuming an exponential decay of the coefficients. In Sections 10.3 and 10.4, we give sufficient conditions in terms ˜ φ, ˜ θ and τ . In Section 10.5 we present the applications of the coefficients α, ˜ β, of such results to the empirical copula process. Definition 10.1 (Empirical process). We recall the definition of the empirical process for the different cases considered: • Stationary real valued or multivariate sequence: Let (Xn )n∈Z be a sequence of Rd valued random variables. Define in Rd the partial order by s ≤ t if and only if si ≤ ti for i = 1, . . . , d. The empirical process of X is defined by n 1 1{Xi ≤t} . (10.0.1) Fn (t) = n i=1 • Stationary real valued random field: For N in N, let BN be the closed ball of radius N for the ∞ -norm on T = Zd and n = #BN = (2N + 1)d . Let (Xt )t∈T be a real valued random field. We define the empirical process Fn (t) =
1 1{Xk ≤t} . n k∈BN
223
(10.0.2)
224
CHAPTER 10. THE EMPIRICAL PROCESS
In any case, if the variables are identically distributed with common distribution function F , we define the normalized empirical process by Un (t) =
√ n (Fn (t) − F (t)) .
We need to define the limit processes in the following central limit theorems. (B(t))t∈[0,1]d is a zero-mean Gaussian process with covariance function Γ(t, s) =
Cov(1X0 ≤t , 1Xk ≤s ).
(10.0.3)
k∈T
where T = Z in the case of random sequences and T = Zd in the case of random fields.
10.1
A simple condition for the tightness
We consider a stationary real valued sequence (Xn )n∈Z with continuous common repartition function F . We assume without loss of generality that the marginal distribution of this sequence is the uniform law on [0, 1]. Assume that the sequence (Xn )n∈Z satisfies the following weak dependence condition: Let F = {x → 1s<x≤t / for s, t ∈ [0, 1]}. We assume that for any m ∈ {1, 2, 3} and any 0 ≤ t1 ≤ t2 ≤ t3 ≤ t4 , m 4 sup Cov f (Xti ), f (Xti ) ≤ ε(r), f ∈F i=1
(10.1.1)
i=m+1
where r = tm+1 − tm and ε(r) does only depend on r (in this case a weak dependence condition holds for a class of functions Ru → R working only with the values u = 1, 2 or 3). Proposition 10.1. Let (Xn )n∈Z be a real valued stationary sequence fulfilling (10.1.1) with ε(r) = O(r−5/2−ν ), for some ν > 0.
(10.1.2)
Then the process Un is tight in D([0, 1]). Proof of Proposition 10.1. The moment inequality (4.3.17), together with conditions (10.1.1) and (10.1.2), yields the existence of a positive constant C such
10.2. η-DEPENDENT SEQUENCES
225
that for any s, t in [0, 1] Un (t) − Un (s)4
≤ ≤
1/2 1 n−1 1/4 r−a ∧ |t − s| + (r + 1)2 ε(r) n r=0 r=0 1/2 C r−a
C
n−1
r≥|t−s|−1/a
1/2 2−a |t − s| +n 4
+
r<|t−s|−1/a
≤
C{|t − s|
a−1 2a
+n
2−a 4
}.
The last bound together with the tightness criterion given in Proposition 4.2 proves that the sequence {Un (t), t ∈ [0, 1]} is tight. Remark 10.1. Stationary associated sequences (see Section 1.4) satisfy the requirement of Proposition 10.1 if sup sup Cov(1X0 >x , 1Xk >y ) = O(r−5/2−ν ).
|k|≥r x,y∈R
Using the following inequality : sup sup Cov(1X0 >x , 1Xk >y ) ≤ C sup Cov1/3 (X0 , Xk ),
|k|≥r x,y∈R
|k|≥r
for an universal constant C, Yu (1993) [195] proves the tightness under the condition Cov(X0 , Xr ) = O(r−a ), for a > 15/2. However in this case, the paper by Louhichi (2000) [124] proves the tightness under the condition Cov(X0 , Xr ) = O(r−a ), for a > 4.
10.2
η-dependent sequences
In this section, Y is a stationary η-dependent sequence in [0, 1]d with uniform marginal distributions. Theorem 10.1. Assume that (Yi )i∈Z is a stationary η-dependent zero-mean process in [0, 1]d with uniform marginal distributions. Assume that there exist some constants C > 0 and a > 4d + 2 such that η(r) ≤ Cr−a . Then the process Un converges in distribution in D([0, 1]d ) to the Gaussian process B. The following lemma is used to prove Theorem 10.1. Lemma 10.1. Assume that (Yi )i∈Zp is a stationary multivariate random field with value in [0, 1]d and that the density of marginal distribution of the vectors
226
CHAPTER 10. THE EMPIRICAL PROCESS
Yi are bounded by CY . For s ≤ t in [0, 1]d , denote gt,s (x) = 1{x ≤ t} − 1{x ≤ s}. Let i = (i1 , . . . , iu ) in (Zp )u and j = (j1 , . . . , jv ) in (Zp )v be two sets of indices that are r-distant in L1 -distance. Let G and H be two bounded Lipschitz functions on Ru and Rv respectively. Denote Yi = (Yi1 , . . . , Yiu ). Then |Cov (G(gt,s (Yi )), H(gt,s (Yj )))| ≤ ψ(G, H) r ,
(10.2.1)
where, setting φ(G, H) = dG H∞ Lip (G), we define 1/2
• if Y is η-dependent, r = ηr
ψ(G, H) = 4(CY d)1/2 (φ(G, H) + φ(H, G)),
(10.2.2)
1/3
• if Y is κ-dependent, r = κr
ψ(G, H) = 2(4(CY d))2/3 (φ(G, H) + φ(H, G))2/3 (φ(G, H)φ(H, G))1/3 (10.2.3) Proof of lemma 10.1 For δ ≥ 0, define the δ-approximations of 1{x≥t} by: hδ,t (x) =
d (x(p) − t(p) + δ) p=1
δ
1{t(p) −δ<x(p)
Define gδ,t,s = hδ,t − hδ,s . Then its Lipschitz modulus is equal to δ −1 , where
the distance in Rd is d1 (x, y) = dp=1 |x(p) − y (p) | and E |gs,t (Y0 ) − gt,s,δ (Y0 )| ≤ 2dCY δ because the density of the variable Yi is bounded by CY and the two functions are equal except on 2d blocks of width δ. Define G0 (Yi ) = G (gt,s (Yi ) and Gδ (Yi ) = G (gt,s,δ (Yi )). |Cov(G0 (Yi ), H0 (Yj )) − Cov(Gδ (Yi ), Hδ (Yj ))| ≤ |E (G0 (Yi )H0 (Yj )) − E (Gδ (Yi )Hδ (Yj ))| + |E (G0 (Yi )) E (H0 (Yj )) − E (Gδ (Yi )) E (Hδ (Yj ))| After substitution of the variables one by one, the first term of the right hand side is bounded by: (uH∞ Lip (G) + vG∞ Lip (H))E |gt,s (Y0 ) − gt,s,δ (Y0 )| . so that: |E (G0 (Yi )H0 (Yj )) − E (Gδ (Yi )Hδ (Yj ))| ≤ 2CY dδ(φ(G, H) + φ(H, G)). The bound of the second term is the same. Now if Y is η-dependent: |Cov (Gδ (Yi ), Hδ (Yj )) | ≤ (φ(G, H) + φ(H, G))
ηr . δ
10.2. η-DEPENDENT SEQUENCES Hence
227
ηr |Cov (G0 (Yi ), H0 (Yj ) | ≤ (φ(G, H) + φ(H, G)) 4CY dδ + δ
If Y is κ-dependent: |Cov (Gδ (Yi ), Hδ (Yj )) | ≤ (φ(G, H)φ(H, G))
κr . δ2
Hence |Cov (G0 (Yi ), H0 (Yj )) | ≤ 4CY d(φ(G, H) + φ(H, G))δ + φ(G, H)φ(H, G)
κr δ2
Choosing the optimal δ, relation (10.2.1) is proved. We apply this lemma for the case of products of indicator functions, namely for s ≤ t in Rd , for any k multi-index of Zu , we define Πk =
u
gt,s (Ykj ) − F (t) + F (s).
j=1
We note 7uthat Πk = G(gt,s (Yk )) where the function G is defined by G(x1 , . . . xu ) = j=1 (xj − c) with c = F (t) − F (s). Here G∞ = Lip (G) = 1. Corollary 10.1. Assume that (Yi )i∈Z is a stationary η-dependent zero-mean process in [0, 1]d with uniform marginal distributions, then for any sequences i = (i1 , . . . , iu ) and j = (j1 , . . . , jv ) such that r ≤ j1 − iu : (10.2.4) Cov Πi , Πj ≤ (u + v) (r) , # with (r) = 4 dη(r). Next we prove a Rosenthal type inequality. Proposition 10.2. Assume that Y is a stationary η-dependent zero-mean process in [0, 1]d with uniform marginal distributions. Assume moreover that condition (10.2.4) is satisfied with (r) = Cr−a . Then, for l < (a + 1)/2 and (s, t) such that E|x0 (s, t)| < C, we have 2l
⎛
2 (4l − 2)!3 ⎝ E(Un (t) − Un (s)) ≤ kl (2l − 1)! 3 2l
n1−l kl + (2l)! 3 where kl = C +
C2a a−2l+1
.
E|x0 (s, t)| C
E|x0 (s, t)| C
1−1/a l
1+(1−2l)/a , (10.2.5)
228
CHAPTER 10. THE EMPIRICAL PROCESS
Proof of Proposition 10.2. Let s ≤ t be in Rd . As |xi (s, t)| ≤ 1, we get for any sequences i ∈ Zu , j ∈ Zv , (10.2.6) Cov Πi , Πj ≤ 2 E|x0 (s, t)|. For any integer q ≥ 1, set
Aq (n) =
E Π , k
(10.2.7)
k∈{1,...,n}q then
E(Un (s) − Un (t))2l ≤ (2l)!n−l A2l (n).
(10.2.8)
Let q ≥ 2. For a finite sequence k = (k1 , . . . , kq ) of elements of Z, let (k(1) , . . . , k(q) ) be the same sequence ordered from the smaller to the larger. The gap r(k) in the sequence is defined as the maximum of the integers k(i+1) − k(i) , i = 1, . . . , q − 1. Choose any index j < q such that k(j+1) − k(j) = r, and define the two nonempty subsequences k1 = (k(1) , . . . , k(j) ) and k2 = (k(j+1) , . . . , k(q) ). Define Gr (q, n) = {k ∈ {1, . . . , n}q / r(k) = r}. Sorting the sequences of indices by their gaps, we get Aq (n) ≤ +
+
n k=1 n−1
E|x0 (s, t)|q Cov Πk1 , Πk2
r=1
k∈Gr (q,n)
n−1
r=1
k∈Gr (q,n)
E Πk1 E Πk2 .
(10.2.9)
(10.2.10)
Define Bq (n) =
n k=1
E|x0 (s, t)|q +
n−1
r=1
k∈Gr (q,n)
Cov Πk1 , Πk2 .
In
order to prove that the expression (10.2.10) is bounded by the product 1 m Am (n)Aq−m (n) we make a first summation over the k’s with #k = m. Hence q−1 Aq (n) ≤ Bq (n) + Am (n)Aq−m (n). (10.2.11) m=1
Now we give a bound of Bq (n). To build a sequence k belonging to Gr (q, n), we first fix one of the n points of {1, . . . , n}. We choose a second point among
10.2. η-DEPENDENT SEQUENCES
229
the two points that are at distance r from the first point. The i-th point lies in an interval of radius r centered at one of the i − 1 preceding points. Thus for r ∈ N∗ , we have #Gr (q, n) ≤ n · 2 · 2(2r + 1) · · · (q − 1)(2r + 1) ≤ 2n(q − 1)!(3r)q−2 . We use condition (10.2.4) and condition (10.2.6) to deduce: Bq (n) ≤
n E|x0 (s, t)| + 2n(q − 1)!
n−1
(3r)q−2 min(q (r), 2 E|x0 (s, t)|)
r=1
≤
n E|x0 (s, t)| + n 3
q−1
q!
n−1
r
q−2
min( (r), E|x0 (s, t)|)
.
r=1
Denote by R the integer such that R < (E|x0 (s, t)|/C)−1/a ≤ R + 1. For any 2 ≤ q ≤ 2l: R−1 ∞ Bq (n) ≤ n E|x0 (s, t)| + 3q−1 nq! E|x0 (s, t)| rq−2 + C rq−2−a
≤ ≤
r=1
r=R
E|x0 (s, t)| q−1 C 3 nq! R Rq−1−a + q−1 a−q+1 E|x0 (s, t)| C + R−a . 3q−1 nq!(E|x0 (s, t)|/C)−(q−1)/a q−1 a−q+1 q−1
But R ≥ 1, so that (E|x0 (s, t)|/C)−1/a ≤ 2R, and
Bq (n) ≤ 3q−1 nq!(E|x0 (s, t)|/C)1−(q−1)/a C +
C2a a − 2l + 1
.
We find that:
−1/a q 1+1/a E|x0 (s, t)| n kl E|x0 (s, t)| Bq (n) ≤ 3 q! C 3 C
(10.2.12)
so Bq (n) is bounded by a function M q Vq that satisfies condition (4.3.24) and gives (4l − 2)! E|x0 (s, t)| −1/a 2l 3 ( A2l (n) ≤ (2l)!(2l − 1)! C l 1+1/a 2 E|x0 (s, t)| n kl E|x0 (s, t)| 1+1/a n kl + (2l)! 3 C 3 C (4l − 2)!32l n kl E|x0 (s, t)| 1−1/a l 2 ≤ (2l)!(2l − 1)! 3 C n kl E|x0 (s, t)| 1+(1−2l)/a , +(2l)! 3 C
230
CHAPTER 10. THE EMPIRICAL PROCESS
and (10.2.5) is proved. Proof of theorem 10.1. CLT for the finite dimensional distributions of Un . Let (s1 , . . . , sm ) be a fixed sequence of elements in [0, 1]d. Denote Bn the vector-valued process Bn = (Un (s1 ), . . . , Un (sm )). To prove a CLT for the vector Bn is equivalent to prove the Gaussian convergence for any linear
mcombination of its coordinates. Let (α1 , . . . , αm ) be a real vector such that j=1 αj = 0.
Define Zi = m j=1 αj (1{Yi ≤ sj } − P (Yi ≤ sj )). Define also 1 Zi = αj Un (sj ). Sn = √ n 1≤i≤n
1≤j≤m
We use the Bernstein blocking technique, as in chapter 8. Let p(n) and q(n) be sequences of integers such that p(n) = o(n) and q(n) = o(p(n)). Assume that the Euclidean division of n by (p + q) gives a quotient k and a remainder r. For i = 1, . . . , k, we define the interval Pi = {(p + q)(i − 1) + 1, . . . , (p + q)i − q} and if r = 0, Pk+1 = {(p + q)k + 1, . . . , (p + q)k + r ∨ p}. Q the set of indices that are not in one of the Pi . Note that the cardinal of Q is less than (k + 1)q. For each block Pi (1 ≤ i ≤ k + 1) and Q, we define the partial sums: 1 ui = √ Zj , n j∈Pi
1 v= √ Zj . n j∈Q
We use lemma 8.2. We check the conditions for the sequence Zj . To check n−1 n−1 1 (k + 1)q (8.2.6), note that σn2 ≤
(r) and that Ev 2 ≤
(r). n r=0 n r=0 √ Let us check (8.2.7). Using (10.2.4) with Lip G = Lip H = t max αj / nσn , dG = mpk and dH = mp, we get
j−1 t t t j αj ui , h uj ≤ mp (k + 1)
(q). Cov g σn i=1 σn σn and
j−1 t t t j αj 2 ui , h uj ≤ mp (k+1)
(q) = O(n3/2 p−1 q −a ). Cov g σ σ σ n n n j=2 i=1
k+1
˜ 10.3. α ˜ , β˜ AND φ-DEPENDENT SEQUENCES
231
Taking p = n5/6 and q = n5/6a gives a bound tending to 0. To prove (8.2.8), it is sufficient to show that E|ui |4 = O(k −2 ). But ⎞4 ⎛ 4 m m p2 p2 4 ⎠ ⎝ Zj = 2 αi Bp (si ) ≤ 2 m3 α4i (Bp (si ) − Bp (0)) , n n i=1 i=1 j∈Pi
and we conclude by applying Proposition 10.2 for l = 2 to the couples (0, si ). In order to prove (8.2.9), note that (8.2.6) implies that k+1 1 ui = 1. lim 2 Var n→∞ σn i=1 But k+1 k+1 2 ui − E|ui | Var i=1
≤ 2
|Cov(ui , uj )|
1≤i =j≤k
i=1
∞
≤
4(k + 1)p
(j) = O(q −a+1 ) = o(1). n j=q
Taking p = n5/6 and q = n5/6a gives a bound tending to 0. Tightness of Un . We use the criteria of Proposition 4.2. Define F = {gs,t /gs,t (x) = 1{x ≤ t} − 1{x ≤ s} − F (t) + F (s); s, t ∈ [0, 1]d} . By definition Un (t) − Un (s) = Zn (gs,t ) and gs,t PY ,1 = E|x0 (s, t)|. Recalling that a > 2d + 1, from the Rosenthal inequality (10.2.5) for l = d + 1 we get 1/r
Zn (gs,t )p ≤ C(gs,t PY ,1 + n1/q−1/2 ) , with p = q = 2l = 2d + 2, r = 2a/(a − 1). For the class of functions considered, the covering number NPY ,1 (x, F )) is O(x−d ) so that 1 1 1 a+1 d x(1−r)/r (NPY ,1 (x, F ))1/p dx ≤ C x− 2 ( a + d+1 ) dx. 0
0
Because a > d + 1, the exponent is greater than −1, and the integral is finite. As p = q = 2d + 2, the last condition holds directly.
10.3
˜ α, ˜ β˜ and φ-dependent sequences
Consider the three following conditions: ˜ 2 (k) ˜ 2 (k) = O k −7/3−ε if d = 1, and α (C1 ) There exists ε > 0 such that α
232
CHAPTER 10. THE EMPIRICAL PROCESS
= O k −2d−ε if d > 1. (C2 ) There exists ε > 0 such that (C3 ) There exists ε > 0 such that
β˜2 (k) = O k −2d−ε . −1−ε φ˜2 (k) = O k .
Theorem 10.2. If one of the conditions (Ci ), i = 1, . . . , 3 holds, then the process Un converges in distribution in D(Rd ) to the Gaussian process B. Remark 10.2. For d = 1, the condition (C1 ) is better than the condition √ αX (k) = O(k −1− 2− ) given in Shao and Yu (1996) [175] for strongly mixing sequences (recall that αX (k) has been defined in Section 1.2 of Chapter 1). In fact, in Theorem 7.3 of his book, Rio (2000) [161] has shown that the rate αX (k) = O(k −1− ) is sufficient for the weak convergence of the d-dimensional distribution function. Proof of Theorem 10.2. We keep the notations of Chapter 4. The finite dimensional convergence of Un can be proved as before. Let us prove the tightness of Un . Let F = {x → 1x≤t , t ∈ Rd }, and let G = {f − h , f, h ∈ F } . We have to prove that the process {Zn (f ), f ∈ F } is asymptotically tight, that is there exists a semi metric ρ on F such that (F , ρ) is totally bounded, and, for every
> 0, sup |Zn (f ) − Zn (g)| > = 0 . (10.3.1) lim lim sup P δ→0 n→∞
ρ(f,g)≤δ, f,g∈F
Since NQ,1 (x, F ) = O(x−d ) for any finite measure Q on Rd , the set (F , · Q,1 ) is totally bounded. Consequently, the property (10.3.1) follows from (4.5.2) by applying Markov’s inequality. Let us prove that condition (C1 ) implies (4.5.2). For any s, t in Rd , let fs,t (x) = 1x≤t − 1x≤s and f˜s,t (x) = fs,t (x) − fs,t (x)P (dx). With proposition 5.6 for (f˜s,t (Xi ))i∈Z for any p ≥ 1, the quantity Zn (f˜s,t )p is bounded by 1/3 # pV∞ + n1/3−1/2 3p2 (f˜s,t (X0 )3 p/3 + M1 (p) + M2 (p) + M3 (p)) , (10.3.2) where V∞ , M1 (p), M2 (p) and M3 (p) are defined in Proposition 5.6. Let P be the law of X0 . Use inequality (5.2.5), then for any r > 2, g ∈ G, Zn (g)p ≤ 4(p gP,1)1/r
(r−2)/r α ˜ 1 (k) k≥0
+∞ 3/p 1/3 + n1/3−1/2 3p2 1 + 10 k α ˜2 (k) . (10.3.3) k=1
We then apply Proposition 4.2 with q = 3. Recall that NP,1 (x, F ) = O(x−d ). If d = 1, we can take r = 7/2 and p > 7/2 such that (4.5.2) holds under C1 . If
10.4. θ AND τ -DEPENDENT SEQUENCES
233
d > 1 we can take r = 3 and p > 3d such that (4.5.2) holds under (C1 ). Let us prove that condition (C2 ) implies (4.5.2). Define the measure Q on Rd by +∞ Q(dx) = B(x) P (dx) = 1 + 4 bk (x) P (dx), (10.3.4) k=1
where bk (x) is the function from R to [0, 1] such that b(σ(X 0 ), Xk ) = bk (X0 )
+∞ and P is the law of X0 . Note that Q is finite as soon as k=1 β1 (k) is finite. Applying the inequality (5.2.6), we obtain that for any g in G, 1/3 +∞ 3/p 1/2 1/3−1/2 2 ˜ Zn (g)p ≤ (p gQ,1 ) + n k β2 (k) . 3p 1 + 10 d
k=1
We then apply Proposition 4.2 with r = 2 and q = 3. Since NQ,1 (x, F ) = O(x−d ), we can take p > 3d such that (4.5.2) holds under (C2 ). Let us prove that condition (C3 ) implies (4.5.2). Applying Proposition 5.7 to the sequence (f˜s,t (Xi ))i∈Z , we obtain, for any p ≥ 1, 1/2
Zn (f˜s,t )p ≤ (p(V∞ + 2M0 (p))) 1/3 ˜1 (p) + M ˜2 (p) + M3 (p) + n1/3−1/2 3p2 f˜(X0 )3 p/3 + M , ˜1 (p), M ˜2 (p) and M3 (p) are defined in Proposition 5.7. where V∞ , M0 (p), M Applying the inequality (5.2.7), we get that for any g in G, +∞ 1/2 Zn (g)p ≤ (pgQ,1 )1/2 + 2p φ˜2 (k) k=N
+ n1/3−1/2 3p2 1 + 2
N
k φ˜2 (k) + 4
k=1
∞
φ˜2 (k)(k ∧ N ) + 4
k=1
+∞
1/3 . φ˜2 (k)
k=1
(10.3.5) We take now N = nα with α = 1/(2 + ε). If (C3 ) holds, we infer from (10.3.5) that there exists some positive constant C such that, for any g in G, 1/2
Zn (g)p ≤ CgQ,1 + Cn−ε/(4+2ε) . To conclude we apply Proposition 4.2 with r = 2 and q = 2 + ε. NQ,1 (x, F ) = O(x−d ), (4.5.2) holds under (C3 ).
10.4
θ and τ -dependent sequences
Consider the three following conditions:
Since
234
CHAPTER 10. THE EMPIRICAL PROCESS
(C4 ) Each component of X1 hasa bounded density and there existsε > 0 such that θ1,2 (k) = O k −14/3−ε if d = 1, and θ1,2 (k) = O k −4d−ε if d > 1. (C5 ) Each component of X1 has a bounded density and there exists ε > 0 such that τ1,2 (k) = O k −4d−ε . (C6 ) Each component of a bounded density and there exists ε > 0 such X1 has that τ∞,2 (k) = O k −2−ε . Theorem 10.3. If one of the conditions (Ci ), i = 4, . . . , 6 holds, then the process Un converges in distribution in D(Rd ) to the Gaussian process B. Proof of Theorem 10.3. The result is immediate by using Theorem 10.2 and Lemma 5.1.
10.5
Empirical copula processes
Copulas describe the dependence structure between some random vectors. They have been introduced a long time ago by Sklar (1959) [178] and have been rediscovered recently, especially for their applications in finance and biostatistics. Briefly, a d-dimensional copula is a distribution function on [0, 1]d , whose marginal distributions are uniform and that summarizes the dependence “structure” independently of the specification of the marginal distributions. To be specific, consider a random vector X = (X1 , . . . , Xd ) in Rd , whose joint distribution function is F and whose marginal distribution functions are denoted by Fj , j = 1, . . . , d. Then there exists a unique copula C defined on the product of the values taken by the r.v. Fj (Xj ), such that C(F1 (x1 ), . . . , Fd (xd )) = F (x1 , . . . , xd ), for any x = (x1 , . . . , xd ) ∈ Rd . C is called the copula associated with X. When d F is continuous, it is defined on [0, 1]d, with an obvious extension to R . When F is discontinuous, there are several choices to expand C on the whole [0, 1]d (see Nelsen (1999) [134] for a complete theory). The natural empirical counterpart of C is the so-called empirical copula, defined by −1 −1 Cn (u) = Fn (Fn,1 (u1 ), . . . , Fn,d (ud )), for every u1 , . . . , ud in [0, 1], where Fn denotes the empirical process as in Definition 10.0.1 and Fn,i the empirical process of the i-th marginal distribution. We use the usual “generalized inverse” notations, for every j = 1, . . . , d, Fj−1 (u) = inf{t / Fj (t) ≥ u}. Empirical copulas have been introduced by Deheuvels (1979, 1981a, 1981b), [51],[52],[53] in an i.i.d. framework. This author studied the consistency of Cn
10.5. EMPIRICAL COPULA PROCESSES
235
and the limiting behavior of n1/2 (Cn − C) under the strong assumption of independence between margins. Fermanian et al.(2002) [86] proved some functional CLT for this empirical copula process in a more general framework and provide some extensions. Note that the results of [86] are available under the sup-norm and outer expectations assumptions, as in van der Vaart and Wellner (1996) [183]. Assume that the process (Yi )i∈Z , Y = (F1 (X1 ), . . . , Fd (Xd )) is weakly dependent. Note that the covariance structure of the limit process B depends not only on the copula C (via the term associated with i = 0 e.g.), but also on the joint law between X0 and Xi , for every i. This is different from the i.i.d. case, where B becomes a Brownian bridge whose covariance structure is a function of C only. Actually, the covariances of B depend here on every successive copulas of the random vectors (X0 , Xi ). We can state: Theorem 10.4. If (Yi )i∈Z is weakly dependent and if the empirical process of (Yi )i∈Z converges in distribution in D([0, 1]d ) to a Gaussian process B , if C has some continuous first partial derivatives, then the process n1/2 (Cn − C) tends weakly to a Gaussian process G in D([0, 1]d ). Moreover, this process has continuous sample paths and can be written as G(u) = B(u) −
d
∂j C(u)B(u1 , . . . , uj−1 , 0, uj+1 , . . . , ud ),
(10.5.1)
j=1
for every u ∈ [0, 1]d . Note that the covariance structure of n1/2 (Cn − C) is involved, because of both (10.0.3) and (10.5.1). Proof of theorem 10.4. The proof is directly adapted from Fermanian et al.(2002) [86]. Briefly, we can assume that the law of X is compactly supported on [0, 1]d , eventually by working with Y = (F1 (X1 ), . . . , Fd (Xd )). Indeed, it can be proved that the empirical copulas associated with Y and X are equal on all the points (i1 /n, . . . , id /n), i1 , . . . , id in {0, . . . , n} (lemma 3 in [86]), thus on [0, 1]d as a whole. Consider the usual norm on l∞ ([0, 1]) and the Skohorod metric δ on D([0, 1]d . Define the mappings (D([0, 1]d ), δ) → (D([0, 1]d ), δ) × (D([0, 1]), δ)⊗d φ1 : F → (F, F1 , . . . , Fd ) φ2 :
(D([0, 1]d ), δ) × (D([0, 1]), δ)⊗d → (l∞ ([0, 1]d )) × (l∞ ([0, 1]))⊗d (F, F1 , . . . , Fd ) → (F, F1−1 , . . . , Fd−1 )
236
CHAPTER 10. THE EMPIRICAL PROCESS
φ3 :
(l∞ ([0, 1]d ) × l∞ ([0, 1]))⊗d → (l∞ ([0, 1]d )) (F, G1 , . . . , Gd ) → F (G1 , . . . , Gd ) .
Clearly, φ1 is Hadamard-differentiable because it is linear. Moreover, φ2 is Hadamard-differentiable tangentially to the corresponding product of continuous functions by applying theorem 3.9.23 in van der Vaart and Wellner (1996) [183]. Note that, for any function h ∈ C([0, 1]), the convergence of a sequence hn towards h in (D([0, 1]), δ) is equivalent to the convergence in (D([0, 1]), · ∞ ). Thus, working with the Skorohod metric is not an hurdle here. At last, φ3 is Hadamard-differentiable by applying theorem 3.9.27 in [183]. Thus, the chain rule applies : φ = φ3 ◦ φ2 ◦ φ1 is Hadamard-differentiable tangentially to C([0, 1]d ). The result follows by applying the functional Δ-method to the empirical process of Y and to the function φ (see theorem 3.9.4 in [183]).
10.6
Random fields
In this section, we give rates of convergence in the central limit theorem for a η- or κ-weak dependent field ξ. Assume that the density of the variable ξi is bounded by Cξ . Following lemma 4.1, we get (I, c)-weak dependence with # # • for η-weak dependence, (r) = η(r) and c(df , dg ) = 2 8Cξ (df + dg ). 1
2
4
• for κ-weak dependence: (r) = (κ(r)) 3 and c(df , dg ) = 2(8Cξ ) 3 (df +dg ) 3 . For the sake of simplicity, we shall assume that the process takes its values in [0, 1]. This may be achieved by using the quantile transform. Let (S, | · |S ) be the space of c` adl` ag functions D([0, 1]) with the Skorohod metric. Let π denote the Prohorov distance between distribution functions on (S, | · |S ). If X and Y are two processes on S, we also denote π(X, Y ) the Prohorov distance between their distributions. Central limit theorem for the empirical process Let Un be the normalized version of the empirical process defined by (10.0.2) with respect to the closed ball BN of radius N and cardinality n = #BN = (2N +1)d . Denote gs,t (u) = 1s 0 and b > 0 such that (r) ≤ Ce−br , then
10.6. RANDOM FIELDS
237
π(Un , X) = O n−α0 log(n)−β0 , where α0
=
β0
=
1 , 8d + 24 10d2 + 39d + 28 . 8d + 24
The proof is based on the well known result on the Prohorov distance: Lemma 10.2. Let δ be a positive real and D be a finite subset {x1 , . . . , xm } ⊂ [0, 1], such that every x in [0, 1] is in a δ-neighborhood of some xi . Let X and Y be two distributions on S. Define the δ-oscillation of X wX (δ) =
sup (X(x) − X(y)) ,
x−y<δ
and εX (δ) = inf{ε ∈ R P(wX (δ) > ε) ≤ ε}. The Prohorov distance between X and Y is bounded by: π(X, Y ) ≤ εX (δ) + εY (δ) + π(XD , YD ), where XD is the finite dimensional distribution of X on the subset D. We need to compute bounds for the δ-oscillations and distance between laws. These are based on moment inequalities for the process Un . Proposition 10.3. Assume that (r) ≤ Ce−br , with b > 0. For (s, t) such that |t − s| < Cξ e−4b /C and |t − s| < Ce3b /Cξ : 2dl (4l − 2)! E(Un (t) − Un (s))2l ≤ 6(1 ∨ b−2 ) log (1/|t − s|) (2l − 1)! l × (2(2d)!Cξ |t − s|) + (2l)!(2ld)!n1−l Cξ |t − s| . (10.6.1)
Note that Un (t)−Un (s) = √1n k∈BN gs,t (ξk ). The proposition is a consequence of the bound of the covariance of quantities depending on the functions gs,t (ξk ). Proof of proposition 10.3. We adapt the proof of Proposition 10.2 to the series (gs,t (ξk ))k∈BN . For a sequence k = (k1 , . . . , kq ) 7 of elements of T , define ξk = (ξk1 , . . . , ξkq ) and when s and t are fixed, Πk = qi=1 gs,t (ξki ). For any integer q ≥ 1, set: E Π , Aq (N ) = (10.6.2) k q k∈BN then,
|E(Un (s) − Un (t))2l | ≤ (2l)!n−l A2l (N ).
(10.6.3)
238
CHAPTER 10. THE EMPIRICAL PROCESS
Let q ≥ 2. For a finite sequence k = (k1 , . . . , kq ) of elements of T , the gap is defined by the max of the integers r such that the sequence may be split into two non-empty subsequences k1 and k2 ⊂ Zd whose mutual distance equals r (d(k1 , k2 ) = min{i − j1 /i ∈ k1 , j ∈ k2 } = r). If the sequence is constant, its q gap is 0. Define the set Gr (q, N ) = {k ∈ BN and the gap of k is r}. Sorting the sequences of indices by their gap:
Aq (N ) ≤
E|gs,t (ξk1 )|q +
k1 ∈BN 2N
+
2N
Cov Πk1 , Πk2
r=1 k∈Gr (q,N )
E Πk1 E Πk2 .
(10.6.4)
r=1 k∈Gr (q,N )
Define
Bq (N ) =
E|gs,t (ξk1 )|q +
k1 ∈BN
2N
Cov Πk1 , Πk2 .
r=1 k∈Gr (q,N )
We get Aq (N ) ≤ Bq (N ) +
q−1
Am (N )Aq−m (N ).
m=1
To build a sequence k belonging to Gr (q, N ), we first fix one of the n points of BN . We choose a second point on the 1 -sphere of radius r centered on the first point. The third point is in a ball of radius r centered on one of the preceding points, and so on. . . Thus #Gr (q, N ) ≤ n · 2d(2r + 1)d−1 · 2(2r + 1)d · · · (q − 1)(2r + 1)d ≤ ndq!(3r)d(q−1)−1 . We use Lemma 4.1 to deduce: 2N Bq (N ) ≤ n Cξ |t − s| + dq! (3r)d(q−1)−1 min( (r), Cξ |t − s|) . r=1
Let R be an integer to be chosen later. Bq (N ) ≤ nd3
d(q−1)
R−1 ∞ d(q−1)−1 q! Cξ |t − s| r +C rd(q−1)−1 e−br . r=0
r=R
Comparing summations with integrals we get Bq (N ) ≤
n(3(1 ∨ b−1 ))d(q−1) q!(d(q − 1))! × Ce4b −b(R+1) e ×Rd(q−1) Cξ |t − s| 1 + . Cξ |t − s|
10.6. RANDOM FIELDS
239
Choose R as the integer part of 1b log Ce4b /Cξ |t − s| and assume that (s, t) ∈ T are such that |t − s| ≤ e−4b Cξ /C and |t − s| ≤ e3b C/Cξ . Then R ≥ 1 and dq Bq (N ) ≤ 6(1 ∨ b−2 ) log (1/|t − s|) nCξ |t − s| q! (dq)!,
(10.6.5)
so that Bq (N ) is bounded by a function M q Vq that satisfies condition (4.3.24). Then A2l (N ) ≤
2dl (4l − 2)! 6(1 ∨ b−2 ) log (1/|t − s|) (2l)!(2l − 1)! (2(2d)!nCξ |t − s|)l + (2l)!(2ld)!nCξ |t − s| ,
and (10.6.1) is proved. Oscillations of the empirical process Using Proposition 10.3, we give a bound for the modulus of continuity of Un . Proposition 10.4. If (r) ≤ Ce−br with b > 0, then for δ ≥ 1/n: εUn (δ) ≤ K1 (Cξ , b, d)δ 1/2 logd+1 (1/δ),
(10.6.6)
d 1 where K1 (Cξ , b, d) = 8 6(1 ∨ b−2 ) (2(2d)!Cξ ) 2 . Proof of proposition 10.4. We show that exponential moment of Un (t) − Un (s) are finite and use Stroock’s method to find the oscillation. Lemma 10.3. Let f (u) = |u|1/2 logd (1/u). Assume that (r) ≤ Ce−br with b > 0, and that δ ≥ 1/n. Then there exists a constant c0 such that for every c < c0 and every (s, t) such that |t − s| ≤ δ: |U (t) − U (s)| n n ≤ B(c) < +∞. E exp c f (t − s) Proof of lemma 10.3 Using the moment inequality (10.6.1) and Stirling’s formula: E
|Un (t) − Un (s)| f (t, s)
2p
2d p ≤ p2p 8e2 Cξ (2d)! 6(1 ∨ b−2 ) .
|Un (t) − Un (s)| E exp c ≤ f (t − s) ≤
2k 12 |Un (t) − Un (s)| E k! f (t, s)
∞ k c k=0 ∞ k=0
2d k2 ck k k 2 8e Cξ (2d)! 6(1 ∨ b−2 ) . k!
240
CHAPTER 10. THE EMPIRICAL PROCESS
−d 2 − 1 8e Cξ (2d)! 2 , the lemma is true. For c0 = 6(1 ∨ b−2 ) We apply a lemma of Garsia (1965) [90]. Let c = c0 /2, B(c) as in lemma 10.3, ψ(u) = ecu − 1 and 0 < δ < e−2d . The lemma says that if δ δ |Un (t, ω) − Un (s, ω)| Y (ω) = ψ dsdt < ∞, f (|t − s|) 0 0 then |Un (t, ω) − Un (s, ω)| ≤ 8ψ −1 (4Y (ω)/δ 2 )f (δ). Using Markov inequality and the fact that E(Y ) ≤ B(c)δ 2 : λ δ2 4B(c) . P(|Un (t) − Un (s)| ≥ λ) ≤ P Y ≥ ψ ≤ 4 8f (δ) ψ 8fλ(δ) For λ = (4/c)f (δ) log(1/δ), P(|Un (t) − Un (s)| ≥ λ) ≤ 4B(c)δ 1/2 /(1 − δ 1/2 ) . For δ sufficiently small, this term is less than λ, so that εUn (δ) ≤ K1 δ 1/2 logd (1/δ) with K1 = 4/c. Oscillations of the limit process Proposition 10.5. If (r) ≤ Ce−br with b > 0, then etting K3 (Cξ , b, d) = d 1/2 (Cξ d!(2d + 7)) 2d 1 ∨ b−1 , εX (δ) ≤ K3 (Cξ , b, d)δ 1/2 log(d+1)/2 (1/δ).
(10.6.7)
Proof of proposition 10.5. We use a chaining argument to bound the δ-oscillations of the non-stationary Gaussian limit process X. Define a semi-metric ρ¯ on [0, 1] by ρ¯(s, t)2 = Var (X(t) − X(s)). As a Gaussian process, X satisfies the exponential inequality: P(|X(t) − X(s)| ≥ λ¯ ρ(s, t)) ≤ exp{−λ2 /2}.
By definition, ρ(s, ¯ t)2 = j∈T Cov (x0 (t) − x0 (s), xj (t) − xj (s)). Cauchy-Schwarz inequality:
Using the
|Cov (x0 (t) − x0 (s), xr (t) − xr (s)) | ≤ Var (x0 (t) − x0 (s)) ≤ Cξ |t − s|, and by definition of (r), |Cov (x0 (t) − x0 (s), xr (t) − xr (s)) | ≤ (r). For a metric ν over [0, 1], we denote Nν (ε) the covering number (see Pollard (1984) [151], p.143). It is the cardinality of the smallest set S of points of [0, 1], so that for any t, ν(t, S) ≤ ε. The corresponding covering integral is ε 1/2 Jν (ε) = 2 log Nν (u)2 /u du. 0
10.6. RANDOM FIELDS
241
Computing as in (10.6.5), it is easy to show that ρ¯(s, t)2 =
∞
(2r + 1)d−1 (r) ∧ Cξ |t − s| ≤ K 2 |t − s| logd (1/|t − s|).
r=0
# where K = 2 Cξ d!(1 ∨ b−1 )d . Let ε > 0. The inclusion of the balls of the two metrics implies that Nρ¯(Kε1/2 logd/2 (1/ε)) ≤ N|·| (ε). Thus Nρ¯(u) ≤ 2d+1 (K/u)2 logd+1 (K/u) ≤ 2d (K/u)d+2 . The corresponding covering integral is: δ 1/2 Jρ¯(δ) ≤ 4 log 2d+1 (K/u)d+3 + 2 log(1/u) du 0
≤ ≤
2δ log1/2 2d+1 K d+3 + (4d + 14)1/2 log−1/2 (1/δ) 2δ log1/2 2d+1 K d+3 + (4d + 14)1/2 δ log1/2 (1/δ).
δ
log(u)du 0
Taking ε = Kδ 1/2 logd/2 (1/δ): d/2 1/2 P max |X(t) − X(s)| ≥ 26Jρ¯(Kδ log (1/δ)) ≤ Kδ 1/2 logd/2 (1/δ). |t−s|≤δ
For a sufficiently small δ, εX (δ) ≤ Jρ¯(Kδ 1/2 logd/2 (1/δ)) ≤ K(2d + 7)1/2 δ 1/2 log(d+1)/2 (1/δ). Distance between the finite dimensional laws Let m ∈ N. Let D = {t1 , . . . , tm } ⊂ [0, 1]. (xi (t1 ), . . . , xi (tm )). Define the partial sum sn :
We denote z i the m-vector
1 i sn = √ z. n i∈BN
Let Y be a centered Gaussian m-vector, whose covariance matrix is ΣD = (Σs,t )s,t∈D . We bound the Prohorov distance between sn and Y . For a real ε such that 0 ≤ ε ≤ 1, we define the class of functions Fε by Fε = {f ∈ Cb3 (Rm )/ 0 ≤ f ≤ 1, ||f (i) ||∞ ≤ 2m−1/2 ε−i for i = 1, 2 or 3}, (10.6.8) where f (i) is the differential of i-th order of f and the norm ||·||∞ is the operator sup-norm w.r.t to the norm · 1 . A bound of the Prohorov distance between sn and Y is given by: π(sn , Y ) ≤ 4m1/2 ε(1 + log1/2 (ε)) + 2 sup |E(f (sn ) − f (Y )|). Fε
(10.6.9)
242
CHAPTER 10. THE EMPIRICAL PROCESS
Proposition 10.6. Define m, p and q as functions of n which converge to infinity with n, negligible with respect to n, q negligible with respect to p, and ε as a function of n which converges to zero as n tends to infinity. For a sufficiently large n, in the case of η-dependence: 4m1/2 ε(1 + log1/2 (1/ε))
(10.6.10)
+ +
K1 ε−1 m1/2 q 1/2 p−1/2 2K5 m1/2 ε−2 n3/2 (q)
(10.6.11) (10.6.12)
+ +
2K6 m3/2 ε−3 n (q) 2K7 mε−3 pd/2 n−1/2
(10.6.13) (10.6.14)
+
2K8 m3/2 ε−2 p−1 q.
(10.6.15)
π(sn , Y ) ≤
In the case of κ-dependence, the terms involving K5 and K6 are replaced by: + 2K5 m1/3 pd/3 ε−7/3 n4/3 (q) + 2K6 m4/3 pd ε−11/3 n (q). The values of the constants Ki are given in the proof. Proof of proposition 10.6. We use the Bernstein blocking technique (Bernstein, 1939, [13]. Assume that the Euclidean division of n by (p + q) gives a quotient a and a remainder r. Denote j = (j, . . . , j). Define K = {−a − 1, . . . , a + 1}d; for i ∈ {−a, . . . , a}d , we define the blocks Pi = [(p+q)(i−1), . . . , (p+q)i−q1]. These blocks are separated with bands of width q. We complete the construction with at most 4a + 4 incomplete blocks on the boundary, also separated with bands of width q, and associate each of them with a corresponding index in the boundary of K. Denote Q the set of indices that are in the separating bands. Note that the cardinality of Q is less than d(2a + 1)qpd−1 . We order the set of blocks P by the lexicographic order of their index in K. We define the variables (ui )i=1,...,(2a+1)d and v exactly as in section 8.2.2. Consider an independent sequence of centered Gaussian vectors (y i )i=1,...,k , such that each y i has the same covariance matrix as ui . Let ε > 0 and f ∈ Fε defined by (10.6.8). We decompose f (sn ) − f (Y ) =
f (sn ) − f
+
f
+
f
k
ui
i=1 k i=1
y
i
k i=1
−f
u
i
(10.6.16)
k
yi
(10.6.17)
i=1
− f (Y ).
(10.6.18)
10.6. RANDOM FIELDS
243
Consider the left hand side of (10.6.16). k k k i i i u u +v −f u E f (sn ) − f ≤ E f i=1
i=1
≤
2m−1/2 ε−1
i=1
m
E(|vs |)
s=1
≤
2m1/2 ε−1 E(|v1 |2 )1/2 .
Because (r) is decreasing, E(|v1 |2 ) =
1 1 |Cov (xi (t1 ), xj (t1 ))| ≤
(i − j) n n i,j∈Q
≤
#Q n
N
i,j∈Q
(2r + 1)d−1 (r) ≤ K1
r=0
#Q q ≤ K1 , n p
where K1 is a constant depending on the sequence (r). We get k ui E f (sn ) − f ≤ K1 (mq/p)1/2 ε−1 .
(10.6.19)
i=1
Now, we apply Lemma 7.1 to the difference (10.6.17). We first, give a bound for T1 . Assume that ξ is η-dependent. Define G = ∂f (u1 + · · · + ui−1 )/∂xs and H = uis . We apply lemma 10.1 for dG ≤ pd i, dH = pd , Lip (G) ≤ 2m−1/2 ε−2 , Lip (H) ≤ n−1/2 , G∞ ≤ 2m−1/2 ε−1 and H∞ ≤ pd n−1/2 . Because of the respective order of the parameters, we simplify (10.2.2): 1 j−1 Cov ∂f (u + · · · + u ) , ujs ≤ 2φ(G, H) (q) ≤ 4m−1/2 ε−2 p2d i n−1/2 (q). ∂xs Summing over j and s: |T1 | ≤
i
|E(f (1) (u1 + · · · + uj−1 ) · uj )| ≤ 4m1/2 ε−2 (q)n3/2 .
j=1
Now we give a bound for T2 . Define G =
∂ 2 f (u1 +···+ui−1 ) and H = ∂xsi ∂xti d −1/2 −3
uis uit . We
ε , Lip (H) ≤ apply Lemma 10.1 for dG = i2 p2d , dH = 2p , Lip (G) ≤ 2m pd n−1 , G∞ ≤ 2m−1/2 ε−2 and H∞ ≤ p2d n−1 . Keeping the larger exponents in each term: 2 ∂ f (u1 + · · · + uj−1 ) j j , us ut ≤ 4m−1/2 ε−3 p2d jn−1 (q). Cov ∂xs ∂xt
244
CHAPTER 10. THE EMPIRICAL PROCESS
Summing over j, s and t: |T2 | ≤
i E f (2) (u1 + · · · + uj−1 ) · (uj , uj ) ≤ 2m3/2 ε−3 (q)n. j=1
Assume that ξ is κ-dependent. Using the same bounds for functions G and H and relation (10.2.3): |T1 | ≤
i
|E f (1) (u1 + · · · + uj−1 ) · uj | ≤ 4m1/3 pd/3 ε−7/3 n4/3 (q),
j=1
|T2 | ≤
i E f (2) (u1 + · · · + uj−1 ) · (uj , uj ) ≤ 4m4/3 pd ε−11/3 n (q). j=1
The term of third order is bounded using the third order moment. Using Jensen inequality: m 3/4 E|u1 |32 ≤ m1/2 E|u1s |4 . s=1
Substituting 1 to the bound Cξ |t − s| and choosing R = 1 ∨ (4 + b−1 log(C)), 4d Equation (10.6.5) becomes V4 (N ) ≤ pd 3R(1 ∨ b−2 ) 4!(4d)!. Then 3/4 E|u1s |4 ≤ K7 p3d/2 n−3/2 , so that the last term is less than f (3) A ≤ K7 mε−3 pd/2 n−1/2
(10.6.20)
We conclude that, if ξ is η-dependent: k k |E(f ( uj ) − f ( yj ))| ≤ j=1
K5 m1/2 ε−2 n3/2 (q)
(10.6.21)
j=1
+K6 m3/2 ε−3 n (q) + K7 mε−3 pd/2 n−1/2 , where K5 = 4, K6 = 2 and K7 is the constant in (10.6.20). If ξ is κ-dependent: k k uj ) − f ( yj ))| |E(f ( j=1
≤ K5 m1/3 pd/3 ε−7/3 n4/3 (q)
(10.6.22)
j=1
+K6 m4/3 pd ε−11/3 n (q) + K7 mε−3 pd/2 n−1/2 . Now we have to bound the difference (10.6.18). We use the following lemma:
10.6. RANDOM FIELDS
245
Lemma 10.4. Let X and Y be two centered Gaussian vector of length m, with respective covariance matrix M and N . Let 0 < ε < 1, and f ∈ Fε . Then |E(f (X) − f (Y ))| ≤ m−1/2 ε−2 M − N 1 , where A1 =
i,j
|Ai,j |.
Proof of lemma 10.4. Because X and Y are Gaussian n √ √ E(f (X) − f (Y )) = f Zk + Xk,n / n − f Zk + Yk,n / n , k=1
where the Xk,n and Yk,n are independent copies √ of X and Y , and Zk = (X1,n +· · ·+Xk−1,n +Yk+1,n +· · ·+Yn,n )/ n. Using the Taylor expansion: n 1 (2) E f (0) · (Xk,n , Xk,n ) − f (2) (0) · (Yk,n , Yk,n ) 2n k=1 1 + 3/2 E f (3) (Vk,n ) · (Xk,n , Xk,n , Xk,n ) − f (3) (Wk,n ) · (Yk,n , Yk,n , Yk,n ) , 6n
E(f (X) − f (Y )) =
Vk,n , Wk,n being random vectors. The term involving f (3) tends to 0, and m 2 ∂ f (0) (2) (2) E f (0) · (Xk,n , Xk,n ) − f (0) · (Yk,n , Yk,n ) = (Mi,j − Ni,j ) , ∂xi ∂xj i,j=1 so that |E(f (X) − f (Y ))| ≤ m−1/2 ε−2 M − N 1 . Apply this lemma to the Gaussian vector Y and yi . The covariance matrix
pd of yi and Y are respectively k n ΣD,p and ΣD , where ΣD,p (s, t) =
h(j)Cov(x0 (s), xj (t))
|j|
Define the matrix M by M (s, t) =
and h(j) =
d (1 − jl /p). l=1
|j|
Cov(x0 (s), xj (t)). We have that
d 2 d−1 kp ≤ m kqp Σ ||ΣD,p ||∞ + m2 ΣD,p − M ∞ + m2 M − ΣD ∞ . − Σ D,p D n n 1 Because of the convergence of the series defining Σ, ΣD,p ∞ is uniformly bounded
in p and gives the main contribution. Because 1 − h(j) ≤ d|j|/p, and |j|
246
CHAPTER 10. THE EMPIRICAL PROCESS
M − ΣD ∞ is the remainder of a geometric series and is smaller. We conclude that the difference (10.6.18) satisfies: k E f y i − f (Y ) ≤ K8 m3/2 ε−2 qp−1 .
(10.6.23)
i=1
Thus collecting the bounds (10.6.19), (10.6.21) or (10.6.22) and (10.6.23), we infer that the distance between the finite dimensional distributions of size m satisfies the inequalities of Proposition 10.6. Proof of theorem 10.5. Let D be the set of reals (i/m)i=1,...,m . The corresponding δ is m−1 . We collect the results of Proposition 10.4 (oscillation of the empirical process), Proposition 10.5 (oscillation of the Gaussian limit process) and Proposition 10.6 (distance between fidi repartitions). We use lemma 10.2 to conclude. The (1/m)-oscillation of Un is less than K10 m−1/2 log2d+1 (m). The (1/m)oscillation of X is negligible with respect to m−1/2 log2d+1 (m). We choose q = log(n). The terms (10.6.11), (10.6.12) and (10.6.13) are negligible. The minimum rate is obtained for parameters such that m−1/2 log2d+1 (n), m1/2 ε log1/2 (n), m5/2 ε−3 pd/2 n−1/2 and m3/2 ε−2 p−1 log(n) are of same order with respect to n. The solution is 1
m = n 4d+12 (log(n)) p=n
1 d+3
ε=n
−1 4d+12
(log(n))
6d2 +17d+5 4d+12
−2d+2 d+3
,
2 +9d+1 − 2d4d+12
(log(n))
1
,
.
For this choice, the rate of convergence is n 8d+24 (log(n))
10d2 +39d+29 8d+24
.
Chapter 11
Functional estimation In this chapter we are going to consider some methods of functional estimation. We study the estimation of the marginal density of the one weak dependence sequence (Xt )t∈Z and also the estimation of the regression function in a two dimensional model Zt = (Xt , Yt )t∈Z . We will show that the CLT and uniform a.s. convergence results hold under general non causal weak dependence. We also establish sharp results about the MISE of these estimators under more restrictive causal dependence. Our principal goal consists in extending to less restrictive notions of weak dependence results already known for mixing sequences. We end the chapter with an overview over the different methods of non parametric estimation in order to extend the field of application of the previous results.
11.1
Some non-parametric problems
For a stationary two dimensional process (Zt )t∈Z with Zt = (Xt , Yt ), an important quantity is the regression function r(x) = E(Y0 |X0 = x). Various methods to fit such a function have been developed. Nadaraya-Watson kernel estimates are very popular; see Rosenblatt (1991) [169], Prakasa-Rao (1983) [153], or Robinson (1983) [163], for instance. Volatility. Among other problems, one may wish to estimate the volatility of financial times series, v(x) = Var (Xt |Xt−1 = x). The question enters 2 into the regression framework with both Yi = Xi−1 and Yi = Xi−1 since j 2 v(x) = v2 (x) − v1 (x), where vj (x) = E(X1 |X0 = x). Density. Another important problem of econometric interest is to estimate the marginal density f of a stationary sample. Density kernel estimators 247
248
CHAPTER 11. FUNCTIONAL ESTIMATION built by using a kernel function K are usually defined. Derivatives of the density and regression functions can also be estimated by using analogous procedures. Here we simply set Yi ≡ 1.
Quantiles. Conditional quantiles are linked to the conditional distribution by the relation F (y | x) = P(X1 ≤ y|X0 = x). More precisely, we denote by q(t|x) = inf {y / F (y | x) > t} the generalized (right-continuous, with left-limits) inverse of the monotone function y → F (y | x). Set Yt (y) = 1{Xt+1 ≤y} . Consistent estimators of the conditional regression E(Yt (y)|Xt = x) provides information on the previous conditional quantiles. Derivatives of a density. An estimator of the derivative f (ν) of f , where ν is a ∂ ν1 +···+νd f vector (ν1 , . . . , νd ) in Nd and f (ν) = . ∂xν11 . . . ∂xνdd The kernel estimator fˆ of f (see below for a precise statement) gives another estimation fˆ(ν) of f (ν) as: n m1+ν1 +···+νd ∂ ν1 +···+νd K δ ∂ ν1 +···+νd fˆ = ν1 νd ν1 νd m (x − Xi ) ∂x1 . . . ∂xd n ∂x1 . . . ∂xd i=1
11.2
Kernel regression estimates
We now consider a stationary process (Zt )t∈Z with Zt = (Xt , Yt ) where Xt , Yt ∈ R. The quantity of interest is the regression function r(x) = E(Y0 |X0 = x). Let K be some kernel function integrated to 1, Lipschitz and with a compact support. The kernel estimators are defined by fD(x) g (x) D rD(x)
x − Xt , h n x − Xt 1 Yt K , = gDn,h (x) = nh t=1 h 1 K = fDn,h (x) = nh t=1 n
= rDn,h (x) =
gDn,h (x) , if fDn,h (x) = 0; rD(x) = 0, fDn,h (x)
otherwise.
Here h = (hn )n∈N is a sequence of positive real numbers. We always assume that hn → 0, nhn → ∞ as n → ∞. Definition 11.1. Let ρ = a + b with (a, b) ∈ N×]0, 1]. Denote by Ca the set of a-times continuously differentiable functions. The set of ρ-regular functions Cρ
11.2. KERNEL REGRESSION ESTIMATES
249
is defined as * Cρ = u : R → R u ∈ Ca and ∀K, ∃A ≥ 0,
|x|, |y| ≤ K ⇒ |u(a) (x) − u(a) (y)| ≤ A|x − y|b .
Assuming g ∈ Cρ , one can choose a kernel function K of order ρ (which is not necessarily nonnegative integer) such that the bias bh satisfies g (x)) − g(x) = O(hρ ), uniformly on any compact subset of R, bh (x) = E(D see e.g. Rosenblatt (1991) [169]∗ . If, moreover, ρ is an integer with b = 1, ρ = a − 1, then with an appropriately chosen kernel K of order ρ, bh (x) ∼ hρ g (ρ) (x) sρ K(s)ds/ρ!, uniformly on any compact interval. In view of the asymptotic analysis we assume that the marginal density f (·) of X0 exists and is continuous. Moreover, f (x) > 0 for any point x of interest and the regression function r(·) = E(Y0 |X0 = ·) exists and is continuous. Finally, for some p ≥ 1, x → gp (x) = f (x)E(|Y0 |p |X0 = x) exists and is continuous. We set g = f r with obvious shorthand notation. Moreover, we impose one of the following moment conditions:
11.2.1
E|Y0 |S < ∞,
for some
S ≥ p,
(11.2.1)
Eea|Y0 | < ∞,
for some
a > 0.
(11.2.2)
Second order and CLT results
We consider first the properties of D g(x). The following conditionally centered equivalent of g2 appears in the asymptotic variance of the estimator rD, G2 (x) = f (x)Var (Y0 |X0 = x) = g2 (x) − f (x)r2 (x). Assume that the densities of the pairs (X0 , Xk ), k ∈ Z+ , exist, and are uniformly bounded: sup f(k) ∞ < ∞. Moreover, uniformly over all k ∈ Z+ , the functions k>0
r(k) (x, x ) = E |Y0 Yk ||X0 = x, Xk = x
(11.2.3)
are continuous. Under these assumptions, the functions g(k) = f(k) r(k) are locally bounded. ∗ Such
kernels can be constructed as K = pd for some known compactly supported (continuous) density d(·) and a polynomial p with d◦ p < ρ. If d(x) = 0 on some infinite set then the system of equations tj K(t)dt = aj (0 ≤ j ≤ p) admits a unique solution for all choices of aj ’s because the quadratic form p → p2 (x)d(x) dx is positive definite.
250
CHAPTER 11. FUNCTIONAL ESTIMATION
Theorem 11.1. Suppose that the stationary sequence (Zt )t∈Z satisfies the conditions (11.2.1) with p = 2 and (11.2.3). Suppose that nδ h → ∞ for some δ ∈]0, 1[. Then √ D 2 nh (D g(x) − ED g (x)) −−−−→ N 0, g2 (x) K (u)du n→∞
and setting g˜(x) = D g(x) − r(x)fD(x) √
2 nh (˜ g (x) − E˜ g (x)) −−−−→ N 0, G2 (x) K (u)du D
n→∞
under any of the weak dependence condition formulated below. For clarity sake we do not precise the assumptions here but the result holds if the decay of the sequence θ(n) = O(n−a ) as n → ∞ for each a > 0 is faster than any Riemanian decay. The same holds too for η(n), κ(n) or λ(n). In order to consider asymptotics for the ratio estimator rD we use a method, already used by Collomb (1984) [38], which consists of studying higher order asymptotics. It is the topic of the next subsection. Theorem 11.2. Suppose that the stationary sequence (Zt )t∈Z satisfies the conditions (11.2.1) with p = 2 and (11.2.3). Consider a positive kernel K. Let f, g ∈ Cρ for some ρ ∈]0, 2], and nh1+2ρ → 0. Then, if f (x) = 0, √ G2 (x) D nh rD(x) − r(x) −−−−→ N 0, 2 K 2 (u)du n→∞ f (x) under any of the weak dependence condition formulated below. Assuming that the sequence (Zt )t∈Z is θ-weakly dependent with rate O(r−a ) and a > 3 , Ango Nze, B¨ uhlmann and Doukhan (2002) [6] prove that, uniformly in x belonging to any compact subset of R, 1 1 2 g2 (x) K (u) du + o Var (D g (x)) = , nh nh and
1 1 2 D Var D g(x) − r(x)f (x) = G2 (x) K (u) du + o . nh nh
The exponential moment assumption can be relaxed. Suppose that the stationary sequence (Zt )t∈Z satisfies conditions (11.2.1) and (11.2.3) with p = 2, S > 2. Former results then hold if the sequence (Zt )t∈Z is weak dependent with η(r) = O(r−a ) and a > 3S − 4/(S − 2) + 2δ /(S − 2), or λ(r) = O(r−a ) and a > 4S − 4/(S − 2) + 2δ /(S − 2).
11.2. KERNEL REGRESSION ESTIMATES
251
The CLT convergence Theorem 11.1 holds, under the conditions (11.2.1) with p = 2 and (11.2.3), if the stationary sequence (Zt )t∈Z is η or κ-weak dependent with rate O(r−a ) and 1 2 + 2(2 + j)δ a > αj (δ) = min max (2 + j, 3(2 + j)δ) , max 2 + j + , , δ 1+δ where j = 1 or j = 2 according respectively to η or λ dependence assumption. These results extend Doukhan and Louhichi (2001) [68], valid for the case of the density function fD, to the estimate D g under weak dependence. The first right hand side term is obtained by Bernstein’s blocking technique. The second right hand side term results from the application of the Lindeberg method (see Rio (2000) [161]). The CLT convergence Theorem 11.2 relies on the expansion u
−1
=
p i=0
i
(−1)i
p+1
(u − u0 ) (u − u0 ) + (−1)p+1 i+1 u0 uup+1 0
,
(11.2.4)
where p = 2, u = bn , u0 = Ebn = 1, and rD(x) = an /bn (if bn = 0) with n x − Xi * x − X0 an = Yi K nEK , hn hn i=1 n x − Xi * x − X0 K nEK . bn = hn hn i=1 Using the Rosenthal inequalities described in § 4.3 and the aforementioned CLT, we obtain the CLT convergence Theorem 11.2 for the regression function, under conditions (11.2.1) for p = 2 and (11.2.3), if the stationary sequence (Zt )t∈Z is either η or κ-weakly dependent with rate O(r−a ), with a > αj (δ) and a > 3 ∨ 9(2+j) 7−4δ (j = 1 under η and j = 2 under κ dependence). The results stated in Theorem 11.1 and Theorem 11.2 also hold for finite dimensional convergence. The components are asymptotically jointly independent, much in the same way that for i.i.d. sequences. A rapid sketch of the proof for theorem 11.1. We proceed as in Rio (2000) [161] and more specifically as in Coulon-Prieur and Doukhan (2000) [40] for density estimation in a causal case. The case of non causal coefficients is considered with Bernstein blocks as in Doukhan and Louhichi (2001) [68]. Here for 0 ≤ t ≤ n we consider Lipschitz truncationsat level M = M (n): x − Xt Tt (x) = Yt 1|Yt |≤M + M (n)1Yt >M − M 1Yt <−M K . h
n Then for a suitable constant σn , σn i=1 Tt (x) approximates the expression √ D nh D g(x) − r(x)f (x) (after centering).
252
CHAPTER 11. FUNCTIONAL ESTIMATION
√ Now a CLT may be proved for ξt = σn nh (Tt (x) − ETt (x)). The second assertion is a consequence of the first one, where Yt is replaced by Yt − r(x). See Ango Nze et al., 2002, [6], Ango Nze & Doukhan, 2004, [7]. Remarks • A more easy and efficient way to get such results may obtained by using lemma 7.1, see Bardet et al. (2006) [10] for density estimation and a paper by Nicolas Ragache will clarify the regression estimation case. ˆ be an estimate of a function H (among • Finite repartitions distributions. Let H the previously cited). Then a central limit result for √ ˆ ˆ − EH(x)) → N (0, s2 (x)) Zn (x) = nh(H(x) extends to a multivariate central limit theorem (Zn (x1 ), . . . , Zn (xk )) → Nk (0, Σ) where Σ denotes the diagonal matrix with entries (s2 (x1 ), . . . , s2 (xk )). The previous process is not tight in C[0, 1] at points for which s2 (x) = 0.
11.2.2
Almost sure convergence properties
Theorem 11.3. Let (Zt )t∈Z be a stationary sequence satisfying the conditions (11.2.1) with p = 2 and (11.2.3). Then under the forthcoming conditions, (i) There exists a sequence ( n )n∈N with nh/ ( n log(n)) → ∞ as n → ∞ such that for any M > 0, 8
n log(n) , almost surely. g (x) − ED g (x)| = O sup |D nh |x|≤M (ii) Assume now that inf |x|≤M f (x) > 0. If f, g ∈ Cρ for some ρ ∈]0, ∞[, h ∼ ( n log(n)/n)1/(1+2ρ) , then $ ρ/(1+2ρ) 9
n log(n) sup |D r(x) − r(x)| = O , almost surely. n |x|≤M Remark. Under conditions (ii) of Theorem 11.3, but assuming only the weaker condition about the bandwidth sequence nδ h → ∞, for some δ ∈]0, 1[, we obtain sup |D r (x) − r(x)| = o(1),
almost surely.
|x|≤M
For the sake of simplicity, we shall only consider the geometrically dependent case and we defer a reader to Ango Nze, B¨ uhlmann and Doukhan (2002) [6] for Riemanian decays.
11.2. KERNEL REGRESSION ESTIMATES
253
Theorem 11.4. Let (Zt )t∈Z be a stationary sequence satisfying the conditions (11.2.1) with p = 2 and (11.2.3), and either η or κ-weak dependent with geometric decay rate. (i) If nh/ log4 (n) → ∞, then for any M > 0, almost surely, sup |D g (x) − ED g (x)| = O
|x|≤M
log2 (n) √ nh
(ii) For any M > 0, if f, g ∈ Cρ for some ρ ∈]0, ∞[, h ∼
.
log4 (n) n
1/(1+2ρ) and
inf |x|≤M f (x) > 0, then, almost surely, $ sup |D r(x) − r(x)| = O
|x|≤M
log4 (n) n
ρ/(1+2ρ) 9 .
Proof of Theorem 11.3. We keep usual notations and denote by C a universal constant (whose value can change from one place to another). Assume that E(exp(a|Y0 |)) < ∞. Then
P
g (x) − g˜(x)| > 0 sup |D
|x|≤M
≤ n P (|Y0 | ≥ M0 log(n)) ≤ Cn1−M0 ,
and, by the Cauchy-Schwarz inequality, E F x − X0 1 h1/3 −M0 n g (x) − g˜(x)| ≤ E |Y0 |1{|Y0 |≥M0 log(n)} |K . sup E|D | ≤ h h h |x|≤M We can now reduce computations to the case of a density estimator, as in Doukhan and Louhichi (2001) [68]. Assume that the interval [−M, M ] is covered by Lν intervals with diameter 1/ν (here ν = ν(n) depends on n and we denote by Ij the j-th interval and xj the center of the interval). Assume that the relation hν → ∞ holds (for n → ∞). Assume that the compactly supported kernel K vanishes if t > R0 . Liebscher (1996) [121] exhibits another kernel-type density estimate g˜ based on an even, continuous, kernel, decreasing on [0, ∞[, constant on [0, 2R0 ], taking the value 0 at t = 3R0 (with compact support). Then, he proves that sup |˜ g (x) − E˜ g (x)| ≤ |˜ g (xj ) − E˜ g (xj )| + x∈Ij
C (|˜ g (xj ) − E˜ g (xj )| + 2|E˜ g (xj )|) . hν
254
CHAPTER 11. FUNCTIONAL ESTIMATION
Therefore, for any λ > 0,
≤
2λ log(n) 1 P sup |D g(x) − ED g (x)| ≥ √ + h1/3 n−M0 + C h hν nh x∈[−M,M] λ g(x1 ) − E˜ g (x1 )| ≥ √ C n1−M0 + Lν · sup P |˜ x1 nh λ
√ g (x1 ) − E˜ g (x1 )| ≥ + Lν · sup P |˜ . x1 nh
The exponential inequality (4.3.31) completes the proof of assertion (i). The proof of assertion (ii) is standard and we defer to [6] for a proof .
11.3
˜ MISE for β-dependent sequences
We now consider the problem of estimating the unknown marginal density f from the observations (X1 , . . . , Xn ) of a stationary sequence (Xi )i≥0 . In this context, Viennet (1997) [185] obtained optimal results for the MISE under the
condition k>0 β(σ(X0 ), σ(Xk )) < ∞ for a β-mixing sequence Xn . We wish to extend Viennet’s results to sequences satisfying only ˜ (11.3.1) β(σ(X 0 ), Xk ) < ∞ . k>0
For kernel density estimators , this can be done by assuming only that the kernel K is BV and Lebesgue integrable. For projection estimators, it works only if the basis is well localized, because our variance inequality is less precise than that of Viennet. Note that Condition (11.3.1) is much less restrictive than Viennet’s, for it contains many non mixing examples. In particular, since f is supposed to be square integrable with respect to the Lebesgue measure, the distribution older. Hence, we infer from lemma 5.1 point 3.i) that function F of X0 is 1/2-H¨
(11.3.1) holds as soon as k>0 (τ (σ(X0 ), Xk ))1/3 < ∞. If f is bounded (11.3.1)
holds as soon as k>0 (τ (σ(X0 ), Xk ))1/2 < ∞.
Variance inequalities ˜ = β(σ(X ˜ We set β(i) 0 ), Xi ). The main results of this section are the following upper bounds (compare to Theorems 1.2 and 1.3(a) in Rio (2000) [161] for the mixing coefficients α(σ(X0 ), σ(Xi ))). Proposition 11.1. Let K be any BV function such that |K(x)|dx is finite. Let (Xi )i≥0 be a stationary sequence, and define 1 Yk,n . n n
Yk,n = h−1 K(h−1 (x − Xk ))
and
fn (x) =
k=1
(11.3.2)
˜ 11.3. MISE FOR β-DEPENDENT SEQUENCES
255
The following inequality holds
nh
Var (fn (x))dx ≤
K 2 (x) dx + 2
n−1
˜ β(k) dK
|K(x)|dx .
k=1
We now come to the case of projection estimators defined with more details in the next section: Proposition 11.2. Let (ϕi )1≤i≤n be an orthonormal system of L2 (R, λ) (λ is the Lebesgue measure) and assume that each ϕi is BV. Let (Xi )i≥0 be a stationary sequence, and define 1 ϕj (Xk ) n n
Yj,n =
and
fn =
m
Yj,n ϕj .
(11.3.3)
j=1
k=1
The following inequality holds n
m n−1 m ˜ Var (fn (x))dx ≤ sup ϕ2j (x) + 2 dϕj |ϕj (x)| . β(k) sup x∈R j=1
k=1
x∈R j=1
˜ ˜ Remark 4. Since β(M, X) ≤ φ(M, X), Propositions 11.1 and 11.2 apply to
n−1 ˜
n−1 For dynamical systems satisfying (3.3.1) with 2 i=1 ak instead of i=1 β(k). kernel estimators this can be also deduced from a variance estimate given in Prieur (2001) [154]. Proof of Proposition 11.1. We start from the elementary inequality Var (fn (x)) ≤
n−1 1 2 Y0,n 22 + |Cov(Y0,n , Yi,n )| . n n i=1
Now h Y0,n 22 (x)dx = (K(x))2 dx. To complete the proof, we apply Proposition 5.3: h |Cov(Y0,n , Yi,n )|(x)dx ˜ ≤ dKE b(σ(X0 ), Xi ) |Y0,n (x)|dx ≤ β(i)dK |K(x)|dx . Proof of Proposition 11.2. Since (ϕi )1≤i≤n is an orthonormal system of L2 (R, λ) we have that m Var (Yj,n ) . Var (fn (x))dx = j=1
256
CHAPTER 11. FUNCTIONAL ESTIMATION
Applying Proposition 5.3, we obtain that Var (Yj,n ) ≤ ≤
n−1 1 2 ϕj (X0 )22 + |Cov(ϕj (X0 ), ϕj (Xk ))| n n
1 2 ϕj (X0 )22 + n n
k=1 n−1
dϕj E(|ϕj (X0 )|b(σ(X0 ), Xk )) .
k=1
To complete the proof we sum in j: n
Var (fn (x))dx ≤
m
Eϕ2j (X0 ) + 2
j=1
n−1
m E b(σ(X0 ), Xk ) dϕj |ϕj (X0 )| . j=1
k=1
Some function spaces In this section we recall the definition of the spaces Lip∗ (s, 2, I), where I is either R or some compact interval [a, b] (see DeVore and Lorentz (1993) [59], Chapter 2). Let Irh = R if I = R and Irh = [a, b − rh] otherwise. For any h ≥ 0, let Th be the translation operator Th (f, x) = f (x + h) and Δh = Th − T0 be the difference operator. By induction, define the operators Δrh = Δh ◦ Δr−1 h . Let λ be the Lebesgue measure on I and .2,λ the usual norm on L2 (I, λ). The modulus of smoothness of order r of a function f in L2 (I, λ) is defined by ωr (f, t)2 = sup Δrh (f, ·)1Irh 2,λ , 0≤h≤t
For s > 0, Lip∗ (s, 2, I) is the space of functions f in L2 (I, λ) such that f s,2,I = f 2,λ + sup t>0
ω[s]+1 (f, t)2 < ∞. ts
These spaces are Banach spaces with respect to the norm .s,2,I . Recall that Lip∗ (s, 2, I) is a particular case of Besov spaces (precisely Lip∗ (s, 2, I) = Bs,2,∞ (I)) and that it contains Sobolev spaces Ws (I) = Bs,2,2 (I).
Application to Kernel estimators If fn is defined by (11.3.2), set fh = E(fn ). Let r be some positive integer, and assume that the kernel K is such that: for any f belonging to the Sobolev space Wr (R) we have (f (x) − fh (x))2 dx ≤ M1 h2r f (r) 22 , (11.3.4)
˜ 11.3. MISE FOR β-DEPENDENT SEQUENCES
257
for some constant M1 depending only on r. From (11.3.4) and Theorem 5.2 page 217 in DeVore and Lorentz (1993) [59], we infer that, for any f in L2 (R, λ), (f (x) − fh (x))2 dx ≤ M2 (wr (f, h)2 )2 , for some constant M2 depending only on r. This last inequality implies that, if f belongs to Lip∗ (s, 2, R) for r − 1 ≤ s < r, then (f (x) − fh (x))2 dx ≤ M2 h2s f 2s,2,R . This evaluation of the bias together with Proposition 11.1 leads to the following Corollary. Corollary 11.1. Let r be some positive integer. Let (Xi )i≥1 be a stationary sequence with common marginal density f belonging to Lip∗ (s, 2, R) with r − 1 ≤ s < r, or to Ws (R) with s = r. Let K be a BV function satisfying (11.3.4) and such that |K(x)|dx is finite. Let fn be defined by (11.3.2) with h = n−1/(2s+1) . If (11.3.1) holds, then there exists a constant C such that E (fn (x) − f (x))2 dx ≤ Cn−2s/(2s+1) . Here are two well known classes of kernel satisfying (11.3.4). Example 1. One says that K is a kernel of order k, if 1. K(x)dx = 1, K 2 (x)dx < ∞ and |x|k+1 |K(x)|dx < ∞ . 2.
xj K(x)dx = 0 for 1 ≤ j ≤ k.
If K is a Kernel of order k, then it satisfies (11.3.4) for any r ≤ k + 1. For instance, the naive kernel K = (1/2)1]−1,1] is BV and of order 1. Consequently Corollary 11.1 applies to functions belonging to Lip∗ (s, 2, R) for s < 2, or to W2 (R). A footnote on page 249 proves the existence of such kernels. Example 2. Assume that the fourier transform K ∗ of K satisfies |1 − K ∗ (x)| ≤ M |x|r for some positive constant M . Then K satisfies (11.3.4) for this r. For instance, K(x) = sin(x)/(πx) satisfies (11.3.4) for any positive integer r. Unfortunately, it is neither BV nor integrable. Another function satisfying (11.3.4) for any positive integer r is the analogue of the de la Vall´ee-Poussin kernel V (x) = (cos(x) − cos(2x))/πx2 . This function is BV and integrable, so that Corollary 11.1 applies to any function belonging to Lip∗ (s, 2, I) for s > 0.
258
CHAPTER 11. FUNCTIONAL ESTIMATION
Application to unconditional systems. Proposition 11.2 is of special interest for orthonormal systems (ϕi )i≥1 satisfying the two conditions: √ P1. There exists C1 independent of m such that max dϕi ≤ C1 m. 1≤i≤m
P2. There exists C2 independent of m such that sup
m
√ |ϕj (x)| ≤ C2 m.
x∈R j=1
An orthonormal system satisfying P2 is called unconditional. For such systems, we obtain from Proposition 11.2 that n
n−1 ˜ Var (fn (x))dx ≤ m C22 + 2C1 C2 β(k) .
(11.3.5)
k=1
Example 1 (piecewise polynomials). Let (Qi )1≤i≤r+1 be an orthonormal basis of the space of polynomials of order r on [0, 1] and define the function Ri on R by: Ri (x) = Qi (x) if x belongs to ]0, 1]and 0 otherwise. We consider the regular partition of ]0, 1] into k intervals ](j − 1)/k, j/k] 1≤j≤k . Define √ the functions Ri,j (x) = kRi (kx − (j − 1)). Clearly the family (Ri,k )1≤i≤r+1 is an orthonormal basis of the space of polynomials of order r on the interval [(j − 1)/k, j/k]. Let m = k(r + 1) and (ϕi )i≥1 be any family such that ϕi 1 ≤ i ≤ m = Ri,j 1 ≤ j ≤ k, 1 ≤ i ≤ r + 1 . (11.3.6) The orthonormal system (ϕi )i≥1 satisfies P1 and P2 with C1 = (r + 1)−1/2 max dRi and C2 = (r + 1)−1/2 sup 1≤i≤r+1
r+1
|Ri (x)|.
x∈[0,1] i=1
√ The case of histograms corresponds to r = 0. In that case ϕj = k1](j−1)/k,j/k] . √ Clearly C2 = 1 and dϕj = 2 k, so that C1 = 2. Assume now that X0 has a density f such that f 1[0,1] belongs to Lip∗ (s, 2, [0, 1]). Suppose that r > s − 1, and denote by f¯ the orthogonal projection of f on the subspace generated by (ϕi )1≤1≤m . From Lemma 12 in Barron et al. (1999) [12] we know that there exists a constant K depending only on s such that 0
1
(f (x) − f¯(x))2 dx ≤ Km−2s .
(11.3.7)
Since f¯ = E(fn ), we obtain from (11.3.5) and (11.3.7) the following corollary.
˜ 11.3. MISE FOR β-DEPENDENT SEQUENCES
259
Corollary 11.2. Let (Xi )i≥1 be a stationary sequence with common marginal density f such that f 1[0,1] belongs to Lip∗ (s, 2, [0, 1]). Let r be any nonnegative integer such that r > s − 1 and k = [n1/(2s+1) ]. Let (ϕi )1≤i≤m be defined by (11.3.6) and fn be defined by (11.3.3). If (11.3.1) holds, then there exists a constant C such that 1 E (fn (x) − f (x))2 dx ≤ Cn−2s/(2s+1) . 0
Example 2 (wavelet basis). Let {ej,k , j ≥ 0, k ∈ Z} be an orthonormal wavelet basis with the following convention: e0,k are translate of the father wavelet and for j ≥ 1, ej,k = 2j/2 ψ(2j x− k), where ψ is the mother wavelet. Assume that these wavelets are compactly supported and have continuous derivatives up to order r (if r = 0, the wavelets are supposed to be BV). Let g be some function with support in [−A, A]. Changing the indexation of the basis if neces 2j M
sary, we can write g = j≥0 k=1 aj,k ej,k , where M ≥ 1 is some finite integer
J depending on A and on the size of the wavelets supports. Let m = j=0 2j M and (ϕi )i≥1 be any family such that ϕi 1 ≤ i ≤ m = ej,k 0 ≤ j ≤ J, 1 ≤ k ≤ 2j M . (11.3.8) The orthonormal system (ϕi )i≥1 satisfies P1 and P2. Assume now that X0 has a density f belonging to Lip∗ (s, 2, R) with compact support in [−A, A]. Denote by f¯ the orthogonal projection of f on the subspace generated by (ϕi )1≤i≤m . From Lemma 12 in Barron et al.(1999) [12] we know that there exist a constant K depending only on s such that 1 (f (x) − f¯(x))2 dx ≤ K2−2Js . (11.3.9) 0
Since f¯ = E(fn ), we obtain the following corollary from (11.3.5) and (11.3.9). Corollary 11.3. Let (Xi )i≥1 be a stationary sequence with common marginal density f belonging to Lip∗ (s, 2, R) and with compact support in [−A, A]. Let r be any nonnegative integer such that r > s − 1 and J be such that J = [log2 (n1/(2s+1) )]. Let (ϕi )1≤i≤m be defined by (11.3.8) and fn be defined by (11.3.3). If (11.3.1) holds, then there exists a constant C such that E (fn (x) − f (x))2 dx ≤ Cn−2s/(2s+1) .
a ˜ Remark 5. More generally, if ni=1 β(σ(X 0 ), Xi ) = O(n ) for some a in [0, 1[, −2s(1−a)/(2s+1) we obtain the rate n for the MISE in Corollaries 11.1, 11.2 and 11.3. Note that if (11.3.1) holds the rate n−2s/(2s+1) is known to be optimal for i.i.d. observations.
260
11.4
CHAPTER 11. FUNCTIONAL ESTIMATION
General Kernels
We are aimed to describe other (linear) estimation procedures obtained by replacement of the kernel Kn (x, y) = h−1 n K (y − x/hn ) by another one. As a meta-theorem we claim that theorems 11.1 and 11.3 remain valid if the convolution kernels are replaced by any of the forthcoming one; moreover those results extend to the estimation questions formulated in § 11.1; the needed assumptions are still a fast enough Riemanian decay of the weak dependence coefficient sequence. The more precise theorem 11.4 requires geometric decay rates. Even if we do not write definitive convergence results, we provide below all the bounds needed to extend results in § 11.2 to other types of estimators. Projections. Our discussion will be made of three steps: we consider successively finite order polynomial functions (Tchebichev), infinite order kernels (Dirichlet) and summation methods (Cesaro, developed in the case of De la Vall´ee-Poussin and Fejer on the one hand and Abel-Poisson on the other hand with Melher), and finally generating functions. For the sake of completeness several classical results are here reminded and we are most of the time working on R rather than Rd , the generalization being straightforward. Orthogonal polynomials. Consider an interval I of R and a real Hilbert space L2 (I, μ) with μ dominated by the Lebesgue measure. The Schmidt orthogonalization of the family of monomial functions {1, x, x2 , . . . } according to the scalar product of L2 (I, μ) leads to the Tchebichev polynomials Pj . Such polynomials verify for all j, deg(Pj ) = j. Denoting κn the highest coefficient of Pn (x), orthogonality of the families implies, for constant Bn+2 , a three terms recurrence relation κn+2 κn+2 κn Pn+2 (x) = x + Bn+2 Pn+1 (x) − 2 Pn (x). κn+1 κn+1 Examples. If I = (−1, 1) and μ(dt) = dt then this gives the Legendre polyno2 mials, if I = (−∞, ∞) and μ(dt) = e−t dt we obtain the Hermite polynomials and if I = (0, ∞) and μ(dt) = e−t dt this leads to Laguerre polynomials. If f is in L2 (I, μ), a natural way to estimate f is by mean of projection on an orthonormal polynomial basis through the estimation fˆn that naturally arises : n m(n) 1 ˆ fn (x) = Pj (Xi )Pj (x) n i=1 j=1
The kernel we have to consider is thus given by
m(n)
Km(n) (Xi , x) =
j=1
Pj (Xi )Pj (x).
11.4. GENERAL KERNELS
261
√ Application to functional results with um (Xi , x) = mKm (Xi , x) as usual. Indeed, the Christoffel-Darboux formula (three terms recurrence relation: see e.g. Szeg¨o [182], p. 43), implies that the polynomial functions yield the kernel: Km (x, y) =
κm Pm+1 (x)Pm (y) − Pm (x)Pm+1 (y) κm+1 x−y
where κm is the highest coefficient of Pm (x). One remarks that the division by x− y does not change the polynomial character of Km because x = y is a root of m Km (x, check ∞ κm t . Thereby √ we can straightforwardly √y). Then Km (x, y) ∼1/2 and that m|Km (Xi , t)|dt m−1/2 on every that mKm (Xi , ·)∞ m compact subdomain of I. An example on L2 ([−1, 1], dt) is given by Legendre polynomials also known by the formula: (−1)k dn (1 − x2 )n . Lk (x) = n 2 n! dxn Dirichlet kernel. Consider now the family of trigonometric polynomials functions {cos(nx), sin(nx)}n∈N . This family is well-known to be a dense subset of the set of 2π-periodical continuous functions, denoted C ([0, 2π], dt) by Weierstrass density theorem. If we consider the projection Sn on the trigonometric subspace span cos kx, sin nx k ≤ n = Tn then the natural associated kernel is Dirichlet kernel: Dm (x, y) =
m k=−m
eik(x−y) = 1 + 2
m k=1
cos k(x − y) =
(x − y) sin (2m+1) 2 1 sin 2 (x − y)
The Dirichlet kernel is also of infinite order but is not positive. Thereby one cannot use the previous monotone convergence argument. Consider a function f in C α, ([0, 2π]), the space of r times differentiable, 2π periodical functions on [0, 2π]. A theorem of Jackson (cf. Doukhan and Sifre [74] p. 217) asserts that Rm (f ) = sup Dm (y − x)(f (y) − f (x))dy = O(m−α log m) x
This implies that the optimal bandwidth condition for Fourier approximation in our case is not defined as a power of n. This furthermore implies that whenever f is in C α, ([0, 2π], R) we cannot use an optimal window to control the bias and the covariance of the estimate. Then in that case we need to erase the bias by taking a1 suboptimal window, i.e. a window whose order is such that Rm (f ) = o n− α . Summation methods. The purpose of summation methods is to transform
262
CHAPTER 11. FUNCTIONAL ESTIMATION
natural kernel into non-negatives, more regular kernels, thereby turning over the above problems. We first define summation methods by the way of a weight sequence {am,j / m ∈ N, 0 ≤ j ≤ m}. Then for all j ≤ m: am,j →m→∞ 0, and
m a j=1 m,j = 1. The weighed kernel is defined as a generalized Cesaro’s mean a Km (x, y)
=
m
am,j Kj (x, y) =
j=0
m
am,j
j=0
j
Pi (x)Pi (y)
i=0
In the sequel we will omit the superscript a. The summations method when well chosen lead to improving result on the bias. Consider now the main classical examples. The de la Vall´ ee-Pousin kernel is obtain by considering the means of the Fourier truncated series: Sm,n =
1 (Sn + · · · + Sm+1 ) m−n
The kernel associated to this summation of projection is given by: Dm,n (t) =
1 sin m−n
(m+n)t 2
sin (m−n)t 2
sin2
t 2
When m takes the value 0, the kernel specializes into the Fejer’s kernel Fm of order m, wich is also the result of a Cesaro mean of the Dirichlet Kernel Dm , setting in that case am,j = m−1 1{1≤j≤m} , 2 F˜m (Xk , x) = mDm (Xk − x) = Fm (Xk − x) ∼ m−1
The Fejer kernel is nonnegative then for all f ∈ C 1, ([0, 2π]) an equivalent of the bias is easily found with a > 0 Rn,m (f )(x)
=
1 n
= m
n
[0,2π] i=1 1/2
= m1/2
m1/2 Fm (y − x)f (y)dy − f (x)
[0,2π]
[0,2π]
Fm (u)f (x + u)dy −
[0,2π]
Fm (u)f (x)du
Fm (u) (f (x + u) − f (x)) du
Now let t = mu and expand f to order one according to Taylor’s formula; the positivity of the kernel implies Rn,m (f )(x) = O m−3/2 , thus the optimal 1 bandwidth condition is m = O(n 6 ) and we may again adapt the CLT. 2p β, −1 For f ∈ C ([0, 2π]) with β > 1 use Jackson kernel Jp,m = qp,m Dm−1 where qp,m normalizes Jp,m .
11.4. GENERAL KERNELS
263
Generating function and Abel summation. Another way of adding regularity to kernels is with analytic extension of the generating functions. Note that this is close to the principles of Abel summations. e.g. the example of Hermite polynomial functions leads to a kernel known as Melher kernel, see [182]. The Hermite polynomial functions Hn (x) are defined by: 2 1 dk (−1)k k (e−ν )(x) Hk (x) = √ k 1/2 dν 2 k!π
The double generating function of this family is of the form: Kt (Xi , x) =
∞
1 1 − ((X 2 +x2 )(t2 +1)−4tXk x) e 2(1−t2 ) k tk Hk (Xi )Hk (x) = # 2 π(1 − t ) k=0
And admits an analytic continuation in t = 1. The Mehler kernel Mm (x) is defined by setting t = 1 − 1/m(n) and studying the behavior when m(n) → ∞. Then we once more have a result. Wavelets basis. Another important class of projection estimates is given by wavelets. Consider a wavelet basis derived from a scaling function φ(x). Then the projection f˜ of order m of a density f over span{φm,k = md/2 φ(md x − k) / k ∈ Z} is f˜(x) =
∞
αm,k φm,k (x)
where
αm,k =
f (t)φm,k (t)dt
k=0
Empirical coefficients are: α ˆ n,m,k = n−1 ˆ estimate f writes as: fˆn,m (x) =
∞
n
α ˆ n,m,k φm,k (x) = n−1
i=1
φm,k (Xi ). Thus the projection
n ∞
φm,k (Xi )φm,k (x)
k=−∞ i=1
k=−∞
We define the wavelet kernel KM as KM (Xi , x) =
∞
φM,k (Xi )φM,k (x)
k=−∞
The number of terms in the summation is finite for finite M whenever the function φ is assumed to have a compact support. This hypothesis is not necessary and we only assume here that wavelets are regular enough (cf. Daubechies (1988) [41]): ∃C > 0, ∃α > 1, ∀x ∈ R: |φ(x)| ≤ C/(1 + |x|)α . This yields a fast
264
CHAPTER 11. FUNCTIONAL ESTIMATION
decay of the kernel: |φ(M d x)|dx |KM (Xi , x)|dx ≤ =
M −d = O(M −d+d(α−1) ) = O(M d(α−2) ), (1 + M d |x|)α−1 O(M −d M −2dα ) O(M d(1−2α) ), a.s. if x = 0, P(Xi = 0) = 0.
We now set m = M 2d(2α−2) . It is remarkable that for x = 0 (and only for this point if we assume regularity conditions) one must set m = M 2d(α−2) hence the speed of convergence is lower. δ = 1/2d(1 − 2α) holds in our theorem. We now note Km instead of KM . Then Km (x, y) = md K(mx, my) for K(x, y) =
∞ k=−∞ φ(y − k)φ(x − k) and bias writes, Em (f )(x) = E(fˆn,m (x) − f (x)) ≤ Km (y, x)f (y)dy − f (x) = md K(md y, md x)(f (y) − f (x))dy t = md K(md x + t, md x) f x + − f (x) dt m Let now the kernel K have regularity r i.e. K(u, u − t)tv dt = 0 if 0 < v < r and K(u, u − t)tr dt = 0. If now f ∈ C r ([a, b], R), then by the Taylor formula, x+t/m n−1 tj ur−1 t (j) f f (r) du, we obtain f x+ (x) + = j r−1 m m j! m (r − 1)! x j=0 x+t/m t 1 K x, x + Em (f )(x) = ur−1 f (r−1) (u)dudt mr−1 (r − 1)! m x If f (r−1) is bounded, the second integral is O((t/m)r ) as Em (f )(x). Remarks. • Em (f )(x) ∼ h(mx)m−r where h is a bounded function when m goes to infinity. The precise characteristics of h depends on the wavelet considered. h in general is a pseudo-periodic function. • Now consider the quadratic deviation. By positivity of the summed functions, the second moment Vm (f )(x) answers t Vm (f )(x) = m2d K 2 (md x + t, md x) f x + − f (x) dt m ∼ m K 2 (u, u + t)dt.
Chapter 12
Spectral estimation Parametric estimation from a sample of a stationary time series is an important statistic problem both for theoretical research and for its practical applications to real data. Whittle’s approximate likelihood estimate is particularly attractive for numerous models like ARMA, linear processes, etc. mainly for two reasons: first, Whittle’s contrast does not depend on the marginal law of the time series but only on its spectral density, and second, its computation time is smaller than other parametric estimation methods such as exact likelihood. Numerous papers have been written on this estimation method after Whittle’s seminal paper and in particular Hannan (1973) [102], Rosenblatt (1985) [167] and Giraitis and Robinson (2001) [94] established the asymptotic normality respectively for Gaussian, causal linear, strong mixing processes as well as for ARCH(∞) processes. This chapter is organized as follows. A first section details the expression of the spectral densities of some standard models. A second section addresses the asymptotic behavior of the empirical periodogram. This is a non consistent estimator of the spectral density but it yields a consistent parametric procedure called Whittle estimation. A last section is aimed to derive the properties of a bandwidth estimate of the spectral density. This estimate is obtained by convoluting the periodogram with an approximate Dirac measure. Its second order properties are easily derived through a simple diagram formula but higher order asymptotic needed to conclude to a.s. convergence properties need a dependence tool introduced in § 12.3.2.
12.1
Spectral densities
The following models are already described and their weak dependence properties are checked in chapter 3. Nevertheless, this is an important feature to 265
266
CHAPTER 12. SPECTRAL ESTIMATION
provide precise expressions of their spectral properties. Non-causal (two-sided) linear processes. Let X be a zero mean stationary non causal (two-sided) linear time series satisfying: Xk =
∞
aj ξk−j
for k ∈ Z,
(12.1.1)
j=−∞
with (ak )k∈Z ∈ RZ and (ξk )k∈Z a sequence of zero mean i.i.d. random variables such that E(ξ02 ) = σ 2 < ∞ and E(|ξ0 |2 ) < ∞. We set a ˜k = a−k and a ˜ = (˜ ak )k∈Z . Therefore the spectral density of X exists and satisfies: 2 ∞ σ 2 −ikλ ak e f (λ) = 2π k=−∞ ⎛ ⎞ ∞ ∞ σ2 ⎝ = ak−j aj e−ikλ ⎠ 2π j=−∞ k=−∞
∞ σ (a ∗ a ˜)k e−ikλ . 2π 2
=
(12.1.2)
k=−∞
There exist very few explicit results in the case of two-sided linear processes. Causal GARCH and ARCH(∞) processes. The famous and from now on classical GARCH(q , q) model, introduced by Engle (1982) [84] and Bollerslev (1986) [23], is given by relations Xk = ρk ξk
with
ρ2k
= a0 +
q
2 aj Xk−j
j=1
+
q
cj ρ2k−j ,
(12.1.3)
j=1
where (q , q) ∈ N2 , a0 > 0, aj ≥ 0 and cj ≥ 0 for j ∈ N and (ξk )k∈Z are i.i.d. random variables with zero mean (for an excellent survey about ARCH modelling, see Giraitis et al. (2005) [93]). Under some additional conditions, the GARCH model can be written as a particular case of ARCH(∞) model, introduced in Robinson (1991) [165], that satisfies Xk = ρk ξk
with ρ2k = b0 +
∞
2 bj Xk−j ,
(12.1.4)
j=1
with a sequence (bj )j depending on the family (aj ) and (cj ). Different sufficient conditions can be given for obtaining a m-order stationary solution to (12.1.3)
12.1. SPECTRAL DENSITIES
267
or (12.1.4). Notice that for both models (12.1.3) or (12.1.4), the spectral density is a constant. As a consequence, the periodogram is that of the squared processes; in the GARCH case (see Bollerslev (1986) [23]) is based on the ARMA representation satisfied by (Xk2 )k∈Z . Indeed, if (Xk ) is a solution of (12.1.3) or (12.1.4), then (Xk2 ) can be written as a solution of a particular case of the bilinear equation (see § 3.4.2): ∞ ∞ 2 2 + λ1 b0 + λ1 bj Xk−j bj Xk−j Xk2 = εk γb0 + γ j=1
for k ∈ Z,
j=1
with εk = (ξk2 −λ1 )/γ for k ∈ Z, λ1 = Eξ02 and γ 2 = Var (ξ02 ). Moreover, the time ∞ −1 series (Yk )k∈Z defined by Yk = Xk2 − λ1 b0 1 − λ1 bj for k ∈ Z, satisfies j=1
the bilinear equation with parameter c0 = 0. Hence, a sufficient condition for the stationarity of (Xk2 )k∈Z with X02 m < ∞ is (ε0 m + 1) ·
∞
|bj | < 1
j=1
⇐⇒
∞ ξ 2 − λ 1 m 0 +1 · |bj | < 1. γ j=1
Set σ 2 = E(X02 − ρ20 ), the spectral density of (Xk2 )k∈Z is f (λ) =
∞ −2 σ 2 bj · eijλ . 1 − 2π j=1
The method developed in Giraitis and Robinson (2001) [94] for establishing the central limit theorem satisfied by the periodogram is essentially ad hoc and can not be used for non causal or non linear time series. The recent book of Straumann (2005) [181] also provides an up-to-date and complete overview to this question. Chapter 8 of this book is devoted to the results in Mikosch and Straumann (2002) [131] that studied the case of intermediate moment conditions of order > 4 and < 8 for the special case of GARCH(1,1) processes. Causal Bilinear processes. Now, assume that X = (Xk )k∈Z is a bilinear process (see the seminal paper of Giraitis and Surgailis (2002) [95]) satisfying the equation ∞ ∞ aj Xk−j + c0 + cj Xk−j Xk = ξk a0 + j=1
for k ∈ Z,
(12.1.5)
j=1
where (ξk )k∈Z are i.i.d. random variables with zero mean and such that ξ0 p < ∞ with p ≥ 1, and aj , cj , j ∈ N are real coefficients. Assume c0 = 0 and define
268
CHAPTER 12. SPECTRAL ESTIMATION
the generating functions
∞
∞ C(z) = j=1 cj z j A(z) = j=1 aj z j
∞
∞ G(z) = (1 − C(z))−1 = j=0 gj z j H(z) = A(z)G(z) = j=1 hj z j .
∞
∞ ∞ |a | + |c | < 1, If ξ0 p j=1 |hj | < ∞, for instance when ξ0 p · j j j=1 j=1 there exists a unique zero mean stationary and ergodic solution X in Lp (Ω, A, P) of equation (12.1.5). For p ≥ 2, the covariogram of X is ∞ ∞ −1 h2j gj gj+k R(k) = a20 ξ0 2 1 − and
j=1
k
j=0
|R(k)| < ∞. The spectral density of X exists and satisfies: f (λ) =
∞ ∞ a20 σ 2
∞ 2 gj gj+k e−ikλ , 2π 1 − j=1 hj k=−∞ j=0
with σ 2 = ξ0 22 . Non-causal (two-sided) bilinear and LARCH(∞) processes. linear process X = (Xk )k∈Z satisfies the equation Xk = ξk a0 + aj Xk−j , for k ∈ Z,
The bi(12.1.6)
j∈Z∗
where (ξk )k∈Z are i.i.d. random bounded variables and (ak )k∈Z is a sequence of
real numbers such that λ = ξ0 ∞ · j =0 |aj | < 1. Then the spectral density of X exists and is defined by −2 ∞ σ 2 ijλ 1 − f (λ) = b e j . 2π j=1 In the previous expression, the coefficients bj are not written explicitly as functions of the initial parameters (ai )i∈Z and a. By the same way as in the causal case, assume now that Y = (Yk )k∈Z satisfies the relation B 2 , Yk = ξk a0 + aj Yk−j for k ∈ Z, (12.1.7) j =0
with the same assumptions on (ξk )k∈Z and (ak )k∈Z . Then, the time series (Yk2 )k∈Z satisfies the relation (12.1.6) and is a stationary process. Then, Y is a stationary process, so-called a two-sided LARCH(∞) process. The condition on the sequence (ξk )k∈Z , i.e. i.i.d. random bounded variables, is restricting. However, if it is only a sufficient condition for the existence of a non causal LARCH(∞) process; it seems to be very close to be also a necessary condition, see Doukhan and Wintenberger (2005) [77].
12.2. PERIODOGRAM
269
Non-causal linear processes with dependent innovations. Let X be a zero mean stationary non causal (two-sided) linear time series satisfying eqn. (12.1.1) with now a dependent and centered stationary innovation process: Eξ02 = σ 2 .
Eξ0 = 0,
Then denoting by fξ the spectral of the process (ξt ) we have fξ (0) = σ 2 /(2π), moreover the spectral density of X exists and satisfies:
f (λ) =
2 ∞ −ikλ ak e fξ (λ) k=−∞
=
fξ (λ)
∞
(a ∗ a ˜)k e−ikλ .
(12.1.8)
k=−∞
Examples of interest are orthogonal series like mean zero LARCH(∞) processes. In this case no additional information about ξ may be obtained from the expression of this spectral density. Another important case is thus that of a process ξ with a non constant spectral density; we previously recalled the precise expression of this spectral density for the case of Bilinear processes introduced by Giraitis and Surgailis (2002) [95].
12.2
Periodogram
Let X = (Xk )k∈Z be a zero mean fourth-order stationary time series with real values. Denote (R(i))i the covariogram of X, and (κ4 (i, j, k))i,j,k the fourth cumulants of X: R(i) = κ4 (i, j, k) =
Cov(X0 , Xi ) = E (X0 Xi ), EX0 Xi Xj Xk − EX0 Xi EXj Xk −EX0 Xj EXi Xk − EX0 Xk EXi Xj ,
for (i, j, k) ∈ Z3 . We will use the following assumption on X: Assumption M. X is such that: γ=
∈Z
R()2 < ∞ and κ4 =
|κ4 (i, j, k)| < ∞.
(12.2.1)
i,j,k∈Z
This assumption is closely linked to weak dependence in the comments following definition 4.1 and by using the notation (4.4.9) with lemma 4.11 and propositions
270
CHAPTER 12. SPECTRAL ESTIMATION
2.1 and 2.2. Assumption M ensures the existence of X’s spectral density f ∈ L2 ([−π, π[): 1 R(k) eikλ for λ ∈ [−π, π[. 2π
f (λ) =
k∈Z
The periodogram of X is defined as: 2 n 1 −ikλ Xk e In (λ) = , 2πn
for λ ∈ [−π, π[.
k=1
We now rewrite In (λ)
=
1 D Rn (k)e−ikλ 2π |k|
Dn (k) = R
1 n
(n−k)∧n
Xj Xj+k
j=1∨(1−k)
Dn (k) is a biased estimate of R(k). Here R Thus, the periodogram In (λ) is a natural estimator of the spectral density; unfortunately it is not a consistent estimator, as its variance does not tend to zero as n tends to infinity. However, once integrated with respect to some L2 function, its behavior becomes quite smoother and can provide an estimation of the spectral density. Now, let g : [−π, π[→ R a 2π-periodic function such that g ∈ L2 ([−π, π[) and define: π Jn (g) = g(λ)In (λ) dλ, the integrated periodogram of X −π π g(λ)f (λ) dλ. and J(g) = −π
Theorem Let c > 0. Assume that the function g(λ)= ∈Z g eiλ
12.1 (SLLN). satisfies ∈Z (1 + ||)s g2 ≤ c for some s > 1. If X satisfies Assumption M, then uniformly with respect to such functions g, Jn (g) → J(g)
a.s.
This theorem will be proved after two lemmas of independent interest. Lemma 12.1. If X satisfies Assumption M, then: Dn () ≤ κ4 + 2γ. n max Var R ≥0
12.2. PERIODOGRAM
271
Proof of Lemma 12.1. To prove this result, we denote Yj, = Xj Xj+ − R(), use the identity Cov(Y0, , Yj, ) = κ4 (, j, j + ) + R(j)2 + R(j + )R(j − ) and deduce from the stationarity of (Yj, )j∈Z when is a fixed integer: Dn () nVar R
≤
(n−)∧n (n−)∧n 1 |Cov(Yj, , Yj , )| n j=1∨(1−) j =1∨(1−) |Cov(Y0, , Yj, )|
≤
|κ4 (, j, j + )| + 2R(j)2
≤
κ4 + 2γ,
≤
j∈Z
j∈Z
by using Cauchy-Schwarz inequality for 2 -sequences.
Lemma 12.2. Denote cs = ∈Z (1 + ||)−s . If X satisfies assumption of Theorem 12.1, then: 3c γ + cs (κ4 + 2γ) . E|Jn (g) − J(g)|2 ≤ n Proof of Lemma 12.2. As in Doukhan and Le´on (1989) [66], we use the decomposition: ⎧ ⎪ R() g , ⎪ T1 (g) = ⎪ ⎪ ⎪ ||≥n ⎪ ⎪ ⎨ 1 || R() g , Jn (g) − J(g) = −T1 (g) − T2 (g) + T3 (g) with T2 (g) = n ⎪ ||
(12.2.2) Remark that T3 (g) = Jn (g) − EJn (g). Thus, we obtain the inequality: E|Jn (g) − J(g)|2 ≤ 3(|T1 (g)|2 + |T2 (g)|2 + E|T3 (g)|2 ). Cauchy-Schwarz inequality yields: c (1 + ||)−s R()2 ≤ R()2 , |T1 (g)|2 ≤ c n ||≥n ||≥n c 2 c 2 −s 2 |T2 (g)| ≤ || (1 + ||) R() ≤ R()2 . n2 n ||
||
272
CHAPTER 12. SPECTRAL ESTIMATION γ . Lemma 12.1 entails n Dn () − ER Dn ())2 , ≤ c (1 + ||)−s (R
Hence, |T1 (g)|2 + |T2 (g)|2 ≤ |T3 (g)|2
||
E|T3 (g)|2
Dn ()) (1 + ||)−s Var (R
≤
c
≤
1 (1 + ||)−s (κ4 + 2γ) n
||
||
≤
c cs ·
κ4 + 2γ , n
We combine those results to deduce Lemma 12.2. Proof of Theorem 12.1. We derive the strong law of large numbers from a weak L2 -LLN and from Lemma 12.2. The proof follows the scheme of the proof of the standard strong LLN. Set t > 0. First, we know that for all random variables X and Y , we have P (|X + Y | ≥ 2t) ≤ P (|X| ≥ t) + P (|Y | ≥ t). Thus: P 2 max 2 |Jn (g) − J(g)| ≥ 2t ≤ P(|Jk2 (g) − J(g)| ≥ t) k ≤n<(k+1) + P 2 max 2 |Jn (g) − Jk2 (g)| ≥ t k ≤n<(k+1)
and P max |Jn (g) − J(g)| ≥ 2t ≤ n≥N
∞ √
P(|Jk2 (g) − J(g)| ≥ t)
k=[ N ]
+ ≤
∞ √ k=[ N ]
P
max
k2 ≤n<(k+1)
|Jn (g) − Jk2 (g)| ≥ t 2
AN + BN .
(12.2.3)
From Bienaym´e-Tchebychev (or Markov) inequality, Lemma 12.2 implies: AN ≤
C1 1 · , 2 t2 √ k
(12.2.4)
k≥ N
n () = R Dn () − ER Dn (). The fluctuation term BN with C1 ∈ R+ . Now set R is more involved and its bound is based on the same type of decomposition as (12.2.2), because for k 2 < n: Jn (g) − Jk2 (g) = −T1 (g) + T2 (g) − T3 (g),
12.2. PERIODOGRAM
273
with now,
T1 (g) =
R() g ,
k2 ≤||
1 || R() g , n 2 k ≤||
T2 (g) = and T3 (g) =
||
||
As previously, |T1 (g)|2 + |T2 (g)|2 ≤ c · Set Lk =
max
k2 ≤n<(k+1)2
BN ≤
γ . k2
|Jn (g) − Jk2 (g)| and Tk∗ (g) =
√ k≥ N
bk ,
max
k2 ≤n<(k+1)2
bk = P (Lk ≥ t) ≤
with
|T3 (g)|, then,
E(L2k ) . t2
Now E(L2k ) ≤ 3(|T1 (g)|2 + |T2 (g)|2 + E|Tk∗ (g)|2 ) ≤
3cγ + 3E|Tk∗ (g)|2 . k2
Then, for k 2 ≤ n < (k + 1)2 and ∈ Z, n () = R Δ,n,k
=
k2 Rk2 () + Δ,n,k n n∧(n−) 1 Yh, . n 2 2 h=(k ∧(k −))+1
n () = Δ,n,k in such a case. k2 () = 0 if k 2 ≤ || ≤ n and thus R Remark that R Also note that Δ∗,k
=
1 max |Δ,n,k | ≤ 2 k2 ≤n<(k+1)2 k
(k2 +2k)∧((k2 +2k)−)
h=(k2 ∧(k2 −))+1
and thus, E(Δ∗,k )2
≤ ≤
1 (2k)2 max 2 E(|Yh, |2 ) 4 k (h,)∈Z 4 E(|X0 |4 ). k2
|Yh, |,
274
CHAPTER 12. SPECTRAL ESTIMATION
Write T3 (g)
=
||
|Tk∗ (g)| ≤
k2 Rk2 () 1 − Δ,n,k g g − n 2
2 |Rk2 () g | + k 2 ||
||
Δ∗,k |g |,
||<(k+1)2
and we thus deduce for a constant C1 > 0, 4 C1 cA ∗ 2 ∗ 2 D E|Tk (g)| ≤ 2C1 sup Var (Rk2 ()) + sup E(Δ,k ) ≤ 2 k ∈Z k2 ∈Z for a constant A > 0 depending on E|X0 |4 , κ4 , and γ only. Hence bk ≤ 3(γ + AC1 )/(k 2 t2 ) is a summable series and, with C2 > 0, BN ≤
C2 1 · . 2 t2 √ k
(12.2.5)
k≥ N
Then, (12.2.3), (12.2.4) and (12.2.5) imply sup |Jn (g) − J(g)| → 0 in probabiln≥N
ity, so that Jn (g) → J(g) a.s. Two frames of weak dependence are considered here, the θ-weak dependence property and the non causal η-weak dependence.
12.2.1
Whittle estimation
The previous examples are essentially explicit representations of the spectral density of some commonly used times series. In the case when the coefficients are functions of an unknown finite dimensional parameter β, a way to estimate this parameter is to use the contrast J(gβ−1 ) where f (λ) = σ 2 gβ (λ) denotes the spectral density of the model according to the value β of the parameter and moreover gβ (0) = 1. We thus exhibit two parameters, σ 2 and β. Let (X1 , . . . , Xn ) be a sample from X. Define the Whittle maximum likelihood estimators of β ∗ and σ ∗2 , that are π In (λ) −1 D dλ and βn = Argminβ∈K Jn (gβ ) = Argminβ∈K −π gβ (λ) 1 σ Dn2 = Jn (g −1 Dn ). β 2π In Bardet, Doukhan and Le´ on (2005) [11] it is shown that strong consistency of the estimators βDn and σ Dn2 may be proved by using extensions of the previous tools. First a SLLN can be deduced of the one of the previous section and
12.3. SPECTRAL DENSITY ESTIMATION
275
secondly the CLT is derived by using the previous representation of the periodogram and the results in § 7.1. To our knowledge, the known results about asymptotic behavior of Whittle parametric estimation for non-Gaussian linear processes are essentially devoted to one-sided (causal) linear processes (see for instance, Hannan (1973) [102], Hall and Heyde (1980) [100], Rosenblatt (1985) [168], Brockwell and Davis (1988) [28]). There exist very few results in the case of two-sided linear processes. In Rosenblatt ((1985), p. 52 [168]) a condition for strong mixing property for two-sided linear processes was given, but some restrictive conditions on the process were also required for obtaining a central limit theorem for Whittle estimators: the distribution of random variables ξk has to be absolutely continuous with respect to the Lebesgue measure with a bounded variation density, m > 4 + 2δ with δ > 0 and a central
∞ limit theorem obtained with a tapered periodogram (under assumption also m=1 α4, ∞ (m)δ/(2+δ) < ∞ where α4, ∞ (m) ≥ αm denote a strong mixing coefficient defined now with four points in the future instead of 2 for α m ). The case of strongly dependent twosided linear processes was also treated by Giraitis and Surgailis (1990) [95] or Horvath and Shao (1999) [108]. In the case of causal linear processes, it is well known that: √ n(βDn − β ∗ ) → Np 0 , 2π(W ∗ )−1 , √ σ Dn2 is a consistent estimate of σ 4 and therefore n(D σn2 − σ ∗ ) → N 0 , σ ∗4 γ4 , √ √ with γ4 the fourth cumulant of the (ξk )k∈Z , and n(βDn − β ∗ ) and n(D σn2 − σ ∗ ) are asymptotically normal and independent.
12.3
Spectral density estimation
In this section, X = (Xn )n∈Z denotes a vector valued stationary sequence (with values in RD ). The spectral matrix density of X writes
fX (λ) =
t∈Z
Cov(X0 , Xt )eitλ
276
CHAPTER 12. SPECTRAL ESTIMATION
here, Rt = Cov(X0 , Xt ) = EX0T Xt − EX0T EXt is a D × D−matrix. This D × D−matrix valued function is estimated by fDX (λ)
= = =
Fm In (λ) ≡
π
−π
Fm (μ)In (λ − μ)
dμ 2π
+ n 1 |k − l| XkT Xl ei(k−l)λ 1− n m k,l=1 1 |s| D 1− Rn (s)e−isλ , 2π m
(12.3.1) (12.3.2) (12.3.3)
|s|≤m
Dn (s) = 1 R n
(n−s)∧n
(12.3.4)
j=1∨(1−s)
isλ = where Fm (λ) = |s|<m 1 − |s| m e the matrix periodogram is defined as In (λ) =
XjT Xj+s
1 n
sin(m+ 12 )λ sin λ 2
2 is the Fe´jer Kernel and
n 1 T Xk Xl ei(l−k)λ n k,l=1
if the sequence Xt is centered at expectation. Note that in equation (12.3.4), the summation contains n − |s| terms hence this estimate of Rs is biased. The following relation links the spectral density to the limit variance Σ in the CLT for X = (Xn )n∈Z fX (0) =
Cov(X0 , Xs ).
s∈Z
Assume
s
|s|ρ Rs < ∞ where · is any matrix norm, then
bias(λ)
=
bias(λ)
= ≤ =
fDX (λ) − fX (λ)
|s| |s| isλ isλ
Rs e
1− 1− − 1 Rs e −
m n
|s|<m
|s|≥m 1 ρ 1 |s|Rs + ρ |s| Rs n s m s 1 1 + O n mρ
12.3. SPECTRAL DENSITY ESTIMATION
12.3.1
277
Second order estimate
First recall the 4-th order cumulant of a centered random vector (x, y, z, t) ∈ R4D writes, for 1 ≤ a, b, c, d ≤ D, (a,b,c,d)
κ4
(x, y, z, t) = Ex(a) y (b) z (c) t(d) − Ex(a) y (b) Ez (c) t(d) − Ex(a) z (c) Ey (b) t(d) − Ex(a) t(d) Ey (b) z (c)
Ex(a) y (b) z (c) t(d) − Ex(a) y (b) Ez (c) t(d)
(a,b,c,d)
= κ4
(x, y, z, t)
(a) (c)
z
Ey (b) t(d)
(a) (d)
Ey (b) z (c) .
− Ex
− Ex
t
Set now (a)
(b)
(c)
(d)
(a)
κ(a,b,c,d) (i, j, k) = κ4 (X0 , Xi , Xj , Xk ),
(b)
r(a,b) (i) = EX0 Xi . (12.3.5)
Assume that for 1 ≤ a, b, c, d ≤ D,
|j|ρ r(a,b) (j)
=
rρ(a,b) < ∞,
(12.3.6)
κ(a,b,c,d) < ∞.
(12.3.7)
j∈Z
κ(a,b,c,d) (i, j, k) = i,j,k∈Z
Then we may rewrite
fD(λ) − EfD(λ)
=
n
Zk,n (λ),
(12.3.8)
k=1
Zk,n (λ)
=
δn,k,s (λ)Yk,s ,
with
(12.3.9)
|s|<m
Yk,s δn,k,s (λ)
= XkT Xk+s − R(s) |s| −isλ = 1− 1{1≤k+s≤n} e m
(12.3.10) (12.3.11)
278
CHAPTER 12. SPECTRAL ESTIMATION
hence setting fD(λ) = fD(a,b) (λ) 1≤a,b≤D , each coordinate writes for 1 ≤ a, b, c, d ≤ D as: (a,b,c,d) Var fD(λ) = Cov fD(a,b) (λ), fD(c,d) (λ) n 1 |j| (a,b,c,d) (λ), = 1− Cj,n n n |j|
max
1≤a,b≤D
max |ua,b |,
1≤a,b≤D
|αa,b,c,d | ≤ D2
1≤c,d≤D
max
1≤a,b,c,d≤D
|αa,b,c,d |.
(12.3.12)
(12.3.13)
The variance of this estimate is a 4−th order tensor which writes (using notation (12.3.13)) (a,b,c,d) ≤ 1 Var fD(λ) |κ(a,b,c,d) (s, j, j + t)| n |j|
+
1 n
+
1 n
|r(a,c) (j)||r(b,d) (j + t − s)|
|j|
|r(a,d) (j + t)||r(b,c) (j − s)|,
|j|
(a,b,c,d) Assumptions (12.3.7) implies that 2m+1 bounds the first sum and after n κ an interversion of summations w.r.t. t and s, assumption (12.3.6), both other (a,c) (b,d) (a,d) (b,c) terms are bounded above by 2m+1 r1 and 2m+1 r1 , respectively. n r0 n r0 This entails ⎛ ⎞ D D D
2m + 1
(a,c) (b,d) ⎝ max κ(a,b,c,d) + 2 max r0 r1 ⎠ ,
Var fD(λ) ≤ 1≤a,b≤D 1≤a,b≤D n c=1 c,d=1
d=1
12.3. SPECTRAL DENSITY ESTIMATION
279
we thus obtain Lemma 12.3. Assume that the vector valued stationary sequence (Xn )n∈Z satisfies conditions (12.3.7) and (12.3.6) for some ρ ≥ 1, then (using notation (12.3.12))
2
1 1 m
+ + E fD(λ) − f (λ) ≤ C n2 m2ρ n −2ρ/(2ρ+1) by setting m = n1/(2ρ+1) . This expression is optimized as O n
12.3.2
Dependence coefficients
This section is aimed to prove that bounds of higher order moments needed for deriving a.s. convergence of spectral density estimates are analogue to those computed for densities estimates in chapter 11, see § 11.2.2. Spectral estimates are written as second order polynomials of the initial process. We thus need a translation table to compute the properties of the initial process in order to derive asymptotics. In connection with § 4.3 and § 4.4.1 we now make use of the decorrelation coefficients (2.2.1), the following lemma is a first rough bound which relates those coefficients to those built upon the sequence Z = (Zk )k∈Z defined in an analogous way to (12.3.9), for some fixed complex valued sequence δk,s ∈ C such that supk,s |δk,s | ≤ 1 (here the dependence with respect to λ is omitted). In order to obtain a suitable bound of this coefficient, it seems unavoidable to make use of the previous diagram formula. (a ,b ) (a ,b ) (al+1 ,bl+1 ) (a ,b ) cZ,q (r) = max · · · Zkq q q Cov Zk1 1 1 · · · Zkl l l , Zkl+1 k1 ≤ · · · ≤ kq kl+1 − kl = r 1 ≤ a1 , . . . , aq ≤ D 1 ≤ b1 , . . . , bq ≤ D
Those coefficients are already defined for vector valued sequences in (4.4.7). Set, (a ,b ) (a ,b ) (al+1 ,bl+1 ) (a ,b ) C = Cov Zk1 1 1 · · · Zkl l l , Zkl+1 · · · Zkq q q if kl+1 − kl = r, then (in a condensed notation) (a ,b ) (aq ,bq ) (al+1 ,bl+1 ) (a ,b ) · · · Y C = Cov Yk1 ,s1 1 1 · · · Ykl ,sl l l , Ykl+1 kq ,sq ,sl+1 |s1 |,...,|sq |≤m
≤
q
u
|κμi (k,s) (X (a,b) )|
u=1 μ1 ,...μu |s1 |,...,|sq |≤m i=1
where the previous sum extends on such undecomposable diagrams such that some μi is not entirely contained neither in the past nor in the future of the
280
CHAPTER 12. SPECTRAL ESTIMATION ⎧ (1, 1), ⎪ ⎪ ⎨ (2, 1), ⎪ ... ⎪ ⎩ (l, 1),
table Past
Future
(1, 2) (2, 2) ... (l, 2)
⎧ ⎨ (l + 1, 1), (l + 1, 2) ... ... ⎩ (q, 1), (q, 2)
The number of such regular diagrams is thus q! (each one is defined by one term in the past and one term in the future). In the general case, set λi = #μi for 1 ≤ i ≤ u, then if μi contains vi terms (ji,1 , 2), . . . , (ji,vi , 2) from the second column of the table u κμi (k,s) (X (a,b) ) Cμ1 ,...,μu = |s1 |,...,|sq |≤m i=1
|Cμ1 ,...,μu | ≤
u
|κμi (k,s) (X (a,b) )|
i=1 |sji,1 |,...,|sji,v |≤m i
For the general case we need to consider sums of cumulants indexed by ν(s) = (h1 + s1 , . . . , hi + si , h 1 , . . . , h j ) κμ(s) (X) |s1 |,...,|si |≤m
Using stationarity, it is easy to check that this sum is bounded by (2m+1)w κi+j (i + j = #μ) where w = 7 0 or 1 according to the fact that j = 0 or j = 0. Hence |Cμ1 ,...,μu | ≤ (2m + 1)w ui=1 κ#μi where w is the number of those μi entirely in the second column of the table, hence w ≤ q2 . In fact, the non Gaussian partitions (those with μi > 2 for some i) have a contribution of a lower order O(m[q/2]−1 ). Taking in account that in each partition one term has factors both in the Past and in the Future and lemma 4.11 we thus obtain Theorem 12.2. If the sums of cumulants of X, κp < ∞ are finite for each p ≤ 4q (which holds if r (r + 1)p−2 c∗X,p (r) < ∞ from lemma 4.12) then there is a constant Kq > 0 such that cZ,q (r) ≤ Kq (2m + 1)[q/2] c∗X,2q (r − 2m). Using inequality (4.4.12) we thus derive the main result of this section. Corollary 12.1. Let 2p ≥ 2 be an even integer such that κq < ∞ for q ≤ 4p, then m p EfD(λ) − EfD(λ)2p = O n
12.3. SPECTRAL DENSITY ESTIMATION
281
• Note that lemma 4.12 prove that the previous conditions may also be expressed in terms of weak dependence coefficients (4.4.8). • Moreover the condition m(n) q <∞ n n implies a.s. convergence of such spectral density estimates as this was done in § 11.2.2 for regression estimates.
Chapter 13
Econometric applications and resampling Essentially few rigorous results are stated in this chapter. We are aimed here to check how weak dependence may be applied in standard applications. This last chapter is more aimed at providing reasonable directions for further investigations of times series. The chapter is organized as follows. In order to provide deep econometric motivations, Section 13.1 includes several situations where various weak dependence conditions arise. After some generic examples including bootstrapping, we consider specific problems including unit root problems and parametric or semiparametric problems in § 13.1.1, 13.1.2 and 13.1.3. A following section 13.2 reviews the question of bootstrap; some models based bootstraps are first considered (see also § 13.2.4). We consider the block bootstrap in § 13.2.1 and § 13.2.2 addresses GMM estimation for which weak dependence provides a complete proof of the results in Hall and Horowitz (1996) [101]. We also mention conditional bootstrap in § 13.2.3 and sieve bootstrap in § 13.2.4. Finally in Section 13.3 we study more completely the problem of limiting variance (in the central limit theorem) estimation under η-weak dependence.
13.1
Econometrics
Time series analysis is a major part of econometrics. Here we provide several examples of interest in which it is essential to consider dependent structures instead of simple independence. In some situations like bootstrap mixing notions seem useless (see § 13.2.2). An application concerns linearity tests in time series analysis. Rios (1996) [162] considers a stationary functional autoregressive model (13.2.1) where r = L + C is the decomposition of the autoregression 283
284
CHAPTER 13. ECONOMETRICS AND RESAMPLING
function into a sum of linear (L) and nonlinear (C) components. Local linearity of r is then tested via the null hypothesis 2 H0 : (r
(x)) w(x) dx = 0 where the weight function w has compact support. Rios (1996) [162] proves that a local linearity test can be handled in the strong mixing case. The function r is assumed to be ρ continuous. Then the plug-in estimator TD = rD2 (x)w(x)dx converges to T = r2 (x)w(x)dx if αn = O(n−a ) and a > 2 + 3/ρ and the 1 1 bandwidth condition hn ∈ [n− 10 , n− 2ρ−4 ]. This result may be extended to weak dependence as in Ango Nze et al. (2002) [6]. Still another problem of interest is to test the independence of the innovations (ξn )n∈Z in a regression model Xn = aYn + ξn . This can be performed using the Durbin-Watson statistic which is a non correlation test. The latter can be written as a continuous functional of the Donsker line of the sequence (ξn )n∈Z . Some other applications are detailed in the forthcoming subsections.
13.1.1
Unit root tests
Consider a stationary autoregressive sequence (Xn )n∈Z generated by an i.i.d. sequence (ξn )n∈Z , Xn = aXn−1 + ξn . A classical problem is to test whether there is a unit root (that is a = 1). In the specific context of aggregated time series, the assumption of white noise innovations seems to be rather strong. Phillips (1987) [145] develops unit root tests for mixing and heterogeneously distributed innovations. The ordinary least square estimate D a is shown to be a continuous functional of the Donsker line of the sequence (ξn )n∈Z . As an application of the functional central limit theorem, Phillips shows that a unit root test can be based on the fact that under the null hypothesis H0 : a = 1, σξ2 1 D 2 n (D a − 1) −−−−→ 1 W1 − 2 n→∞ 2 σ Wt2 dt 0
∞ where W denotes the standard Brownian motion and σ 2 = −∞ Cov(ξ0 , ξk ); in the initial i.i.d. frame σξ2 /σ 2 = 1 but this is no more the case for dependent innovations. The author works with stationary strong mixing sequences, and conditions under which the functional CLT result holds are given before. This example, as the author suggests, can be generalized to error sequences (ξn )n∈Z that allow for heteroskedasticity.
13.1. ECONOMETRICS
13.1.2
285
Parametric problems
Generalized method of moments (GMM) estimation procedures involve an estimate θDn , which is a solution of the arg-min problem Jn (θDn ) = minθ∈Θ Jn (θ), where n n 1 1 Jn (θ) = g(Xi , θ) Ω g(Xi , θ) . (13.1.1) n i=1 n i=1 Here Θ ⊂ Rd is a finite dimensional parameter set, and g(·, ·) is a given function such that Eθ0 g(X1 , θ0 ) = 0, where θ0 is the true parameter point. In the time series context, the positive semi-definite matrix Ω is often replaced (see Hall and Horowitz, 1996, equation (3.2) [101]) by an asymptotically optimal weight matrix estimate Ωn (θ) H(x, y, θ)
=
n κ 1 g(Xi , θ)g(Xi , θ) + H(Xi , Xi+j , θ), n i=1 j=1
= g(x, θ)g(y, θ) + g(y, θ)g(x, θ) ,
and κ is such that Eg(Xi , θ)g(Xj , θ) = 0 if |i − j| > κ. The statistic to test H0 : 1 n θ = θ0 is Jn (θ) = Kn (θDn ) Kn (θDn ), where Kn (θ) = √1n Ωn (θ) 2 i=1 g(Xi , θ) (the square root of a symmetric positive matrix is uniquely defined). The following CLT holds under standard mixing assumptions: √ D θDn − θ −−−−→ Nd (0, Id ), (13.1.2) Tn (θ) = n Σ−1 n n→∞
where the diagonal matrix Σn has d entries. GMM techniques naturally involve an unknown covariance matrix. In order to estimate such limiting distributions it will be natural to use the bootstrap techniques described in § 13.2.1.
13.1.3
A Semi-parametric estimation problem
We follow the presentation in Robinson (1989) [164]. He considers an economic variable observable at time n which is an R × 1 vector of r.v.’s (Wn )n∈Z . We observe Wn at time n = 1 − P, 2 − P, . . . , T where P is nonnegative and T large.
Hypotheses of economic interest often involve a subset Xn = B(Wn , . . . , Wn−P )
of the array Wn , . . . , Wn−P ; for this B is a J × (P R) matrix formed from the P R−rowed identity matrix IP R by omitting P R − J rows (which means that in B, P R − J elements of Wn , . . . , Wn−P are deleted). Thus, in B, elements of Wn , Wn−1 , . . . , Wn−P which are not in Xn are deleted, and Xn can have elements in common with Xn+P −1 , . . . , Xn+1 , Xn−1 , . . . , Xn−P . Let Xn = (Yn , Zn ) , where Yn and Zn are K × 1 and L × 1 vectors (K + L = J). The problem of interest is to test the hypothesis E(Yn |Zn ) = 0 against the
286
CHAPTER 13. ECONOMETRICS AND RESAMPLING
alternative E(Yn |Zn ) = 0. This null hypothesis is written in the form τ = 2 H(z, z)f (z)dz = 0 for M = 0 and RL
τ= H(z, z) f (z), f (1) (z) , . . . , f (M) (z) f (z)dz = 0 RL
for some M > 0 and some function H(z, z) defined as G(x1 , x2 )dF (y1 |z1 )dF (y2 |z2 ) H(z1 , z2 ) = RK ×RK
for some convenient function G and where F (A|z) = P(Yn ∈ A|Zn = z) for any Borel set A of RK and z ∈ RL and x1 = (y1 , z1 ) and x2 = (y2 , z2 ) . Here f (j) (z) denotes the vector of j−partial derivatives of f . An example of this framework is given by Xn = (Yn , Zn ) , where Yn = (tn , s n )
and Zn = vn . The regression model tn = β (sn − En sn ) + γ En sn + un
(13.1.3)
is of common use in econometrics. Here sn , tn , vn are respectively scalars, p × 1 and q × 1; they are observable random sequences while the innovation process (un ) is centered and unobservable, so that E(un |sn , tn ) = 0; we denote En ( · ) = E(·|vn ). In the case of a weakly dependent and stationary innovation process, Robinson (1989) [164] considers the hypothesis H0 : β = 0. In this case, the hypothesis can be written as before and Robinson calculates β = τ where K = p + 1, L = 1, M = 0 and G(x1 , x2 ) = (t1 − t2 )s1 φ(v1 ) for some function φ : Rq → R (usually φ ≡ 1). Robinson considers the ˆ = nD D −1 τD constructed from the n−sample (X1 , . . . , Xn ). Here, statistics λ τ Ω n 1 D is the natural esG(Xi , Xj )k(Zi − Zj /h) is a U −statistics and Ω τD = 2 L n h i,j=1
D = 1 timator of the covariance matrix of τD. One such estimate is Ω ci c . n i=1 i Tapered versions might be preferred (see formula (2.21) of Robinson, 1989 n di,j + dj,i with di,j = G(Xi , Xj )k(Zi − Zj )/h, where [164]), here ci = n
j=1 β−mixing assumpk(z) = h−L k(z), h−1 k (1) (z) , . . . , h−M k (M) (z) . Under √ tions, Robinson proves that the above estimates are n−consistent and satisfy a CLT. Under a natural β−mixing condition, Robinson proves in fact that D has asymptotically a χ2 −distribution if βj = O(j −b ) where the statistic λ b > μ/(μ − 2) under the moment assumption supi,j E|G(Xi , Xj )|μ < ∞. The β−mixing assumption allows to compare the joint distribution of the initial sequence with respect to a sequence of r.v.’s with independent blocks. This
13.2. BOOTSTRAP
287
reconstruction is due to Berbee’s coupling lemma, no matter how big the size of the blocks may be. Yoshihara (1976) [193] derives a covariance inequality that fits to U −statistics. A way to get rid of β−mixing conditions is to consider an ˜ n of the trajectory X1 , . . . , Xn . Now a simpler ˜1, . . . , X independent realization X estimator of τ is given by n ˜j 1 − Z Z i ˜ j )k τ = 2 L . G(Xi , X n h i,j=1 h The asymptotic behavior of this expression is easy to derive under
alternative weak dependence conditions by using our results because τ = h1L ni=1 Wn,h (Xj ) is the numerator of a Nadaraya-Watson kernel for the regression estimation problem E(s1 (t1 − t)|v1 = z) in the special case of the previous example. In fact this trick avoids the corresponding coupling construction for U −statistics.
13.2
Bootstrap
Consider the following example concerning bootstrap: let a stationary autoregressive sequence be generated by an independent and identically distributed (i.i.d.) and centered sequence (ξn )n∈Z : Xn = r(Xn−1 ) + ξn .
(13.2.1)
Standard nonparametric estimation techniques provide an estimate of the autoregression function r. Let rD be a convenient estimator of r. Given data (X1 , . . . , Xn ) from the sequence 13.2.1, another autoregressive process can be defined by Dn−1 ) + ξn∗ . Gn = rD(X (13.2.2) X The innovations (ξn∗ ) are i.i.d. drawn according to the centered empirical measure of the estimated centered residuals, 1 ξj , ξDi = ξi − n j=1 n
ξi = Xi − rD(Xi−1 ),
1 ≤ i ≤ n.
From the first example § 1.5 this is clear that no mixing assumption can be expected for the model (13.2.2). However, our concept of fading memory can still be applied. Bickel and B¨ uhlmann (1999) [18] set up such a new weak dependence condition in order to build critical bootstrap values for a linearity test in linear models. Doukhan and Louhichi (1999) [67] have extended it in order to fit models such as positively dependent sequences, Markov chains (with or without topological assumptions), and Bernoulli shifts. The Bernoulli shifts are defined in Assumption 1 of Hall and Horowitz (1996) [101] and are used throughout
288
CHAPTER 13. ECONOMETRICS AND RESAMPLING
that paper. The above mentioned weak dependence conditions yield standard √ results concerning convergence in distribution with a n-normalization. If now the process Xk = H(ξk , ξk−1 , . . .) is defined as a Bernoulli shift a suitable D ξ∗, ξ∗ , . . . , ξ∗ form of the resampled version Xk∗ of Xk is H k k−1 k−l+1 , 0, . . . . Here D and ξ ∗ are i.i.d. random variables drawn uniformly with the distribution Fn , H k the centered version of residuals distribution obtained through filtering. In the
simple case of a linear process (H(z0 , z1 , . . .) = k ak zk , see Section 13.2.2); in the general setting, one needs to develop additional estimation procedures. In order to describe the asymptotic properties of such processes one needs to know the limiting asymptotic behaviour of Bernoulli shifts. Unfortunately such representations are not always known and additional bootstrap procedures have been considered.
13.2.1
Block bootstrap
We describe here the block-bootstrap procedure which is adapted to the times series (Xi )i∈N . Let b = b (n) and l = l (n) denote the number and the length of the blocks. Assume b · l = n and consider l blocks (X(j−1)l+1 , . . . , Xjl ) for jl ) are now randomly drawn (uniformly) (j−1)l+1 , . . . , X 1 ≤ j ≤ b. Blocks (X among those l blocks. A trajectory of the resampled process is obtained by concatenation of those block; however a problem clearly appears to connect those (see K¨ unsch, 1989 [113]).
13.2.2
Bootstrapping GMM estimators
˜∗ Using the notations in § 13.1.2, let (Xi )1≤i≤n denote a block-bootstrap sample and let g ∗ (x, θ) = g (x, θ) − E∗ g x, θDn . The expectation is taken with respect to the bootstrap distribution. The GMM estimate θD∗ solves the arg-min problem n
Jn∗ (θ)
=
1 ∗ ∗ g (Xi , θ) n i=1 n
Ω
n 1 ∗ ∗ g (Xi , θ) n i=1
(13.2.3)
when the matrix Ω is known. In order to prove the consistency of this bootstrap procedure Hall and Horowitz (1996) [101] propose an uncomplete proof. However, weak dependence will allows us to prove rigorously this consistency. More precisely, if Xn = h( n ,
n−1 , . . .) for some i.i.d. sequence ( i )i∈Z , their Assumption 1 is∗ E h( n , n−1 , . . .) − h( n , n−1 , . . . , n−m , 0, 0, . . .) ≤ ∗ The
∗
function h : RN → B with B a Banach space with norm · .
e−dm . d
13.2. BOOTSTRAP
289
This condition holds for linear processes and it is claimed to imply geometric strong mixing in [101]. Andrews’s simple example (1984) [2] proves that this does not hold in general. It implies however θ−weak dependence with geometric decay yielding the following useful tail inequality for sums of functions of the sequence ξn = f (Xn , θ)† . This is the main tool to prove the validity of the bootstrap in this dependent setting. A rigorous version of Lemma 1 in Hall and Horowitz (1996) [101] thus follows Lemma 13.1 (Ango Nze & Doukhan, 2004 [7]). Let (ξn ) be a stationary η−weakly dependent sequence with Eξn = 0 such that η(r) = O (e−ar ) as −33 , as |z| → ∞. Then r ↑ ∞ for some a > 0, and P(|ξ1 | ≥ z) = O |z| Rn = n−1 ni=1 ξi satisfies 2+ lim n P |Rn | > n− 5 = 0.
n→∞
n = 1 n ξi,n . Proof. Set ξi,n = ξi 1{|ξi |≤n1/16 } − Eξi 1{|ξi |≤n1/16 } and let R i=1 n Then 2+ 2+ n | > n− 2+ 5 P |Rn | > 2n− 5 ≤ P |R + 2n1+ 5 E ξi 1{|ξi |>n1/16 } 32(2+) 2+ 3 1 ≤ n− 5 E|Rn |32 + 2n1+ 5 ξ1 4 P 4 |ξ1 | > n 16 −11+32 47 + n 5 − 320 = n−1 O n 5 = o(n−1 ). Following precisely the same steps as in Hall and Horowitz (1996) [101], we thus prove, by only replacing their Lemma 1 by our lemma 13.1, that bootstrapping critical values for GMM estimators is an asymptotically valid procedure. Remark 13.1. Theorems 1, 2 and 3 in Hall and Horowitz (1996) [101] seem now to be rigorously proved. Analogous comments fit to a more recent paper by Andrews (2002) [4]. The exponent 33 in the previous lemma is unnatural and it will be improved in a forthcoming paper. The above procedure can be used for testing the null hypothesis H0 : θ = θ0 against the bilateral alternative. Under H0 the studentized statistic Tn (θ) described in (13.1.2) satisfies with the critical value Qα , P (|Tn (θ)| > Qα ) = α + O (1/n) . † In this equation, θ is really the parameter to be estimated and in order too avoid further confusion with the dependence we shall prove below a result under the weaker η weak dependence.
290
CHAPTER 13. ECONOMETRICS AND RESAMPLING
As in G¨ otze & Hipp (1978) [99], Hall and Horowitz seem to prove that Tn (θ) and the bootstrap studentized statistic √ Tn∗ (θ) = n Σ∗n −1 θDn∗ − θDn , have close distributions in the sense that P sup |P∗ (Tn∗ (θ) ≤ z) − P (Tn (θ) ≤ z)| > n−a = o n−a ,
(13.2.4)
z∈R
for a relevant integer 2a, with a ≥ 1 + ξ, and the range of ξ ∈ [0, 1] is formulated according to the dependence assumptions prescribed. This relation comes from an Edgeworth development. It yields an improved acceptation rule for the test of H0 : P (|Tn∗ (θ)| > Q∗α ) = α + O n−1−ξ .
13.2.3
Conditional bootstrap
A simple local conditional bootstrap is investigated by Ango Nze et al. (2002) [6]. Consider a stationary process (Xt , Yt )t∈Z . The local bootstrap for nonparametric regression is defined as follows: the empirical distribution for Yt given Xt = x writes n 1 x − Xt * D D F (y|x) = 1{Yt ≤y} K fn,b (x), nb t=1 b for a kernel density estimator with bandwidth b = b(n). The bootstrap sample is now defined as (Xt , Yt∗ )1≤t≤n where Yt∗ ∼ FD( · |Xt ) is independent of (Ys∗ )s =t conditionally to the data for 1 ≤ t ≤ n. In this article, it is shown that ∗ the asymptotic properties of the local regression bootstrap estimator rDn,h with bandwidth h = h(n) constructed from this sample are analogue to those of the regression estimator rDn,h contructed from the data (Xt , Yt )1≤t≤n : √ ∗ ∗ nh rDn,h (x) − E∗ rDn,h (x) ≤ u sup P∗ u∈R √ P nhD rn,h (x) − ED rn,h (x) ≤ u →n→∞ 0, −P under suitable weak dependence assumptions (see theorem 4 in [6]).
13.2.4
Sieve bootstrap
Bickel and B¨ uhlmann (1999) [18] tackle the problem of the ‘sieve bootstrap’ for a one sided linear process Xn − μ = ξ0 +
∞ t=1
at ξn−t
(13.2.5)
13.2. BOOTSTRAP
291
where (ξn ) is a sequence of i.i.d. random variables (r.v.’s) with Eξ0 = 0 and the ∞ |at | < ∞ and μ = EXn . density function fξ , and where t=1
Under the assumption that the function Ψ(z) = 1 +
∞
at z t has no root in the
t=1
closed unit circle, the process (13.2.5) admits an AR(∞) representation (Xn − μ) +
∞
bt (Xn−t − μ) = ξn ,
t=1
with
∞
|bt | < ∞.
(13.2.6)
t=1
The latter process (13.2.6) is fitted with an autoregressive process of finite order p(n) (p(n)/n → 0, p(n) → ∞). Using estimated residuals, the resampling (i.i.d.) innovation process (ξn∗ )n∈Z is constructed by smoothing the empirical process based on those residuals by a kernel density estimate of the density fξ . Finally, the smoothed sieve bootstrap sample (Xn∗ )n∈Z is defined by resampling the AR(p(n)) process from innovations (ξn∗ )n∈Z :
p(n)
(Xn∗
¯ + − X)
Dbt (X ∗ − X) ¯ = ξn∗ n−t
(13.2.7)
t=1
The purpose of [18] was to carry over a weak dependence property (here strong mixing) of the initial sequence (Xn )n∈Z to the sieve processes (Xn∗ )n∈Z (a classic and a smoothed version were examined in the paper). The goal is unrealistic for the classic bootstrap sample, since the distribution of the bootstrapped innovations is discrete. Proving a mixing property for the smoothed sieve bootstrap sample eludes the efforts of the authors. In the latter case, it nevertheless appears that limit theorems can be proven by another method. It consists in using the following property: |Cov (g1 (X−d1 +1 , . . . , X0 ) , g2 (Xk , . . . , Xk+d2 −1 )) | ≤ 4 g1 ∞ g2 ∞ ν k; C d1 , C d2 ,
(13.2.8)
with d1 , d2 ∈ N and for smooth functions g1 , g2 belonging to the classes C d1 and C d2 (see equation (3.1) for the definition of the class C d and some examples). The new dependence coefficient ν is less than the strongly mixing coefficient. Bickel and B¨ uhlmann (1999) [18] cannot prove that the sieve sequence (Xn∗ ) is strongly mixing. A weak dependence condition is now defined by the ν coefficient. The authors prove that it is satisfied by both this sequence and a smooth version of the resampled innovations. For instance, Bickel and B¨ uhlmann prove that if the sequence (Xn )n∈Z satisfies some regularity conditions ensuring that αk = O (k −γ ) (recall that νk ≤ αk ), then the sieve bootstrap process (Xn∗ )n∈Z satisfies a ν mixing condition with a polynomial rate ν ∗ (k; C d1 , Dd2 ) = O k −Lγ for relevant classes C d1 , Dd2 and a positive constant L. See Theorem 3.2 on page 422 in Bickel and B¨ uhlmann (1999) [18] for more details.
292
13.3
CHAPTER 13. ECONOMETRICS AND RESAMPLING
Limit variance estimates
The end of this chapter is aimed at providing a very important resampling tool for weakly dependent time series. In this monograph (and elsewhere) extensions of the central limit theorem have been proved for times series. The limiting variance takes a complicated form; the results write as 1 d √ Xk →n→∞ N (0, σ 2 ), n n
where
σ2 =
k=1
∞
EX0 Xk .
k=−∞
Functional vector valued versions of such results also arise, 1 √ Xj → Z(t) − Z(s), n
(13.3.1)
ns<j≤nt
with Z(t) ∈ RD , the D-dimensional centered Gaussian random process such that ∞ Cov(X0 , Xk ). Cov(Z(s), Z(t)) = (t ∧ s) · Σ, Σ= k=−∞
For statistical use, one needs to provide self-normalized versions of such results. The expression reduces to σ 2 = EX02 for iid sequences and it may be directly n 1 estimated by n k=1 Xk2 using the method of moments. The first and natural way to achieve this for dependent sequences is to replace σ 2 by some convergent estimator. We shall assume η−weak dependence; several ways of estimating this quantity are reasonable. • Recalling that σ 2 is only the value at origin of Xt ’s spectral density gives a first approach; spectral density estimates from chapter 12 yield as in § 12.3 estimators of σ 2 , we defer a reader to Bardet et al. (2005) [11] for this approach. • Another way to estimate this quantity is considered here: we mimick an argument by Carlstein (1986) [34], see also Peligrad and Shao (1994) [141]. This estimation is based on the Donsker invariance principle and a subsampling argument described in section 13.3.2. We provide below a.s. convergence properties of those estimates as well as a CLT; modifications of the previous CLT will make it suitable for applications. To this aim, subsection 13.3.1 relates moments of sums with the cumulants of stationary sequences; this is a tool of an independent interest for several applications, like extensions to multispectra of the results in chapter 12. We are involved here in a vector valued version of the estimation of σ 2 . The main motivation for this is to derive a dependent version of Kolmogorov-Smirnov test. Empirical CLT (for the empirical cdf) are derived in chapter 10: n d 1 √ 1(Xk ≤x) − F (x) →n→∞ B(x), n k=1
13.3. LIMIT VARIANCE ESTIMATES
293
where B(x) denotes the centered Gaussian process such that EB(x)B(y) =
∞
Cov 1(X0 ≤x) , 1(Xk ≤y) ,
k=−∞
A direct extension of the Kolmogorov-Smirnov test is not possible for such dependent sequences because limits are not distribution free. After a convenient discretization, Doukhan & Lang (2002) [63] prove that confidence bounds for such statistics, which are not anymore distribution-free, may however be estimated with the present sharp estimates of the multivariate extension of σ 2 . The following subsection is aimed to derive useful tools in order to precise asymptotics for the previous estimation procedure.
13.3.1
Moments, cumulants and weak dependence
We thus consider a stationary vector valued sequence (Xn )n∈Z with values in RD ; we equip RD with the norm defined as x = |x(1) | + · · · + |x(D) | for x = (x(1) , . . . , x(D) ) ∈ RD . We assume below that there exists some b > 2 such 1 u that X0 b = EX0 b b < ∞. For each u ≥ 1 we identify the sets RD and RD·u . We use the coefficients cX,q (r) from (4.4.7). We define nonincreasing coefficients, for further convenience, (d) t cX,q (r) = max cX,l (r)μq−l , with μt = max E X0 . (13.3.2) 1≤l≤q
1≤d≤D
Those coefficients (4.4.7) are now linked to the weak dependence coefficients: Proposition 13.1. Assume that the stationary and vector valued sequence (a) (Xn )n∈Z is η-dependent and satisfies μ = max1≤a≤D X0 b < ∞ for some b > q then the coefficients defined in eqn. (4.4.7) satisfy b
cX,q (r) ≤ q4 b−1 μ
(q−1)b b−1
b−q
η(r) b−1 .
Proof of proposition 13.1. Consider integers 1 ≤ < q, 1 ≤ a1 , . . . , aq ≤ D and t1 ≤ · · · ≤ tq such that t+1 − t ≥ r, we need to bound (uniformly wrt , 1 ≤ a1 , . . . , aq ≤ D and t1 ≤ · · · ≤ tq ) the expression (a ) (a+1 ) (a ) (a ) c = Cov Xt1 1 · · · Xt , Xt+1 · · · Xtq q = |Cov(A, B)| (13.3.3) In order to bound (13.3.3), for some M > 0 depending on r [to be defined later] (1)
(D)
we now set X j = (X j , . . . , X j
(a)
) with X j
(a)
= Xj
∨ (−M ) ∧ M for each
294
CHAPTER 13. ECONOMETRICS AND RESAMPLING (1)
(D)
1 ≤ a ≤ D if Xj = (Xj , . . . , Xj c
). Then we also have,
≤ |Cov(A, B)| + |Cov(A − A, B)| + |Cov(A, B − B)|, (a1 )
A = X t1
(a )
(a+1 )
· · · X t ,
(aq )
· · · X tq .
B = X t+1
Now we note that |(A − A)B| ≤ |A(B − B)| ≤
(ai )
(a )
Yi |Xti i − X ti |,
i=1 q
(ai )
(a )
Yi |Xti i − X ti |
i=+1 (aj )
(a )
where Yi writes as the product of q − 1 factors Zi,j = |Xtj j | or |X tj | for 1 1 ≤ j ≤ q and j = i. It is thus clear that for q−1 b + p = 1 an analogue representation of the centering terms yields |Cov(A−A, B)|+|Cov(A, B−B)| ≤ 2
q i=1
(a)
(a)
(a)
max X0 q−1 max X0 −X 0 p . b
1≤a≤D
1≤a≤D
Set h(x) = x ∨ (−M ) ∧ M thenLip h = 1 and h ∞ = M thus f (x1 , . . . , x ) = (a+1 ) (a ) (a1 ) (a ) h(x1 )···h(x ) is such that A =f Xt1 , . . . , Xt , . . . , Xtq q ,B =fq− Xt+1 and f ∞ ≤ M , Lip f ≤ M −1 ; from η-dependence we derive, |Cov(A, B)| ≤ (Lip f fq− ∞ + (q − )Lip fq− f ∞ ) η(r) ≤ qM q−1 η(r). (a)
Now for each real valued random variable X1 (a)
E|X0
(a)
− X 0 |p
=
(a)
E|X0
and 1 ≤ a ≤ D, (a)
− X 0 |p 1|X (a) |≥M 0
(a) E|X0 |p
≤
2
≤
(a) 2p E|X0 |b M p−b .
p
1|X (a) |≥M 0
Hence, (a)
(a)
(a)
b
X0 − X 0 p ≤ 2X0 bp M 1− p , b
(a)
thus setting μ = max1≤a≤D X0 b , we obtain with the previous inequalities, |Cov(A − A, B)| |Cov(A, B)|
+ |Cov(A, B − B)| ≤ 4qμq−1+ p M 1− p , b b ≤ q 4μq−1+ p M 1− p + M q−1 η(r) ≤ q 4μb M q−b + M q−1 η(r) . b
b
13.3. LIMIT VARIANCE ESTIMATES
295
The previous expression has an order optimized with 4μb M 1−b = η(r) yielding b
c = |Cov(A, B)| ≤ q4 b−1 μ
(q−1)b b−1
b−q
η(r) b−1 .
As a first application of this relation, it seems useful to state the following moment inequality, they also entail laws of large numbers. Corollary 13.1 (D = 1). Let (Xt )t∈Z be a real valued and stationary η-weakly dependent times series. Assume that E|X0 |b < ∞ for some b > q, EX0 = 0, and ∞ b−q (r + 1)q−2 η(r) b−1 < ∞, r=0
then there exists a constant C > 0 only depending on q and on the previous series such that ⎞q ⎛ n E⎝ Xj ⎠ ≤ Cn[q/2] . j=1
Proof. We note, that
(r + 1)q−2 cX,q (r) < ∞ and we use theorem 4.2 together
r≥0
with proposition 13.1 to conclude.
13.3.2
Estimation of the limit variance
Now let N be some fixed integer and n = N m, ˜ we rewrite the previous limit ˜ ˜ i,m (13.3.1) as Δ ˜ → Δi (which have the same distribution) where we set i+m ˜ √ i i 1 D ˜ +1 −Z Δi,m Xj , Δi = N Z = Δ ∼ ND (0, Σ) . ˜ = √ m m m ˜ j=i+1 The heuristic in Peligrad and Shao (1995) [142]’s variance estimator consists to ˜ i,m substitute Δi by Δ ˜ in this last relation. For different values of i the random ˜ On another variables Δi are independent only for values with a difference ≥ m. hand if i = i(n), j = j(n) are such that limn→∞ |i(n) − j(n)|/m(n) ˜ = α then the couple (Δi(n) , Δj(n) ) is Gaussian with Cov(Δi(n) , Δj(n) ) = 1 − α, if 0 ≤ α ≤ 1. This implies ⎛ lim Var ⎝ #
n→∞
1 n − m(n) ˜
⎞
n−m(n) ˜
i=1
˜ i,m(n) F (Δ )⎠ ˜
= 0
1
Cov F Z(1 , F Z(1 + α) − Z(α) dα.
296
CHAPTER 13. ECONOMETRICS AND RESAMPLING
This integral may be explicitly computed with the help of an Hermite expansion but it is however somewhat complicates. We thus subsample this sums setting ˜ ˜m Δi,m ˜ = Δim, ˜ , then, N 1 F (Δi ) →N →∞ N i=1 N 1 √ (F (Δi ) − EF (Δi )) N i=1
D
→N →∞
EF (Δ)
(13.3.4)
ND (0, Var F (Δ)) .
(13.3.5)
Examples • F (x) = x x yields the estimation of the covariance matrix Σ. • Let F (x) = |x a| for a fixed vector a ∈ RD , then EF (Δ) = E|N (0, 1)|.
√ a Σa ·
• Setting more generally F (x1 , . . . , xD ) = (|xi + xj |2 )1≤i≤j≤D ∈ RD(D+1)/2 provides estimates for all the coefficients of the matrix Σ: for this, use the polarity argument 1 1 1 (i) (j) (i) (j) σi,j = . Var (Δ + Δ ) − Var 2Δ − Var 2Δ 2 4 4 In order to make the heuristic (13.3.5) work, we better consider sequences = (n), m = m(n), m ˜ = m + and N = N (n) converging to infinity and such that n ≥ N (m + ) − . Now we set 1 Δi,m = √ m
(i−1)(m+)+m
Xj ,
(13.3.6)
j=(i−1)(m+)+1
and EF (Δ) is estimated by FDn =
N (n) 1 F (Δi,m(n) ). N (n) i=1
(13.3.7)
Peligrad and Shao (1995) [142] consider instead that EF (Δ) is estimated by Fn =
n− m ˜ 1 ˜ i,m(n) F (Δ ). ˜ n−m ˜ i=1
We need to quote that in this case the sum runs on 1 ≤ i ≤ n − m ˜ which is a number of much larger order than N . In this case, the Gaussian limiting
13.3. LIMIT VARIANCE ESTIMATES
297
variables are not independent. This makes the limiting distribution a bit more complicated and we thus avoid this estimation by working analogously to Carlstein (1986) [34]. More precisely, this author proves that if i = i(n), j = j(n) ˜ ≥ 1 then vary in such a way that lim inf n→∞ |i(n) − j(n)|/m(n) ˜ j(n),m(n) ˜ i(n),m(n) →n→∞ 0 ,Δ Cov Δ ˜ ˜ This is exactly our situation and the variables Δi,m are here asymptotically independent as n ↑ ∞ for any choice of . The setting in Carlstein is that of a strong mixing sequence and other type of increment are also investigated. The forthcoming section is aimed to derive the asymptotic behaviour for this estimation.
13.3.3
Law of the large numbers
Lemma 13.2. Let now F : RD → R be a Lipschitz function. Assume that the sequence (Xn )n∈Z is stationary and η−weakly dependent then N 1 L2 F (Δi,m ) →N →∞ EF (Δ) , N i=1
(13.3.8)
∞ b−4 (a) 1 (r + 1)2 η(r) b−1 < ∞ for some b > 4. if μ = max1≤a≤D E|X0 |b b < ∞, r=0
Proof of lemma The proof 13.2. followsNfrom two steps. N 1 1 Step 1. Var F (Δi,m ) ≤ |Cov(F (Δ1,m ), F (Δi,m ))| →N →∞ 0. N i=1 N i=1 For this we first write Fi,m = F (Δi,m ) f (x1 , . . . , xm )
= f (X(i−1)(m+)+1 , . . . , X(i−1)(m+)+m ), x1 + · · · + xm √ = F , xi ∈ RD , m
with
√ thus Lip f ≤ Lip F/ m; this means that covariances will not be directly controlled with weak dependence. We thus have to control |Cov(F (Δ1,m ), F (Δi,m ))| ,
1≤i≤N
(13.3.9)
We first consider the variance term obtained with i = 1. If we replace F by F − F (0) the expression (13.3.9) is unchanged so that we assume that F (0) = 0 m
*√
thus |F1,m | ≤ Lip F
Xj
m and j=1 m D m D m
2 2 2
(k) (k) E
Xj ≤ E Xj ≤ D E Xj . j=1
k=1
j=1
k=1
j=1
298
CHAPTER 13. ECONOMETRICS AND RESAMPLING
2 b 1 (k) (k) With this relation and Cov(X0 , Xr ) ≤ 23+ b−1 μ b−1 η(r)1− b−1 obtained from proposition 13.1, we derive: E|F1,m |2
≤
2D(Lip F )2 2
D m−1 (k) Cov(X0 , Xr(k) ) k=1 r=0 m−1 2
≤
24+ b−1 μ b−1 D
<
∞,
b
1
η(r)1− b−1
r=0
from our assumption.
Now consider F (x) = F (x) ∨ (−M ) ∧ M for some M > 0 to be defined later, we set F i,m = F (Δi,m ), then F i,m = f (X(i−1)(m+)+1 , . . . , X(i−1)(m+)+m ), √ √ where f (x1 , . . . , xm ) = F (x1 + · · · + xm / m) ; thus Lip f ≤ Lip F/ m and f ∞ ≤ M and we derive, |Cov (F1,m , Fi,m )| ≤ Cov F 1,m , F i,m + Cov F1,m , Fi,m − F i,m + Cov F1,m − F 1,m , F i,m ≤ Cov F 1,m , F i,m + (F1,m 2 + F i,m 2 )Fi,m − F i,m 2 ≤ Cov F 1,m , F i,m + 2F1,m 2 Fi,m − F i,m 2 . Then
FM Cov F 1,m , F i,m ≤ 2Lip √ η((i − 2)( + m) + ). m
On the other hand we already obtained F1,m 2 ≤ CD for some constant C > 0 and the following lemma 13.3 with p = 2 and q = 4 implies with corollary 13.1, (k) Δi,m 44 ≤ cμ4 for some constant if ∞
b−4
(r + 1)2 η(r) b−1 < ∞, and Fi,m − F i,m 2 ≤ CD2 M −1 .
r=0 (1)
(D)
Lemma 13.3. Let q > p ≥ 1 and 1 ≤ i ≤ N then if Δi,m = (Δi,m , . . . , Δi,m ), (k)
Fi,m − F i,m p ≤ 2Dq/p Lip q/p F · M 1−q/p max Δi,m q/p q . 1≤k≤D
If now q is an even integer, then if
∞
b−q
(r + 1)q−2 η(r) b−1 < ∞ the last moment
r=0
is bounded uniformly wrt i and m. Useful cases are given for q = 2 and q = 4.
13.3. LIMIT VARIANCE ESTIMATES
299
Proof of lemma 13.3. We first quote that Fi,m − F i,m p ≤ 2Fi,m 1|Fi,m |≥M p , then E|Fi,m |p 1|Fi,m |≥M
Fi,m − F i,m p
≤
M p−q E|Fi,m |q
≤
Lip q F · M p−q E|Δi,m |q
≤
Dq Lip q F · M p−q max Δi,m qq
≤
2Dq/p Lip q/p F · M 1−q/p max Δi,m q/p q .
(k)
1≤k≤D
(k)
1≤k≤D
(k)
Now corollary 13.1 implies that max1≤k≤D Δi,m qq is uniformly bounded. End of the proof of lemma 13.2. We thus have proved that there is some constant C > 0 such that N D2 1 M 3 −1 Var F (Δi,m ) ≤ C D M + √ η() + , N i=1 m N D2
3/2 −1/4 √ η() + ≤ C D m . N The last inequality holds for some C > 0 with the choice M = D3/2 m1/4 η()−1/2 . Step 2. We now need to prove that EF (Δ1,m ) →m→∞ EF (Δ). We first note that the condition η(r) = O(r−α ) for α > 2 + 1/(b − 2) from Bardet, Le´ on and Doukhan (2005) [11] for CLT to hold is implied by the assumptions in our result. Consider again the truncated function F but for some M which may vary with m, |EF (Δ1,m ) − EF (Δ)| ≤ A + B + C, where A = E|F (Δ1,m ) − F (Δ1,m )|, B = E|F (Δ1,m ) − F (Δ)|, C = E|F (Δ) − F (Δ)|. Now as M → ∞, A ≤ F (Δ1,m ) − F (Δ1,m )2 = O(D2 /M ) converges to 0 (uniformly wrt m), and C →M→∞ 0, because E|F (Δ)| < ∞. Now, from the central limit theorem, B → 0 as m → ∞. Thus we may choose a sequence M = M (m) ↑ ∞ such that EF (Δ1,m ) → EF (Δ).
13.3.4
Central limit theorem
√1 (F (Δi,m ) − EF (Δi,m )). Theorem 13.1. Set Zn = N i=1 xi,n with xi,n = N Assume that the centered and stationary sequence (Xt ) satisfies E|X0 |b < ∞ for some b > 4 and this is η−weakly dependent with η(r) = O (r−α ) for some α > 2 + 1/(b − 2). Then, D
Zn →n→∞ N (0, Var F (Δ)), where N = N (n) is the largest number such that n ≥ N (m + ) − , and where m = m(n) = [mγ ], = (n) = [mδ ] satisfy α > 12 /δ + 43 (1 − γ)/δ.
300
CHAPTER 13. ECONOMETRICS AND RESAMPLING
Remark. For α > 2 + 1/(b − 2) one may choose such rates with γ > δ > 0. Proof. Consider Z ∼ N (0, Var F (Δ)). For f ∈ C 3 (R, R), we want to control |Ef (Zn ) − f (Z)| ≤ |Ef (Z (n) ) − f (Z)| +
N
|Ui |,
i=1
i−1 with Ui = E (fi (wi,n + xi,n ) − fi (wi,n + yi,n )), wi,n = j=1 xi,n and fi (t) =
n f t + j=i+1 yi,n , where, as usual, empty sums are set equal to 0 and where Z (n) =
N
yi,n , yi,n ∼ N (0, Var xi,n ), and the Gaussian random variables yi,n
i=1
are set as jointly independent and independent of the process (Xt )t∈Z . (j)
Step 1. Lindeberg technique. Notice that fi ∞ ≤ f (j) ∞ for j = 0, 1, 2, 3 and for each i ≤ N . From a Taylor expansion and from the bound E|yi,n |3 ≤ 3/2 ≤ E|N (0, 1)|3 E|xi,n |3 , there exists a constant c > 0 such E|N (0, 1)|3 Ex2i,n that N
N c 1 |Ui | ≤ √ E|Δ1,m |3 + |Cov(fi (wi,n , xi,n )| + Cov(fi
(wi,n , x2i,n ) . 2 N i=1 i=1
√ As mentioned in the previous section the first term is bounded as O(1/ N ) if the moment of order 4 of a sum is suitably controlled. In order to make use of the weak dependence condition, we have to truncate the random variables level M > 0 precisely xi,n . Set F (t) = F (t) ∨ (−M ) ∧ M for a truncation set later, then xi,n = √1N F (Δi,m ) − EF (Δi,m ) writes as a function xi,n = g(X(i−1)(m+)+1 , . . . , X(i−1)(m+)+n ) where t1 + · · · + tm 1 √ F g(t1 , . . . , tm ) = √ − EF (Δi,m ) , m N √ √ thus g∞ ≤ 2M/ N and Lip g ≤ Lip F/ N m. By another hand, we may (j) write fi (wi,n ) = G (Xs )s≤(i−1)(m+)− where G : (RD )(i−1)(m+) → R satis√ fies G∞ ≤ f (j) ∞ , Lip G ≤ f (j+1) ∞ Lip F/ mN and is a function of less than n variables in RD . |Cov(fi (wi,n ), xi,n )| |Cov(fi (wi,n ), xi,n )| Fi,m − F i,m 1
2f ∞ F (Δ1,m ) − F (Δ1,m )1 ≤ |Cov(fi (wi,n ), xi,n )| + √ N √ √ ≤ c mM η(i−1)(m+)+ ≤ c mM η , with η dependence, ≤ cD2 M −3 ,
from lemma 13.3.
13.3. LIMIT VARIANCE ESTIMATES
301
With M = η()−1/4 m−1/8 N −1/8 D1/4 we thus arrive with some constant c > 0 to |Cov(fi (wi,n ), xi,n )| ≤ cD1/4 m3/8 N 3/8 η()3/4 ≤ cD1/4 n3/8 η()3/4 . Analogously Cov(f
(wi,n ), x2 ) ≤ Cov(f
(wi,n ), x2 ) + 2f
∞ E|x2 − x2 | i i,n i i,n i,n i,n 2 1 M M Cov(fi
(wi,n ), x2i,n ) ≤ c n √ √ + m√ η(), mN N mN N √ M2 n η(), with η dependence. ≤ c N 2 E|x2i,n − x2i,n | ≤ F (Δi,m )2 F (Δi,m ) − F (Δi,m )2 N D3 , from lemma 13.3. ≤ c MN Here we choose M = η()−1/3 n−1/6 D to obtain, for some constant c > 0, Cov(fi
(wi,n ), x2i,n )
≤
c
D2 n1/6 η()1/3 . N
We thus have proved that N 1 |Ui | ≤ c N D1/4 n3/8 η()3/4 + D2 n1/6 η()1/3 + √ . N i=1 Now consider m(n) = [nγ ] and (n) = [nδ ]. Then N (n) ∼ cnβ where β = 1 − γ. In order that the previous expression converges to 0, we only need β > 0,
α>
1 , 2δ
α>
1 4(1 − γ) + , 2δ 3δ
δ < γ.
To conclude, this is enough to choose δ < γ, such that those relations hold. If δ and γ are both close enough to 1, α > 2 + 1/(b − 2) implies those inequalities. Step 2. Gaussian approximation. To conclude we still need to bound the expression |E(f (Z (n) ) − f ((Z))|. For this set Z = σN and Z (n) = σm N for some 2 standard Gaussian random variable N and σm = Var F (Δ1,m ), σ 2 = Var F (Δ). Then |E(f (Z (n) ) − f ((Z))| ≤ f
∞ |σ − σm |2 ≤
f
∞ 2 2 |σ − σm |. σ
We need to prove that EF 2 (Δ1,m ) →m→∞ EF 2 (Δ) and we use step 2 in the proof of lemma 13.2. We first note that the condition η(r) = O(r−α ) with α > 2 + 1/(b − 2) from Bardet, Le´ on and Doukhan (2005) [11] implies the CLT. We consider the truncated function F for some M which may vary with m, |EF 2 (Δ1,m ) − EF 2 (Δ)| ≤
A + B + C,
302
CHAPTER 13. ECONOMETRICS AND RESAMPLING 2
2
2
where we set A = E|F 2 (Δ1,m )−F (Δ1,m )|, B = E|F (Δ1,m )−F (Δ)|, and C = 2 E|F (Δ)−F 2 (Δ)|. Now A ≤ 2F 2 (Δ1,m )2 F (Δ1,m )−F (Δ1,m )2 = O(D2 /M ) (as M → ∞) converges to 0 (uniformly wrt m), and C →M→∞ 0 because E|F 2 (Δ)| < ∞. From the central limit theorem, B → 0 as m → ∞. Thus we may choose a sequence M = M (m) ↑ ∞ such that EF 2 (Δ1,m ) → EF 2 (Δ).
13.3.5
A non centered variant
An alternative more attractive result involves Tn =
N
ti,n ,
where
i=1
1 ti,n = √ (F (Δi,m ) − EF (Δ)) . N
A central limit theorem for this non centered quantity looks much more convenient to consider the estimation of the parameter EF (Δ). Such results are not considered by Peligrad & Shao (1995) [142]. In order to derive them one still uses the Lindeberg technique with blocks. Hence, for some bounded and C 3 function f we bound again: |E(f (Tn ) − f (Z))| ≤ ≤
|E(f (Tn ) − f (Zn ))| + |E(f (Zn ) − f (Z))| √ N f ∞ |EF (Δ1,m ) − EF (Δ)| + |E(f (Zn ) − f (Z))|
with Z ∼ N (0, Var F (Δ)). This means that the previous convergence relies on the decay rate of the expression 1 |EF (Δ1,m ) − 2EF (Δ)|; as stressed in the if F (x) = (x a) and the previous quantity forthcoming remark, this is O m tends to zero under the assumption: limn→∞ N/m2 = 0. Examples of functions F
• Case D = 1. In this case for F such that |F (x)|/(1 + x2 ) dx < ∞, Petrov (1996) lemma 5.4 page 152 [144] implies |F (x)| dx |EF (Δ1,m ) − EF (Δ)| ≤ c(vn + δ) 1 + x2 with vm = supx∈R |P(Δ1,m ≤ x) − P(Δ ≤ x)| = O m−λ for some λ < 18 given in Bardet, Doukhan and Le´on (2005) [11] and δ = EΔ21,m − EΔ2 = 1 O m . This implies that the cases F (x) = |x|β for some β ≤ 2 is also obtained if, now, limn→∞ N/m2λ = 0; Peligrad and Shao (1995) [142] consider the special case β = 1. Here λ → 18 as a, b → ∞ if η(r) = O(r−a ) and E|X0 |b < ∞.
• Case D = 1. If now F (x) = xp then theorem 4.6 provides a bound O(1/m) for each integer p which resembles the Rosenthal inequality in the independent case (see Hall & Heyde, 1980 [100]).
13.3. LIMIT VARIANCE ESTIMATES
303
• Case D > 1. The special case F (x) = (x a)2 , for some a ∈ RD is the simplest multi-dimensional one. In this case, indeed setting ξt = Xt a, we still assume Eξt = 0 and one may write |s| EF (Δ1,m ) = 1− Eξ0 ξs , m
∞
EF (Δ) =
|s|<m
Eξ0 ξs ,
s=−∞
1 |s|Eξ0 ξs + Eξ0 ξs m |s|<m |s|≥m ∞ 1 |s| = |s|Eξ0 ξs + 1− Eξ0 ξs , m s=−∞ m
so that EF (Δ) − EF (Δ1,m ) =
|s|≥m
thus ∞ 1 |s|Eξ0 ξs ≤ |Eξ0 ξs | . EF (Δ) − EF (Δ1,m ) − m s=−∞
(13.3.10)
|s|≥m
Using proposition 13.1 now yields if a = (a(1) , . . . , a(D) ), 3b−1
1
|Eξ0 ξs | ≤ Da2 cX,2 (s) ≤ 2 b−1 Da2 μ b−1 η(s)1− b−1 . 1 This previous bound (13.3.10) is thus O Da2 η(s)1− b−1 . For fixed D this bound has order o(m−1 ) if
b
∞
s≥m 1
sη(s)1− b−1 < ∞, and thus
s=1
EF (Δ) = EF (Δ1,m ) +
∞ 1 1 |s|Eξ0 ξs + o . m s=−∞ m
Remark. The previous results given under η weak dependence will be extended in forthcoming papers under κ and λ-weak dependence.
Bibliography [1] Aldous, D. J., Eagleson, G. K. (1978) On mixing and stability of limit theorems. Ann. Probab. 6, 325-331. [2] Andrews, D. W. K. (1984) Non strong mixing autoregressive processes. J. Appl. Probab. 21, 930-934. [3] Andrews, D. W. K. (1988) Laws of large numbers for independent non identically distributed random variables. Econometric Theory 4, 458-467. [4] Andrews, D. W. K. (2002) Higher-order Improvements of a Computationally Attractive k-step Bootstrap for Extremum Estimators. Econometrica 70, 119-162. [5] Andrews, W. K., Pollard, D. (1994) An introduction to functional central limit theorems for dependent stochastic processes. Int. Statist. Rev. 62, 119-132. [6] Ango Nze, P., B¨ uhlman, P., Doukhan, P. (2002) Weak Dependence beyond Mixing and Asymptotics for Nonparametric Regression. Annals of Statistics, 30-2, 397-430. [7] Ango Nze, P., Doukhan, P. (2004) Weak dependence, models and applications to econometrics. Econometric Theory 20-6, 995-1045. [8] Bakhtin, Yu., Bulinski, A. V. (1997) Moment inequalities for sums of dependent multiindexed random variables. Fundamentalnaya i prikladnaya matematika 3-4, 1101-1108. [9] Barbour, A. D., Gerrard, R. M., Reinert, G. (2000) Iterates of expanding maps. Probab. Theory Rel. Fields 116 151-180. [10] Bardet, J. M., Doukhan, P., Lang, G., Ragache, N. (2006) A Lindeberg central limit theorem for dependent processes and its statistical applications. Submitted. [11] Bardet, J. M., Doukhan, P., Le´ on, J. R. (2006) A uniform central limit theorem for the periodogram and its applications to Whittle parametric estimation for weakly dependent time series. Submitted. [12] Barron, A. R., Birg´e, L., Massart, P. (1999) Risk bounds for model selection by penalization. Probab. Theory and Rel. Fields 113, 301-413. [13] Bernstein, S. (1939) Quelques remarques sur le th´eor`eme limite Liapounoff. C. R. (Dokl.) Acad. Sci. USSR, Ser. 24, 1939, 3-8. [14] Bena¨ım, M. (1996) A dynamical system approach to stochastic approximation. SIAM J. control and optimization 34-2, 437-472.
305
306
BIBLIOGRAPHY
[15] Benveniste, A., M´etivier, M. Priouret, P. (1987) Algorithmes adaptatifs et approximations stochastiques. Masson. [16] Berbee, H. C. P. (1979) Random walks with stationary increments and renewal theory, Math. centre tracts, Amsterdam. [17] Bickel, P., B¨ uhlmann, P. (1995) Mixing property and functional central limit theorems for a sieve bootstrap in times series. Techn. Rep. 440, Dept. Stat., U. C. Berkeley. [18] Bickel, P., B¨ uhlmann, P. (1999) A new mixing notion and functional central limit theorems for a sieve bootstrap in time series Bernoulli 5-3, 413-446. [19] Bickel, P. J., Wichura, M. J. (1971) Convergence criteria for multiparameter stochastic processes and some applications. Ann. Math. Stat. 42, 1656-1670. [20] Billingsley, P. (1968) Convergence of Probability Measures. Wiley, New-York. [21] Billingsley, P. (1995) Probability and measure. Third edition. Wiley Series in Probability and Mathematical Statistics. A Wiley-Interscience Publication. John Wiley & Sons, Inc., New York. [22] Birkel, T. (1988) Moment bounds for associated sequences. Ann. Probab. 16, 1184-1193. [23] Bollerslev, T. (1986) Generalized autoregressive conditional heteroskedasticity J. of Econometrics, 31, 307-327. [24] Bolthausen, E. (1982) On the central limit theorem for stationary random fields. Ann. Probab. 10, 1047-1050. [25] Borovkova, S., Burton, R., Dehling, H. (2001). Limit theorems for functionals of mixing processes with application to U -statistics and dimension estimation. Trans. Amer. Math. Soc. 353, 4261-4318. [26] Bosq, D. (1993) Bernstein’s type large deviation inequalities for partial sums of strong mixing process. Statistics 24, 59-70. [27] Bougerol, P. (1993) Kalman filtering with random coefficients and contractions. SIAM J. Control Optim. 31, 942-959. [28] Brockwell, P., Davis, R. (1987) Time Series: Theory and Methods. SpringerVerlag. [29] Bradley, R. C. (1983) Approximations theorems for strongly mixing random variables. Michigan Math. J. 30, 69-81. [30] Bradley, R. C. (2002) Introduction to strong mixing conditions, Volume 1. Technical Report, Department of Mathematics, I.U. Bloomington. [31] Broise, A. (1996) Transformations dilatantes de l’intervalle et th´eor`emes limites. ´ Etudes spectrales d’op´erateurs de transfert et applications, Ast´erisque 238, 1-109. [32] Brandi`ere, O., Doukhan, P. (2004) Dependent noise for stochastic algorithms. Probab. Math. Statist. 2-24, 381-399. [33] Bulinski, A. V., Sashkin, A. P. (2005) Strong invariance principle for dependent multi-indexed random variables. Doklady. Mathematics 72-11, 503-506, MAIK Nauka/Interperiodica.
BIBLIOGRAPHY
307
[34] Carlstein, E. (1986) The Use of Subseries Values for Estimating the Variance of a General Statistic from a Stationary Sequence. Annals of Statistics 14-3, 1171-1179. [35] Castaing, C., Raynaud de Fitte, P., Valadier, M. (2004). Young Measures on Topological Spaces. Mathematics and its applications, Kluwer. [36] Chen, H-F. (1985) Stochastic Approximation and Its Apllications. Non convex Optimization and Its Application 64, Kluwer Academic Plublishers. [37] Chow, Y. S. (1966) Some convergence theorems for independent random variables. Ann. Math. Stat. 37, 1482-1493 and 36-4, 1293-1314. [38] Collomb, G. (1984) Propri´ et´es de convergence presque compl`ete du predicteur de noyau. Z. Wahrsch. Verw. Gebiete 66, 441-446. [39] Comtet, L. (1970) Analyse Combinatoire. Tome 1. Coll. S.U.P. Presses Universitaires de France. [40] Coulon-Prieur, C., Doukhan, P. (2000) A triangular CLT for weakly dependent sequences. Statist. Probab. Lett. 47, 61-68. [41] Daubechies, I. (1988) Orthonormal bases of compactly supported wavelets. Comm. Pure & Appl. Math. 41, 909-996. [42] Dedecker, J. (2001) Exponential inequalities and functional central limit theorems for random fields. ESAIM Probab. & Stat. http://www.emath.fr/ps/ 5, 77-104. [43] Dedecker, J., Doukhan, P. (2003) A new covariance inequality and applications. Stoch. Proc. Appl. 106-1, 63-80. [44] Dedecker, J., Merlev`ede, F. (2002) Necessary and sufficient conditions for the conditional central limit theorem. Ann. Probab. 30, 1044-1081. [45] Dedecker, J., Prieur, C. (2004a) Coupling for τ -dependent sequences and applications. Journal of Theor. Probab. 17-4, 861-885. [46] Dedecker, J., Prieur, C. (2004b) Couplage pour la distance minimale. C. R. Acad. Sci. Paris S´erie 1 338-10, 805-808. [47] Dedecker, J., Prieur, C., Raynaud De Fitte, P. (2006) Parametrized KantorovichRubinstein theorem and application to the coupling of random variables. [48] Dedecker, J., Prieur, C. (2005) New dependence coefficients. Examples and applications to statistics. Prob. Theor. and Rel. Fields 132, 203-236. [49] Dedecker, J., Prieur, C.(2006) An empirical central limit theorem for dependent sequences, in Dependence in probapity and statistics, LNS 187, 105-121, Stoch. Proc. Appl., 117, 121-142. [50] Dedecker, J., Rio, E. (2000) On the functional Central Limit Theorem for Stationary processes. Ann. I. H. P. ser. B 36-1, 1-34. [51] Deheuvels, P. (1979) La fonction de d´ependance empirique et ses propri´et´es. Acad. Roy. Belg., Bull. C1 Sci. 5`eme s´er. 65, 274-292. [52] Deheuvels, P. (1981a) A Kolmogorov-Smirnov Type Test for independence and multivariate samples. Rev. Roum. Math. Pures et Appl. 26-2, 213-226.
308
BIBLIOGRAPHY
[53] Deheuvels, P. (1981b) A nonparametric test for independence. Pub. Inst. Stat. Univ. Paris. 26-2, 29-50. [54] Dehling, H., Philipp, W. (1982) Almost sure invariance principles for weakly dependent vector-valued random variables. Ann. Probab. 10, 689-701. [55] Dehling, H., Taqqu, M. (1989) The empirical process of some long-range dependent sequences with applications to U -statistics. Ann. Statist. 17, 1767-1783. [56] De Jong, R. M. (1996) A strong law of large numbers for triangular mixingale arrays. Statist. and Probab. Lett. 27, 1-9. [57] Delyon, B. (1990) Limit theorem for mixing processes. Tech. Report IRISA, Rennes 1, 546. [58] Delyon, B. (1996) General convergence result on stochastic approximation. IEEE Trans. Automatic Control 41, 9. [59] De Vore, R., Lorentz, G. (1993) Constructive Approximation. Springer-Verlag, Berlin. [60] Diaconis, P., Freedman, D. (1999) Iterated random functions. SIAM Rev. 41-1, 45-76 . [61] Doukhan, P. (1994) Mixing: Properties and Examples. Lecture Notes in Statistics 85. Springer Verlag. [62] Doukhan, P. (2002) Models inequalities and limit theorems for stationary sequences, in Theory and applications of long range dependence (Doukhan et alii ed.) Birkh¨ auser, 43-101. [63] Doukhan, P., Lang, G. (2002) Rates of convergence in the weak invariance principle for the empirical repartition process of weakly dependent sequences. Stat. Inf. for Stoch. Proc. 5, 199-228. [64] Doukhan, P., Lang, G., Louhichi, S., Ycart, B. (2005) A functional central limit theorem for interacting particle systems on transitive graphs. Available from http://arxiv.org/abs/math-ph/0509041. [65] Doukhan, P., Latour, A., Oraichi, D. (2006) Simple integer-valued bilinear time series model. Adv. Appl. Probab. 38-2, 559-578. [66] Doukhan, P., Le´ on, J. R. (1989) Cumulants for mixing sequences and applications to empirical spectral density. Probab. Math. Stat. 10.1, 11-26. [67] Doukhan, P., Louhichi, S. (1999) A new weak dependence condition and applications to moment inequalities. Stoch. Proc. Appl. 84, 313-342. [68] Doukhan, P., Louhichi, S. (2001) Functional estimation for weakly dependent stationary time series. Scand. J. Statist. 28, 325-342. [69] Doukhan, P., Madre, H., Rosenbaum, M. (2006) Weak dependence beyond mixing for infinite ARCH-type bilinear models. Statistics. [70] Doukhan, P., Massart, P., Rio, E. (1994) The functional central limit theorem for strongly mixing processes. Ann. I. H. P. ser. B 30-1, 63-82. [71] Doukhan, P., Neumann, M. (2006) A Bernstein type inequality for times series. Submitted to Stoch. Proc. Appl.
BIBLIOGRAPHY
309
[72] Doukhan, P., Oppenheim, G., Taqqu, M., editors (2003) Theory and Applications of Long-range Dependence. Birkha¨ user, Boston (2003). [73] Doukhan, P., Portal, F. (1983) Moments de variables al´eatoires m´elangeantes. C. R. Acad. Sci. Paris S´erie 1 297, 129-132. [74] Doukhan, P., Sifre, J.C. (2001) Cours d’analyse - Analyse r´eelle et int´egration. Dunod. [75] Doukhan, P., Teyssi`ere, G., Winant, P. (2006) Vector valued ARCH infinity processes, in Dependence in Probability and Statistics. Patrice Bertail, Paul Doukhan and Philippe Soulier Editors, Springer, New York. [76] Doukhan, P., Truquet, L. (2006) Weakly dependent random fields with infinite interactions. Preprint in http://samos.univ-paris1.fr/ppub2006.html. [77] Doukhan, P., Wintenberger, O. (2005) An invariance principle for weakly dependent stationary general models. Preprint in http://samos.univ-paris1.fr/ppub2005.html. [78] Doukhan, P., Wintenberger, O. (2006) Weakly dependent chains with infinite memory Preprint in http://samos.univ-paris1.fr/ppub2005.html. [79] Dudley, R. M. (1968). Distances of probability measures and random variables. Ann. Math. Statist. 39, 1563-1572. [80] Dudley, R. M. (1989) Real analysis and probability. Wadworsth Inc., Belmont, California. [81] Duflo, M. (1990) M´ethodes r´ecursives al´eatoires. Masson, Paris (English Edition, Springer, 1996). [82] Duflo, M. (1996) Algorithmes Stochastiques, Collection Math. et Appl. 23, Springer. [83] Eberlein, E., Taqqu, M., editors (1986) Dependence in Probability and Statistics. Birkha¨ user, Boston. [84] Engle, R. F. (1982) Autoregressive conditional heteroskedasticity with estimate of the variance in the U.K. inflation Econometrica, 15, 286-301 [85] Esary, J., Proschan, F., Walkup, D. (1967) Association of random variables with applications. Ann. Math. Statist. 38, 1466-1476. [86] Fermanian, J-D., Radulovic, D., Wegkamp, M. (2004) Weak convergence of empirical copula processes. Bernoulli 10, 847-860. [87] Fortuyn, C., Kastelyn, P., Ginibre, J. (1971) Correlation inequalities in some partially ordered sets. Comm. Math. Phys. 74, 119-12. [88] Fr´echet, M. (1957) Sur la distance de deux lois de probabilit´ e. C. R. Acad. Sci. Paris. 244-6, 689-692. [89] Fuk, D. Kh., Nagaev, S. V. (1971) Probability inequalities for sums of independent random variables. Theory Probab. Appl. 16, 643-660. [90] Garsia, A. M. (1965) A simple proof of E. Hopf ’s maximal ergodic theorem. J. of Maths and Mech. 14, 381-382.
310
BIBLIOGRAPHY
[91] Ghosal, S., Chandra, T. K. (1998) Complete convergence of martingale arrays. J. Theor. Probab. 11-3, 621-631. [92] Giraitis, L., Kokoszka, P., Leipus, R. (2000) Stationary ARCH−models: dependence structure and central limit theorem. Econometric Theory, 16, 3-22. [93] Giraitis L., Leipus, R., Surgailis, D. (2006) Recent advances in ARCH modelling in Long memory in Economics, Ed. Gilles Teyssi`ere and Alan Kirman, Springer Verlag. [94] Giraitis L., Robinson P. M. (2001) Whittle estimation of ARCH models Econom. Theory 17-3, 608-631. [95] Giraitis, L., Surgailis, D. (2002) ARCH-type bilinear models with double long memory. Stoch. Proc. and their Applic. 100, 275-300. [96] Goldstein, S. (1979) Maximal coupling. Z. Wahrsch. Verw. Gebiete 46, 193-204. [97] Gordin, M. I. (1969) The central limit theorem for stationary processes. Dokl. Akad. Nauk SSSR 188, 739-741. [98] Gordin, M. I. (1973) Abstracts of Communication, T.1: A-K. International Conference on Probability Theory, Vilnius. [99] G¨ otze, F., Hipp, C. (1978) Asymptotic expansions in the central limit theorem under moment conditions. Z. Wahr. und Verw. Gebiete 42-1, 67-87. [100] Hall, P., Heyde, C. C. (1980) Martingale limit theory and its applications. Academic Press. [101] Hall, P., Horowitz, J. (1996) Bootstrap critical values for tests based on generalized method of moment estimators. Econometrica 64-4, 891-916. [102] Hannan, E. J. (1973)The estimation of frequency. J. appl. Probab. 10, 510-519. [103] Heyde, C. C. (1974) On the central limit theorem for stationary processes. Z. Wahrsch. Verw. Gebiete 30, 315-320. [104] Heyde, C. C., Scott, D. J. (1973) Invariance principles for the law of the iterated logarithm for martingales and processes with stationary increments. Ann. Probab. 1, 428-436. [105] Heyde, C. C. (1975) On the central limit theorem and iterated logarithm law for stationary processes. Bull. Austral. Math. Soc. 12, 1-8. [106] Hofbauer, F., Keller, G. (1982) Ergodic properties of invariant measures for piecewise monotonic transformations. Math. Z. 180, 119-140. [107] Hoffmann-Jorgensen, J. (1974) Sums of independent Banach space valued random variables. Studia Math. 52, 159-186. [108] Horv´ ath, L., Shao, Q.M. (1999)Limit theorems for quadratic forms with applications to Whittle’s estimate. Ann. Appl. Probab. 9, No.1, 146-187. [109] Hsu, P. L., Robbins, H. (1947) Complete convergence and the strong law of large numbers. Proc. Nat. Acad. Sci. 33, 25-31. [110] Ibragimov, I. A. (1962) Some limit theorems for stationary processes. Theory Probab. Appl. 7, 349-382.
BIBLIOGRAPHY
311
[111] Kallenberg, O. (1997) Foundations of Modern Probability. Springer-Verlag, New York. [112] Kolmogorov, A. N., Rozanov, Y. A. (1960) On the strong mixing conditions for stationary Gaussian sequences. Th. Probab. Appl. 5, 204-207. [113] K¨ unsch, H. R. (1989) The jacknife and bootstrap for general stationary observations. Ann. Statist. 17, 1217-1241. [114] Kushner, H. J., Clark, D. S. (1978) Stochastic approximation for constrained and uncontrained systems. Applied math. science series 26, Springer. [115] Lasota, A., Yorke, J. A. (1974) On the existence of invariant measures for piecewise monotonic transformations. Trans. Amer. Math. Soc. 186, 481-488. [116] Latour, A. (1997) The multivariate GINAR(p) process. Adv. Appl. Prob. 29, 228247. [117] Latour, A. (1998) Existence and spectral density of the GINAR(p) process. J. Times Ser. Anal. 19, 439455. [118] Lee, M. L. T., Rachev, S. T., Samorodnitsky, G. (1990) Association of stable random variables. Ann. Probab. 18-4, 1759-1764. [119] Leonov, V. P., Shiraev, A. N. (1959) On a method of calculation of semiinvariants. Th. Probab. Appl. 5, 460-464. [120] Li, D., Rao, M. B., Jiang, T., Wang, X. (1995) Complete convergence and almost sure convergence of weighted sums of random variables. J. Theor. Prob. 8, 49-76. [121] Liebscher, E. (1996) Strong convergence of sums of α- mixing random variables with applications to density estimation. Stoch. Proc. Appl. 65, 69-98. [122] Liggett, T. M. (1985) Interacting particle systems. Springer-Verlag, New York. [123] Liverani, C., Saussol B., Vaienti S. (1999) A probabilistic approach to intermittency. Ergodic theory and dynamical systems, 19, 671-685. [124] Louhichi, S. (2000) Weak convergence for empirical processes of associated sequences. Ann. I. H. P. ser. B 36-5, 547-567. [125] Louhichi, S. (2003) Moment inequalities for sums of certain dependent random variables. Theor. Prob. Appl. 47-4. [126] Major, P. (1978) On the invariance principle for sums of identically distributed random variables. J. Multivariate Anal. 8, 487-517. [127] Mc Leish, D. L. (1975) A generalization of martingales and mixing sequences. Adv. in Appl. Probab. 7-2, 247-258. [128] Mc Leish, D. L. (1975a) A maximal inequality and dependent strong laws, Ann. Probab. 3, 829-339. [129] Mc Leish, D. L. (1975b) Invariance principles and mixing random variables. Z. Wahrsch. Verw. Gebiete 32, 165-178. [130] Merlev`ede, F., Peligrad, M. (2002) On the coupling of dependent random variables and applications. in Empirical process techniques for dependent data. 171-193. Birkh¨ auser.
312
BIBLIOGRAPHY
[131] Mikosch, T., Straumann, D. (2002) Whittle estimation in a heavy-tailed GARCH(1,1) model. Stoch. Proc. Appl. 100, 187-222. [132] Mokkadem, A. (1990) Propri´ et´es de m´elange des mod`eles autor´egressifs polynomiaux. Ann. I. H. P. ser. B 26-2, 219-260. [133] M´ oricz, F. A., Serfling, J. A., Stout, W. F. (1982) Moment and probability bounds with quasi-superadditive structure for the maximum partial sum. Ann. Probab. 10, 1032-1040. [134] Nelsen, R. B. (1999) An introduction to copulas. Lecture Notes in Statistics 139, Springer. [135] Newman, C. M. (1980) Normal fluctuations and the F KG inequalities. Comm. Math. Phys. 74-2, 119-128. [136] Newman, C. M. (1984) Asymptotic independence and limit theorems for positively and negatively dependent random variables. In Y. L. Tong, Editor, Inequalities in Statistics and Probability, IMS Lecture Notes-Monograph Series 5, 127-140. [137] Oodaira, H., Yoshihara, K. I. (1971) Note on the law of the iterated logarithm for stationary processes satisfying mixing conditions. Kodai Math. Sem. Rep. 23, 335-342. [138] Ornstein, D. S. (1973) An example of a Kolmogorov automorphism wich is not a Bernoulli Shift. Adv. in Math. 10, 49-62. [139] Peligrad, M. (1983) A note on two measures of dependence and mixing sequences. Adv. in Appl. Probab. 15-2, 461-464. [140] Peligrad, M. (2002) Some remarks on coupling of dependent random variables. Stat. and Prob. Lett. 60, 201-209. [141] Peligrad, M., Shao, Q. M. (1994) Self-normalizing central limit theorem for sums of weakly dependent random variables. J. Theoret. Probab. 7, 309-338. [142] Peligrad, M., Shao, Q. M. (1995) Estimation of the variance for partial sums for ρ-mixing random variables. J. of Multiv. Analysis 52, 140-157. [143] Peligrad, M., Utev, S. (1997) Central limit theorem for linear processes. Ann. Probab. 25-1, 443-456. [144] Petrov, V. V. (1996) Limit theorems of probability theory: sequences of independent random variables. Clarendon Press, Oxford. [145] Phillips, P. C. B. (1987) Time Series Regression with a Unit Root. Econometrica 55-2, 277-301. [146] Pitt, L. (1982) Positively Correlated Normal Variables are Associated. Ann. Probab. 10, 496-499. [147] Pham, T. D., Tran, L. T. (1985) Some mixing properties of time series models. Stoch. Process. Appl. 19, 297-303. [148] Pham, T. D. (1986) Bilinear Markovian representation and bilinear models. Stoch. Proc. Appl. 20, 295-306. [149] Phillips, P.C.B. (1987) Time series regression with a unit root. Econometrica 55, 277-301.
BIBLIOGRAPHY
313
[150] Philipp, W. (1986) Invariance principles for independent and weakly dependent random variables in Dependence in probability and statistics (Oberwolfach, 1985), 225-268. Prog. Probab. Statist. 11, Birkh¨ auser, Boston. [151] Pollard, D. (1984) Convergence of stochastic processes. Springer-Verlag, Berlin. [152] P¨ otscher, B. M., Prucha, I. R. (1991) Basic structure of the asymptotic theory in dynamic nonlinear econometric models, part 1: consistency and approximation concepts, part 2: asymptotic normality. Econometric Reviews, 10, 125-216 and 253-325. [153] Prakasa, R. (1983) Nonparametric functional estimation. New York, Academic Press. [154] Prieur, C. (2001) Density Estimation For One-Dimensional Dynamical Systems. ESAIM, Probab. & Statist. 5 www.edpsciences.com/ps/, 51-76. [155] Prieur, C. (2002) An Empirical Functional Central Limit Theorem for Weakly Dependent Sequences. Probab. Math. Statist. 22-2, 259-287. [156] R´enyi, A. (1963) On stable sequences on events. Shankhya Ser. A 25, 293-302. [157] Rio, E. (1993) Covariance inequalities for strongly mixing processes. Ann. I. H. P. ser. B 29-4, 587-597. [158] Rio, E. (1995a) About the Lindeberg method for strongly mixing sequences. ESAIM-PS 1, 35-61. [159] Rio, E. (1995b) The functional law of iterated logarithm for strongly mixing processes. Ann. Probab. 23, 1188-1203. [160] Rio, E. (1996) Sur le th´eor`eme de Berry-Esseen pour les suites faiblement d´ependantes. Probab. Theory Relat. Fields 104, 255-282. [161] Rio, E. (2000) Th´eorie asymptotique pour des processus al´eatoire faiblement d´ependants. SMAI, Math´ematiques et Applications 31, Springer. [162] Rios, R. (1996) Utilisation des techniques non param´ etriques et semi param´etriques en statistique des donn´ees d´ependantes. Ph. D. Thesis: Universit´e Paris sud, Paris. [163] Robinson, P. M. (1983) Nonparametric estimators for time series. Journal of Time Series Analysis 4, 185-207. [164] Robinson, P. M. (1989) Hypothesis Testing in Semiparametric and Nonparametric Models for Econometric Time Series. The Review of Economic Studies 56-4, 511-534. [165] Robinson, P. M. (1991) Testing for strong serial correlation and dynamic conditional heteroskedasticity in multiple regression. Journal of Econometrics, 47, 67-84 [166] Rosenblatt, M. (1956) A central limit Theorem and a strong mixing condition. Proc. Nat. Ac. Sc. U.S.A. 42, 43-47. [167] Rosenblatt, M. (1961) Independence and dependence in Proc. 4th. Berkeley Symp. Math. Stat. Prob. 411-443. Berkeley University Press. [168] Rosenblatt, M. (1985) Stationary sequences and random fields. Birkha¨ user.
314
BIBLIOGRAPHY
[169] Rosenblatt, M. (1991) Stochastic curve estimation, NSF-CBMS Regional Conference Series in Probability and Statistics, 3. [170] Rosenthal (1970) On the subspaces of Lp (p > 1) spanned by sequences of independent random variables. Israel Journal of Math, 8, 272-303. [171] R¨ uschendorf, L. (1985) The Wasserstein Distance and Approximation Theorems. Z. Wahr. verw. Gebiete 70, 117-129. [172] Samorodnitsky, G., Taqqu, M. S. (1994) Stable Non-Gaussian Random Processes: Stochastic Models with Infinite Variance. Chapman and Hall, New York. [173] Saulis, L., Statulevicius, V. A. (1991) Limit theorems for large deviations. Birkh¨ auser. [174] Shao, Q. M. (1993) Almost sure invariance principle for mixing sequences of random variables. Stoch. Proc. Appl. 48 319-334. [175] Shao, Q. M., Yu, H. (1996) Weak convergence for weighted empirical processes of dependent sequences. Ann. Probab. 24-4, 2098-2127. [176] Shashkin, A. P. (2005) A weak dependence property of a spin system. Preprint. [177] Shixin, E. G. (1999) On the convergence of weighted sums of Lq -mixingale arrays. Acta Math. Hungar. 82(1-2), 113-120. [178] Sklar, A. (1959) Fonctions de r´epartition ` a n dimensions et leurs marges. Publ. Inst. Statist. Univ. Paris 8, 229-231. [179] Skorohod, A. V. (1976). On a representation of random variables. Theory Probab. Appl. 21, 628-632. [180] Strassen, V. (1964) An invariance principle for the law of the iterated logarithm. Z. Wahrsch. Verw. Gebiete 3, 211-226. [181] Straumann, D. (2005) Estimation in conditionally heteroscedastic time series. Lecture Notes in Statistics 181, Springer, New York. [182] Szeg¨ o, S. (1967) Orthogonal Polynomials. A. M. S. 23, New York. [183] van der Vaart, A. W., Wellner, J. A. (1996) Weak convergence and empirical processes. Springer, Berlin. [184] Valadier, M. (1973) D´esint´egration d’une mesure sur un produit. C. R. Acad. Sc. Paris 276, 33-35. [185] Viennet, G. (1997) Inequalities for absolutely regular sequences: application to density estimation. Prob. Th. Rel. Fields 107, 467-492. [186] Withers, C. S (1981) Central limit theorem for dependent variables. Z. Wahrsch. Verw. Gebiete 57, 509-534. [187] Wolkonski, V. A., Rozanov, Y. A. (1959, 1961) Some limit theorems for random functions, Part I: Theory Probab. Appl. 4, 178-197; Part II: Theory Probab. Appl. 6, 186-198. [188] Wu, W. B. (2005a) Nonlinear system theory: Another look at dependence. Proc. Nat. Acad. Sci. USA 102, 14150-14154.
BIBLIOGRAPHY
315
[189] Wu, W. B. (2005b) On the Bahadur representation of sample quantiles for dependent sequences. Ann. Statist. 33, 1934-1963. [190] Wu, W. B. (2005c) Fourier transforms of stationary processes Proc. Amer. Math. Soc. 133, 285-293. [191] Wu, W. B. (2006) M -estimation of linear models with dependent errors. Ann. Statist., To Appear. [192] Wu, W. B., Zhao, Z. (2005) Inference of Trends in Time Series. Under Review, JRSSB. [193] Yoshihara, K. (1973) The law of the iterated logarithm for the processes generated by mixing processes. Yokohama Math. J. 21, 67-72. [194] Young, L. S. (1973) Recurrence times and rates of mixing. Israel Journal of Math., 110, 153-188. [195] Yu, H. (1993) A Glivenko-Cantelli Lemma and weak convergence for empirical processes of associated sequences. Probab. Th. Rel. Fields 95, 357-370. [196] Yu, K. F. (1990) Complete convergence of weighted sums of martingale differences. J. Theor. Prob. 3, 339-347.
Index algorithm Kiefer-Wolfowitz, 143 linear regression, 145 Appell polynomial, 24 associated processes, 53 associated sequences, 225 association, 3, 6 autoregressive processes, 59 Bernoulli shift, 21, 35 causal, 23, 31, 32 noncausal, 24, 25 Volterra, 22, 26 Bernstein block, 230, 242 bracketing number, 98 Catalan number, 80 chaining, 98 concentration, 67 convergence almost sure, 252 complete, 144 finite dimensional, 202, 207, 230 stable, 205 copula, 234 covariogram, 268, 269 cumulant, 85, 88, 91, 269, 277 dependence I, 67 coefficient, 11, 15 near epoch, 6 projective measure, 19 vectors, 137 dependent noise, 137, 140 diagram, 280 formula, 279
distribution Bernoulli, 2 Gaussian, 2 Rademacher, 206 dynamical system, 38 β-transformation, 41 expanding map, 41 Gauss map, 41 neutral fixed point, 41 spectral gap, 38 empirical process, 98, 223 estimation bias, 249, 258, 259, 261, 262, 264 conditional quantile, 248 consistence, 248, 270 density, 248, 254 functional, 247 kernel, 248 MISE, 254, 259 non parametric, 247 projection, 255, 263 regression, 247, 248 spectral, 265 density, 275 volatility, 247 Whittle, 274 function ρ-regular, 248 bounded variation, 11 Lipschitz, 10 polynomial, 260 Gaussian processes, 55 generalized inverse, 18
317
318 Hahn-Jordan, 11 heredity, 12 independence, 2, 215 inequality (2 + δ)-order, 140 Bennett, 120 Burkholder, 123 covariance, 38, 110, 111, 137 association, 7 Doob maximal, 211 exponential, 82, 93, 239 Fuk-Nagaev, 121, 217 Marcinkiewicz-Zygmund, 77, 140 maximal, 132, 134 moment, 201 (2 + δ)-order, 70 Rosenthal, 79, 95, 126, 130, 131, 138, 139, 227, 237, 251 variance, 254 interacting particle systems, 56 configuration, 56 Feller semigroup, 56, 57 infinitesimal generator, 56 smooth function, 57 invariance principle conditional, 205 Donsker, 200 strong, 215 weak, 199 iterated random function, 33
INDEX mixing, 4, 8 absolute regularity, 4 maximal correlation, 4 strong, 4, 215 uniform, 4, 215 mixingale, 6, 140 model ARCH process, 36, 43 ARCH(∞), 43 autoregressive process, 36 bilinear, 48 GARCH(p, q), 43 infinite memory, 64 integer valued, 61 LARCH(∞) process, 42 bilinear, 42 random field, 62 Markov chain, 33 autoregressive, 36 branching, 37 moment centered, 85 cumulant, 85, 92 sum, 91 ODE method, 135 orthogonality, 2, 3, 7 periodogram, 267, 269, 270 integrated, 270 matrix, 276
kernel function, 249 de la Vall´ee-Poussin, 257, 262 Dirichlet, 261, 262 Fejer, 262, 276 wavelet, 263
random field, 11, 62, 200, 236 relative compactness, 205, 208
law of the iterated logarithm, 213 functional, 215 linear process dependent innovations, 269 non-causal, 266
tail function, 103 tightness, 98, 200, 205, 209, 224, 231, 232 total variation, 11
martingale difference, 6 measure (signed), 11
spectral density, 265, 268, 270, 274 matrix, 275 Steutel-van Harn operator, 61
unconditional system, 258