Weak Dependence: With Examples and Applications

Lecture Notes in Statistics Edited by P. Bickel, P. Diggle, S. Fienberg, U. Gather, I. Olkin, S. Zeger 190 Jérôme Ded...

26 downloads 696 Views 3MB Size Report

This content was uploaded by our users and we assume good faith they have the permission to share this book. If you own the copyright to this book and it is wrongfully on our website, we offer a simple DMCA procedure to remove your content from our site. Start by pressing the button below!

Report copyright / DMCA form

DOWNLOAD PDF

Lecture Notes in Statistics Edited by P. Bickel, P. Diggle, S. Fienberg, U. Gather, I. Olkin, S. Zeger

190

Jérôme Dedecker Paul Doukhan Gabriel Lang José Rafael León R. Sana Louhichi Clémentine Prieur

Weak Dependence: With Examples and Applications

Jérôme Dedecker Laboratoire de Statistique, Théorique et Appliquée Université Paris 6 175 rue de Chevaleret 75013 Paris France [email protected] Gabriel Lang Equipe MORSE UMR MIA518 INRA Agro Paris Tech, F-75005 Paris France [email protected] Sana Louhichi Laboratoire de Mathématiques Université Paris-Sud–Bât 425 914058 Orsay Cedex France [email protected] Clémentine Prieur Génie Mathématique et Modélisation Laboratoire de Statistique et Probabilités INSA Toulouse 135 avenue de Rangueil 31077 Toulouse Cedex 4 France [email protected]

ISBN 978-0-387-69951-6

Paul Doukhan ENSAE, Laboratoire de Statistique du CREST (Paris Tech) 3 avenue Pierre Larousse 92240 Malakoff France and SAMOS-MATISSE (Statistique Appliquée et Modélisation Stochastique) Centre d’Economie de la Sorbonne Université Paris 1-Panthéon-Sorbonne CNRS 90, rue de Tolbiac 75634-Paris Cedex 13 France [email protected] José Rafael León R. Escuela de Matematica Universidad Central de Venezuela Ap. 47197 Los Chaguaramos Caracas 1041-A Venezuela [email protected]

e-ISBN 978-0-387-69952-3

Library of Congress Control Number: 2007929938 © 2007 Springer Science+Business Media, LLC All rights reserved. This work may not be translated or copied in whole or in part without the written permission of the publisher (Springer Science+Business Media, LLC, 233 Spring Street, New York, NY 10013, USA), except for brief excerpts in connection with reviews or scholarly analysis. Use in connection with any form of information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed is forbidden. The use in this publication of trade names, trademarks, service marks, and similar terms, even if they are not identified as such, is not to be taken as an expression of opinion as to whether or not they are subject to proprietary rights. Printed on acid-free paper. 9 8 7 6 5 4 3 2 1 springer.com

Preface Time series and random ﬁelds are main topics in modern statistical techniques. They are essential for applications where randomness plays an important role. Indeed, physical constraints mean that serious modelling cannot be done using only independent sequences. This is a real problem because asymptotic properties are not always known in this case. The present work is devoted to providing a framework for the commonly used time series. In order to validate the main statistics, one needs rigorous limit theorems. In the ﬁeld of probability theory, asymptotic behavior of sums may or may not be analogous to those of independent sequences. We are involved with this ﬁrst case in this book. Very sharp results have been proved for mixing processes, a notion introduced by Murray Rosenblatt [166]. Extensive discussions of this topic may be found in his Dependence in Probability and Statistics (a monograph published by Birkha¨ user in 1986) and in Doukhan (1994) [61], and the sharpest results may be found in Rio (2000) [161]. However, a counterexample of a really simple non-mixing process was exhibited by Andrews (1984) [2]. The notion of weak dependence discussed here takes real account of the available models, which are discussed extensively. Our idea is that robustness of the limit theorems with respect to the model should be taken into account. In real applications, nobody may assert, for example, the existence of a density for the inputs in a certain model, while such assumptions are always needed when dealing with mixing concepts. Our main task here is not only to provide the reader with the sharpest possible results, but, as statisticians, we need the largest possible framework. Doukhan and Louhichi (1999) [67] introduced a wide dependence framework that turns out to apply to the models used most often. Their simple way of weakening the independence property is mainly adapted to work with stationary sequences. We thus discuss examples of weakly dependent models, limit theory for such sequences, and applications. The notions are mainly divided into the two following classes: • The ﬁrst class is that of “Causal” dependence. In this case, the conditions may also be expressed in terms of conditional expectations, and thus the v

vi

PREFACE powerful martingale theory tools apply, such as Gordin’s [97] device that allowed Dedecker and Doukhan (2003) [43] to derive a sharp Donsker principle. • The second class is that of noncausal processes such as two-sided linear processes for which speciﬁc techniques need to be developed. Moment inequalities are a main tool in this context.

In order to make this book useful to practitioners, we also develop some applications in the ﬁelds of Statistics, Stochastic Algorithms, Resampling, and Econometry. We also think that it is good to present here the notation for the concepts of weak dependence. Our aim in this book was to make it simple to read, and thus the mathematical level needed has been set as low as possible. The book may be used in diﬀerent ways: • First, this is a mathematical textbook aimed at ﬁxing the notions in the area discussed. We do not intend to cover all the topics, but the book may be considered an introduction to weak dependence. • Second, our main objective in this monograph is to propose models and tools for practitioners; hence the sections devoted to examples are really extensive. • Finally, some of the applications already developed are also quoted for completeness. A preliminary version of this joint book on weak dependence concepts was used in a course given by Paul Doukhan to the Latino Americana Escuela de Matem´atica in Merida (Venezuela). It was especially useful for the preparation of our manuscript that a graduate course in Merida (Venezuela) in September 2004 on this subject was based on these notes. The diﬀerent contributors and authors of the present monograph participated in developing it jointly. We also want to thank the various coauthors of (published or not yet published) papers on the subject, namely Patrick Ango Nz´e (Lille 3), Jean-Marc Bardet (Universit´e Paris 1), Odile Brandi`ere (Orsay), Alain Latour (Grenoble), H´el`ene Madre (Grenoble), Michael Neumann (Iena), Nicolas Ragache (INSEE), Mathieu Rosenbaum (Marne la Vall´ee), Gilles Teyssi`ere (G¨oteborg), Lionel Truquet (Universit´e Paris 1), Pablo Winant (ENS Lyon), Olivier Wintenberger (Universit´e Paris 1), and Bernard Ycart (Grenoble). Even if all their work did not appear in those notes, they were really helpful for their conception. We also want to thank the various referees who provided us with helpful comments either for this monograph or for papers submitted for publication and related to weak dependence. We now give some clariﬁcation concerning the origin of this notion of weak dependence. The seminal paper [67] was in fact submitted in 1996 and was part

PREFACE

vii

of the PhD dissertation of Sana Louhichi in 1998. The main tool developed in this work was combinatorial moment inequalities; analogous moment inequalities are also given in Bakhtin and Bulinski (1997) [8]. Another close deﬁnition of weak dependence was provided in a preprint by Bickel and B¨ uhlmann (1995) [17] anterior to [67], also published in 1999 [18]. However, those authors aimed to work with the bootstrap; see Chapter 13 and Section 2.2 in [6]. The approach of Wu (2005) [188] detailed in Remark 3.1, based on L2 -conditions for causal Bernoulli shifts, also yields interesting and sharp results. This monograph is essentially built in four parts: Definitions and models In the ﬁrst chapter, we make precise some issues and tools for investigating dependence: this is a motivational chapter. The second chapter introduces formally the notion of weak dependence. Models are then presented in a long third chapter. Indeed, in our mind, the richness of examples is at the core of the weak dependence properties. Tools Tools are given in two chapters (Chapters 4 and 5) concerned respectively with noncausal and causal properties. Tools are ﬁrst used in the text for proving the forthcoming limit theorems, but they are essential for any type of further application. Two main tools may be found: moment bounds and coupling arguments. We also present speciﬁc tightness criteria adapted to work out empirical limit theorems. Limit theorems Laws of large numbers (and some applications), central limit theorems, invariance principles, laws of the iterated logarithm, and empirical central limit theorems are useful limit theorems in probability. They are precisely stated and worked out within Chapters 6–10 . Applications The end of the monograph is dedicated to applications. We ﬁrst present in Chapter 11 the properties of the standard nonparametric techniques. After this, we consider some issues of spectral estimation in Chapter 12. Finally, Chapter 13 is devoted to some miscellaneous applications, namely applications to econometrics, the bootstrap, and subsampling techniques. After the table of contents, a useful short list of notation allows rapid access to the main weak dependence coeﬃcients and some useful notation.

J´erˆome Dedecker, Paul Doukhan, Gabriel Lang, Jos´e R. Le´on, Sana Louhichi, and Cl´ementine Prieur

Contents Preface

v

List of notations

xiii

1 Introduction 1.1 From independence to dependence . . 1.2 Mixing . . . . . . . . . . . . . . . . . . 1.3 Mixingales and near epoch dependence 1.4 Association . . . . . . . . . . . . . . . 1.5 Nonmixing models . . . . . . . . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

1 1 4 5 6 8

2 Weak dependence 2.1 Function spaces . . . . . . . . . . . . . . . 2.2 Weak dependence . . . . . . . . . . . . . . 2.2.1 η, κ, λ and ζ-coeﬃcients . . . . . . 2.2.2 θ and τ -coeﬃcients . . . . . . . . . ˜ 2.2.3 α ˜ , β˜ and φ-coeﬃcients. . . . . . . . 2.2.4 Projective measure of dependence

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

9 10 11 12 14 16 19

3 Models 3.1 Bernoulli shifts . . . . . . . . . . . . . . . . . . . 3.1.1 Volterra processes . . . . . . . . . . . . . 3.1.2 Noncausal shifts with independent inputs 3.1.3 Noncausal shifts with dependent inputs . 3.1.4 Causal shifts with independent inputs . . 3.1.5 Causal shifts with dependent inputs . . . 3.2 Markov sequences . . . . . . . . . . . . . . . . . . 3.2.1 Contracting Markov chain. . . . . . . . . 3.2.2 Nonlinear AR(d) models . . . . . . . . . . 3.2.3 ARCH-type processes . . . . . . . . . . . 3.2.4 Branching type models . . . . . . . . . . .

. . . . . . . . . . .

. . . . . . . . . . .

. . . . . . . . . . .

. . . . . . . . . . .

. . . . . . . . . . .

. . . . . . . . . . .

. . . . . . . . . . .

. . . . . . . . . . .

. . . . . . . . . . .

21 21 22 24 25 31 32 33 35 36 36 37

ix

. . . . .

CONTENTS

x 3.3 3.4

. . . . . . . . . . . . .

. . . . . . . . . . . . .

. . . . . . . . . . . . .

. . . . . . . . . . . . .

. . . . . . . . . . . . .

. . . . . . . . . . . . .

38 42 43 48 53 53 55 56 59 59 61 62 65

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . bound . . . .

. . . . . . . . . . . . .

. . . . . . . . . . . . .

. . . . . . . . . . . . .

. . . . . . . . . . . . .

. . . . . . . . . . . . .

67 67 69 69 70 73 77 79 82 84 84 93 96 98

5 Tools for causal cases 5.1 Comparison results . . . . . . . . . . . . . . . . . . . . . . 5.2 Covariance inequalities . . . . . . . . . . . . . . . . . . . . 5.2.1 A covariance inequality for γ1 . . . . . . . . . . . . 5.2.2 A covariance inequality for β˜ and φ˜ . . . . . . . . 5.3 Coupling . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.3.1 A coupling result for real valued random variables 5.3.2 Coupling in higher dimension . . . . . . . . . . . . 5.4 Exponential and moment inequalities . . . . . . . . . . . . 5.4.1 Bennett-type inequality . . . . . . . . . . . . . . . 5.4.2 Burkholder’s inequalities . . . . . . . . . . . . . . . 5.4.3 Rosenthal inequalities using Rio techniques . . . . 5.4.4 Rosenthal inequalities for τ1 -dependent sequences . 5.4.5 Rosenthal inequalities under projective conditions 5.5 Maximal inequalities . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . .

. . . . . . . . . . . . . .

. . . . . . . . . . . . . .

. . . . . . . . . . . . . .

103 103 110 110 111 114 115 116 119 120 123 125 130 131 132

3.5

3.6

Dynamical systems . . . . . . . . . . . . . . . . . Vector valued LARCH(∞) processes . . . . . . . 3.4.1 Chaotic expansion of LARCH(∞) models 3.4.2 Bilinear models . . . . . . . . . . . . . . . ζ-dependent models . . . . . . . . . . . . . . . . 3.5.1 Associated processes . . . . . . . . . . . . 3.5.2 Gaussian processes . . . . . . . . . . . . . 3.5.3 Interacting particle systems . . . . . . . . Other models . . . . . . . . . . . . . . . . . . . . 3.6.1 Random AR models . . . . . . . . . . . . 3.6.2 Integer valued models . . . . . . . . . . . 3.6.3 Random ﬁelds . . . . . . . . . . . . . . . 3.6.4 Continuous time . . . . . . . . . . . . . .

4 Tools for non causal cases 4.1 Indicators of weakly dependent processes . . . . . 4.2 Low order moments inequalities . . . . . . . . . . 4.2.1 Variances . . . . . . . . . . . . . . . . . . 4.2.2 A (2 + δ)-order momentbound . . . . . . 4.3 Combinatorial moment inequalities . . . . . . . . 4.3.1 Marcinkiewicz-Zygmundtype inequalities . 4.3.2 Rosenthal type inequalities . . . . . . . . 4.3.3 A ﬁrst exponential inequality . . . . . . . 4.4 Cumulants . . . . . . . . . . . . . . . . . . . . . . 4.4.1 General properties of cumulants . . . . . 4.4.2 A second exponential inequality . . . . . . 4.4.3 From weak dependence to the exponential 4.5 Tightness criteria . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . .

. . . . . . . . . . . . .

. . . . . . . . . . . . .

xi

CONTENTS 6 Applications of SLLN 6.1 Stochastic algorithms with non causal dependent input 6.1.1 Weakly dependent noise . . . . . . . . . . . . . 6.1.2 γ1 -dependent noise . . . . . . . . . . . . . . . . 6.2 Examples of application . . . . . . . . . . . . . . . . . 6.2.1 Robbins-Monro algorithm . . . . . . . . . . . . 6.2.2 Kiefer-Wolfowitz algorithm . . . . . . . . . . . 6.3 Weighted dependent triangular arrays . . . . . . . . . 6.4 Linear regression . . . . . . . . . . . . . . . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

135 135 137 140 142 142 143 143 145

7 Central limit theorem 7.1 Non causal case: stationary sequences . . . . . . . . 7.2 Lindeberg method . . . . . . . . . . . . . . . . . . . 7.2.1 Proof of the main results . . . . . . . . . . . 7.2.2 Rates of convergence . . . . . . . . . . . . . . 7.3 Non causal random ﬁelds . . . . . . . . . . . . . . . 7.4 Conditional central limit theorem (causal) . . . . . . 7.4.1 Deﬁnitions and preliminary lemmas . . . . . 7.4.2 Invariance of the conditional variance . . . . 7.4.3 End of the proof . . . . . . . . . . . . . . . . 7.5 Applications . . . . . . . . . . . . . . . . . . . . . . . 7.5.1 Stable convergence . . . . . . . . . . . . . . . 7.5.2 Suﬃcient conditions for stationary sequences 7.5.3 γ-dependent sequences . . . . . . . . . . . . . ˜ 7.5.4 α ˜ and φ-dependent sequences . . . . . . . . . 7.5.5 Suﬃcient conditions for triangular arrays . .

. . . . . . . . . . . . . . .

. . . . . . . . . . . . . . .

. . . . . . . . . . . . . . .

. . . . . . . . . . . . . . .

. . . . . . . . . . . . . . .

. . . . . . . . . . . . . . .

. . . . . . . . . . . . . . .

153 153 155 158 161 163 173 174 176 178 182 182 184 189 192 194

8 Donsker principles 8.1 Non causal stationary sequences . . . . . . . . . . . 8.2 Non causal random ﬁelds . . . . . . . . . . . . . . . 8.2.1 Moment inequality . . . . . . . . . . . . . . . 8.2.2 Finite dimensional convergence . . . . . . . . 8.2.3 Tightness . . . . . . . . . . . . . . . . . . . . 8.3 Conditional (causal) invariance principle . . . . . . . 8.3.1 Preliminaries . . . . . . . . . . . . . . . . . . 8.3.2 Finite dimensional convergence . . . . . . . . 8.3.3 Relative compactness . . . . . . . . . . . . . 8.4 Applications . . . . . . . . . . . . . . . . . . . . . . . 8.4.1 Suﬃcient conditions for stationary sequences 8.4.2 Suﬃcient conditions for triangular arrays . .

. . . . . . . . . . . .

. . . . . . . . . . . .

. . . . . . . . . . . .

. . . . . . . . . . . .

. . . . . . . . . . . .

. . . . . . . . . . . .

. . . . . . . . . . . .

199 199 200 201 202 205 205 206 207 208 209 209 212

9 Law of the iterated logarithm (LIL) 213 9.1 Bounded LIL under a non causal condition . . . . . . . . . . . . 213 9.2 Causal strong invariance principle . . . . . . . . . . . . . . . . . . 214

xii 10 The 10.1 10.2 10.3 10.4 10.5 10.6

CONTENTS empirical process A simple condition for the tightness η-dependent sequences . . . . . . . . ˜ α ˜ , β˜ and φ-dependent sequences . . θ and τ -dependent sequences . . . . Empirical copula processes . . . . . . Random ﬁelds . . . . . . . . . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

223 224 225 231 233 234 236

11 Functional estimation 11.1 Some non-parametric problems . . . . . . . 11.2 Kernel regression estimates . . . . . . . . . 11.2.1 Second order and CLT results . . . . 11.2.2 Almost sure convergence properties . ˜ 11.3 MISE for β-dependent sequences . . . . . . 11.4 General kernels . . . . . . . . . . . . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

247 247 248 249 252 254 260

12 Spectral estimation 12.1 Spectral densities . . . . . . . . 12.2 Periodogram . . . . . . . . . . 12.2.1 Whittle estimation . . . 12.3 Spectral density estimation . . 12.3.1 Second order estimate . 12.3.2 Dependence coeﬃcients

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

265 265 269 274 275 277 279

13 Econometric applications and resampling 13.1 Econometrics . . . . . . . . . . . . . . . . . . . . . 13.1.1 Unit root tests . . . . . . . . . . . . . . . . 13.1.2 Parametric problems . . . . . . . . . . . . . 13.1.3 A semi-parametric estimation problem . . . 13.2 Bootstrap . . . . . . . . . . . . . . . . . . . . . . . 13.2.1 Block bootstrap . . . . . . . . . . . . . . . 13.2.2 Bootstrapping GMM estimators . . . . . . 13.2.3 Conditional bootstrap . . . . . . . . . . . . 13.2.4 Sieve bootstrap . . . . . . . . . . . . . . . . 13.3 Limit variance estimates . . . . . . . . . . . . . . . 13.3.1 Moments, cumulants and weak dependence 13.3.2 Estimation of the limit variance . . . . . . . 13.3.3 Law of the large numbers . . . . . . . . . . 13.3.4 Central limit theorem . . . . . . . . . . . . 13.3.5 A non centered variant . . . . . . . . . . . .

. . . . . . . . . . . . . . .

. . . . . . . . . . . . . . .

. . . . . . . . . . . . . . .

. . . . . . . . . . . . . . .

. . . . . . . . . . . . . . .

. . . . . . . . . . . . . . .

. . . . . . . . . . . . . . .

. . . . . . . . . . . . . . .

283 283 284 285 285 287 288 288 290 290 292 293 295 297 299 302

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

Bibliography

305

Index

317

List of notations We recall here the main speciﬁc or unusual notation used throughout this monograph. As usual, #A denotes the cardinal of a ﬁnite set A, and N, Z, R, C are the standard sets of calculus. For clarity and homogeneity, we note in this list: • a b or a = O(b) means that a ≤ Cb for a constant C > 0 and a, b ≥ 0, • a b or a = o(b) as b → 0, for a, b ≥ 0, means limb→0 a/(a + b) = 1, • a ∧ b, a ∨ b are the minimum and the maximum of the numbers a, b ≥ 0, • M, U, V, A, B are σ-algebras, and (Ω, A, P) is a probability space, • X, Y, Z, . . . denote random variables (usually ξ, ζ are inputs), 1 • Lp (E, E , m) are classes of measurable functions: f p = E |f (x)|p dm(x) p < ∞, • F is a cumulative distribution function, Φ is the normal cumulative distribution function, and φ = Φ is its density, • n is a time (space) delay, r, s ∈ N are “past-future” parameters, p is a moment order, and F, G are function spaces.

α(U, V) α (M, X) α r (n) BV BV1 β(U, V) β(M, X) βr (n) Cn,r cX,r (n) cX,r (n) γp (M, X) γp (n) δn

(X, Y )

p. 4, p. 16, p. 19, p. 11,

§ § § §

1.2, eqn. (1.2.1), 2.2.3, def. 2.5-1, 2.2.3, def. 2.2.15, 2.1,

mixing coeﬃcient dependence coeﬃcient dependence sequence bounded variation spaces

p. 4, p. 16, p. 19, p. 73, p. 87, p. 87, p. 19, p. 19, p. 25, p. 11,

§ § § § § § § § § §

1.2, eqn. (1.2.4), 2.2.3, def. 2.5-2, 2.2.3, def .2.2.15, 4.3, def. 4.1, 4.4.1, eqn. (4.4.7), 4.4.1, eqn. (4.4.8), 2.2.4, def. 2.2.16, 2.2.4, def. 2.2.16, 3.1.2, def. 3.1.10, 2.2, eqn. (2.2.1),

mixing coeﬃcient dependence coeﬃcient dependence sequence moment coeﬃcient sequence vector moment sequence maximum moment sequence dependence coeﬃcient dependence sequence Bernoulli shift coeﬃcient general coeﬃcient

xiii

xiv

(n) pp. 4–11, §§ 1.1–2.2, eqn. (1.1.3), η(n) p. 12, § 2.2.1, eqn. (2.2.3), p. 18, § 2.2.3, eqn. (2.2.14), f −1 F , G, Ψ p. 11, § 2.2, eqn. (2.2.1), F, Ψ φ(U, V) p. 4, § 1.2, eqn. (1.2.3), φ(M, X) p. 16, § 2.2.3, def. 2.5-3, φr (n) p. 19, § 2.2.3, def. 2.2.15, I p. 67, § 4.1, p. 88, § 4.4.1, def. 4.4.9, κp (n) κ(n) p. 12, § 2.2.1, eqn. (2.2.5), L p. 38, § 3.3, p. 6, § 1.3, def. 1.2, Lp -NED (1) p. 10, § 2.1, Λ(δ), Λ (δ) λ(n) p. 12, § 2.2.1, eqn. (2.2.4), p. 5, § 1.2, eqn. (1.2.5), μX (n) μX,r,s (n) p. 5, § 1.2, eqn. (1.2.6), ψ(n) p. 5, § 1.3, QX (·) p. 74, § 4.3, eqn. (4.3.3), ρ(U, V) p. 4, § 1.2, eqn. (1.2.2), θ(n) p. 14, § 2.2.2, eqn. (2.2.7), p. 15, § 2.2.2, eqn. (2.2.8), θp (M, X) θp,r (n) p. 15, § 2.2.2, eqn. (2.2.9), p. 16, § 2.2.2, eqn. (2.2.12), τp (M, X) τp,r (n) p. 16, § 2.2.2, eqn. (2.4), ζ(n) p. 12, § 2.2.1, eqn. (2.2.6), p. 22, § 3.1, eqn. (3.1.3), ωp,n

LIST OF NOTATIONS coeﬃcient sequence dependence sequence inverse of a monotonic f weak dependence mixing coeﬃcient dependence coeﬃcient dependence sequence class of indicators cumulant sequence dependence sequence Perron Frobenius operator NED notion Lipschitz spaces dependence sequence μ-mixing sequence μ-mixing ﬁeld Lp -mixingale coeﬃcient quantile function mixing coeﬃcient dependence sequence dependence coeﬃcient dependence sequence coupling coeﬃcient coupling sequence dependence sequence shift increment coeﬃcient

Chapter 1

Introduction This chapter is aimed to justify some of our choices and to provide a basic background of the other competitive notions like those linked to mixing conditions. In our mind mixing notions are not related to time series but really to σ-algebras. They are consequently more adapted to work in areas like Finance where history, that is the σ-algebra generated by the past is of a considerable importance. Having in view the most elementary ideas, Doukhan and Louhichi (1999) [67] introduced the more adapted weak dependence condition developped in this monograph. This deﬁnition makes explicit the asymptotic independence between ‘past’ and ‘future’; this means that the ‘past’ is progressively forgotten. In terms of the initial time series, ‘past’ and ‘future’ are elementary events given through ﬁnite dimensional marginals. Roughly speaking, for convenient functions f and g, we shall assume that Cov (f (‘past’), g(‘future’)) is small when the distance between the ‘past’ and the ‘future’ is suﬃciently large. Such inequalities are signiﬁcant only if the distance between indices of the initial time series in the ‘past’ and ‘future’ terms grows to inﬁnity. The convergence is not assumed to hold uniformly on the dimension of the ‘past’ or ‘future’ involved. Another direction to describe the asymptotic behavior of certain time series is based on projective methods. It will be proved that this is coherent with the previous items.

Sections in this chapter ﬁrst provide general considerations on independence, then we deﬁne classical mixing coeﬃcients, mixingales and association to conclude with simple counterexamples.

1.1

From independence to dependence

We recall here some very basic facts concerning independence of random variables. Let P, F be random variables deﬁned on the same probability space 1

2

CHAPTER 1. INTRODUCTION

(Ω, A, P) and taking values in measurable spaces (EP , EP ) and (EF , EF ). Independence of both random variables P, F writes P(X ∈ A, Y ∈ B) = P(X ∈ A)P(Y ∈ B),

∀(A, B) ∈ EP × EF

(1.1.1)

extending this identity by linearity to stepwise constant functions and using limits yields a formulation of this relation which looks more adapted for applications: Cov(f (P), g(F)) = 0,

∀(f, g) ∈ L∞ (EP , EP ) × L∞ (EF , EF )

where, for instance, L∞ (EP , EP ) denotes the subspace of ∞ (EP , R) (the space of bounded and real valued functions), of measurable and bounded function f : (EP , EP ) → (R, BR ). If the spaces EP , EF are topological spaces endowed with their Borel σ-algebras (the σ-algebra generated by open sets) then it is suﬃcient to state Cov(f (P), g(F)) = 0,

∀(f, g) ∈ P × F

(1.1.2)

where P, F are dense subsets of the spaces of continuous functions EP → R and EF → R. In order to qualify a simple topology on both spaces it will be convenient to assume that EP and EF are locally compact topological spaces and the density in the space of continuous functions will thus refer to uniform convergence of compact subsets of EP , EF . From a general principle of economy, we always should wonder about the smallest classes P, F possible. The more intuitive (necessary) condition for independence is governed by the idea of orthogonality. In this idea we now develop some simple examples for which, however Orthogonality

=⇒

Independence.

• Bernoulli trials If EP = EF = {0, 1} are both two-points spaces, the random variables now follow Bernoulli distributions and independence follows form the simple orthogonality of Cov(P, F) = 0. Indeed from the standard bilinearity properties of the covariance, we also have Cov(1−P, F) = 0, Cov(P, 1−F) = 0 and Cov(1−P, 1−F) = 0 which respectively means that eqn. (1.1.1) holds if (A, B) = ({a}, {b}) with (a, b) = (0, 0) or respectively (1, 0), (0, 1) and (1, 1). The problem of this example is that in this case the σ-algebras generated by P, F are very poor and this example will thus not ﬁt the forthcoming case of ‘important’ past and future. • Gaussian vectors If now EP = Rp and EF = Rq , then if the vector Z = (P, F) ∈ Rp+q is Gaussian then its distribution only depends on its second order properties∗ , ∗ This only means that Z’s distribution depends only on the expressions EZ and EZ Z i i j for 1 ≤ i, j ≤ d if Z = (Z1 , . . . , Zd ).

1.1. FROM INDEPENDENCE TO DEPENDENCE

3

hence coordinate functions are enough to determine independence. In more precise words two random vectors P = (P1 , . . . , Pp ) and F = (F1 , . . . , Fq ) are independent if and only if Cov(Pi , Fj ) = 0 for any 1 ≤ i ≤ p and any 1 ≤ j ≤ q. This is a very simple characterization of independence which clearly does not extend to any category of random variables. For example, let X, Y ∈ R be independent and symmetric random variables such that P(X = 0) = 0 if we set P = X and F = sign(X)Y then Cov(P, F) = E(|X|Y ) − EX · EF = E|X| · EY − EX · EF = 0 because X and Y are centered random variables even if those variables are not independent if the support of Y ’s distribution contains more than two points. A simple situation of uncorrelated individually standard Gaussian random variables which are not independent is provided† with the couple (P, F) = (N, RN ) where N ∼ N (0, 1) and R is a Rademacher random variable (that means P(R = 1) = P(R = −1) = 12 ) independent of N . • Associated vectors (cf. § 1.4) Again, we assume that EP = Rp and EF = Rq , then the random vector X = (P, F) ∈ Rp+q = Rd is called associated in case Cov(h(X), k(X)) ≥ 0 for any measurable couple of functions h, k : Rd → R such that both E(h2 (X) + k2 (X)) < ∞ and the partial functions xj → h(x1 , . . . , xd ) and xj → k(x1 , . . . , xd ) are non-decreasing for any choice of the remaining coordinates x1 , . . . ,xj−1 , xj+1 ,. . . , xd ∈ R and any 1 ≤ j ≤ d. A essential property of such vectors is that here too, orthogonality implies independence. This will be developed in a forthcoming chapter. An example of associated vectors is that of independent coordinates. Even if it looks uninteresting case in our dependent setting, this leads to much more involved examples of associated random vectors through monotonic functions. The class of such coordinatewise increasing functions is a cone of L2 (X), the class of functions such that Eh2 (X) < ∞, hence the set of associated distributions looks like a (very thin) cone of the space of distributions on Rd . The same idea applies to Gaussian distributions which is even ﬁnite dimensional in the large set of laws on Rd .

If now, we consider a time series X = (Xn )n∈Z with values in a locally compact topological space E (typically E = Rd ) we may consider one variable P of the past and one variable F of the future: P = (Xi1 , . . . , Xiu ),

F = (Xj1 , . . . , Xjv ),

† In this case both variables are indeed centered with Normal distributions and E{N (RN )} = ER · EN 2 = 0 while |N | = |RN | is not independent of itself since it is not a.s. a constant.

4

CHAPTER 1. INTRODUCTION

for i1 ≤ i2 ≤ · · · ≤ iu < j1 ≤ j2 ≤ · · · ≤ jv , u, v ∈ N∗ = {1, 2, . . .}. Independence of the time series X thus writes as the independence of P and F for any choices i1 ≤ i2 ≤ · · · ≤ iu < j1 ≤ j2 ≤ · · · ≤ jv . Independence of the times series up to time m, also called m-dependence, is now characterized as independence of P and F if iu + m ≤ j1 . Finally, asymptotic independence of past and future will thus be given by arbitrary asymptotics

(r) =

sup

sup

d(P,F)≥r (f,g)∈F ×G

|Cov(f (P), g(F))|

(1.1.3)

where d(P, F) = j1 − iu . The only problem of the previous deﬁnition is that the corresponding dependence coeﬃcient should also be indexed by suitable multiindices (i1 , i2 , . . . , iu ) and (j1 , j2 , . . . , jv ). This deﬁnition will be completed in chapter 2 by considering classes Pu ⊂ P and Fv ⊂ F and suprema as well with respect to ordered multi-indices (i1 , i2 , . . . , iu ) and (j1 , j2 , . . . , jv ) such that j1 − iu ≥ r.

1.2

Mixing

Mixing conditions, as introduced by Rosenblatt (1956) [166] are weak dependence conditions in terms of the σ−algebras generated by a random sequence. In order to deﬁne such conditions we ﬁrst introduce the conditions relative to sub-σ−algebras U, V ⊂ A on an abstract probability space (Ω, A, P): α(U, V) = ρ(U, V) =

sup

U∈U ,V ∈V

|P(U ∩ V ) − P(U )P(V )|

(1.2.1)

|Corr(u, v)|

(1.2.2)

sup

u∈L2 (U ),v∈L2 (V)

φ(U, V) =

P(U ∩ V ) sup − P(V ) P(U ) U∈U ,V ∈V

β(U, V) =

1 2

sup I, J ≥ 1 (Ui )1≤i≤I ∈ U I , (Vj )1≤j≤J ∈ V J

I J

(1.2.3)

|P(Ui ∩ Vj ) − P(Ui )P(Vj )| (1.2.4)

i=1 j=1

In the deﬁnition of β, the supremum is considered over all measurable partitions (Ui )1≤i≤I , (Vj )1≤j≤J of Ω. The above coeﬃcients are, respectively, Rosenblatt (1956) [166]’s strong mixing coeﬃcient α(U, V), Wolkonski and Rozanov (1959) [187]’s absolute regularity coeﬃcient β(U, V), Kolmogorov and Rozanov (1960) [112]’s maximal correlation coeﬃcient ρ(U, V), and Ibragimov (1962) [110]’s uniform mixing coeﬃcient φ(U, V). A more comprehensible formulation for β is written in terms of a norm in total variation β(U, V) = PU ⊗V − PU ⊗ PV T V

1.3. MIXINGALES AND NEAR EPOCH DEPENDENCE

5

here PU , PV denote the restrictions of P to σ−ﬁelds U, V and PU ⊗V is a law on the product σ−ﬁelds deﬁned of rectangles by PU ⊗V (U, V ) = P(U, V ). In case U, V are generated by random variables U, V this may be written β(U, V) = P(U,V ) − PU ⊗ PV T V as the total variation norm of distributions of (U, V ) and (U, V ) for some independent copy V of V. The Markov frame is however adapted to prove β-mixing since this condition holds under positive recurrence. In fact any coeﬃcient μ such that μ(U, V) ∈ [0, +∞] is well deﬁned and such that independence of U, V implies μ(U, V) = 0 may be considered as a mixing coeﬃcient. Once a mixing coeﬃcient has been chosen, the corresponding mixing condition is deﬁned for random processes (Xt )t∈Z and for random ﬁelds (Xt )t∈Zd : μX (r) = sup c(σ(Xt , t ≤ i), σ(Xt , t ≥ i + r)) (1.2.5) i∈Z

and the random process is called μ-mixing in case μX (r) →r→∞ 0. Here μ = α, β, φ or ρ thus yield the coeﬃcient sequences αX (r), βX (r), φX (r) or ρX (r); many other coeﬃcients may also be introduced. For the more diﬃcult case of random ﬁelds, one needs a more intricate deﬁnition. The one we propose depends on two additional integers, and the random ﬁeld (Xt )t∈Zd is μ-mixing in case for any u, v ∈ N , cX,u,v (r) →r→∞ 0, where now μX,a,b (r) =

sup #A=a,#B=b,d(A,B)≥r

c(σ(Xt , t ∈ A), σ(Xt , t ∈ B))

(1.2.6)

the supremum is considered over ﬁnite subsets with cardinality u, v and at least r distant (where a metric has been ﬁxed on Zd ). The following relations hold: ρ − mixing φ − mixing ⇒ ⇒ α − mixing β − mixing and no reverse implication holds in general. Examples for such conditions to hold are investigated in Doukhan (1994) [61], and Rio (2000) [161] provides up-to-date results in this setting. We only quote here that those conditions are usually diﬃcult to check.

1.3

Mixingales and Near Epoch Dependence

Definition 1.1 (Mc Leish (1975) [129], Andrews (1988) [3]). Let p ≥ 1 and let (Fn )n∈Z be an increasing sequence of σ-algebras. The sequence (Xn , Fn )n∈Z is

6

CHAPTER 1. INTRODUCTION

called an Lp -mixingale if there exist nonnegative sequences (cn )n∈Z and (ψ(n))n∈Z such that ψ(n) − → 0 as n → ∞ and for all integers n ∈ Z, k ≥ 0, Xn − E (Xn | Fn+k ) p E (Xn | Fn−k ) p

≤ cn ψ (k + 1) , ≤ cn ψ (k) .

(1.3.1) (1.3.2)

This property of fading memory is easier to handle than the martingale condition. A more general concept is the near epoch dependence (NED) on a mixing process. Its deﬁnition can be found in Billingsley (1968) [20] who considered functions of φ−mixing processes. Definition 1.2 (P¨ otscher and Prucha (1991) [152]). Let p ≥ 1. We consider a c-mixing process (deﬁned as in eqn. (1.2.5)) (Vn )n∈Z . For any integers i ≤ j, set Fij = σ (Vi , . . . , Vj ) . The sequence (Xn , Fn )n∈Z is called an Lp -NED process on the c-mixing process (Vn )n∈Z if there exist nonnegative sequences (cn )n∈Z and (ψ(n))n∈Z such that ψ(n) − → 0 as n → ∞ and for all integers n ∈ Z, k ≥ 0,

n+k

≤ cn ψ (k) .

Xn − E Xn | Fn−k p

This approach is developed in details in P¨ otscher and Prucha (1991) [152]. Functions of MA(∞) processes can be handled using NED concept. For instance, limit theorems can be deduced for sums of such Functions of MA(∞) processes. These previous deﬁnitions translate the fact that a k-period – ahead in the ﬁrst case, both ahead and backwards in the second deﬁnition – projection is convergent to the unconditional mean. They are known to be satisﬁed by a wide class of models. For example, martingale diﬀerences can be described as L1 mixingale sequences, and linear processes with martingale diﬀerence innovations as well.

1.4

Association

The notion of association was introduced independently by Esary, Proschan and Walkup (1967) [85] and Fortuin, Kastelyn and Ginibre (1971) [87]. The motivations of those authors were radically diﬀerent since the ﬁrst ones were working in reliability theory and the others in mechanical statistics, and their condition is known as FKG inequality. Definition 1.3. The sequence (Xt )t∈Z is associated, if for all coordinatewise increasing real-valued functions h and k, Cov(h(Xt , t ∈ A), k(Xt , t ∈ B)) ≥ 0 for all ﬁnite disjoint subsets A and B of Z and if moreover E h(Xt , t ∈ A)2 + k(Xt , t ∈ B)2 < ∞.

1.4. ASSOCIATION

7

This extends the positive correlation assumption to model the notion that two stochastic processes have a tendency to evolve in a similar way. This deﬁnition is deeper than the simple positivity of the correlations. Besides the evident fact that it does not assume that the variances exist, one can easily construct orthogonal (hence positively correlated) sequences that do not have the association property. An important diﬀerence between the above conditions is that its uncorrelatedness implies independence of an associated sequence (Newman, 1984 [136]). Let for instance (ξk , ηk ) be independent and i.i.d. N (0, 1) sequences. Then the sequence (Xn )n∈Z deﬁned by Xk = ξk (ηk − ηk−1 ) is neither correlated nor independent, hence it is not an associated sequence. Heredity of association only holds under monotonic transformations. This unpleasant restriction will disappear under the assumption of weak dependence. The following property of associated sequences was a guideline for the forthcoming deﬁnition of weak dependence. Association does not imply at all any mixing assumption‡ . The forthcoming inequality (1.4.1) also contains the idea that weakly correlated associated sequences are also ‘weakly dependent’. The following result provide a quantitative idea of the loss of association to independence: Theorem 1.1 (Newman, 1984 [136]). For a pair of measurable numeric functions (f, g) deﬁned on A ⊂ Rk , we write f g if both functions g + f and g − f are non-decreasing with respect to each argument. Let now X be any associated random vector with range in A. Then (fi gi , for i = 1, 2) ⇒ Cov(f1 (X), f2 (X)) ≤ Cov(g1 (X), g2 (X)) . This theorem follows simply from several applications of the deﬁnition to the coordinatewise non-decreasing functions gi −fi and gi +fi . By an easy application of the above inequalities one can check that

k l

∂f

|Cov(f (X), g(Y ))| ≤

∂xi

i=1 j=1

∞

∂g

∂yj

Cov(Xi , Yj ),

(1.4.1)

∞

for Rk or Rl valued associated random vectors X and Y and C 1 functions f and g with bounded partial derivatives. For this, it suﬃces to note that f f1 if

p

∂f

one makes use of Theorem 1.1 with f1 (x1 , . . . , xp ) =

∂xi xi . i=1

∞

Denote by R(z) the real part of the complex number z. Theorem 1.1 can be extended to complex valued functions, up to a factor 2 in the left hand side of the above inequality (1.4.1). Indeed, we can set now f g if for any real number ‡ E.g. Gaussian processes with nonnegative covariances are associated while this is well known that this condition does not implies mixing, see [61], page 62.

8

CHAPTER 1. INTRODUCTION

ω the mapping t = (t1 , . . . , tk ) → R g(t) + eiω(t1 +···+tk ) f (t) is non-decreasing with respect to each argument. Also, for any real numbers t1 , . . . , tk , k k i(t1 X1 +···+tk Xk ) it1 X1 itk Xk ≤ 2 − Ee · · · Ee |ti ||tj |Cov(Xi , Xj ). Ee i=1 j=1

On the opposite side, negatively associated sequences of r.v.’s are deﬁned by a similar relation than the aforementioned covariance inequality, except for the sense of this inequality. This property breaks the seemingly parallel deﬁnitions of positively and negatively associated sequences.

1.5

Nonmixing models

Finite moving averages Xn = H(ξn , ξn−1 , . . . , ξn−m ) are trivially m-dependent. However this does not remain exact as m → ∞. For example, the Bernoulli −(k+1) xk ) is not mixing; this shift Xn = H(ξn , ξn−1 , . . .) (with H(x) = ∞ k=0 2 is an example of a Markovian, non-mixing sequence.

∞ Indeed, its stationary representation writes Xn = k=0 2−k−1 ξn−k . Here ξn−k is the k-th digit in the binary expansion of the uniformly chosen number Xn = 0.ξn ξn−1 · · · ∈ [0, 1]. This proves that Xn is a deterministic function of X0 which is the main argument to derive that such models are not mixing ([61], page 77, counterexample 2 or [2]); more precisely, as Xn is some deterministic function of X0 the event A = (X0 ≤ 12 ) belongs both to the sigma algebras of the past σ(Xt , t ≤ 0) an and the sigma algebras of the future σ(Xt , t ≥ n), hence with the notation in § 1.2, α(n) ≥ |P(A ∩ A) − P(A)P(A)| =

1 1 1 − = . 2 4 4

The same arguments apply to the model described before of an autoregressive process with innovations taking p distinct values. The diﬀerence between two such independent processes of this type or ((−1)n Xn )n provide example of nonassociated and non-mixing processes. Assume now that more generally ξj ∼ b(s) follows a Bernoulli distribution with parameter 0 < s < 1. Concentration properties then hold e.g. Xn is uniform if s = 12 , and it has a Cantor marginal distribution if s = 13 . Much more stationary models may be in fact proved to be nonmixing; e.g. for integer valued models (3.6.2) this is simple to prove that Xt = 0 ⇒ X0 = 0 and P(X0 = 0) ∈]0, 1[. With stationarity this easily excludes this model to be strong mixing since, setting P(X0 = 0) := p, α(n) ≥ P (X0 = 0) ∩ (Xn = 0) − P(X0 = 0)P(Xn = 0) = p(1 − p) > 0.

Chapter 2

Weak dependence Many authors have used one of the two following type of dependence: on the one hand mixing properties, introduced by Rosenblatt (1956) [166], on the other hand martingales approximations or mixingales, following the works of Gordin (1969, 1973) [97], [98] and Mc Leisch (1974, 1975) [127], [129]. Concerning strongly mixing sequences, very deep and elegant results have been established: for recent works, we mention the books of Rio (2000) [161] and Bradley (2002) [30]. However many classes of time series do not satisfy any mixing condition as it is quoted e.g. in Eberlein and Taqqu (1986) [83] or Doukhan (1994) [61]. Conversely, most of such time series enter the scope of mixingales but limit theorems and moment inequalities are more diﬃcult to obtain in this general setting. Between those directions, Bickel and B¨ uhlmann (1999) [18] and simultaneously Doukhan and Louhichi (1999) [67] introduced a new idea of weak dependence. Their notion of weak dependence makes explicit the asymptotic independence between ‘past’ and ‘future’; this means that the ‘past’ is progressively forgotten. In terms of the initial time series, ‘past’ and ‘future’ are elementary events given through ﬁnite dimensional marginals. Roughly speaking, for convenient functions f and g, we shall assume that Cov (f (‘past’), g(‘future’)) is small when the distance between the ‘past’ and the ‘future’ is suﬃciently large. Such inequalities are signiﬁcant only if the distance between indices of the initial time series in the ‘past’ and ‘future’ terms grows to inﬁnity. The convergence is not assumed to hold uniformly on the dimension of the ‘past’ or ‘future’ involved. The main advantage is that such a kind of dependence contains lots of pertinent examples and can be used in various situations: empirical central limit theorems are proved in Doukhan and Louhichi (1999) [67] and Borovkova, Burton and 9

10

CHAPTER 2. WEAK DEPENDENCE

Dehling (2001) [25], while applications to Bootstrap are given by Bickel and B¨ uhlmann (1999) [18] and Ango Nz´e et al.(2002) [6] and to functional estimation (Coulon-Prieur & Doukhan, 2000 [40]). In this chapter a ﬁrst section introduces the function spaces necessary to deﬁne the various dependence coeﬃcients of the second section. They are classiﬁed in separated subsections. We shall ﬁrst consider noncausal coeﬃcients and then their causal counterparts; in both cases the subjacent spaces are Lipschitz spaces. A further case associated to bounded variation spaces is provided in the following subsection. Projective measure of dependence are included in the last subsection.

2.1

Function spaces

In this section, we give the deﬁnitions of some function spaces used in this book. • Let m be any measure on a measurable space (Ω, A). For any p ≥ 1, we denote by Lp (m) the space of measurable functions f from Ω to R such that

f p,m f ∞,m

1/p = |f (x)| m(dx) < ∞, = inf M > 0 m(|f | > M ) = 0 < ∞, for p = ∞. p

For simplicity, when no confusion can arise, we shall write Lp and · p instead of Lp (m) and · p,m . Let X be a Polish space and δ be some metric on X (X need not be Polish with respect to δ). • Let Λ(δ) be the set of Lipschitz functions from X to R with respect to the distance δ. For f ∈ Λ(δ), denote by Lip (f ), f ’s Lipschitz constant. Let Λ(1) (δ) = {f ∈ Λ(δ) / Lip (f ) ≤ 1}. • Let (Ω, A, P) be a probability space. Let X be a Polish space and δ be a distance on X . For any p ∈ [1, ∞], we say that a random variable X with values in X is Lp -integrable if, for some x0 in X , the real valued random variable δ(X, x0 ) belongs to Lp (P). Another type of function class will be used in this chapter: it is the class of functions with bounded variation on the real line. To be complete, we recall,

2.2. WEAK DEPENDENCE

11

Definition 2.1. A σ-ﬁnite signed measure is the diﬀerence of two positive σﬁnite measures, one of them at least being ﬁnite. We say that a function h from R to R is σ-BV if there exists a σ-ﬁnite signed measure dh such that h(x) = h(0) + dh([0, x[) if x ≥ 0 and h(x) = h(0) − dh([x, 0[) if x ≤ 0 (h is left continuous). The function h is BV if the signed measure dh is ﬁnite. Recall also the Hahn-Jordan decomposition: for any σ-ﬁnite signed measure μ, there is a set D such that μ+ (A) = μ(A ∩ D) ≥ 0,

−μ− (A) = μ(A\D) ≤ 0.

μ+ and μ− are mutually singular, one of them at least is ﬁnite and μ = μ+ −μ− . The measure |μ| = μ+ + μ− is called the total variation measure for μ. The total variation of μ writes as μ = |μ|(R). Now we are in position to introduce • BV1 the space of BV functions h : R → R such that dh ≤ 1.

2.2

Weak dependence

Let (Ω, A, P) be a probability space and let X be a Polish space. Let F= Fu and G = Gu , u∈N∗

u∈N∗

where Fu and Gu are two classes of functions from X u to R. u Definition 2.2. Let X and Y be two random variables with values in X and v X respectively. If Ψ is some function from F × G to R+ , deﬁne the F , G, Ψ dependence coeﬃcient (X, Y ) by

(X, Y ) =

sup

f ∈Fu g∈Gv

|Cov(f (X), g(Y ))| . Ψ(f, g)

(2.2.1)

Let (Xn )n∈Z be a sequence of X -valued random variables. Let Γ(u, v, k) be the set of (i, j) in Zu × Zv such that i1 < · · · < iu ≤ iu + k ≤ j1 < · · · < jv . The dependence coeﬃcient (k) is deﬁned by

(k) = sup

sup

u,v (i,j)∈Γ(u,v,k)

((Xi1 , . . . , Xiu ), (Xj1 , . . . , Xjv )) .

The sequence (Xn )n∈Z is (F , G, Ψ)-dependent if the sequence ( (k))k∈N tends to zero. If F = G we simply denote this as (F , Ψ)-dependence. Remark 2.1. Deﬁnition 2.2 above easily extends to general metric sets of indices T equipped with a distance δ (e.g. T = Zd yields the case of random ﬁelds). The set Γ(u, v, k) is then the set of (i, j) in T u × T v such that k = min {δ(i , jm ) / 1 ≤ ≤ u, 1 ≤ m ≤ v } .

12

CHAPTER 2. WEAK DEPENDENCE

2.2.1

η, κ, λ and ζ-coeﬃcients

In this section, we focus on the case where Fu = Gu . If f belongs to Fu , we deﬁne df = u. In a ﬁrst time, Fu is the set of bounded functions from X u to R, which are Lipschitz with respect to the distance δ1 on X u deﬁned by δ1 (x, y) =

u

δ(xi , yi ) .

(2.2.2)

i=1

In that case: • the coeﬃcient η corresponds to Ψ(f, g) = df g∞ Lip (f ) + dg f ∞ Lip (g) ,

(2.2.3)

• the coeﬃcient λ corresponds to Ψ(f, g) = df g∞ Lip (f ) + dg f ∞Lip (g) + df dg Lip (f )Lip (g) . (2.2.4) To deﬁne the coeﬃcients κ and ζ, we consider for Fu the wider set of functions from X u to R, which are Lipschitz with respect to the distance δ1 on X u , but which are not necessarily bounded. In that case we assume that the variables Xi are L1 -integrable. • the coeﬃcient κ corresponds to Ψ(f, g) = df dg Lip (f )Lip (g) ,

(2.2.5)

• the coeﬃcient ζ corresponds to Ψ(f, g) = min(df , dg )Lip (f )Lip (g) .

(2.2.6)

These coeﬃcients have some hereditary properties. For example, let h : X → R be a Lipschitz function with respect to δ, then if the sequence (Xn )n∈Z is η, κ, λ or ζ weakly dependent, then the same is true for the sequence (h(Xn ))n∈Z . One can also obtain some hereditary properties for functions which are not Lipschitz on the whole space X , as shown by Lemma 2.1 below, in the special case where X = Rk equipped with the distance δ(x, y) = max1≤i≤k |xi − yi |. Proposition 2.1 (Bardet, Doukhan, Le´on, 2006 [11]). Let (Xn )n∈Z be a sequence of Rk -valued random variables. Let p > 1. We assume that there exists some constant C > 0 such that max1≤i≤k Xi p ≤ C. Let h be a function from Rk to R such that h(0) = 0 and for x, y ∈ Rk , there exist a in [1, p[ and c > 0 such that |h(x) − h(y)| ≤ c|x − y|(|x|a−1 + |y|a−1 ) . We deﬁne the sequence (Yn )n∈Z by Yn = h(Xn ). Then,

2.2. WEAK DEPENDENCE

13

• if (Xn )n∈Z is η-weak dependent, then (Yn )n∈Z also, and p−a ηY (n) = O η(n) p−1 ; • if (Xn )n∈Z is λ-weak dependent, then (Yn )n∈Z also, and p−a λY (n) = O λ(n) p+a−2 . Remark 2.2. The function h(x) = x2 satisﬁes the previous assumptions with a = 2. This condition is satisﬁed by polynomials with degree a. Proof of Proposition 2.1. Let f and g be two real functions in Fu and Fv respectively. Denote x(M) = (x ∧ M ) ∨ (−M ) for x ∈ R. Now, for x = (x1 , . . . , xk ) ∈ (M) (M) Rk , we analogously denote x(M) = (x1 , . . . , xk ). Assume that (i, j) belong to the set Γ(u, v, r) deﬁned in Deﬁnition 2.2. Deﬁne Xi = (Xi1 , . . . , Xiu ) and Xj = (Xj1 , . . . , Xjv ). We then deﬁne functions F : Ruk → R and G : Rvk → R through the relations: (M)

(M)

(M)

(M)

• F (Xi ) = f (h(Xi1 ), . . . , h(Xiu )), F (M) (Xi ) = f (h(Xi1 ), . . . , h(Xiu )), • G(Xj ) = g(h(Xj1 ), . . . , h(Xjv )), G(M) (Xj ) = g(h(Xj1 ), . . . , h(Xjv )). Then: |Cov(F (Xi ), G(Xj ))|

≤ |Cov(F (Xi ), G(Xj ) − G(M) (Xj ))| +|Cov(F (Xi ), G(M) (Xj ))| ≤ 2f ∞ E|G(Xj ) − G(M) (Xj ))| +2g∞ E|F (Xi ) − F (M) (Xi )| +|Cov(F (M) (Xi ), G(M) (Xj ))|

But we also have from the assumptions on h and Markov inequality, E|G(Xj ) − G(M) (Xj ))|

≤

Lip g

v

(M)

E|h(Xjl ) − h(Xjl

)|

l=1

≤ ≤

2c Lip g

v E |Xjl |a 1|Xjl |>M , l=1 p

2c v Lip gC M a−p .

The same thing holds for F . Moreover, the functions F (M) : Ruk → R and G(M) : Rvk → R satisfy Lip F (M) ≤ 2cM a−1 Lip (f ) and Lip G(M) ≤ 2cM a−1

14

CHAPTER 2. WEAK DEPENDENCE

Lip (g), and F (M) ∞ ≤ f ∞ , G(M) ∞ ≤ g∞ . Thus, from the deﬁnition of weak dependence of X and the choice of i, j , we obtain respectively, if M ≥ 1 Cov F (M) (Xi ), G(M) (Xj ) ≤ 2c(uLip (f )g∞ + vLip (g)f ∞ )M a−1 η(r), ≤ 2c(df Lip (f )g∞ + dg Lip (g)f ∞ )M a−1 λ(r) + 4c2 df dg Lip (f )Lip (g)M 2a−2 λ(r). Finally, we obtain respectively, if M ≥ 1: |Cov(F (Xi ), G(Xj ))| ≤ ≤

2c(uLip f g∞ + vLip gf ∞ ) × M a−1 η(r) + 2C p M a−p , c(uLip f + vLip g + uvLip f Lip g) ×(M 2a−2 λ(r) + M a−p ).

Choosing M = η(r)1/(1−p) and M = λ(r)−1/(p+a−2) respectively, we obtain the result. In the deﬁnition of the coeﬃcients η, κ, λ and ζ, we assume some regularity conditions on Fu = Gu . In the case where the sequence (Xn )n∈Z is an adapted process with respect to some increasing ﬁltration (Mi )i∈Z , it is often more suitable to work without assuming any regularity conditions on Fu . In that case Gu is some space of regular functions and Fu = Gu . This last case is called the causal case. In the situations where both Fu and Gu are spaces of regular functions, we say that we are in the non causal case.

2.2.2

θ and τ -coeﬃcients

Let Fu be the class of bounded functions from Xu to R, and let Gu be the class of functions from Xu to R which are Lipschitz with respect to the distance δ1 deﬁned by (2.2.2). We assume that the variables Xi are L1 -integrable. • The coeﬃcient θ corresponds to Ψ(f, g) = dg f ∞ Lip (g) .

(2.2.7)

The coeﬃcient θ has some hereditary properties. For example, Proposition 2.2 below gives hereditary properties similar to those given for the coeﬃcients η and λ in Lemma 2.1. Proposition 2.2. Let (Xn )n∈Z be a sequence of Rk -valued random variables. We deﬁne the sequence (Yn )n∈Z by Yn = h(Xn ). The assumptions on (Xn )n∈Z and on h are the same as in Lemma 2.1. Then,

2.2. WEAK DEPENDENCE

15

• if (Xn )n∈Z is θ-weak dependent, (Yn )n∈Z also, and p−a θY (n) = O θ(n) p−1 . The proof of Proposition 2.2 follows the same line as the proof of Proposition 2.1 and therefore is not detailed. We shall see that the coeﬃcient θ deﬁned above belongs to a more general class of dependence coeﬃcients deﬁned through conditional expectations with respect to the ﬁltration σ(Xj , j ≤ i). Definition 2.3. Let (Ω, A, P) be a probability space, and M be a σ-algebra of A. Let X be a Polish space and δ a distance on X . For any Lp -integrable random variable X (see § 2.1) with values in X , we deﬁne θp (M, X) = sup{E(g(X)|M) − E(g(X))p / g ∈ Λ(1) (δ)}.

(2.2.8)

Let (Xi )i∈Z be a sequence of Lp -integrable X -valued random variables, and let (Mi )i∈Z be a sequence of σ-algebras of A. On X l , we consider the distance δ1 deﬁned by (2.2.2). The sequence of coeﬃcients θp,r (k) is then deﬁned by θp,r (k) = max ≤r

1 sup θp (Mi , (Xj1 , . . . , Xj )) . (i,j)∈Γ(1,,k)

(2.2.9)

When it is not clearly speciﬁed, we shall always take Mi = σ(Xk , k ≤ i). The two preceding deﬁnitions are coherent as proved below. Proposition 2.3. Let (Xi )i∈Z be a sequence of L1 -integrable X -valued random variables, and let Mi = σ(Xj , j ≤ i). According to the deﬁnition of θ(k) and to the deﬁnition 2.3, we have the equality θ(k) = θ1,∞ (k).

(2.2.10)

Proof of Proposition 2.3. The fact that θ(k) ≤ θ1,∞ (k) is clear since, for any f in Fu , g in Gv , and any (i, j) ∈ Γ(u, v, k), f (X , . . . , X ) g(X , . . . , X ) i1 iu j1 jv , Cov f ∞ vLip (g) g(X , . . . , X )

1

g(Xj1 , . . . , Xjv )

j1 jv ≤ E Miu − E

≤ θ1,∞ (k). v Lip (g) Lip (g) 1 To prove the converse inequality, we ﬁrst notice that θ(Mi , (Xj1 , . . . , Xjv ) = lim θ (Mk,i , (Xj1 , . . . , Xjv )) , k→−∞

(2.2.11)

16

CHAPTER 2. WEAK DEPENDENCE

where Mk,i = σ(Xj , k ≤ j ≤ i). Now, letting f (Xk , . . . , Xi ) = sign E(g(Xj1 , . . . , Xjv )|Mk,i ) − E(g(Xj1 , . . . , Xjv )) , we have that, for (i, j) in Γ(1, v, k) and g in Λ(1) (δ1 ), E(g(Xj1 , . . . , Xjv )|Mk,i ) − E(g(Xj1 , . . . , Xjv ))1 = Cov(f (Xk , . . . , Xi ), g(Xj1 , . . . , Xjv )) ≤ vθ(k) . We infer that

1 θ(Mk,i , (Xj1 , . . . , Xjv ) ≤ θ(k) v and we conclude from (2.2.11) that θ1,∞ (k) ≤ θ(k). The proof is complete. Having in view the coupling arguments in § 5.3, we now deﬁne a variation of the coeﬃcient (2.2.8) where we exchange the order of .p and the supremum. This is the same step as passing from α−mixing to β−mixing, which is known to ensure nice coupling arguments (see Berbee, 1979 [16]). Definition 2.4. Let (Ω, A, P) be a probability space, and M a σ-algebra of A. Let X be a Polish space and δ a distance on X . For any Lp −integrable (see § 2.1)) X -valued random variable X, we deﬁne the coeﬃcient τp by:

τp (M, X) = sup (2.2.12) g(x)PX|M (dx) − g(x)PX (dx)

g∈Λ(1) (δ) p

where PX is the distribution of X and PX|M is a conditional distribution of X given M. We clearly have θp (M, X) ≤ τp (M, X) .

(2.2.13)

Let (Xi )i∈Z be a sequence of Lp -integrable X -valued random variables. The coeﬃcients τp,r (k) are deﬁned from τp as in (2.2.9).

2.2.3

˜ α, ˜ β˜ and φ-coeﬃcients.

In the case where X = (Rd )r , we introduce some new coeﬃcients based on indicator of quadrants. Recall that if x and y are two elements of Rd , then x ≤ y if and only if xi ≤ yi for any 1 ≤ i ≤ d. Definition 2.5. Let X = (X1 , . . . , Xr ) be a (Rd )r -valued random variable and M a σ-algebra of A. For ti in Rd and x in Rd , let gti ,i (x) = 1x≤ti − P(Xi ≤ ti ). Keeping the same notations as in Deﬁnition 2.4, deﬁne for t = (t1 , . . . , tr ) in (Rd )r , r r gti ,i (xi )PX|M (dx) and LX (t) = E gti ,i (Xi ). LX|M (t) = i=1

Deﬁne now the coeﬃcients

i=1

2.2. WEAK DEPENDENCE

17

1. α ˜ (M, X) = sup LX|M(t) − LX (t)1 . t∈(Rd )r

˜ 2. β(M, X) = sup |LX|M (t) − LX (t)| . 1

t∈(Rd )r

˜ 3. φ(M, X) = sup LX|M(t) − LX (t)∞ . t∈(Rd )r

Remark 2.3. Note that if r = 1, d = 1 and δ(x, y) = |x − y|, then, with the above notation, τ1 (M, X) = LX|M (t)1 dt . The proof of this equality follows the same lines than the proof of the coupling property of τ1 (see Chapter 5, proof of Lemma 5.2). In the deﬁnition of the coeﬃcients θ and τ , we have used the class of functions Λ(1) (δ). In the case where d = 1, we can deﬁne the coeﬃcients α ˜ (M, X), ˜ ˜ β(M, X) and φ(M, X) with the help of bounded variation functions. This is the purpose of the following lemma: Lemma 2.1. Let (Ω, A, P) be a probability space, X = (X1 , . . . , Xr ) a Rr valued random variable and M a σ-algebra of A. If f is a function in BV1 , let f (i) (x) = f (x) − E(f (Xi )). The following relations hold:

r

r

(i)

(i) fi (Xi )M − E fi (Xi ) . 1. α ˜ (M, X) = sup

E

f1 ,...,fr ∈BV1

i=1

1

i=1

r

(i) ˜ 2. β(M, X) =

sup fi (xi ) PX|M − PX (dx) .

f1 ,...,fr ∈BV1

1

i=1

r r

(i) (i)

˜ 3. φ(M, X) = sup fi (Xi )|M − E fi (Xi ) .

E

f1 ,...,fr ∈BV1

i=1

i=1

∞

Remark 2.4. For r = 1 and d = 1, the coeﬃcient α(M, ˜ X) was introduced by Rio (2000, equation 1.10c [161]) and used by Peligrad (2002) [140], while τ1 (M, X) was introduced by Dedecker and Prieur (2004a) [45]. Let α(M, σ(X)), β(M, σ(X)) and φ(M, σ(X)) be the usual mixing coeﬃcients deﬁned respectively by Rosenblatt (1956) [166], Rozanov and Volkonskii (1959) [187] and Ibragimov (1962) [110]. Starting from Deﬁnition 2.5 one can easily prove that ˜ ˜ α ˜ (M, X) ≤ 2α(M, σ(X)), β(M, X) ≤ β(M, σ(X)), φ(M, X) ≤ φ(M, σ(X)).

18

CHAPTER 2. WEAK DEPENDENCE

Proof of Lemma 2.1. Let fi be a function in BV1 . Assume without loss of generality that fi (−∞) = 0. Then (i) fi (x) = − (1x≤t − P(Xi ≤ t)) dfi (t) . Hence, k

(i) fi (xi )PX|M (dx)

i=1

k

= (−1)

k

k gti ,i (xi )PX|M (dx) dfi (ti ) ,

i=1

i=1

and the same is true for PX instead of PX|M . From these inequalities and the fact that |dfi |(R) ≤ 1, we infer that sup

f1 ,...,fk ∈BV1

k k (i) (i) fi (xi )PX|M (dx) − fi (xi )PX (dx) i=1

i=1

≤ sup |LX|M (t) − LX (t)| . t∈Rr

The converse inequality follows by noting that x → 1x≤t belongs to BV1 . The following proposition gives the hereditary properties of these coeﬃcients. Proposition 2.4. Let (Ω, A, P) be a probability space, X an Rr -valued, random variable and M a σ-algebra of A. Let g1 , . . . , gr be any nondecreasing functions, and let g(X) = (g1 (X1 ), . . . , gr (Xr )). We have the inequalities α ˜ (M, g(X)) ≤ ˜ ˜ ˜ ˜ α ˜ (M, X), β(M, g(X)) ≤ β(M, X) and φ(M, g(X)) ≤ φ(M, X). In particu˜ (M, F (X)) = α ˜ (M, X), lar, if Fi is the distribution function of Xi , we have α ˜ ˜ ˜ ˜ β(M, F (X)) = β(M, X) and φ(M, F (X)) = φ(M, X). Notations 2.1. For any distribution function F , we deﬁne the generalized inverse as F −1 (x) = inf t ∈ R F (t) ≥ x . (2.2.14) For any non-increasing c` adl` ag function f : R → R we analogously deﬁne the generalized inverse f −1 (u) = inf{t/f (t) ≤ u}. Proof of Proposition 2.4. The fact that α ˜ (M, g(X)) ≤ α ˜ (M, X) is immediate, from its deﬁnition. We infer that α(M, ˜ F (X)) ≤ α ˜ (M, X). Applying the ﬁrst result once more, we obtain that α ˜ (M, F −1 (F (X))) ≤ α ˜ (M, F (X)). −1 To conclude, it suﬃces to note that F ◦ F (X) = X almost surely, so that ˜ α ˜ (M, X) ≤ α ˜ (M, F (X)). Of course, the same arguments apply to β(M, X) ˜ and φ(M, X). We now deﬁne the coeﬃcients α ˜ r (k), β˜r (k) and φ˜r (k) for a sequence of σalgebras and a sequence of Rd -valued random variables.

2.2. WEAK DEPENDENCE

19

Definition 2.6. Let (Ω, A, P) be a probability space. Let (Xi )i∈Z be a sequence of Rd -valued random variables, and let (Mi )i∈Z be a sequence of σ-algebras of A. For r ∈ N∗ and k ≥ 0, deﬁne α ˜ r (k) = max

sup

1≤l≤r (i,j)∈Γ(1,l,k)

α ˜ (Mi , (Xj1 , . . . , Xjl )) .

(2.2.15)

The coeﬃcients β˜r (k) and φ˜r (k) are deﬁned in the same way. When it is not clearly speciﬁed, we shall always take Mi = σ(Xk , k ≤ i).

2.2.4

Projective measure of dependence

Sometimes, it is not necessary to introduce a supremum over a class of functions. We can work with the simple following projective measure of dependence Definition 2.7. Let (Ω, A, P) be a probability space, and M a σ-algebra of A. Let p ∈ [1, ∞]. For any Lp −integrable real valued random variable deﬁne γp (M, X) = E(X|M) − E(X)p .

(2.2.16)

Let (Xi )i∈Z be a sequence of Lp −integrable real valued random variables, and let (Mi )i∈Z be a sequence of σ-algebras of A. The sequence of coeﬃcients γp (k) is then deﬁned by (2.2.17) γp (k) = sup γp (Mi , Xi+k ) . i∈Z

When it is not clearly speciﬁed, we shall always take Mi = σ(Xk , k ≤ i). Remark 2.5. Those coeﬃcients are deﬁned in Gordin (1969) [97], if p ≥ 2 and in Gordin (1973) [98] if p = 1. Mc Leish (1975a) [128] and (1975b) [129] uses these coeﬃcients in order to derive various limit theorems. Let us notice that γp (M, X) ≤ θp (M, X) .

(2.2.18)

Chapter 3

Models The chapter is organized as follows: we ﬁrst introduce Bernoulli shifts, a very broad class of models that contains the major part of processes derived from a stationary sequence. As an example, we deﬁne the class of Volterra processes that are multipolynomial transformation of the stationary sequence. We will discuss the dependence properties of Bernoulli shifts, whether the initial is a dependent or independent sequence. When the innovation sequence is independent, we will distinguish between causal and non-causal processes. After these general properties, we focus on Markov models and some of their extensions, as well as dynamical systems which may be studied as Markov chains up to a time reversal. After this we shall consider LARCH(∞)-models which are built by a mix of deﬁnition of Volterra series and Markov processes and will provide an attractive class of non linear and non Markovian times series. To conclude, we consider associated processes and we review some other types of stationary processes or random ﬁelds which satisfy some weak dependence condition.

3.1

Bernoulli shifts

Definition 3.1. Let H : RZ → R be a measurable function. Let (ξn )n∈Z be a strictly stationary sequence of real-valued random variables. A Bernoulli shift with innovation process (ξn )n∈Z is deﬁned as Xn = H ((ξn−i )i∈Z ) ,

n ∈ Z.

(3.1.1)

This sequence is strictly stationary. Remark that the expression (3.1.1) is certainly not always clearly deﬁned; as H is a function depending on an inﬁnite number of arguments, it is generally given in form of a series, which is usually only deﬁned in some Lp space. In order to 21

22

CHAPTER 3. MODELS

deﬁne (3.1.1) in a general setting, we denote for any subset J ⊂ Z: H (ξ−j )j∈J = H (ξ−j 1j∈J )j∈Z . For ﬁnite subsets J this expression is generally well deﬁned and simple to handle. In order to deﬁne such models in Lm we may assume ∞

ωm,n < ∞,

(3.1.2)

n=1

where, for some m ≥ 1: m m ωm,n = E H (ξ−j )|j|≤n − H (ξ−j )|j|
(3.1.3)

This condition indeed proves that the sequence H (ξ−j )|j|≤n has the Cauchy property and thus converges in the Banach space Lm of the classes of random variables with a ﬁnite moment order m. In fact the strict deﬁnition of the function H as an element of the space Lm (RZ , B(RZ ), μ) is the following. Denote by μ the distribution of a process ξ = (ξt )t∈Z . The measure μ is a probability distribution on the measurable space (RZ , B(RZ )). If as before we assume that ξ is stationary, that S ⊂ R is the support of the distribution of ξ0 , and that S ⊂ RZ is the support of the distribution of the sequence ξ, then the random variable H deﬁnes a function over S ⊃ S (Z) where S (Z) is the set of sequences with values 0 excepted for ﬁnitely many indices. Now given a function deﬁned over R(Z) the previous condition (3.1.2) ensures that such a function may be extended to a function H ∈ Lm (RZ , B(RZ ), μ). Dependence properties. No mixing properties have been derived for such models excepted for the simple case of m-dependent Bernoulli shifts, i.e. when H depends only on a ﬁnite number of variables.

3.1.1

Volterra processes

The most simple case of inﬁnitely dependent Bernoulli shift is the inﬁnite moving average process with independent innovations: Xt =

∞

ai ξt−i

(3.1.4)

−∞

This simple case is generalised by Volterra processes deﬁned with use of polynomials of the innovation process. A Volterra process is a stationary process

3.1. BERNOULLI SHIFTS

23

deﬁned through a convergent Volterra expansion Xt

= v0 +

∞

Vk;t ,

where

(3.1.5)

k=1

Vk;t

=

ak;i1 ,...,ik ξt−i1 · · · ξt−ik ,

(3.1.6)

i1 <···
and v0 denotes a constant and (ak;i1 ,...,ik )(i1 ,...,ik )∈Zk are real numbers for each k ≥ 1. Let p ≥ 1, then this expression converges in Lp , provided that E|ξ0 |p < ∞ and the weights satisfy ∞

p

|ak;i1 ,...,ik | < ∞.

k=1 i1 <···
If the sequences ak;i1 ,...,ik = 0 as i1 < 0 then the process is causal in the sense that Xt is measurable with respect to σ{ξi , i ≤ t}. In this case t may be seen as the usual time σ{ξi , i ≤ t} denotes the history at epoch t. Assume now that p = 2, E ξ0 = 0 and E ξi2 = 1, then the k−th order homogeneous chaotic processes Vk;t are pairwise orthogonal, and it is thus enough to prove the existence of such homogeneous processes (3.1.6) in L2 in order to obtain the existence of the more general Volterra processes (3.1.5). Normal convergence of Vk;t follows clearly from the convergence of the series deﬁning its variance Γ2k Γ2k = a2i1 ,...,ik−1 ,ik < ∞. 0≤i1 <···
For the general inﬁnite order Volterra series (3.1.5), the corresponding variance is trivially related by orthogonality: 2

Γ =

∞

Γ2k < ∞.

k=1

The formula deﬁning Volterra processes can be generalized to expansions Xt = v0 +

∞

Vk;t , where Vk;t =

(i1 ,...,ik

k=1

ak;i1 ,...,ik ξt−i1 · · · ξt−k ,

(3.1.7)

)∈Zk

and v0 denotes a constant and (ak;i1 ,...,ik )(i1 ,...,ik )∈Zk are real numbers for each k ≥ 1. The major diﬀerence with the preceding deﬁnition is the fact that the indices in the product are not all diﬀerent. Let p ≥ 1, then the series converges in Lp provided that the weights satisfy ∞ k=1

E|ξ0 |pk

i1 ≤···≤ik

p

|ak;i1 ,...,ik | < ∞.

(3.1.8)

24

CHAPTER 3. MODELS

The drawback of this generalization is the loss the properties of orthogonality and moments derived for the models (3.1.5). It is but possible to rewrite the process as a sum of orthogonal term by relaxing the condition of identical distribution and independence of the innovation process. Consider more general Volterra processes deﬁned under the same condition (3.1.8) by Vk;t =

(k)

(k,1)

(k,k)

aj1 ,...,jk ξt−j1 · · · ξt−jk

(3.1.9)

j1 <···<jk (k,l)

For a ﬁxed k > 0, the series (ξt )t∈Z are i.i.d. and mutually orthogonal for l ≤ k. Clearly, models (3.1.5) have this form but it is also interesting to see that models (3.1.7) may also be written as sums of such models. Consider an expansion (3.1.7), we may assume without loss of generality that j1 ≤ · · · ≤ jk and that Eξ0 = 0; we replace each power of an innovation variable by its decomposition on the Appell polynomial of the distribution of ξ0 . For example the squares will be replaced by 2 2 = (ξt−i − σ 2 ) + σ 2 = A2 (ξt−i ) + σ 2 . ξt−i

For higher order polynomials, recall that Appell polynomials (see e.g. Doukhan, k +· · · in such a way that EAk (ξt )P (ξt ) = 2002 [62]) are deﬁned as Ak (ξt−i ) = ξt−i 0 if the degree of the polynomial P is less than k. Replacing all the powers with the help of such Appell polynomial leads to a decomposition (3.1.9) in orthogonal terms. Dependence properties. Note that such models may have no weak dependence properties, as in the case of simple moving averages, see Doukhan, Oppenheim and Taqqu (2003) [72] for a thorough survey of strongly dependent Volterra processes. No mixing property have been derived in the general case. The degenerated case of m dependence, when Vt depends only on the ξt−i for i = 1 to m, so that only a ﬁnite number of coeﬃcients in each series are nonzero, satisﬁes any of the mixing properties. The mixing properties of causal linear processes corresponding to the term V1;t with ai1 = 0 when i1 < 0 were derived under the strong additional assumption that ξ0 ’s distribution admits a density which is itself an absolutely continuous function; see Doukhan, 1994 [61] for references, in this monograph the proof of mixing for non causal linear processes is not complete.

3.1.2

Noncausal shifts with independent inputs

Assume here that the shift is well deﬁned and that the sequence of innovations (ξt ) is i.i.d.

3.1. BERNOULLI SHIFTS

25

Dependence properties. In order to prove weak dependence properties, once the existence and measurability of the function H is ensured, it is sufﬁcient to assume the sequence {δr }r∈N deﬁned by: E H (ξt−j , j ∈ Z) − H ξt−j 1|j|≤r , j ∈ Z = δr (3.1.10) converges to 0 as r tends to inﬁnity. Note that a simple bound for δr is δr ≤

i>r ω1,i . The following elementary lemma is easily proved: Lemma 3.1. Bernoulli shifts are η−weakly dependent with η(r) ≤ 2δ[r/2] . Proof. (s)

(s)

(s)

Let Xn = H((ξn−i 1|i|≤s )). Clearly, the two sequences (Xn )n≤i and

(Xn )n≥i+r are independent if r > 2s; now consider Cov(f , g) for the functions f = f (Xi1 , . . . , Xiu ), g = g(Xj1 , . . . , Xjv ), where f and g are bounded and f, g ∈ Λ(1) (| · |1 ) with | · |1 deﬁned by (2.2.2). Let i1 ≤ · · · ≤ iu and j1 ≤ · · · ≤ jv (s) (s) such that j1 − iu > 2s. From the previous remark, f (s) = f (Xi1 , . . . , Xiu ) and (s)

(s)

g(s) = g(Xj1 , . . . , Xjv ) are independent, and consequently |Cov(f , g)| ≤ Cov(f − f (s) , g) + Cov(f (s) , g − g(s) ) ≤ 2g∞ E f − f (s) + 2f ∞E g − g(s) ≤ 2g∞ Lip f

u

v (s) (s) E Xit − Xit + 2f ∞Lip g E Xjt − Xjt

t=1

t=1

≤ 2(ug∞Lip f + vf ∞ Lip g)δs .

The sequence (δk )k is related to the modulus of uniform continuity of H. Under the following regularity conditions: ai |ui − vi |b , |H(ui , i ∈ Z) − H(vi , i ∈ Z)| ≤ i∈Z

0 < b ≤ 1 and if the sequence (ξi )i∈Z for some non negative constants (ai )i∈Z , has ﬁnite b-th order moment, then δk ≤ ai E|ξi |b . |i|>k

Recall here that processes can be η-weakly dependent and nonmixing, see § 1.5.

3.1.3

Noncausal shifts with dependent inputs

The condition of independent inputs ξ may be relaxed. E.g. in eqn. (3.1.4), instead of independence, assume that the sequence (ξn )n∈Z is ηξ -weak dependent

26

CHAPTER 3. MODELS

then the process (Xn )n∈Z is η-weak dependent with η(r) ≤ ηξ (r/2) + δr/2 . Such an heredity property of weak dependence is unknown under mixing. A general statement of this property is provided below in lemma 3.3. Let us now note by (ξi )i∈Z a weakly dependent innovation process. The coeﬃcient λ is proved to very useful to study Bernoulli shifts Xn = H(ξn−j , j ∈ Z) with weakly dependent innovation process (ξi )i from the forthcoming lemma (see Doukhan and Wintenberger, 2005 [77]). Let H : RZ → R be a measurable function and Xn = H(ξn−i , i ∈ Z). In order to deﬁne Xn , we assume that H satisﬁes: for each s ∈ Z, if x, y ∈ RZ satisfy xi = yi for each index i = s |H(x) − H(y)| ≤ bs (sup |xi |l ∨ 1)|xs − ys | i =s

(3.1.11)

where z is deﬁned by zs = 0 and zi = xi = yi for each i = s. This assumption is stronger than in the case of independent innovations (see equation (3.1.10)). The following lemma proves the existence of such models: Lemma 3.2. Let Xn = H(ξn−i , i ∈ Z) be a Bernoulli shift such that H : RZ → R l ≥ 0 and some sequence bs ≥ 0 such that

satisﬁes the condition (3.1.11) with m |s|b < ∞. Assume that E|ξ | < ∞ with lm + 1 < m for some m > 2. s 0 s Then Xn = H(ξn−i , i ∈ Z) is a strongly stationary process, well deﬁned in Lm . The existence of example (3.1.4) was stated without proof, we now precise more involved examples of Bernoulli shifts with dependent innovations: Example 3.1 (Volterra models with dependent inputs.). Consider H(x) =

K

(k)

aj1 ,...,jk xj1 · · · xjk ,

k=0 j1 ,...,jk

then if x, y are as in eqn. (3.1.11): (k) H(x)−H(y) = aj1 ,...,ju−1 ,s,ju+1 ,...,jk xj1 · · · xju−1 (xs −ys )xju+1 · · · xjk . 1 ≤ u ≤ k ≤ K j1 , . . . , ju−1 ju+1 , . . . , jk

From the triangular inequality we derive that suitable constants in condition (3.1.11) may be chosen as l = K − 1 and bs =

K (k,s)

(k)

|aj1 ,...,jk |,

k=1

(k,s)

stands for the sums over all indices in Zk and one of the indices where j1 , . . . , jk takes the value s.

3.1. BERNOULLI SHIFTS

27

Example 3.2 (Uniform Lipschitz Bernoulli shifts). Assume that the condition (3.1.11) holds with l = 0, then the previous result still holds. An example of such a situation is the case of LARCH(∞) non-causal processes with bounded (m = +∞) and dependent stationary innovations. Proof of lemma 3.2. We ﬁrst prove the existence of Bernoulli shift with dependent innovations in L1 . The same proof leads to the existence in Lm for all m ≥ 1 such that lm + 1 ≤ m . Here we set ξ (s) = (ξ−i 1|i|<s )i∈Z and (s)

ξ+ = (ξ−i 1−s
With (3.1.11) we obtain |H(ξ (1) ) − H(0)| ≤ |H(ξ

(s+1)

)−

(s) |H(ξ+ )

(s) H(ξ+ )|

− H(ξ

(s)

≤

| ≤

b0 |ξ0 |, b−s ( sup |ξ−i |l ∨ 1)|ξ−s |, −s
bs ( sup |ξ−i |l ∨ 1)|ξs |. |i|<s

H¨older inequality yields ∞ (s) (s) E H(ξ (1) ) − H(0) + E H(ξ (s+1) ) − H(ξ+ ) + E H(ξ+ ) − H(ξ (s) ) s=1

≤

2|i|bi (ξ0 1 + ξ0 l+1 l+1 ). (3.1.12)

i∈Z

Hence assumption l + 1 ≤ m with i∈Z |i|bi < ∞ together imply that the variable H(ξ) is well deﬁned. The same way proves that the process Xn = H(ξn−i , i ∈ Z) is a well deﬁned process in L1 and that it is strongly stationary. We can extend this result in Lm for all m ≥ 1 such that lm + 1 ≤ m . Dependence properties. Such models are proved to exhibit either λ- or η-weak dependence properties, as described below. Lemma 3.3. Assume that the conditions of lemma 3.2 are satisﬁed

28

CHAPTER 3. MODELS • if the innovation process (ξi )i∈Z is λ-weakly dependent (with coeﬃcients λξ (r)), then Xn is λ-weakly dependent with m −1−l λ(k) = c inf |i|bi ∨ (2r + 1)2 λξ (k − 2r) m −1+l . r≤[k/2]

i≥r

• if the innovation process (ξi )i∈Z is η-weakly dependent (with coeﬃcients ηξ (r)) then Xn is η-weakly dependent and there exists a constant c > 0 such that ⎛ ⎞ m −2 1+ ml−1 −1 ⎝ ⎠ m |i|bi ∨ (2r + 1) ηξ (k − 2r) η(k) = c inf . r≤[k/2]

i≥r

Because Bernoulli shifts of κ-weak dependent innovations are neither κ- nor ηweakly dependent, the case of κ dependent innovation is here included in that of λ dependent inputs. The proof of lemma 3.2 will be given below. If the weak dependence coeﬃcients of ξ are regularly decreasing, it is easy to explicit the decay of the weak dependence coeﬃcients of X: Proposition 3.1. Here λ > 0 and η > 0 are constants which can diﬀer in each case. • If bi = O i−b for some b > 2 and λξ (i) = O i−λ , resp. ηξ (i) = we O (i−η ) (as i ↑ ∞) then from a simple calculation, optimize both −1−l −λ(1− 2b ) m m −1+l terms in order to prove that λ(k) = O k , resp. η(k) =

O k

(b−2)(m −2) −η (b−1)(m −1)−l

. Note that in the case m = ∞ this exponent is ar-

bitrarily close to λ for large values of b > 0 and takes all possible values between 0 and λ. ηξ (i) = • If bi = O e−ib for some b > 0 andλξ (i) = O e−iλ , resp. b(m −1−l) −iη −λk b(m −1+l)+2η(m −1−l) O e , resp. (as i ↑ ∞) we have λ(k) = O k 2 e b(m −2) m −1−l −ηk b(m −1)+2η(m −2) η(k) = O k m −1 e . The geometrical decays for both (bi )i and coeﬃcients of the innovations ensure the geometric decay of the weakly dependence coeﬃcient of the Bernoulli shift. • If the Bernoulli shift have a geometric decay, say bi = O e−ib −λcoeﬃcients , resp. ηξ (i) = O (i−η ) (as i ↑ ∞) we ﬁnd and λξ (i) = O i

3.1. BERNOULLI SHIFTS

29

m −1−l m −2 l λ(k) = O (log k)2 k −λ m −1+l , resp. η(k) = O (log k)1+ m −1 k −η m −1 . If m = ∞ this means that we only lose at most a factor log2 k with respect to the dependence coeﬃcients of the input dependent series (ξi )i . Proof of lemma 3.3. We exhibit some Lipschitz function by using a convenient truncation. Write ξ = ξ ∨ (−T ) ∧ T for a truncation T which will be precisely (r) stated later. As in the proof of Lemma 3.1, we denote by Xn = H((ξn−i 1|i|≤r )) (r)

and X n = H((ξ n−i 1|i|≤s )). Furthermore, for any k ≥ 0 and any (u + v)-tuples such that s1 < · · · < su ≤ su + k ≤ t1 < · · · < tv , set Xs = (Xs1 , . . . , Xsu ), (r)

(r)

(r)

(r)

(r)

(r)

Xt = (Xt1 , . . . , Xtv ) and X s = (X s1 , . . . , X su ), X t = (X t1 , . . . , X tv ). Then we have for all f, g satisfying f ∞ , g∞ ≤ 1 and Lip f + Lip g < ∞: |Cov(f (Xs ), g(Xt ))| ≤ + +

(r)

|Cov(f (Xs ) − f (X s ), g(Xt ))|

(3.1.13)

(r) (r) |Cov(f (X s ), g(Xt ) − g(X t ))| (r) (r) |Cov(f (X s ), g(X t ))|.

(3.1.14) (3.1.15)

Using that g∞ ≤ 1, the term (3.1.13) in the sum is bounded by u (r) 2Lip f · E Xsi − X si i=1

≤ 2 u Lip f

+ max EXs(r) − X (r) . max EXsi − Xs(r) si i i

1≤i≤u

1≤i≤u

With the same arguments as in the proof of the existence of H(ξ (∞) ) (see equation (3.1.12)), the ﬁrst term in the right hand side is bounded by (ξ0 1 +

) 2|i|b ξ0 l+1 i . Notice now that if x, y are sequences with xi = yi = 0 if l+1 i≥s |i| ≥ r then a repeated application of the previous inequality (3.1.11) yields |H(x) − H(y)| ≤ L(xl∞ ∨ yl∞ ∨ 1)x − y

(3.1.16)

where L = i∈Z |i|bi < ∞. The second term is bounded by using (3.1.16): (r) (r) (r) ξ = E − H − X ξ E Xs(r) H si i ⎛ ⎞ l ≤ LE ⎝ max |ξi | {|ξj |1ξj ≥T }⎠ −r≤i≤r

≤ L(2r + 1)2 E 2

−r≤j≤r

max

−r≤i,j≤r

|ξi |l {|ξj |1|ξj |≥T }

l+1−m ≤ L(2r + 1) ξ0 m m T

30

CHAPTER 3. MODELS

The term (3.1.14) is analogously bounded. We write (3.1.15) as (r) (r) Cov(F (ξsi +j , 1 ≤ i ≤ u, |j| ≤ r), G (ξti +j , 1 ≤ i ≤ v, |j| ≤ r) , (r)

(r)

where F : Ru(2r+1) → R and G : Ru(2r+1) → R. If r ≤ [k/2], assume that ξ is η weakly dependent (resp. λ-weak dependent) to bound this covariance by (r)

(r)

ψ(Lip F , Lip G , u(2r + 1), v(2r + 1)) k−2r , where ψ(u, v, a, b) = uvab and

i = ηi (resp. ψ(u, v, a, b) = uvab + ua + vb and i = λi ). Let x = (x1 , . . . , xu ) and y = (y1 , . . . , yu ) where xi , yi ∈ R2r+1 ; a bound for Lip F supremum over sequences x, y of:

(r)

writes as the

|f (H(xsi +l , 1 ≤ i ≤ u, |l| ≤ r) − f (H(ysi +l , 1 ≤ i ≤ u, |l| ≤ r)|

u . j=1 xj − yj Using (3.1.16), we have: |F

(r)

(x) − F

(r)

(y)|

≤ Lip f L

u l xsi ∞ ∨ ysi ∞ ∨ 1 xsi − y si i=1

≤ Lip f LT l

u

|xsi +l − ysi +l |.

i=1 −r≤l≤r

Hence Lip F (r) ≤ Lip f · L · T l and, similarly, Lip G(r) ≤ Lip g · L · T l . • Under η-weak dependence, we bound the covariance as: |Cov(f (Xs ), g(Xt ))| ≤ (uLip f + vLip g) |i|bi (ξ0 1 + ξ0 l+1 × 4 l+1 ) i≥r

l+1−m + (2r + 1)L (2r + 1)2ξ0 m + T l ηξ (k − 2r) m T We then ﬁx the truncation T m −1 = 2(2r + 1)ξ0 m m ηξ (k − 2r) to obtain the result of the lemma 3.3 in the η-weak dependent case. • Under λ-weak dependence, we obtain: |Cov(f (Xs ), g(Xt ))| ≤ (uLip f + vLip g + uvLip f Lip g) |i|bi (ξ0 1 + ξ0 l+1 × 4 l+1 ) i≥r

l + (2r + 1)L 2(2r + 1)T l+1−m ξ0 m + T λ (k − 2r) ξ m ∨ (2r + 1)2 L2 T 2l λξ (k − 2r)

With the truncation such that T l+m −1 = 2ξ0 m m /(Lλξ (k − 2r)), we obtain the result of the lemma 3.3 in the present η-weak dependent case.

3.1. BERNOULLI SHIFTS

3.1.4

31

Causal shifts with independent inputs

Let (ξi )i∈Z be a stationary sequence of random variables with values in a measurable space X . Assume that there exists a function H deﬁned on a subset of X N , with values in R and such that H(ξ0 , ξ−1 , ξ−2 , . . .) is deﬁned almost surely. The stationary sequence (Xn )n∈Z deﬁned by Xn = H(ξn , ξn−1 , ξn−2 , . . .)

(3.1.17)

is called a causal function of (ξi )i∈Z . In this section, we assume that (ξi )i∈Z is i.i.d. In this causal case, another way to deﬁne a coupling coeﬃcient is to consider a non increasing sequence (δ˜p,n )n≥0 (p may be inﬁnite) such that ˜ n p , δ˜p,n ≥ Xn − X

(3.1.18)

˜ t = H(ξ˜t , ξ˜t−1 , ξ˜t−2 , . . .), ξ˜n = ξn if n > 0 and ξ˜n = ξ for n ≤ 0 for an where X n ˜ t has the same distribution as Xt independent copy (ξt )t∈Z of (ξt )t∈Z . Here X and is independent of M0 = σ(Xi , i ≤ 0). Dependence properties. In this section, we shall use the results of chapter 5 to give upper bounds for the coeﬃcients θp,∞ (n), τp,∞ (n), α ˜ k (n), β˜k (n) and ˜ φk (n). More precisely, we have that 1. θp,∞ (n) ≤ τp,∞ (n) ≤ δ˜p,n . 2. Assume that X0 has a continuous distribution function with modulus of continuity w. Let gp (y) = y(w(y))1/p . For any 1 ≤ p < ∞, we have α ˜ k (n) ≤ β˜k (n) ≤ 2k

δ˜p,n p. gp−1 (δ˜p,n )

In particular, if X0 has a density bounded by K, we obtain the inequality β˜k (n) ≤ 2k(K δ˜p,n )p/(p+1) . 3. Assume that X0 has a continuous distribution function, with modulus of uniform continuity w. Then α ˜k (n) ≤ β˜k (n) ≤ φ˜k (n) ≤ k w(δ˜∞,n ). 4. For φ˜k (n) it is sometimes interesting to use the coeﬃcient

= E(|Xn − Xn∗ |p |M0 )1/p δp,n ∞ .

With the same notations as in point 2, φ˜k (n) ≤ 2k

δp,n p. −1

gp (δp,n )

32

CHAPTER 3. MODELS

The point 1. can be proved by using Lemma 5.3. The points 2. 3. and 4. can be proved as the point 3. of Lemma 5.1, by using Proposition 5.1 and Markov inequality. aj ξn−j . One Application (causal linear processes). In that case Xn =

can take δp,n = 2ξ0 p

j≥0

|aj |. For p = 2 and Eξ0 = 0 one may set

j≥n

12

= 2Eξ02 a2j . δ2,n j≥n

For instance, if ai = 2−i−1 and ξ0 ∼ B(1/2), then δi,∞ ≤ 2−i . Since X0 is uniformly distributed over [0, 1], we have φ˜1 (i) ≤ 2−i . Recall that this sequence is not strongly mixing (see section 1.5). Remark 3.1. By interpreting causal Bernoulli shifts as physical systems, denoted Xt = g(. . . , t−1 , t ) Wu (2005) [188] introduces physical dependence coefﬁcients quantifying the dependence of outputs (Xt ) on inputs ( t ). He considers the nonlinear system theory’s coeﬃcient δ¯t = g(. . . , 0 , . . . , t−1 , t ) − g(. . . , −1 , 0 , . . . , t−1 , t )2 with an independent copy of . This provides a sharp framework for the study of the question of CLT random processes and shed new light on a variety of problems including estimation of linear models with dependent errors in Wu (2006) [191], nonparametric inference of time series in Wu (2005) [192], representations of sample quantiles (Wu 2005 [189]) and spectral estimation (Wu 2005 [190]) among others. This speciﬁc L2 -formulation is rather adapted to CLT and it is not directly possible to compare it with τ -dependence because coupling is given here with only one element in the past. Justiﬁcation of the Bernoulli shift representation follows from Ornstein (1973) [138].

3.1.5

Causal shifts with dependent inputs

In this section, the innovations are not required to be i.i.d., but the method introduced in the preceding section still works. More precisely, assume that there exists a stationary sequence (ξi )i∈Z distributed as (ξi )i∈Z and independent ˜ n = H(ξ , ξ , ξ , . . .). Clearly X ˜ n is independent of of (ξi )i≤0 . Deﬁne X n n−1 n−2 M0 = σ(Xi , i ≤ 0) and distributed as Xn . Hence one can apply the result of Lemma 5.3: if (δ˜p,n )n≥0 is a non increasing sequence satisfying (3.2.2), then the upper bounds 1. 2. 3. and 4. of the preceding section hold. In particular, these results apply to the case where the sequence (ξi )i∈Z is βmixing. According to Theorem 4.4.7 in Berbee (1979) [16], if Ω is rich enough,

3.2. MARKOV SEQUENCES

33

there exists (ξi )i∈Z distributed as (ξi )i∈Z and independent of (ξi )i≤0 such that P(ξi = ξi for some i ≥ k) = β(σ(ξi , i ≤ 0), σ(ξi , i ≥ k)).

Application (causal linear processes). In that case Xn = j≥0 aj ξn−j . For any p ≥ 1, we can take the non increasing sequence (δ˜p,n )n≥0 such that δ˜p,n ≥ ξ0 − ξ0 p

|aj | +

n−1

|aj |ξi−j − ξi−j p ≥

j=0

j≥n

|aj |ξi−j − ξi−j p .

j≥0

From Proposition 2.3 in Merlev`ede and Peligrad (2002) [130], one can take δ˜p,n ≥ ξ0 − ξ0 p

j≥n

|aj | +

n−1

|aj | 2p

j=0

β(σ(ξk ,k≤0),σ(ξk ,k≥i−j))

0

1/p Qpξ0 (u) du ,

where Qξ0 is the generalized inverse of the tail function x → P(|ξ0 | > x) (see Lemma 5.1 for the precise deﬁnition).

3.2

Markov sequences

Let (Xn )n≥1−d be sequence of random variables with values in a Banach space (B, · ). Assume that Xn satisﬁes the recurrence equation Xn = F (Xn−1 , . . . , Xn−d ; ξn ).

(3.2.1)

where F is a measurable function with values in B, the sequence (ξn )n>0 is i.i.d. and (ξn )n>0 is independent of (X0 , . . . , Xd−1 ). Note that if Xn satisﬁes (3.2.1) then the random variable Yn = (Xn , . . . , Xn−d+1 ) deﬁnes a Markov chain such that Yn = M (Yn−1 ; ξn ) with M (x1 , . . . , xd ; ξ) = (F (x1 , . . . , xd ; ξ), x1 , . . . , xd−1 ). Dependence properties. Assume that (Xn )n≤d−1 is a stationary solution ˜0 , . . . , X ˜ 1−d ) to (3.2.1). As previously, let Y0 = (X0 , . . . , X1−d ), and let Y˜0 = (X be and independent vectors with the same law as Y0 (that is a distribution ˜ n = F (X ˜ n−1 , . . . , X ˜ n−d ; ξn ). Clearly, for n > 0, invariant by M ). Let then X ˜ n is distributed as Xn and independent of M0 = σ(Xi , 1 − d ≤ i ≤ 0). As X in the preceding sections, let (δ˜p,n )n≥0 (p may be inﬁnite) be a non increasing sequence such that ˜ n p )1/p . (3.2.2) δ˜p,n ≥ (EXn − X Applying Lemma 5.3, we infer that θp,∞ (n) ≤ τp,∞ (n) ≤ δ˜p,n .

34

CHAPTER 3. MODELS

For the coeﬃcients α ˜ k (n), β˜k (n) and φ˜k (n) in the case B = R, the upper bounds 2., 3., and 4. of section 3.1.4 also hold under the same conditions on the distribution function of X0 . Assume that, for some p ≥ 1, the function F satisﬁes 1

(EF (x; ξ1 ) − F (y; ξ1 )p ) p ≤

d

ai xi − yi ,

i=1

d

ai < 1.

(3.2.3)

i=1

Then one can prove that the coeﬃcient τp,∞ (n) decreases at an exponential rate. Indeed, we have that δ˜p,n ≤

d

ai δ˜p,n−i .

i=1

For two vectors x, y in R , we write x ≤ y if xi ≤ yi for any 1 ≤ i ≤ d. Using this notation, we have that (δ˜p,n , . . . , δ˜p,n−d+1 )t ≤ A(δ˜p,n−1 , . . . , δ˜p,n−d )t , d

with the matrix A equal to ⎛ a1 ⎜1 ⎜ ⎜0 ⎜ ⎜· ⎜ ⎜· ⎜ ⎝· 0

a2 0 1 · · · 0

· · · · · · ·

· · · · · · ·

· · · · · · ·

ad−1 0 0 · · · 1

⎞ ad 0⎟ ⎟ 0⎟ ⎟ ·⎟ ⎟ ·⎟ ⎟ ·⎠ 0

Iterating this inequality, we obtain that (δ˜p,n , . . . , δ˜p,n−d+1 )t ≤ An (δ˜p,0 , . . . , δ˜p,1−d )t .

d Since i=1 ai < 1, the matrix A has a spectral radius strictly smaller than 1. Hence, we obtain that there exists C > 0 and ρ in [0, 1[ such that δ˜p,n ≤ Cρn . Consequently θp,∞ (n) ≤ τp,∞ (n) ≤ Cρn . If B = R and the condition (3.2.3) holds for p = 1, and if the distribution function FX of X0 is such that |FX (x) − FX (y)| ≤ K|x − y|γ for some γ in ]0, 1], then we have the upper bound α ˜ k (n) ≤ β˜k (n) ≤ 2kK 1/(γ+1)C γ/(γ+1) ρnγ/(γ+1) . If the condition (3.2.3) holds for p = ∞, then α ˜ k (n) ≤ β˜k (n) ≤ φ˜k (n) ≤ kKC α ργn . We give below some examples of the general situation described in this section.

3.2. MARKOV SEQUENCES

3.2.1

35

Contracting Markov chain.

For simplicity, let B = Rd , and let · be a norm on Rd . Let F be a Rd valued function and consider the recurrence equation Xn = F (Xn−1 ; ξn ).

(3.2.4)

Assume that Ap = EF (0; ξ1 )p < ∞ and EF (x, ξ1 ) − F (y, ξ1 )p ≤ ap x − yp , (3.2.5) for some a < 1 and p ≥ 1. Duﬂo (1996) [81] proves that condition (3.2.5) implies that the Markov chain (Xi )i∈N has a stationary law μ with ﬁnite moment of order p. In the sequel, we suppose that μ is the distribution of X0 (i.e. the Markov chain is stationary). Bougerol (1993) [27] and Diaconis and Friedmann (1999) [60] provide a wide variety of examples of stable Markov chains, see also Ango-Nz´e and Doukhan (2002) [7]. Dependence properties. Mixing properties may be derived for Markov chains (see the previous references and Mokkadem (1990) [132]), but this property always need an additional regularity assumption on the innovations, namely the innovations must have some absolutely continuous component. By contrast, no assumption on the distribution of ξ0 is necessary to obtain a geometrical decay of the coeﬃcient τp,∞ (n). More precisely, arguing as in the previous section, ˜ 0 is independent of X0 and distributed as X0 , one has the upper bounds: if X ˜ 0 − X 0 p an . θp,∞ (n) ≤ τp,∞ (n) ≤ X In the same way, if each component of X0 has a distribution function which is H¨older, then the coeﬃcients αk (n) and βk (n) decrease geometrically (see lemma 5.1). Let us show now that contractive Markov chains can be represented as Bernoulli shifts in a general situation when Xt and ζt take values in Euclidean spaces Rd and RD , respectively, d, D ≥ 1 with · denoting indiﬀerently a norm on RD or on Rd . Any homogeneous Markov chain Xt may also be represented as solution of a recurrence equation Xn = F (Xn−1 , ξn )

(3.2.6)

where F (u, z) is a measurable function and (ξn )n>0 is an i.i.d. sequence independent of X0 , see e.g. Kallenberg (1997, Proposition 7.6) [111]. Proposition 3.2 (Stable Markov chains as Bernoulli shifts). The stationary iterative models (3.2.6) are Bernoulli shifts (3.1.17) if condition (3.2.5) holds.

36

CHAPTER 3. MODELS

N Proof. We denote by μ the distribution of ζ = (ζn )n∈N on RD . We deﬁne the space Lp (μ, Rd ) of Rd -valued Bernoulli shifts functions H such that H(ζ0 , ζ1 , . . .) ∈ Lp . We shall prove as in Doukhan and Truquet (2006) [76] that the operator of Lp (μ, Rd ) Φ : H → K with K(ζ0 , ζ1 , . . .) = M (H(ζ1 , ζ2 , . . .), ζ0 ) satisﬁes the contraction principle. Then, Picard ﬁxed point theorem will allows to conclude. We ﬁrst mention that the condition (3.2.5) implies M (x, ζ0 )p ≤ A + ax hence with independence of the sequence ζ this yields Kp ≤ A + aHp ; thus Φ(Lp (μ, Rd )) ⊂ Lp (μ, Rd ). Now for H, H ∈ Lp (μ, Rd ) we also derive with analogous arguments that Φ(H) − Φ(H )p ≤ aH − H p . Remark 3.2. It is also possible to derive the Bernoulli shift representation through a recursive iteration in the autoregressive formula.

3.2.2

Nonlinear AR(d) models

For simplicity, let B = R. Autoregressive models of order d are models such that: Xn = r(Xn−1 , . . . , Xn−d ) + ξn . (3.2.7) In such a case, the function F is given by F (u1 , . . . , ud , ξ) = r(u1 , . . . , ud ) + ξ, Assume that E|ξ1 |p < ∞ and that |r(u1 , . . . , ud ) − r(v1 , . . . , vd )| ≤

d

ai |ui − vi |

i=1

for some a1 , . . . , ad ≥ 0 such that a =

d

ai < 1. Then the condition (3.2.3)

i=1

holds, and we infer from section 3.2 that the coeﬃcients τp,∞ (n) decrease exponentially fast.

3.2.3

ARCH-type processes

For simplicity, let B = R. Let F (u, z) = A(u) + B(u)z

(3.2.8)

for suitable Lipschitz functions A(u), B(u), u ∈ R. The corresponding iterative model (3.2.6) satisﬁes (3.2.5) if a = Lip (A) + ξ1 p Lip (B) < 1.

3.2. MARKOV SEQUENCES

37

If p = 2 and E(ξn ) = 0, it is suﬃcient to assume that a=

" 2 2 (Lip (A)) + E(ξ12 ) (Lip (B)) < 1.

Some examples of iterative Markov processes given by functions of type (3.2.8) are: • nonlinear AR(1) processes (case B ≡ 1); • stochastic volatility models (case A ≡ 0); # • classical ARCH(1) models (case A(u) = αu, B(u) = β + γu2 , α, β, γ ≥ 0). In the last example, the inequality (3.2.5) holds for p = 2 with a2 = α2 + Eξ02 γ. A general description of these models can be found in Section 3.4.2.

3.2.4

Branching type models

(1) (D) . Let now A1 , . . . , AD Here B = R and ξn is RD -valued. Let ξn = ξn , . . . , ξn be Lipschitz functions from R to R, and let D Aj (u)z (j) , F u, z (1) , . . . , z (D) = j=1 (i) (j)

for (z (1) , . . . , z (D) ) ∈ RD . For such functions F , if E(ξ1 ξ1 ) = 0 for i = j, the relation (3.2.5) holds with p = 2 if a2 =

D

2 (j) 2 < 1. (Lip (Aj )) E ξ0

j=1

Some examples of this situation are (1)

p) is a Bernoulli variable independent of a centered • If D = 2, and ξ1 ∼ b(¯ (2) 2 variable ξ1 ∈ L and A1 (u) = u, A2 (u) = 1 then the previous relations hold if p < 1. (1)

(2)

• If D = 3, ξ1 = 1 − ξ1 ∼ b(p) is independent of a centered variable (3) ξ1 ∈ L2 , then one obtains usual threshold models if A3 ≡ 1. (3) This only means that Xn = Fn (Xn−1 ) + ξn where Fn is an i.i.d. se(3) quence, independent of the sequence (ξn )n≥1 , and such that Fn = A1 with probability p and Fn = A2 , else. The condition (3.2.5) with p = 2 writes here a = p (Lip (A1 ))2 + (1 − p) (Lip (A2 ))2 < 1.

38

3.3

CHAPTER 3. MODELS

Dynamical systems

Let I = [0, 1], T be a map from I to I and deﬁne Xi = T i . If μ is invariant by T , the sequence (Xi )i≥0 of random variables from (I, μ) to I is strictly stationary. Denote by g1,λ the L1 -norm with respect to the Lebesgue measure λ on I and by ν = |ν|(I) the total variation of ν. Covariance inequalities. In many interesting cases, one can prove that, for any BV function h and any k in L1 (I, μ), |Cov(h(X0 ), k(Xn ))| ≤ an k(Xn )1 (h1,λ + dh) ,

(3.3.1)

for some non increasing sequence an tending to zero as n tends to inﬁnity. Note that if (3.3.1) holds, then |Cov(h(X0 ), k(Xn ))|

= ≤

|Cov(h(X0 ) − h(0), k(Xn ))| an k(Xn )1 (h − h(0)1,λ + dh) .

Since h − h(0)1,λ ≤ dh, we obtain that |Cov(h(X0 ), k(Xn ))| ≤ 2an k(Xn )1 dh .

(3.3.2)

The associated Markov chain. Deﬁne the operator L from L1 (I, λ) to L1 (I, λ) via the equality 0

1

L(h)(x)k(x)λ(dx) =

0

1

h(x)(k ◦ T )(x)λ(dx)

where h ∈ L1 (I, λ) and k ∈ L∞ (I, λ). The operator L is called the PerronFrobenius operator of T . Assume that μ is absolutely continuous with respect to the Lebesgue measure, with density fμ . Let I ∗ be the support of μ (that is (I ∗ )c is the largest open set in I such that μ((I ∗ )c ) = 0) and choose a version of fμ such that fμ > 0 on I ∗ and fμ = 0 on (I ∗ )c . Note that one can always choose L such that L(fμ h)(x) = L(fμ h)(x)1fμ (x)>0 . Deﬁne a Markov kernel associated to T by K(h)(x) =

L(fμ h)(x) 1fμ (x)>0 + μ(h)1fμ (x)=0 . fμ (x)

(3.3.3)

It is easy to check (see for instance Barbour et al.(2000) [9]) that (X0 , X1 , . . . , Xn ) has the same distribution as (Yn , Yn−1 , . . . , Y0 ) where (Yi )i≥0 is a stationary Markov chain with invariant distribution μ and transition kernel K. Spectral gap. In many interesting cases, the spectral analysis of L in the Banach space of BV -functions equipped with the norm hv = dh + h1,λ

3.3. DYNAMICAL SYSTEMS

39

can be done by using the Theorem of Ionescu-Tulcea and Marinescu (see Lasota and Yorke (1974) [115]): assume that 1 is a simple eigenvalue of L and that the rest of the spectrum is contained in a closed disk of radius strictly smaller than one. Then there exists an unique T -invariant absolutely continuous probability μ whose density fμ is BV , and Ln (h) = λ(h)fμ + Ψn (h)

(3.3.4)

with Ψ(fμ ) = 0 and Ψn (h)v ≤ Dρn hv , for some 0 ≤ ρ < 1 and D > 0. Assume moreover that

1

1fμ >0 = γ < ∞. (3.3.5)

fμ v Starting from (3.3.3), we have that K n (h) = μ(h) +

Ψn (hfμ ) 1fμ >0 . fμ

Let · ∞,λ be the essential sup with respect to λ. Taking C1 = 2Dγ(dfμ + 1), we obtain K n (h) − μ(h)∞,λ ≤ C1 ρn hv . This estimate implies (3.3.1) with an = C1 ρn . Indeed, |Cov(h(X0 ), k(Xn ))|

= ≤

|Cov(h(Yn ), k(Y0 ))| k(Y0 )(E(h(Yn )|σ(Y0 )) − E(h(Yn )))1

≤

k(Y0 )1 K n (h) − μ(h)∞,λ

≤

C1 ρn k(Y0 )1 (dh + h1,λ ) .

Moreover, we also have that dK n (h) = dK n (h − h(0))

≤ 2γΨn (fμ (h − h(0)))v ≤ 8Dρn γ(1 + dfμ )dh .

(3.3.6)

Dependence properties. If (3.3.2) holds, the upper bound ˜ φ(σ(X n ), X0 ) ≤ 2an follows from the following lemma. Lemma 3.4. Let (Ω, A, P) be a probability space, X a real-valued random variable and M a σ-algebra of A. We have the equality ˜ φ(M, X) = sup |Cov(Y, h(X))| Y is M-measurable, Y 1 ≤ 1 and h ∈ BV1 .

40

CHAPTER 3. MODELS

Proof of Lemma 3.4. Write ﬁrst |Cov(Y, h(X))| = |E(Y (E(h(X)|M)−E(h(X))))|. For any positive ε, there exists Aε in M such that P(Aε ) > 0 and for any ω in Aε , |E(h(X)|M)(ω) − E(h(X))| > E(h(X)|M) − E(h(X))∞ − ε. Deﬁne the random variable Yε :=

1Aε sign (E(h(X)|M) − E(h(X))) . P(Aε )

Yε is M-measurable, E|Yε | = 1 and |Cov(Yε , h(X))| ≥ E(h(X)|M) − E(h(X))∞ − ε . This is true for any positive ε, we infer from Deﬁnition 2.5 that ˜ φ(M, X) ≤ sup{|Cov(Y, h(X))| / Y is M-measurable, Y 1 ≤ 1 and h ∈ BV1 }. The converse inequality follows straightforwardly from Deﬁnition 2.5. Now, if (3.3.4) and (3.3.5) hold, we have that: for any n ≥ il > · · · > i1 ≥ 0, i1 ˜ φ(σ(X k , k ≥ n), Xn−i1 , . . . , Xn−il ) ≤ C(l)ρ ,

for some positive constant C(l). This is a consequence of the following lemma by using the upper bound (3.3.6). Lemma 3.5. Let (Yi )i≥0 be a real-valued Markov chain with transition kernel K. Assume that there exists a constant C such that for any BV function f and any n > 0,

dK n (f ) ≤ Cdf .

(3.3.7)

Then, for any il > · · · > i1 ≥ 0, l−1 ˜ ˜ φ(σ(Y )φ(σ(Yk ), Yk+i1 ) . k ), Yk+i1 , . . . , Yk+il ) ≤ (1 + C + · · · + C

Consequently, if (3.3.4) and (3.3.5) hold, the coeﬃcients φ˜k (i) of the associated Markov chain (Yi )i≥0 satisfy: for any k > 0, φ˜k (i) ≤ C(k)ρi . Proof of Lemma 3.5. We only give the proof for two points i1 = i and i2 = j, the general case being similar. Let fk (x) = f (x) − E(f (Yk )). We have, almost surely, E(fk+i (Yk+i )gk+j (Yk+j )|Yk ) − E(fk+i (Yk+i )gk+j (Yk+j )) = E(fk+i (Yk+i )(K j−i (g))k+i (Yk+i )|Yk ) − E(fk+i (Yk+i )(K j−i (g))k+i (Yk+i )).

3.3. DYNAMICAL SYSTEMS

41

Let f and g be two functions in BV1 . It is easy to see that d((K j−i (g))k+i fk+i )

≤

dfk+i (K j−i (g))k+i ∞ + d(K j−i (g))k+i fk+i ∞

≤

(1 + d(K j−i (g))k+i ).

Hence, applying (3.3.7), the function (K j−i (g))k+i fk+i /(1 + C) belongs to BV1 . The result follows from Deﬁnition 2.5. Application: uniformly expanding maps. A large class of uniformly expanding maps T is given in Broise (1996) [31], Section 2.1, page 11. If Broise’s conditions are satisﬁed and if T is mixing in the ergodic-theoretic sense, then the Perron-Frobenius operator L satisﬁes the assumption (3.3.4). Let us recall some well know examples (see Section 2.2 in Broise): 1. T (x) = βx − [βx] for β > 1. These maps are called β-transformations. 2. I is the ﬁnite union of disjoints intervals (Ik )1≤k≤n , and T (x) = ak x + bk on Ik , with |ak | > 1. 3. T (x) = a(x−1 − 1) − [a(x−1 − 1)] for some a > 0. For a = 1, this transformation is known as the Gauss map. Remark 3.3. Expanding maps with a neutral fixed point. For some maps which are non uniformly expanding, in the sense that there exists a point for which the right (or left) derivative of T is equal to 1, Young (1999) [194] gives some sharp upper bounds for the covariances of H¨ older functions of T n . For instance, let us consider the maps introduced by Liverani et al. (1999) [123]: $ x(1 + 2γ xγ ) if x ∈ [0, 1/2[ for 0 < γ < 1, T (x) = 2x − 1 if x ∈ [1/2, 1] , for which there exists a unique invariant probability μ. Contrary to uniformly ˜ expanding maps, these maps are not φ-dependent. For λ ∈]0, 1], let δλ (x, y) = (λ) λ |x−y| and let θ1 be the coeﬃcient associated to the distance δλ (see deﬁnition 2.3). Starting from the upper bounds given by Young, one can prove that there exist some positive constants C1 (λ, γ) and C2 (λ, γ) such that C1 (λ, γ) n

γ−1 γ

(λ)

≤ θ1 (σ(T n ), T ) ≤

C2 (λ, γ) n

γ−1 γ

.

Approximating the indicator function fx (t) = 1x≤t by λ-H¨ older functions for small enough λ, one can prove that for any > 0, there exist some positive constant C3 ( , γ) such that C1 (1, γ) n

γ−1 γ

≤ α ˜ (σ(T n ), T ) ≤

C3 ( , γ) n

γ−1 γ −

.

42

CHAPTER 3. MODELS

3.4

Vector valued LARCH(∞) processes

A vast literature is devoted to the study of conditionally heteroskedastic models. One of the best-known model is the GARCH model (Generalized Autoregressive Conditionally Heteroskedastic) introduced by Engle (1982) [84] and Bollerslev (1986) [23]. A usual GARCH(p, q) model can be written: rt = σt ξt ,

σt2

= α0 +

p

2 βi σt−i

i=1

+

q

2 αj rt−j

j=1

where α0 ≥ 0, βi ≥ 0, αj ≥ 0, p ≥ 0, q ≥ 0 are the model’s parameters and the ξt are i.i.d. If the βi are null, we have an ARCH(q) model which can be extended in LARCH(∞) model (see Robinson, 1991 [165], Giraitis, Kokozska and Leipus, 2000 [92]). These models are often used in ﬁnance because their properties are close to the properties observed on empirical ﬁnancial data such as volatility clustering, white noise behaviour or autocorrelation of the squares of those series. To reproduce other properties of the empirical data, such as leverage effect, a lot of extensions of the GARCH model have been introduced: EGARCH, TGARCH. . . A simple equation in terms of a vector valued process allows simpliﬁcations in the deﬁnition of various ARCH type models Let (ξt )t∈Z be an i.i.d. sequence of random d × m-matrices, (aj )j∈N∗ be a sequence of m × d matrices, and a be a vector in Rm . A vector valued LARCH(∞) model is a solution of the recurrence equation ⎛ ⎞ ∞ Xt = ξt ⎝a + (3.4.1) aj Xt−j ⎠ j=1

Some examples of LARCH(∞) models are now provided. Even if standard LARCH(∞) models simply correspond to the case of real valued Xt and aj , general LARCH(∞) models include a large variety of models, such as 1. Bilinear models, precisely addressed in the forthcoming subsection 3.4.2. They are solution of the equation: ⎛ ⎞ ∞ ∞ αj Xt−j ⎠ + β + βj Xt−j Xt = ζt ⎝α + j=1

j=1

where the variables are real valued and ζt is the innovation. This is easy to see that such models take the form with = 2 and d = 1: write previous m α αj and aj = for this ξt = ζt 1 , a = for j = 1, 2, . . . β βj

3.4. VECTOR VALUED LARCH(∞) PROCESSES

43

2. GARCH(p, q) models, are deﬁned by ⎧ ⎨ rt = σt εt q q

2 2 2 βj σt−j +γ+ γj rt−j ⎩ σt = j=1

j=1

where α0 > 0, αj 0, βi 0 (and the variables ε are centered at expectation). as bilinear

They may

be written

models: for this set α0 = γ0 /(1 − βi ) and αi z i = γi z i /(1 − βi z i ) (see Giraitis et al.(2006) [93]). 3. ARCH(∞) processes, given by equations, ⎧ ⎨ rt = σt εt ∞

2 βj ε2t−j ⎩ σt = β0 + j=1

They may be written as bilinear models: set ξt = κβj with λ1 = E(ε20 ), κ2 = Var (ε20 ). aj = λ1 βj

εt

κβ0 1 , a= , λ1 β0

4. Models with vector valued innovations ⎧ ∞ ∞ ∞ ⎪

⎪ 1 1 1 1 1 1 ⎪ αj Xt−j + μt β + βj Xt−j + γ 1 + γj1 Xt−j ⎪ ⎨Xt = ζt α + j=1 j=1 j=1 ∞ ∞ ∞ ⎪

⎪ ⎪ α2j Yt−j + μ2t β 2 + βj2 Yt−j + γ 2 + γj2 Yt−j ⎪Yt = ζt2 α2 + ⎩ j=1

j=1

j=1

may clearly be written as LARCH(∞) models with now m = d = 2.

3.4.1

Chaotic expansion of LARCH(∞) models

We provide suﬃcient conditions for the following chaotic expansion ⎞ ⎛ ∞ aj1 ξt−j1 aj2 . . . ajk ξt−j1 −···−jk a⎠ . Xt = ξt ⎝a + k=1j1 ,...,jk 1

For a k × l matrix M , and · a norm on Rk , let as usual M =

sup

M x .

x∈Rl ,x≤1

For a random matrix M we set M pp = E(M p ).

(3.4.2)

44

CHAPTER 3. MODELS

Theorem 3.1. Assume that either

aj p Eξ0 p < 1 for some p ≤ 1, or

j≥1

aj (Eξ0 ) p

1 p

< 1 for p > 1. Then one stationary of solution of eqn.

j≥1

(3.4.1) in Lp is given by (3.4.2). In this section we set A(x) =

aj ,

A = A(1),

and λp = Aξ0 p .

(3.4.3)

j≥x

Proof. We ﬁrst prove that expression (3.4.2) is well deﬁned. Set

S =

∞

aj1 ξt−j1 · · · ajk ξt−j1 −···−jk

k=1 j1 ,...,jk ≥1

Clearly

S ≤

∞

aj1 · · · ajk ξt−j1 · · · ξt−j1 −···−jk

k=1 j1 ,...,jk 1

Using that the sequence (ξn )n∈Z is i.i.d., we obtain for p ≥ 1,

Sp

≤ ≤

∞

k=1 j1 ,...,jk 1 ∞

aj1 · · · ajk ξt−j1 p · · · ξt−j1 −···−jk p k

(ξ0 p A) .

k=1

If λp < 1, the last series is convergent and S belongs to Lp . If p < 1 we conclude as for p = 1, using the bound

∞ k=1

p aj1 ξt−j1 · · · ajk ξt−j1 −···−jk

∞ k=1

aj1 ξt−j1 · · · ajk ξt−j1 −···−jk p

3.4. VECTOR VALUED LARCH(∞) PROCESSES

45

Now we prove that the expression (3.4.2) is a solution of eqn. (3.4.1), Xt = ξt a +

aj1 ξt−j1 · · · ajk ξt−j1 −···−jk a

k ≥ 1, j1 , . . . , jk 1

⎞ aj1 ξt−j1 a + aj1 ξt−j1 aj2 ξt−j1 −j2 · · · ajk ξt−j1 −j2 −···−jk a⎠ = ξt⎝a + ⎛

= ξt a +

j1 1

k≥2,j1 1

aj1 ξt−j1 a +

j1 1

j2 ,...,jk 1

aj2 ξ(t−j1 )−j2 · · · ajk ξ(t−j1 )−j2 −···−jk a

k ≥ 2, j2 , . . . , jk 1

∞ aj Xt−j . = ξt a +

j=1

Theorem 3.2. Assume that p 1 in the assumption of theorem 3.1, and

assume that ϕ = j aj ξ0 p < 1. If a stationary solution (Yt )t∈Z to equation (3.4.1) exists (a.s.), if Yt is independent of the σ-algebra generated by {ξs ; s > t}, for each t ∈ Z, then this solution is also in Lp and it is (a.s.) equal to the previous solution (Xt )t∈Z deﬁned by equation (3.4.2). Proof. Step 1. We ﬁrst prove that Y0 p < ∞. From the equation (3.4.1), from the stationarity of {Yt }t∈Z and from the independence assumption, we derive that ⎛ ⎞ ∞ Y0 p ξ0 p ⎝a + aj Y0 p ⎠ . j=1

ξ0 p a < ∞. Hence, the ﬁrst point in the theorem follows from Y0 p −ϕ 1 Step 2. As in Giraitis et al. (2000) [92] we write Yt = ξt a + j1 aj Yt−j = Xtm + Stm with ⎛ ⎞ m aj1 ξt−j1 · · · ajk ξt−j1 ···−jk a⎠ , Xtm = ξt ⎝a + k=1 j1 ,··· ,jk 1

⎛ Stm

=

ξt ⎝

⎞

aj1 ξt−j1 · · · ajm ξt−j1 ···−jm ajm +1 Yt−j1 ···−jm ⎠ .

j1 ,··· ,jm+1 1

We have Stm p ξp

j1 ,··· ,jm+1 1

m+1 aj1 · · · ajm+1 ξm . p Y0 p = Y0 p ϕ

46

CHAPTER 3. MODELS

We recall the additive decomposition of the chaotic expansion Xt in equation (3.4.2) as a ﬁnite expansion plus a negligible remainder that can be controlled Xt = Xtm + Rtm where ⎛ Rtm = ξt ⎝

⎞

aj1 ξt−j1 · · · ajk ξt−j1 ···−jk a⎠ ,

k>m j1 ,··· ,jk 1

satisﬁes Rtm p aξ0 p

k>m

ϕk aξ0 p

ϕm → 0. 1−ϕ

Then, the diﬀerence between those two solutions is controlled as a function of m with Xt − Yt = Rtm − Stm , hence Xt − Yt p

Rtm p + Stm p ϕm aξ0 p + Y0 p ϕm 1−ϕ ϕm aξ0 p 2 1−ϕ

thus, Yt = Xt a.s. Dependence properties. To our knowledge, there is no study of the weak dependence properties of ARCH or GARCH type models with an inﬁnite number of coeﬃcients. Mixing seems diﬃcult to prove for such models excepted in the Markov case (aj = 0 for large enough j), because this case is included in the Markov section for which we already mention that additional assumptions are needed to derive mixing. This section refers to Doukhan, Teyssi`ere and Winant (2005) [75], who improve on the special case of bilinear models precisely considered in Doukhan, Madr´e and Rosenbaum (2006) [69]. We use the notations given in (3.4.3). Theorem 3.3. The solution (3.4.2) of eqn. (3.4.1) is θ−weakly dependent with θ(r) ≤ 2Eξ0 Eξ0

t−1

t t λ a kλk−1 A + k 1−λ

for any t ≤ r.

k=1

Proof. Consider f : (Rd )u → R with f ∞ < ∞ and g : (Rd )v → R with Lip (g) < ∞, that is |g(x) − g(y)| ≤ Lip(g)(x1 − y1 + · · · + xu − yu ). Let i1 < · · · < iu , j1 < · · · < jv , such that j1 − iu r. To bound weak dependence ˆ in coeﬃcients we use an approximation of the vector v = Xj1 , . . . , Xjv by v

3.4. VECTOR VALUED LARCH(∞) PROCESSES

47

such a way that, for each index j ∈ {j1 , . . . , jk } and s ≤ r, the random variable ˆ j is independent of Xj−s . More precisely, let X ⎛ ⎞ ∞ ⎟ ˆ t = ξt ⎜ X aj1 ξt−j1 · · · ajk ξt−j1 −···−jk a⎠ . ⎝a + k=1j1 +···+jk <s

Now, let f (u) = f (Xi1 , . . . , Xiu ) and g(v) = g (Xj1 , . . . , Xjv ). We have that |Cov(f (u), g(v))|

|E (f (u)(g(v) − g(ˆ v)) − E(f (u))E(g(v) − g(ˆ v))| v)| 2f ∞E |g(v) − g(ˆ v ˆj 2f ∞Lip (g) EXjk − X k

ˆ 0 . 2vf ∞Lip (g)EX0 − X

k=1

ˆ 0 for any s ≤ r, which implies the bound of the Hence θ(r) ≤ 2EX0 − X theorem. This bound is made explicit for simple decay rates. More precisely Kt−b , under Riemanian decay A(x) Cx−b √ θ(t) t K(q ∨ λ) , under geometric decay A(x) Cq x Further approximations. In order to simulate and also to better understand their behaviour, it is an important feature to see how far those models are from simple processes. Weak dependence was proved with independent approximations ⎞ ⎛ ∞ ˆ t =ξt ⎝a + aj1 ξt−j1 · · · ajk ξt−j1 −···−jk a⎠ X k=1

j1 +···+jk <s

of the LARCH models; we precise this approximation through coupling arguments and we also prove below proximity to the Markov sequence obtained by truncating the series which deﬁnes them. ˆ t of Xt has not the same distribution as Xt . Coupling. The approximation X But we are in the case of causal Bernoulli shifts with independent inputs, so that the method of Section 3.1.4 applies. Let (ξi )i∈Z be an independent copy of (ξi )i∈Z , and let ξ˜n = ξn if n > 0 and ξ˜n = ξn for n ≤ 0. Finally, let ∞ ˜ t = ξ˜t a + X aj1 ξ˜t−j1 · · · ajk ξ˜t−j1 −···−jk a . k=1 j1 ,...,jk

˜ t has the same distribution as Xt and is independent of the σ-algebra Here X M0 = σ(Xi , i ≤ 0). Consequently, if δ˜p,n is a non increasing sequence satisfying

48

CHAPTER 3. MODELS

(3.2.2), we obtain the upper bounds τp,∞ (n) ≤ δ˜p,n . Since for any s ≤ n, we ˜ n p ≤ 2X0 − X ˆ 0 p , we obtain that have that Xn − X t−1 λtp t k−1 τp,∞ (n) ≤ inf 2Eξ0 p Eξ0 p kλp A a. + t≤r k 1 − λp k=1

For p = 1, we recover the upper bound of Theorem 3.3. If d = 1, we obtain the same upper bounds for α ˜k (n), β˜k (n) and φ˜k (n) as in Section 3.1.4, by assuming that the distribution function of X0 is continuous (similar bounds me bay obtained for d > 1 by assuming that each component of X0 has a continuous distribution function, see Lemma 5.1). Markov approximation.

Consider equation (3.4.1) truncated at rank N , ⎛ ⎞ N N ⎠ XtN = ξt ⎝a + aj Xt−j . j=1

The previous solution rewrites as ⎛ ∞ XtN = ξt ⎝a +

⎞ aj1 ξt−j1 · · · ajk ξt−j1 −···−jk a⎠ .

k=1N j1 ,...,jk 1

The corresponding error is then bounded by EXt − XtN

∞

A(N )k .

k=1

In the Riemanian decay case, the error is decay case, the error is q N /(1 − q N ).

3.4.2

∞ k=1

N −bk , and in the geometric

Bilinear models

Those models are deﬁned through the recurrence relation ⎛ ⎞ ∞ ∞ αj Xt−j ⎠ + β + βj Xt−j , Xt = ζt ⎝α + j=1

j=1

the variables here are real valued and ζt now denotes the innovation. To see that this is still a LARCH(∞) model, we set as before α αj , a= , and aj = ξt = ζt 1 , , for j ≥ 1. β βj

3.4. VECTOR VALUED LARCH(∞) PROCESSES

49

One usually takes β = 0. ARCH(∞) and GARCH(p, q) models are particular cases of the bilinear models. Giraitis and Surgailis (2002) [95] prove that under some restrictions, there is a unique second order stationary solution for these models. This solution has a chaotic expansion. Assume that the power series ∞ ∞ αj z j and B(z) = βj z j exist for |z| ≤ 1, let A(z) = j=1

j=1 ∞

G(z) =

∞

A(z) 1 = = gj z j and H(z) = hj z j 1 − B(z) j=1 1 − B(z) j=1

we will note hpp =

∞ j=0

|hj |p . Then:

Proposition 3.3 (Giraitis, Surgailis, 2002 [95]). If (ζt )t is i.i.d., and h2 < 1, then there is a unique second order stationary solution : Xt = α

∞

gt−s1 hs1 −s2 · · · hsk−1 −sk ζs1 · · · ζsk

(3.4.4)

k=1 sk <...<s1 ≤t

Lemma 3.6. Expansion (3.4.2) coincides with the chaotic expansion in proposition 3.3. Proof.

Assuming that β = 0, the expansion (3.4.2) writes as:

Xt = ζt α +

∞

sk <···<s1
(ζt αt−s1 + βt−s1 )×

× (ζs1 αs1 −s2 + βs1 −s2 ) · · · (ζsk−1 αsk−1 −sk + βsk−1 −sk )ζsk α or Xt = ζt α + S1 + S2 with S1

=

ζt αt−s1 (ζs1 αs1 −s2 + βs1 −s2 ) · · · (ζsk−1 αsk−1 −sk + βsk−1 −sk )ζsk α

k≥1 sk < · · · < s1 < t

S2

=

βt−s1 (ζs1 αs1 −s2 + βs1 −s2 ) · · · (ζsk−1 αsk−1 −sk + βsk−1 −sk )ζsk α

k≥1 sk < · · · < s1 < t

Under additional assumptions Giraitis and Surgailis prove the expansion: Xt = α

∞

gt−s1 hs1 −s2 · · · hsk−1 −sk ζs1 · · · ζsk

k=1 sk <···<s1 t

50

CHAPTER 3. MODELS

rewritten as Xt = ζt α + T1 + T2 with T1

∞

=

T2

ζt ht−s1 ζs1 hs1 −s2 ζs2 · · · hsk−1 −sk ζsk α

k=1sk <···<s1
gt−s1 hs1 −s2 ζs1 · · · hsk−1 −sk ζsk α

=

k=1sk <···<s1
The terms in the sum (T1 ) take a form: ζt (αt−i(1) βi(1) −i(1) · · · βi(1) −s1 )ζs1 (αs1 −i(2) βi(2) −i(2) · · · βi(2) −s2 ) · · · 1

1

2

1

p1

1

2

p2

· · · ζsk−1 (αs

(k) k−1 −i1

βi(k) −i(k) · · · βi(k) −s )ζsk α 1

2

pk

k

hence between each couple ζsp , ζsp+1 read from the left to the right one founds a term (αj ) followed with several (βj )’s in such a way that the sum of all indices equals sp+1 − sp . On the one hand, quote that expanding terms in the (S1 ) yields a sum of products of such terms proves that (S1 )’s is included in (T1 )’s. On the other hand, expand the term k¯ = k + p1 + · · · + pk (¯ sk¯ , s¯k−1 , . . . , s¯1 ) = ¯ (k) (k) (1) (1) (1) (sk , ipk , . . . , i1 , sk−1 , . . . , s1 , ip1 , . . . , i2 , i1 , t) in the sum (S1 ) yields the generic term in (T1 ) expansion. Hence (S1 ) and (T1 )’s expansions coincide. Analogously (T2 ) = (S2 ), by quoting that the generic term in (T2 ) writes: (βt−i(1) βi(1) −i(1) · · · βi(1) −s1 )ζs1 (αs1 −i(2) βi(2) −i(2) · · · βi(2) −s2 ) · · · 1

1

2

1

p1

1

· · · ζsk−1 (αs

2

(k) k−1 −i1

p2

βi(k) −i(k) · · · βi(k) −s )ζsk α. 1

2

pk

k

Conditional densities of Bilinear models. Another more speciﬁc feature of those models is the following result on the existence of conditional densities (see Doukhan et al.1995 [69]). This is relevant for subsampling techniques and density estimation. We use the fact that the bilinear equation may be written ˜t Xt = A˜t ζt + B

with

A˜t = a +

∞ j=1

aj Xt−j ,

˜t = B

∞

bj Xt−j

j=1

Hence, conditionally to the past σ-algebra (the history of {ζs , s < t}), the distribution of Xt is as smooth as the one of ζt if A˜t = 0. Now if we are interested by higher order marginal distributions we need a bit more work.

3.4. VECTOR VALUED LARCH(∞) PROCESSES

51

˜t into Split A˜t and B A˜t

=

t−1

aj Xt−j + At ,

At =

j=1

˜t B

=

t−1 j=1

∞

aj Xt−j

j=t

bj Xt−j + Bt ,

Bt =

∞

bj Xt−j

j=t

where At and Bt are measurable with respect to the σ algebra of the past σ{ζs , s ≤ 0}. This entails, for instance: X1 X2

˜1 = A˜1 ζ1 + B = (a1 ζ1 + A2 )ζ2 + (b1 ζ1 + B2 ).

˜1 , A2 and B2 the previous system is triangular and it Thus conditionally to A˜1 , B ˜ thus may be solved if A1 , a1 ζ1 + A2 do not vanish. The following result extends this observation: Lemma 3.7 (Conditional densities Doukhan et al. [69]). Assume that the random variables (ζt )t∈Z and the coeﬃcients aj are non negative for j = 1, 2, . . .. Also suppose that ζi are independent random variables with a density fζi for all i ∈ {1, . . . , n}. Then, conditionally with respect to the past of the process σ{ζs , s ≤ 0}, the random vector (X1 , . . . , Xn ) admits the density fn (x1 , . . . , xn ) deﬁned by: β1 βn 1 fζ fn (x1 , . . . , xn ) = · · · fζn |α1 α2 · · · αn | 1 α1 αn with βj = xj − b1 xj−1 − b2 xj−2 − · · · − bj−1 x1 − Bj and αj = a1 xj−1 + a2 xj−2 + · · · + aj−1 x1 + Aj for 1 ≤ j ≤ n (here α1 = A1 ). Corollary 3.1 (Density). Under the same assumptions as in lemma 3.7, with a = 0 and (ζt )t i.i.d. with a density f bounded by M then n fn (x1 , . . . , xn ) ≤ (M/a) for all (x1 , . . . , xn ). Corollary 3.2 (Density of a couple). Under the same assumptions as in lemma 3.7, and if ζt are i.i.d. with density f , then gi the density of the couple (X1 , Xi ) satisﬁes gi ∞ ≤ f 2 /A1 for all i > 1. Remark 3.4. Asymptotic properties of a standard kernel density estimate relies on such bounds. Indeed an expression of its variance follows jointly from weak dependence properties and such assumptions on the two dimensional distributions.

52

CHAPTER 3. MODELS

Proof of lemma 3.7. We work conditionally to the inﬁnite past from X0 . We write ⎞ ⎛ ⎞ ⎛ An ζn + Bn Xn ⎜ Xn−1 ⎟ ⎜ An−1 ζn−1 + Bn−1 ⎟ ⎟ ⎜ ⎟ ⎜ M⎜ ⎟=⎜ ⎟, .. .. ⎠ ⎝ ⎠ ⎝ . . A1 ζ1 + B1

X1 where

⎛

⎜ ⎜ M =⎜ ⎜ ⎝

1 −a1 ζn − b1 0 1 ... ... 0 0 0 0

−a2 ζn − b2 −a1 ζn−1 − b1 ... 0 0

... −an−1 ζn − bn−1 . . . −an−2 ζn−1 − bn−2 ... ... 1 −a1 ζ2 − b1 0 1

⎞ ⎟ ⎟ ⎟. ⎟ ⎠

This may be rewritten as, ζ1

=

ζ2

=

.. . ζn

=

X1 − B1 A1 X2 − b1 X1 − B2 a1 X1 + A2 Xn − b1 Xn−1 − b2 Xn−2 − · · · − bn−1 X1 − Bn a1 Xn−1 + a2 Xn−2 + · · · + an−1 X1 + An

where the previous coeﬃcients At and Bt are deterministic in this conditional setting. Thus, Eg(X1 , X2 , . . . , Xn ) = g φ−1 (u1 , . . . , un ) fζ1 (u1 ) · · · fζn (un )du1 · · · dun with fζi the density of ζi and with (ζ1 , . . . , ζn ) = φ(X1 , . . . , Xn ). Here, put (u1 , . . . , un ) = φ(x1 , . . . , xn ). The function φ has a diagonal Jacobian, hence ∂uj 1 = ∂xj αj for 1 ≤ j ≤ n and the result follows. Proof of the corollaries. The ﬁrst corollary follows by integration from lemma 3.7. We prove the result for the density of the couple (X1 , X4 ), and we can prove the general result in the same way. With f4 (x1 , x2 , x3 , x4 ) =

β β 1 1 4 ···f , f |α1 · · · α4 | α1 α4

3.5. ζ-DEPENDENT MODELS

53

an integration with respect to x2 and x3 implies that β2 β3 dx2 dx3 f 2 g4 (x1 , x4 ) = f4 (x1 , x2 , x3 , x4 )dx2 dx3 ≤ . f f A1 α2 α3 |α2 α3 | Hence with u=

x2 − b1 x1 − B2 , a1 x1 + A2

v=

x3 − b1 x2 − b2 x1 − B3 , a1 x2 + a2 x1 + A3

we write x2 = u × (a1 x1 + A2 ) + b1 x1 + B2 , x3 = v × (a1 (u × (a1 x1 + A2 ) + b1 x1 + B2 ) + a2 x1 + A3 ) + b1 (u × (a1 x1 + A2 ) + b1 x1 + B2 ) + b2 x1 + B3 . The Jacobian matrix is diagonal and the absolute value of the Jacobian is equal to |(a1 x1 + A2 )(a1 x2 + a2 x1 + A3 )|, and thus f 2 f 2 g4 (x1 , x4 ) ≤ . f (u) f (v) dudv ≤ A1 A1

3.5

ζ-dependent models

In this section, we give some classes of ζ-dependent models: associated processes, Gaussian processes and interacting particle systems.

3.5.1

Associated processes

An analogous formula to (1.4.1) proves that associated random variables belong to the class of ζ-dependent models. Several associated models are obtained from nondecreasing transformations of independent variables. For Gaussian vector, Pitt (1982) [146] gave a necessary and suﬃcient condition to be associated. We discuss Pitt’s result in the sequel. Theorem 3.4. Let X = (X1 , . . . , Xn ) be a Gaussian vector with mean vector 0 and covariance matrix Σ = (σi,j = Cov(Xi , Xj ))1≤i,j≤n . The condition Cov(Xi , Xj ) ≥ 0, for all i, j = 1, . . . , n

(3.5.1)

is necessary and suﬃcient for the variables to be associated. Proof of Theorem 3.4. The method of the proof below is due to Pitt (1982). Assuming Condition (3.5.1), the task is to prove that Cov(f (X), g(X)) ≥ 0, for all nondecreasing functions f and g deﬁned on Rn (the second implication being trivial). We suppose, without loss of generality, that Σ is non-singular and that the

54

CHAPTER 3. MODELS

∂f function f and g are continuously diﬀerentiable with bounded derivatives ∂xi ∂g and , for i = 1, . . . , n. Let Z be an independent copy of X. For any λ ∈ [0, 1], ∂xi let Y (λ) be the random vector deﬁned by # Y (λ) = λX + 1 − λ2 Z. Clearly, for each ﬁxed λ, Y (λ) is a Gaussian vector with covariance matrix Σ and Cov(Xi , Yj (λ)) = λσi,j . Set F (λ) = E (f (X)g(Y (λ))) . Clearly Cov(f (X), g(X)) = F (1) − F (0).

(3.5.2)

It is suﬃcient to show that F (λ) exists and F (λ) ≥ 0 for 0 ≤ λ ≤ 1. To this end, let φ and p be respectively the density of X and the conditional density of Y (λ) given X = x. We have : ⎞ ⎛ n 1 1 ci,j xi xj ⎠ , with Σ−1 = (ci,j )1≤i,j≤n φ(x) = # exp ⎝− 2 i,j=1 (2π)n |Σ| and

1 φ (1 − λ2 )n/2

p(λ; x, y) =

λx − y √ 1 − λ2

Hence F (λ) =

φ(x)f (x)g(λ; x)dx,

(3.5.3)

Rn

where g(λ; x) is deﬁned by

g(λ; x) =

p(λ; x, y)g(y)dy, Rn

which is equal to

g(λ; x) = Rn

where

g(λx − y)φλ (y)dy,

1 φλ (x) = φ (1 − λ2 )n/2

x √ 1 − λ2

(3.5.4)

.

∂g(λ, x) exist and are Now Equation (3.5.4) proves that the partial derivatives ∂xi bounded. Moreover, since g is increasing and λ is positive, we have ∂g(λ, x) ≥ 0. ∂xi

(3.5.5)

3.5. ζ-DEPENDENT MODELS

55

Next an explicit calculation based on the heat equations ∂φ 1 ∂2φ = , ∂σi,i 2 ∂x2i

∂φ ∂2φ = , ∂σi,j ∂xi ∂xj

for i = j,

shows that ∂p ∂λ

=

=

⎛ ⎞ p ⎝ λ kλ − xi cij (λxj − yj ) − (λxi − yi )(λxj − yj )⎠ 1 − λ2 1 − λ2 i,j i,j ⎛ ⎞ ∂p ∂2p 1 ⎝ ⎠. (3.5.6) σi,j − xi − λ i,j ∂xi ∂xj ∂xi i

We obtain, combining (3.5.3), (3.5.4) and (3.5.6) ⎞ ⎛ 2 ∂g(λ; x) ∂ g(λ; x) 1 ⎠ dx. F (λ) = − φ(x)f (x) ⎝ σi,j − xi λ Rn ∂x ∂x ∂x i j i i,j i An integration by parts gives F (λ) =

1 λ

⎛ φ(x) ⎝

Rn

i,j

⎞ σi,j

∂f (x) ∂g(λ; x) ⎠ dx. ∂xi ∂xj

(3.5.7)

∂f (x) ≥ 0, since f is increasing. This fact, together with (3.5.5), ∂xi (3.5.1) and (3.5.7) proves that F (λ) ≥ 0, for any λ ∈ [0, 1]. This conclusion together with (3.5.2) proves Theorem 3.4. We have

Remark 3.5. Stable processes have the same linear structure as normal processes since arbitrary linear combinations of stable variables are stable. Lee, Rachev and Samorodnitsky (1990) [118] gave necessary and suﬃcient conditions for a stable random vector to be associated.

3.5.2

Gaussian processes

Gaussian processes belong to the class of ζ-dependent models. This property is a consequence of the following lemma. Lemma 3.8. Denote XC = (Xi )i∈C if C ⊂ Z. Let (Xn )n∈Z be a Gaussian centered process. Then for all real-valued functions h, k with bounded ﬁrst partial derivatives, one has

∂h ∂k

|Cov(h(XA ), k(XB ))| ≤ (3.5.8)

|Cov(Xi , Xj )|. ∂xi ∞ ∂xj ∞ i∈A,j∈B

56

CHAPTER 3. MODELS

Remark 3.6. The proof of Lemma 3.8 is along the paper of Pitt (1982) [146]. For more details, we refer the reader to the proof of Lemma 19 in Doukhan and Louhichi (1999) [67].

3.5.3

Interacting particle systems

In this subsection, we develop an example of ζ-dependent interacting particle systems (cf. Proposition 3.4 below). Before stating the main result of this paragraph, we brieﬂy recall the basic construction of general interacting particle systems, described in sections I.3 and I.4 of Liggett’s book (1985) [122]. Let S be a countable set of sites, W a ﬁnite set of states, and X = W S the set of conﬁgurations, endowed with its product topology, that makes it a compact set. On each site the state evoluate as a Markov chain. But we are interested in the case where the evolution of neighbour sites are linked. We deﬁne a Feller process on X by specifying the local transition rates: to a conﬁguration η and a ﬁnite set of sites T is associated a nonnegative measure cT (η, ·) on W T . Loosely speaking, we want the conﬁguration to change on T after an exponential time with parameter cT,η = cT (η, ζ) . ζ∈W T

After that time, the conﬁguration becomes equal to ζ on T , with probability cT (η, ζ)/cT,η . Let η ζ denote the new conﬁguration, which is equal to ζ on T , and to η outside T . The inﬁnitesimal generator should be: Ωf (η) = cT (η, ζ)(f (η ζ ) − f (η)) . (3.5.9) T ⊂S ζ∈W T

For Ω to generate a Feller semigroup acting on continuous functions from X into R, some hypotheses have to be imposed on the transition rates cT (η, ·). The ﬁrst condition is that the mapping η → cT (η, ·) should be continuous (and thus bounded, since X is compact). Let us denote by cT its supremum norm. cT = sup cT,η . η∈X

It is the maximal rate of change of a conﬁguration on T . One essential hypothesis is that the maximal rate of change of a conﬁguration at one given site is bounded. B = sup cT < ∞. (3.5.10) x∈S T x

If f is a continuous function on X , one deﬁnes Δf (x) as the degree of dependence of f on x: Δf (x) = sup{ |f (η) − f (ζ)| / η, ζ ∈ X and η(y) = ζ(y) ∀ y = x } .

3.5. ζ-DEPENDENT MODELS

57

Since f is continuous, Δf (x) tends to 0 as x tends to inﬁnity, and f is said to be smooth if Δf is summable: Δf (x) < ∞. |f | = x∈S

It can be proved that if f is smooth, then Ωf deﬁned by (3.5.9) is indeed a continuous function on X and moreover: Ωf ≤ B|f |. We also need to control the dependence of the transition rates on the conﬁguration at other sites. If y ∈ S is a site, and T ⊂ S is a ﬁnite set of sites, one deﬁnes cT (y) = sup cT (η1 , · ) − cT (η2 , · )tv η1 (z) = η2 (z) ∀ z = y , where · tv is the total variation norm: cT (η1 , · ) − cT (η2 , · )tv =

1 |cT (η1 , ζ) − cT (η2 , ζ)| . 2 T ζ∈W

If x and y are two sites such that x = y, the inﬂuence of y on x is deﬁned as: cT (y). γ(x, y) = T x

We will set γ(x, x) = 0 for all x. The inﬂuences γ(x, y) are assumed to be summable: γ(x, y) < ∞. (3.5.11) M = sup x∈ S y∈ S

Under both hypotheses (3.5.10) and (3.5.11), it can be proved that the closure of Ω generates a Feller semigroup {St , t ≥ 0} (Theorem 3.9 p. 27 of Liggett (1985)). A generic process with semigroup {St , t ≥ 0} will be denoted by {ηt , t ≥ 0}. The expectations with respect to its distribution, starting from η0 = η will be denoted by Eη . For each continuous function f , one has: St f (η) = Eη [f (ηt )] = E[f (ηt ) | η0 = η]. We have now all the ingredients to control the covariance of f (ηs ) and g(ηt ) for a ﬁnite range interacting particle system when the underlying graph structure has bounded degree. Proposition 3.4 shows that if f and g are mainly located on two ﬁnite sets R1 and R2 , then the covariance of f and g decays exponentially in the distance between R1 and R2 . From now on, we assume that the set of sites S is endowed with an undirected

58

CHAPTER 3. MODELS

graph structure, and we denote by d the natural distance on the graph. We will assume not only that the graph is locally ﬁnite, but also that the degree of each vertex is uniformly bounded. ∀x ∈ S

#{y ∈ S / d(x, y) = 1} ≤ r,

where # denotes the cardinality of a ﬁnite set. Thus the size of the sphere or ball with center x and radius n is uniformly bounded in x, and increases at most geometrically in n. #{y ∈ S / d(x, y) = n} ≤

r r (r−1)n , #{y ∈ S / d(x, y) ≤ n} ≤ (r−1)n . r−1 r−2

Let R be a ﬁnite subset of S. We shall use the following upper bounds for the number of vertices at distance n, or at most n from R. #{x ∈ S / d(x, R) = n} ≤ #{y ∈ S / d(x, R) ≤ n} ≤ 2enρ #R,

(3.5.12)

with ρ = log(r − 1). In the case of an amenable graph (e.g. a lattice on Zd ), the ball sizes have a subexponential growth. Therefore, for all ε > 0, there exists c such that: #{x ∈ S / d(x, R) = n} ≤ #{y ∈ S / d(x, R) ≤ n} ≤ c enε . What follows is written in the general case, using (3.5.12). It applies to the amenable case replacing ρ by ε, for any ε > 0. We are going to deal with smooth functions, depending weakly on coordinates away from a ﬁxed ﬁnite set R. Indeed, it is not suﬃcient to consider functions depending only on coordinates in R, because if f is such a function, then for any t > 0, St f may depend on all coordinates. Definition 3.2. Let f be a function from S into R, and R be a ﬁnite subset of S. The function f is said to be mainly located on R if there exists two constants α and β > ρ such that α > 0, β > ρ and for all x ∈ R: Δf (x) ≤ αe−βd(x,R) .

(3.5.13)

Since β > ρ, the sum x Δf (x) is ﬁnite. Therefore a function mainly located on a ﬁnite set is necessarily smooth. The system we are considering will be supposed to have ﬁnite range interactions in the following sense (cf. Deﬁnition 4.17 p. 39 of Liggett (1985)). Definition 3.3. A particle system deﬁned by the rates cT (η, ·) is said to have ﬁnite range interactions if there exists k > 0 such that if d(x, y) > k: 1. cT = 0 for all T containing both x and y,

3.6. OTHER MODELS

59

2. γ(x, y) = 0. The ﬁrst condition imposes that two coordinates cannot simultaneously change if their distance is larger than k. The second one says that the inﬂuence of a site on the transition rates of another site cannot be felt beyond distance k. Under these conditions, the following covariance inequality holds. Proposition 3.4. Assume (3.5.10) and (3.5.11). Assume moreover that the process is of ﬁnite range. Let R1 and R2 be two ﬁnite subsets of S. Let β be a constant such that β > ρ. Let f and g be two functions mainly located on R1 and R2 , in the sense that there exist positive constants κf , κg such that, Δf (x) ≤ κf e−βd(x,R1 )

and

Δg (x) ≤ κg e−βd(x,R2 ) .

Then for all positive reals s, t, sup Covη (f (ηs ), g(ηt )) ≤ Cκf κg (#R1 ∧#R2 )eD(t+s) e−(β−ρ)d(R1 ,R2 ) , (3.5.14) η∈X

where, D = 2M e

(β+ρ)k

and

2Beβk C= D

1+

eρk 1 − e−β+ρ

.

Proof. We refer the reader to the proof of Proposition 3.3 in Doukhan et al. (2005) [64]. Remark 3.7. Shashkin (2005) [176] obtains a similar inequality for random ﬁelds indexed by Zd . For transitive graphs, the covariance inequality stated in Proposition 3.4 was studied by Doukhan et al. (2005) [64] in order to derive a functional central limit theorem for interacting particle systems.

3.6 3.6.1

Other models Random AR models

Assume here that Xt = At Xt−1 + ξt , where the sequence ξt ∈ Rd is still i.i.d. but now At is assumed to be a stationary sequence of random d × d matrices. A stationary solution of the previous equation has the formal expansion Xt =

∞ k=0

At · · · At−k ξt−k

60

CHAPTER 3. MODELS

A ﬁrst and simple case for convergence of this series is EA0 p < 1 for a suitable matrix norm, in the case where the sequence (At ) is i.i.d. and that At , At−1 , . . . are independent of the inputs (ξt ). For this we also assume Eξ0 p < ∞ for some p ≥ 1. This condition∗ also implies convergence of the previous series in Lp . For d = 1, a simple example of this situation is the bilinear model At = a+bξt−1 .

If now At = ζt + Jj=1 bj ξt−j for a stationary sequence (ζt ) independent of (ξt ) the condition Jp J E ζ0 + bj ξj < 1 j=1 older’s inimplies absolute convergence of the previous series in Lp through H¨ equality. The previous relation holds if ζ0 pJ + ξ0 pJ

J

bj < 1.

j=1

Those models are also suitable for the previous section related to Markov chains, but a special case of this situation is provided if the sequence (At ) is stationary and independent of the sequence (ξt ). In this case the assumption ∞

EAk Ak−1 · · · A0 p < ∞

k=0

implies the convergence of the previous series in Lp if Eξ0 p < ∞. Extension of such models, solutions of the non Markov equation j Xt = αt Xt−j + ζt ,

(3.6.1)

j∈A

are seen in Doukhan and Truquet [76] as random ﬁelds († ) with inﬁnitely

(2006) j many interactions. If b = j∈A α0 p < 1 the solution of equation (3.6.1) writes a.s. and in Lp , Xt = ζt +

∞

i=1 j1 ,...,ji ∈A

2 i αjt1 αjt−j · · · αjt−j ζ . 1 1 −···−ji−1 t−(j1 +···+ji )

∗ Existence

of the model in this case also relies on the weaker assumption E log A0 < 0; in this case the previous series only converges a.s. and dependence conditions are not easy to derived; for this a concentration inequality is needed and a log transformation should be applied to the obtained coeﬃcients. † Innovations ζ are vectors of Rk and coeﬃcients αj are k × k matrices, · is a norm of t t algebra on this set of matrices and X will valued random ﬁeld. Let A ⊂ Zd \ {0}, we be an E assume that the i.i.d. random ﬁeld ξ = (αjt )i∈A , ζt here Mk×k denotes the set of k × k matrices.

t∈Zd

takes now its values in (Mk×k )A ×E;

3.6. OTHER MODELS

3.6.2

61

Integer valued models

The idea of Galton Watson models conducted Alain Latour (see [116], [117], [65]) to the construction of integer valued extensions of the standard econometric models. As they are discrete valued no mixing condition may usually be expected from such models (see section 1.5) and this is why they ﬁt nicely in the weak dependent frame. Definition 3.4 (Steutel and van Harn Operator). Let (Yj )j∈N be a sequence of independent and identically distributed (i.i.d.) non-negative integer-valued variables with mean α and variance λ, independent of X, a non-negative integervalued variable. The Steutel and van Harn operator, α◦ is deﬁned by: X i=1 Yi , if X = 0, α◦X = 0, otherwise. The sequence (Yi )i∈N is called a counting sequence. Note that, as indicated in Deﬁnition 3.4, the mean of the summands Yi associated with the operator α◦ is denoted by α. Suppose that β◦ is another Steutel and van Harn operator based on a counting sequence (Yi )i∈N . The operator α◦ and β◦ are said to be independent if, and only if, the counting sequences (Yi )i∈N and (Yi )i∈N are mutually independent. One may ﬁrst think to Poisson distributed variables Yi with parameter a. The ﬁrst example, Galton Watson with immigration Xt = a ◦ Xt−1 + ξt

(3.6.2)

was extended in various papers by Alain Latour (see e.g. [116] or [117]) for bilinear type extensions (see Doukhan, Latour and Oraichi, 2006 [65]). We would like to extend the integer-valued model class to give a non-negative integer-valued bilinear process, denoted by INBL(p, q, m, n), similar to the realvalued bilinear process. A time series (Xt )t∈N is generated by a bilinear model, if it satisﬁes the equation: Xt = α +

p

ai Xt−i +

i=1

q

cj εt−j +

j=1

m n

bk (εt− Xt−k ) + εt

(3.6.3)

k=1 =1

where (εt )t∈N is a sequence of i.i.d. random variables, usually but not always with zero mean, and where α, ai , i = 1, . . . , p, cj , j = 1, . . . , q, and bk , k = 1, . . . , m, = 1, . . . , n are real constants. In (3.6.3), we can “formally” substitute Steutel and van Harn operators to some of the parameters giving an equation of the form Xt =

p i=1

ai ◦ Xt−i +

q j=1

cj ◦ εt−j +

n m k=1 =1

bk ◦ (εt− Xt−k ) + εt

(3.6.4)

62

CHAPTER 3. MODELS

where the operators ai ◦, i = 1, . . . , p, cj ◦, j = 1, . . . , q, and bk ◦, k = 1, . . . , m, = 1, . . . , n, are mutually independent and (εt )t∈N is a sequence of i.i.d. nonnegative integer-valued random variables of ﬁnite mean μ and ﬁnite variance σ 2 , independent of the operators. As in Latour et al., we restrict to the ﬁrst-order bilinear model (3.6.5) Xt = a ◦ Xt−1 + b ◦ (εt−1 Xt−1 ) + εt where the sequence involved in the operator a◦ and b◦ are respectively of mean a and b and variance α and β. Y and Y denote generic variables used in a◦ and b◦, respectively. If a + b · μ < 1 Doukhan, Latour and Oraichi (2006) [65] prove that this model is strictly stationary in L1 ; it is θ−weakly dependent with θ(r) ≤ 2(a + b · μ)r E(X0 ). + ε0 p Y p < 1 this solution belongs to Lp . Moment estiIf moreover Y p √ mators thus yield n−consistent estimators of the parameters in the previously cited paper. We ﬁnally mention that in the case of non negative coeﬃcients such models are also associated sequences.

3.6.3

Random ﬁelds

Analogously, one may deﬁne some simple stationary random ﬁelds. Let T be any group (in an additive notation) with some metric d, then Bernoulli shifts still write Xt = H((ξs−t )s∈T ) for a function H : RT → R if (ξt )t∈T is stationary, this is also the case of (Xt )t∈T . In order to derive dependence properties of such models one better considers i.i.d. innovations and we assume that EH((ξs )s∈T ) − H((ξ (r) )s∈T ) →r→∞ 0 (r)

(r)

if we set ξt = ξt for d(s, 0) < r and ξt = z is a ﬁxed point of ξ’s values set. Another option is to use a i.i.d. sequence ξ = (ξt )t∈T independent and with (r) the same distribution as ξ and to set ξt = ξt for d(s, 0) ≥ r. Here again linear random ﬁelds as well as Volterra random ﬁelds are simple to deﬁne. Standard sets T are Zd and (Z/nZ)d . It is less natural to work here with continuous time processes because i.i.d. white noise are discontinuous processes: they are thus less natural to deﬁne. A nice example of this situation is given in the next subsection. LARCH(∞) random fields Let (ξt )t∈Zd be a stationary sequence of random d × m-matrices, (aj )j∈N∗ be a sequence of m × d matrices, and a be a vector in Rm . A vector valued

3.6. OTHER MODELS

63

LARCH(∞) random ﬁeld model is a solution of the recurrence equation ⎛ Xt = ξt ⎝a +

⎞ aj Xt−j ⎠ , t ∈ Zd

(3.6.6)

j =0

Such LARCH(∞) models include a large variety of models, as those in § 3.4 but the main point is here that causality in no more assumed in general. The same proof as in Section 3.4 entails the Proposition 3.5 (Doukhan, Teyssi`ere, Winant, 2006 [75]). Assume that ξ0 ∞

aj < 1,

j =0

then one stationary of solution of eqn. (3.6.6) in Lp is given as ⎛ Xt = ξt ⎝a +

∞

⎞ aj1 ξt−j1 aj2 . . . ajk ξt−j1 −···−jk a⎠

(3.6.7)

k=1 j1 ,...,jk =0

In the following of this section we set A(x) = j≥x aj , A = A(1) and λ = Aξ0 ∞ where (j1 , . . . , jk ) = |j1 | + · · · + |jk |. Approximations. We assume here that the random ﬁeld (ξt )t∈Zd is i.i.d.. One ﬁrst approximates here Xt by a random variable independent of X0 . Set ⎛ ⎞ ∞ ⎟ ˜ t = ξt ⎜ X aj1 ξt−j1 · · · ajk ξt−j1 −···−jk a⎠ ⎝a + k=1 j1 +···+jk
Proposition 3.6. One bound for the error is given by: ˜ t Eξ0 EXt − X

Eξ0

t−1

k−1

kλ

k=1

t λt a A + k 1−λ

We now specialize this result. Assume that b, C > 0 are constants, then there exists some constant K > 0 such that K , under√ Riemaniann decay A(x) Cx−b tb ˜ Xt − Xt K(q ∨ λ) t , under geometric decay A(x) Cq x

64

CHAPTER 3. MODELS

Markov approximation. Consider equation (3.6.6) truncated at rank N ,

N N Xt = ξt a + 0<j≤N aj Xt−j . The previous solution rewrites as ⎛ ⎞ ∞ XtN = ξt ⎝a + aj1 ξt−j1 · · · ajk ξt−j1 −···−jk a⎠ . k=1 0<j1 ,...,jk ≤N

∞

∞ Then EXt − XtN ≤ k=1 A(N )k . This error has rate k=1 N −bk for Riemanian decays and q N /(1 − q N ) in the geometric case. Moreover: Theorem 3.5. The solution (3.6.7) of eqn. (3.6.6) is η−weakly dependent with t−1 t λt k−1 kλ A η(r) = ξ0 ∞ ξ0 ∞ + a . k 1−λ k=1

This bound may be made explicit for the decays considered previously. Models with infinite memory We also mention rapidly here truly non linear extensions of LARCH(∞) models which are the chains with inﬁnite memory from Doukhan and Wintenberger (2006) [78] and the random ﬁelds with inﬁnite interactions from Doukhan and Truquet (2007) [55], those models are respectively solutions of the equations‡ Xt

=

Xt

=

F (Xt−1 , Xt−2 , . . . ; ξt ), t ∈ Z, F Xt−j j =0 ; ξt ), t ∈ Zd ,

those models are usual excited by i.i.d. inputs ξ. Even if no explicit chaotic solution seems to be available in general, such models are well deﬁned and Lp stationary if αj xj − yj , a= αj < 1, F (x; ξ0 ) − F (y; ξ0 )m ≤ j =0

j =0

in the previous inequality§ one should take m = p both for causal random processes and causal random ﬁelds (accurately deﬁned in the above mentioned work) and m = ∞ else. Moreover the weak dependence coeﬃcients are proved to follow analogous decays with now αi = ai in theorem 3.4.2. More precisely, the respectively τ or η weak dependence coeﬃcients have rates driven by the relation r αi . inf a p + p

‡ Here

|i|≥p d

the function F (x; u) is perhaps not deﬁned over all RN × R or RZ × R but it is enough that it is deﬁned on trajectories of the solution. § For respectively j ∈ N∗ and j ∈ Zd \ {0}.

3.6. OTHER MODELS

65

With the previous geometric decays, the bound is the

same as in the paragraph related to theorem 3.3 and for Riemanian decays i≥p ai ≤ C i−b a log loss appears and τ1,∞ (r) ≤ Klogb∨1 r · r−b .

3.6.4

Continuous time

In the continuous time setting one better consider a process with independent increments, (Zt )t∈R , then a simple extension of linear processes is deﬁned through the Wiener integral ∞

Xt =

−∞

f (t − s)dZs

It is for example simple to deﬁne such integrals for a Brownian motion but other possibilities are all the classes of L´evy processes. Among them, the SαS-L´evy motion is described in Samorodnitsky and Taqqu (1994) [172]. Analogues of Volterra processes are now multiple stochastic integrals. A complete theory is developed by Major (1981) [126]. More examples are provided in the monograph by Doukhan, Oppenheim and Taqqu (2003) [72].

Chapter 4

Tools for non causal cases Moment inequalities are the main tools when using non causal weak dependence. A ﬁrst useful section addresses the weak dependence properties of indicators of processes, useful both for moment inequalities and for the empirical process. After this separate sections address variances of sums, (2 + δ)-order moments and higher order moments. They yield both Rosenthal type and MarcinkiewiczZygmund inequalities. Finally cumulants sums are also considered as dependence coeﬃcients and they are used in order to derive sharp exponential inequalities. A last section is devoted to prove tightness criteria for empirical processes through suitable moment inequalities.

4.1

Indicators of weakly dependent processes

Deﬁne, for positive real number x, the function gx : R → R by gx (y) = 1x≤y − 1x≤−y . We are interested along this chapter by (I, Ψ)-dependent sequences, where * )u gxi xi > 0, u ∈ N∗ , I= i=1

and Ψ(f, g) = c(df , dg ), for some positive function c deﬁned on N∗ × N∗ , in this case we will simply say that the sequence is (I, c)-dependent. Set )u * Λ0 = fi fi ∈ Λ ∩ L∞ , fi : R → R, i = 1, . . . , u, u ∈ N∗ . i=1

The following lemma relates η, κ or θ weak dependence to I weak dependence under additional concentration assumptions. 67

68

CHAPTER 4. TOOLS FOR NON CAUSAL CASES

Lemma 4.1. Let (Xn )n∈N be a sequence of r.v’s. Suppose that, for some positive real constants C, α, λ sup sup P (x ≤ Xi ≤ x + λ) ≤ Cλα .

(4.1.1)

x∈R i∈N

(i) If the sequence (Xn ) is (Λ0 , η)-dependent, then it is (I, c)-dependent with α 1

(r) = η(r) 1+α and c(u, v) = 2(8C) 1+α (u + v). (ii) If the sequence (Xn ) is (Λ0 , κ)-dependent, then it is (I, c)-dependent with α

2

(r) = κ(r) 2+α and c(u, v) = 2(8C) 2+α (u + v)

2(1+α) 2+α

.

(iii) If the sequence (Xn ) is (Λ0 , θ)-dependent, then it is (I, c)-dependent with α 1 1

(r) = θ(r) 1+α and c(u, v) = 2(8C) 1+α (u + v) 1+α . , λ)-dependent (with λ(r) then it is (I, c)(iv) If the sequence (Xn ) is (Λ0 ≤ 1),2(1+α) 1 2 1+α 2+α 2+α (u + v) dependent with c(u, v) = 2 (8C) + (8C) and (r) = α

λ(r) 2+α . Proof of Lemma 4.1. First

recall that for all real numbers 0 ≤ xi , yi ≤ 1, |x1 · · · xm − y1 · · · ym | ≤ m i=1 |xi − yi |. Let then g, f ∈ I, i.e. g(y1 , . . . , yu ) = gx1 (y1 ) · · · gxu (yu ), and f (y1 , . . . , yv ) = gx1 (y1 ) · · · gxv (yv ) for some u, v ∈ N∗ and xi , x j ≥ 0. For ﬁxed x > 0 and a > 0 let y x y x fx (y) = 1y>x − 1y≤−x + − + 1 1x−a
k(y1 , . . . , yv ) = fx1 (y1 ) · · · fxv (yv )

then Lip(h) ≤ a−1 , Lip(k) ≤ a−1 . Consider i1 ≤ · · · ≤ iu ≤ iu + r ≤ j1 ≤ · · · ≤ jv and set Cov(h, k) := Cov(h(Xi1 , . . . , Xiu ), k(Xj1 , . . . , Xjv )). (i) η-weak dependence ⇒ |Cov(h, k)| ≤ (u + v)η(r)/a. (ii) κ-weak dependence ⇒ |Cov(h, k)| ≤ ((u + v)/a)2 κ(r). (iii) θ-weak dependence ⇒ |Cov(h, k)| ≤ vθ(r)/a. Inequality (4.1.1) yields |Cov(g, f ) − Cov(h, k)| ≤ 8Caα (u + v) and (i) |Cov(g, f )| ≤ 8Caα (u + v) + (u + v)η(r)/a,

4.2. LOW ORDER MOMENTS INEQUALITIES (ii) |Cov(g, f )| ≤ 8Caα (u + v) +

u+v 2 a

69

κ(r), or

(iii) |Cov(g, f )| ≤ 8Caα (u + v) + va θ(r). The lemma follows by setting respectively 1/(1+α) 1/(2+α) 1/(1+α) (u+v)κ(r) θ(r) η(r) (ii) a = (iii) a = (i) a = 8C 8C 8C(u+v) The case of λ dependence is obtained by the summation of both cases (i) and (ii).

4.2

Low order moments inequalities

Our proof for central limit theorems is based on a truncation method. For a truncation level T ≥ 1 we shall denote X k = fT (Xk ) − EfT (Xk ) with fT (X) = X ∨ (−T ) ∧ T . Let us simply remark that X k has moments of any orders because it is bounded. Suppose that μ = E|X|m is ﬁnite for some m > 0. Furthermore, for any a ≤ m, we control the diﬀerence E|fT (X0 ) − X0 |a with Markov inequality: E|fT (X0 ) − X0 |a ≤ E|X0 |a 1{|X0 |≥T } ≤ μT a−m , thus using Jensen inequality yields 1

X 0 − X0 a ≤ 2μ a T 1− a . m

(4.2.1)

Deriving from this truncation, we are now able to control the limiting variance as well as the higher order moments.

4.2.1

Variances

Lemma 4.2 (Variances). If one of the following conditions holds then the series

k≥0 |Cov(X0 , Xk )| is convergent ∞ k=0 ∞

κ(k) m−2

λ(k) m−1

< ∞

(4.2.2)

< ∞

(4.2.3)

k=0

Proof. Using the fact that X 0 = gT (X0 ) is a function of X0 with Lip gT = 1, gT ∞ ≤ 2T we derive, for T large enough, |Cov(X 0 , X k )| ≤ κ(k),

or ≤ (2T + 1)λ(k) ≤ 4T λ(k) respectively

(4.2.4)

70

CHAPTER 4. TOOLS FOR NON CAUSAL CASES

In the κ-dependent case, truncation can thus be omitted and |Cov(X0 , Xk )| ≤ κ(k)

(4.2.5)

we only consider λ dependence below. Now we develop Cov(X0 , Xk ) = Cov(X 0 , X k ) + Cov(X0 − X 0 , Xk ) + Cov(X 0 , Xk − X k ) and using a truncation T to be determined we use the two previous bounds 1 1 (4.2.1) and (4.2.4) with H¨ older inequality with the exponents + = 1 to a m derive |Cov(X0 , Xk )| ≤ 4T λ(k) + 2X0 m X 0 − X0 a ≤ 4T λ(k) + 4μ1/a+1/m T 1−m/a ≤ 4(T λ(k) + μT 2−m ). Note that we used the relation 1 − m/a = 2 − m. Thus using the truncation μ such that T m−1 = λ(k) yields the bound 1

m−2

|Cov(X0 , Xk )| ≤ 8μ m−1 λ(k) m−1 .

4.2.2

(4.2.6)

A (2 + δ)-order moment bound

Lemma 4.3. Assume that the stationary and centered process (Xi )i∈Z satisﬁes −κ E|X0 |2+ζ < ∞, and it is either κ-weakly −λ dependent with 1κ(r) = O (r )2 or λ. Then if κ > 2 + ζ , or λ > 4 + ζ , then weakly dependent with λ(r) = O r for all δ ∈]0, A ∧ B ∧ 1[ (where A and B are constants smaller than ζ and only depending of ζ and respectively κ or λ, see 4.2.10 and 4.2.11), there exist C > 0 such that: √ where Δ = 2 + δ. Sn Δ ≤ C n, Remarks.

1/Δ 5 • The constant C satisﬁes C > |Cov(X0 , Xk )|. Under 2δ/2 − 1 k∈Z the conditions of this lemma, Lemma 4.2 entails c≡ |Cov(X0 , Xk )| < ∞. k∈Z

• The result is sketched from Bulinski and Sashkin (2005) [33]; notice, however that their condition of dependence is of a causal nature while our is not which explains a loss with respect to the exponents λ and κ. In their ζ−weak dependence setting the best possible value of the exponent is 1 while it is 2 for our non causal dependence.

4.2. LOW ORDER MOMENTS INEQUALITIES

71

Proof of lemma 4.3. Analogously to Bulinski and Sashkin (2005) [33], who sketch Ibragimov (1962) [110], we proceed by recurrence on k for n ≤ 2k to prove the property:

√

1 + |Sn | ≤ C n. (4.2.7) Δ We then assume (4.2.7) for all n ≤ 2K−1 . We note N = 2K and we want to bound 1 + |SN | Δ . We always can share the sum SN in three blocks, the two ﬁrst with the same size n ≤ 2K−1 denoted A and B, and the third V placed between the two ﬁrst and of size q < n. We then have 1 + |SN | Δ ≤ 1 + |A| + |B| Δ + V Δ . The term V Δ is directly bounded with 1 + |V | Δ √ and the property of recurrence, i.e. C q. Writing q = N b with b < 1, then this √ term is of order strictly smaller than N . For 1 + |A| + |B| Δ , we have: E(1 + |A| + |B|)Δ

≤

E(1 + |A| + |B|)2 (1 + |A| + |B|)δ ,

≤

E(1 + 2|A| + 2|B| + (|A| + |B|)2 )(1 + |A| + |B|)δ .

An expansion yields the terms: √ • E(1 + |A| + |B|)δ ≤ 1 + |A|δ2 + |B|δ2 ≤ 1 + 2cδ ( n)δ , δ • E|A|(1 + |A| + |B|)δ ≤ E|A|((1 + |B|)δ + |A|δ ) ≤ E|A|(1 + |B|) E|A|1+δ . √ +1+δ 1+δ 1+δ 1+δ The term E|A| is bounded with A2 and then c ( n) . The term E|A|(1+|B|)δ is bounded using H¨older A1+δ/2 1+|B| δΔ and then √ is at least of order cC δ ( n)1+δ .

• We have the analogous with B instead of A. • E(|A| + |B|)2 (1 + |A| + |B|)δ . For this term, we use an inequality from Bulinski: E (|A| + |B|)2 (1 + |A| + |B|)δ ≤ E|A|Δ + E|B|Δ + 5(EA2 (1 + |B|)δ + EB 2 (1 + |A|)δ ). √ The term E|A|Δ is bounded using (4.2.7) by C Δ ( n)Δ . The second term is the analogous with B. The third is treated with particular care in the following. We now want to control EA2 (1 + |B|)δ and the analogous with B. For this, we introduce the weak dependence. We then have to truncate the variables. We denote X the variable X ∨ (−T ) ∧ T for a real T that will determined later. We then note by extension A and B the sums of the truncated variables X i . Remarking that |B| − |B| ≥ 0, we have: 2

2

E|A|2 (1 + |B|)δ ≤ EA2 (|B| − |B|)δ + E(A2 − A )(1 + |B|)δ + EA (1 + |B|)δ .

72

CHAPTER 4. TOOLS FOR NON CAUSAL CASES

We begin by control EA2 (|B|−|B|)δ . Set m = 2+ζ, then using H¨older inequality with 2/m + 1/m = 1 yields: EA2 (|B| − |B|)δ ≤ A2m (|B| − B|)δ m AΔ is bounded using (4.2.7) and we remark that:

(|B|−|B|)δm ≤ (|B|−|B|1{∀i,|Xi |≤T } )δm ≤ |B|δm 1{∃i,|Xi |>T } ≤ |B|δm 1|B|>T . We then bound 1|B|>T ≤ (|B|/T )α with α = m − δm . Then

E||B| − |B||δm ≤ E|B|m T δm −m . Then, by convexity and stationarity, we have E|B|m ≤ nm E|X0 |m . Then:

EA2 (|B| − |B|)δ n2+m/m T δ−m/m . Finally, remarking that m/m = m − 2, we obtain: EA2 (|B| − |B|)δ nm T Δ−m . We obtain the same bound for the second term: 2

E(A2 − A )(1 + |B|)δ nm T Δ−m . For the third term, we introduce a covariance term: 2

2

2

EA (1 + |B|)δ ≤ Cov(A , (1 + |B|)δ ) + EA E(1 + |B|)δ . √ Δ The last term is bounded with |A|22 |B|δ2 ≤ cΔ n . The covariance is controlled by the weak-dependence notions: • in the κ-dependent case: n2 T κ(q), • in the λ-dependent case: n3 T 2 λ(q). We then choose either the truncation T m−δ−1 = nm−2 /κ(q) or T m−δ = nm−3 /λ(q). Now the three terms of the decomposition have the same order: 1/(m−δ−1) under κ-dependence, E|A|2 (1 + |B|)δ n3m−2Δ κ(q)m−Δ 1/(m−δ) E|A|2 (1 + |B|)δ n5m−3Δ λ(q)m−Δ under λ-dependence. Set q = N b , we note that n ≤ N/2 and this term has order N

3m−2Δ+bκ(Δ−m) m−δ−1

5m−3Δ+bλ(Δ−m) m−δ

under κ-weak dependence and the order N under λ-weak dependence. Those terms are thus negligible with respect to N Δ/2 if: κ λ

> >

3m−2Δ−Δ/2(m−δ−1) , b(m−Δ)

under κ-dependence,

(4.2.8)

5m−3Δ−Δ/2(m−δ) , b(m−Δ)

under λ-dependence.

(4.2.9)

4.3. COMBINATORIAL MOMENT INEQUALITIES

73

Finally, using this assumption, b < 1 and n ≤ N/2 we derive the bound for some suitable constants a1 , a2 > 0: √ Δ E(1 + |SN |)Δ ≤ 2−δ/2 C Δ + 5 · 2−δ/2 cΔ + a1 N −a2 N . Using the relation linking C and c, we conclude that (4.2.7) is also true at the step N if the constant C satisﬁes 2−δ/2 C Δ +5·2−δ/2 cΔ +a1 N −a2 ≤ C Δ . Choose 1/Δ Δ 5c + a1 2δ/2 with c = |Cov(X0 , Xk )|, then the previous relation C> δ/2 2 −1 k∈Z holds. Finally, we use eqns. (4.2.8) and (4.2.9) to ﬁnd a condition on δ. In the case of κ-weak dependence, we rewrite inequality (4.2.8) as: 0 > δ 2 + δ(2κ − 3 − ζ) − κζ + 2ζ + 1. It leads to the following condition on λ: # (2κ − 3 − ζ)2 + 4(κζ − 2ζ − 1) + ζ + 3 − 2κ δ< = A. 2 We do the same in the case of the λ-weak dependence: # (2λ − 6 − ζ)2 + 4(λζ − 4ζ − 2) + ζ + 6 − 2λ δ< = B. 2

(4.2.10)

(4.2.11)

Remark: those bounds are always smaller than ζ.

4.3

Combinatorial moment inequalities

n Let (Xn )n∈N be a sequence of centered r.v.s. Let Sn = i=1 Xi . In this section, q we obtain bounds for |E(Sn )|, when q ∈ N and q ≥ 2. Our main references are here Doukhan and Portal (1983) [73], Doukhan and Louhichi (1999) [67], and Rio (2000) [161]. We ﬁrst introduce the following coeﬃcient of weak dependence. Definition 4.1. Let (Xn ) be a sequence of centered r.v.s. For positive integer r, we deﬁne the coeﬃcient of weak dependence as non-decreasing sequences (Cr,q )q≥2 such that sup |Cov(Xt1 · · · Xtm , Xtm+1 · · · Xtq )| =: Cr,q ,

(4.3.1)

where the supremum is taken over all {t1 , . . . , tq } such that 1 ≤ t1 ≤ · · · ≤ tq and m, r satisfy tm+1 − tm = r.

74

CHAPTER 4. TOOLS FOR NON CAUSAL CASES

Below, we provide explicit bounds of Cr,q in order to obtain inequalities for moments of the partial sums Sn . We shall assume, that either there exist constants C, M > 0 such that for any convenient q-tuple {t1 , . . . , tq } as in the deﬁnition, Cr,q ≤ CM q (r),

(4.3.2)

or, denoting by QX the quantile function of |X| (inverse of the tail function t → P(|X| > t), see (2.2.14)), (r) Cr,q ≤ c(q) QXt1 (x) · · · QXtq (x)dx, (4.3.3) 0

The bound (4.3.2) holds mainly for bounded sequences. E.g. if Xn ∞ ≤ M and X is (Λ ∩ L∞ , Ψ)-weak dependent, we have: Cr,q ≤ max Ψ(j ⊗m , j ⊗(q−m) , m, q − m)M q (r), 1≤m
where j(x) = x1|x|≤1 + 1x>1 − 1x<−1 . If Ψ(h, k, u, v) = c(u, v)Lip (h)Lip (k), the bound becomes Cr,q ≤ max c(m, q − m)M q−2 (r). 1≤m
The bound (4.3.3) holds for more general r.v.s, using moment or tail assumptions. With Lemma 4.1, we derive that if the concentration property (4.1.1) holds then the η (resp. κ) weak dependence implies (I, c)-weak dependence. Now relation (4.3.3) follows from the following lemma. Lemma 4.4. If the sequence (Xn )n∈N is (I, c)-weak dependent, then (r) |Cov(Xt1 · · · Xtm , Xtm+1 · · · Xtq )| ≤ Cq Qt1 (u) · · · Qtq (u)du, 0

where Cq = (maxu+v≤q c(u, v)) ∨ 2. The quantity Qti denotes the inverse of the tail function t → P(|Xti | > t) (inverse are deﬁned in eqn. 2.2.14)). Proof of Lemma 4.4. Let Y + = 0 ∨ Y and Y − = 0 ∨ (−Y ), +∞ +∞ Y+ = 1x≤Y + dx = 1x≤Y dx, 0

Y−

=

0

and

0

+∞

1x≤Y − dx =

0

+∞

1x≤−Y dx.

The inclusion-exclusion formula entails, Y1 · · · Yq =

q i=1

(Yi+ − Yi− ) =

(−1)q−r Yi+ · · · Yi+ Yi− · · · Yi− , 1 r r+1 q

4.3. COMBINATORIAL MOMENT INEQUALITIES

75

where denotes a summation over all the permutations {i1 , . . . , iq } of {1, . . . , q}. Using Fubini’s theorem, the preceding integral representation yields Y1 · · · Yq = (−1)q−r × 1x1 ≤Yi1 · · · 1xr ≤Yir 1xr+1 ≤−Yr+1 · · · 1xq ≤−Yiq dx1 · · · dxq Rd +

=

q

Rd + i=1

(1xi ≤Yi − 1xi ≤−Yi )dx1 · · · dxq .

Again Fubini’s theorem yields q E(Y1 · · · Yq ) = E (1xi ≤Yi − 1xi ≤−Yi ) dx1 · · · dxq . Rd +

(4.3.4)

i=1

Now, eqn. (4.3.4) applied with Yi = Xti for i = 1, . . . , q, together with Fubini’s theorem implies

⎛

Cov Xt1 · · · Xtm , Xtm+1 · · · Xtq =

Rd +

Cov ⎝

m

fi (Xti ),

i=1

q

⎞ fi (Xti )⎠ dx1 · · · dxq ,

i=m+1

where fi (y) = 1xi ≤y − 1xi ≤−y . Deﬁne m q fi (Xti ), fi (Xti ) . B = Cov i=1

(4.3.5)

i=m+1

In the sequel, we give two bounds of the quantity B. The ﬁrst bound does not use the dependence structure, only that |fi (y)| = 1xi ≤|y| . Thus B ≤ 2 inf(ΦXt1 (x1 ), . . . , ΦXtq (xq )),

(4.3.6)

with ΦX (x) = P(|X| ≥ x). The second bound is deduced from the (I, c)-weak dependence property. In fact, we have (recall that r = tm+1 − tm ) B ≤ c(u, v) (r).

(4.3.7)

The bound (4.3.7) together with (4.3.6) yields B ≤ Cq inf( (r), ΦXt1 (x1 ), . . . , ΦXtq (xq )). Hence |Cov(Xt1 · · · Xtm , Xtm+1 · · · Xtq )| +∞ +∞ ... inf( (r), ΦXt1 (x1 ), . . . , ΦXtq (xq ))dx1 · · · dxq . ≤ (cq ∨ 2) 0

0

76

CHAPTER 4. TOOLS FOR NON CAUSAL CASES

The proof of Theorem 1-1 in Rio (1993) [157] can be completely implemented here. We give it for completeness. Let U be an uniform-[0,1] r.v, then

(r) ∧ min ΦXtj (xj ) 1≤j≤q

= P(U ≤ (r), U ≤ ΦXt1 (x1 ), . . . , U ≤ ΦXtq (xq )) = P(U ≤ (r), x1 ≤ QXt1 (U ), . . . , xq ≤ QXtq (U )).

We obtain, collecting the above results |Cov(Xt1 · · · Xtm , Xtm+1 · · · Xtq )| ≤ Cq EQXt1 (U ) · · · QXtq (U )1U≤(r) . The lemma is thus proved. In order to make possible to use such bounds it will be convenient to express bounds of the quantities sa,b,N =

N

(r)

(r + 1)a

Qb (s)ds

(4.3.8)

0

r=0

under conditions of summability of the series , for suitable constants a ≥ 0 and N, b > 0 and a tail function Q of a random variable X such that E|X|b+δ < ∞ r for some δ > 0. Set for convenience Ar = i=0 (i + 1)a for r ≥ 0 and = 0 for (r) b r < 0, and Br = 0 Q (s)ds for r ≥ 0 (= 0 for r < 0), then expression sa,b,N rewrites as follows; Abel transform with H¨ older inequality for the conjugate exponents p = 1 + b/δ and q = 1 + δ/b implies the succession of inequalities sa,b,N =

N

(Ar − Ar−1 )Br

r=0

=

N −1 r=0

1

=

Ar (Br − Br+1 ) + AN BN

N −1

0

≤ ≤

Ar 1](r+1),(r)](s) + AN 1[0,(N )] (s) Qb (s)ds

r=0 1

−1 N

0 −1 N

1/p ds

Apr 1](r+1),(r)](s) + ApN 1[0,(N )] (s)

r=0

1/p 1/q E|X|bq Apr ( (r) − (r + 1)) + ApN (N )

r=0

Hence sa,b,N ≤

N r=0

1/p (Apr

−

Apr−1 ) (r)

b/(b+δ) E|X|b+δ

0

1

Qbq (s)ds

1/q

4.3. COMBINATORIAL MOMENT INEQUALITIES

77

Now we note that ra+1 /(a + 1) ≤ Ar ≤ (r + 1)a+1 /(a + 1) so that Apr −Apr−1 ≤ ((r+1)p(a+1) −(r−1)p(a+1) )/(a+1)p ≤ 2p(r+1)p(a+1)−1 /(a+1)p−1 hence, setting c = (2p)1/p (a + 1)1/p−1 , sa,b,N ≤ c

N

δ/(b+δ) a+(a+1)b/δ

(r + 1)

(r)

Xbb+δ .

r=0

We summarize this in the following lemma. Lemma 4.5. Let a > 0, b ≥ 0 be arbitrary then there exists a constant c = c(a, b) such that for any real random valued variable X with quantile function QX , we have for any N > 0 N

(r)

QbX (s)ds ≤ c

a

(r + 1)

r=0

4.3.1

0

N

δ/(b+δ) (r + 1)a+(a+1)b/δ (r) Xbb+δ .

r=0

Marcinkiewicz-Zygmund type inequalities

Our ﬁrst result is the following Marcinkiewicz-Zygmund inequality. Theorem 4.1. Let (Xn )n∈N be a sequence of centered r.v.s fulﬁlling for some ﬁxed q ∈ N, q ≥ 2 (4.3.9) Cr,q = O(r−q/2 ), as r → ∞. Then there exists a positive constant B not depending on n for which |E(Snq )| ≤ Bnq/2 . Proof of Theorem 4.1. For any integer q ≥ 2, let Aq (n) = |E Xt1 · · · Xtq |.

(4.3.10)

(4.3.11)

1≤t1 ≤···≤tq ≤n

Clearly, |E(Snq )| ≤ q!Aq (n).

(4.3.12)

Hence, in order to bound |E(Snq )|, it remains to bound Aq (n). For this, we argue as in Doukhan and Portal (1983) [73]. Clearly |E(Xt1 · · · Xtm )E(Xtm+1 · · · Xtq )| Aq (n) ≤ 1≤t1 ≤···≤tq ≤n

+

1≤t1 ≤···≤tq ≤n

|Cov(Xt1 · · · Xtm , Xtm+1 · · · Xtq )|.

78

CHAPTER 4. TOOLS FOR NON CAUSAL CASES

The ﬁrst term on the right hand of the last inequality is bounded by:

|E(Xt1 · · · Xtm )E(Xtm+1 · · · Xtq )|

≤

1≤t1 ≤···≤tq ≤n

q−1

Am (n)Aq−m (n).

m=1

Hence Aq (n) ≤

q−1

Am (n)Aq−m (n) + Vq (n).

(4.3.13)

|Cov(Xt1 · · · Xtm , Xtm+1 · · · Xtq )|,

(4.3.14)

m=1

with

Vq (n) =

(t1 ,··· ,tq )∈Gr

where Gr is the set of {t1 , . . . , tq } fulﬁlling 1 ≤ t1 ≤ · · · ≤ tq ≤ n with r = tm+1 − tm = max1≤i
n ∗

|Cov(Xt1 · · · Xtm , Xtm+1 · · · Xtq )|,

t1 =1

∗ denotes a sum over such a collection 1 ≤ t1 ≤ · · · ≤ tq ≤ n with ﬁxed where t1 , and r = tm+1 − tm = max1≤i≤q−1 (ti+1 − ti ) ∈ [0, n − 1]. Again ∗

|Cov(Xt1 · · · Xtm , Xtm+1 · · · Xtq )| ≤

n−1 ∗∗

|Cov(Xt1 · · · Xtm , Xtm+1 · · · Xtq )|.

r=0

where

∗∗

denotes the (q − 2)-dimensional sums each over {(t1 , . . . , tq ) / ti−1 ≤ ti ≤ ti−1 + r, i = 1, m + 1 }.

∗∗ Hence 1 = (r + 1)q−2 , with |Cov(Xt1 · · · Xtm , Xtm+1 · · · Xtq )| ≤ Cr,q , we deduce that n n−1 Vq (n) ≤ (r + 1)q−2 Cr,q . (4.3.15) t1 =1 r=0

We obtain, collecting inequalities (4.3.13) and (4.3.15), Aq (n) ≤

q−1 m=1

Am (n)Aq−m (n) + n

n−1

(r + 1)q−2 Cr,q .

(4.3.16)

r=0

By induction on q, and using the last inequality together with condition (4.3.9), it is easy to check that Aq (n) ≤ Kq nq/2 . Theorem 4.1 follows from (4.3.12).

4.3. COMBINATORIAL MOMENT INEQUALITIES

4.3.2

79

Rosenthal type inequalities

The following lemma, which is a variant of Lemma 1 page 195 in Billingsley (1968) [20], gives moment inequalities of order q ∈ {2, 4}. Lemma 4.6. If (Xn )n∈N is a sequence of centered r.v.s, then ⎧ ⎫ 2 n−1 n−1 n−1 ⎨ ⎬ n E(Sn2 ) ≤ 2n Cr,2 , E(Sn4 ) ≤ 4! Cr,2 +n (r + 1)2 Cr,4 . ⎩ ⎭ r=0

r=0

r=0

(4.3.17) Proof of Lemma 4.6. We take respectively q = 2 and q = 4 in the relation (4.3.16). The obtained formulas, together with (4.3.12) and the fact that A1 (n) = 0 for any positive integer n, prove Lemma 4.6. The following theorem deals with higher order moments. Theorem 4.2. Let q be some ﬁxed integer not less than 2. Assume that the dependence coeﬃcients Cr,p associated to the sequence (Xn ) satisfy, for every nonnegative integer p, p ≤ q, and for some positive constants C and M , Cr,p ≤ CM p (r).

(4.3.18)

Then, for any integer n ≥ 2 ⎧ q/2 ⎫ n−1 n−1 ⎬ ⎨ (2q − 2)! . Cn Mq

(r) ∨ Cn (r + 1)q−2 (r) |E(Snq )| ≤ ⎭ ⎩ (q − 1)! r=0

r=0

(4.3.19) Proof of Theorem 4.2. The relation (4.3.16) together with Condition (4.3.18) gives, Aq (n) ≤

q−1

Am (n)Aq−m (n) + CM q n

m=1

n−1

(r + 1)q−2 (r).

(4.3.20)

r=0

In order to solve the above inductive relation, we need the following lemma. Lemma 4.7. Let (Uq )q>0 and (Vq )q>0 be two sequences of positive real numbers satisfying for some γ ≥ 0, and for all q ∈ N∗ Uq ≤

q−1 m=1

Um Uq−m + M q Vq ,

(4.3.21)

80

CHAPTER 4. TOOLS FOR NON CAUSAL CASES

with U1 = 0 ≤ V1 . Suppose that for every integers m, q fulﬁlling 2 ≤ m ≤ q − 1, there holds m/2 (q−m)/2 q/2 ∨ Vm )(V2 ∨ Vq−m ) ≤ (V2 ∨ Vq ). (4.3.22) (V2 Then, for any integer q ≥ 2 Uq ≤

Mq q

2q − 2 q−1

q/2

(V2

∨ Vq ).

(4.3.23)

Remark 4.1. Note that a suﬃcient condition for (4.3.22) to hold is that for all p, q such that 2 ≤ p ≤ q − 1 Vpq−2 ≤ Vqp−2 V2q−p

(4.3.24)

Proof of Lemma 4.7. Let (Uq )q>0 and (Vq )q>0 be two sequences of positive real numbers as deﬁned by Lemma 4.7. We deduce from (4.3.21) and (4.3.22), that q/2 the sequence (U˜q ), deﬁned by U˜q = Uq /M q (V2 ∨ Vq ) satisﬁes the relation, ˜q ≤ U

q−1

˜m U ˜q−m + 1, U

˜1 = 0. U

m=1

In order to prove (4.3.23), it suﬃces to show that, for any integer q ≥ 2, 1 2q − 2 ˜ Uq ≤ dq := , (4.3.25) q−1 q where dq is called the q-th number of Catalan. The proof of the last bound is done by induction on q. Clearly (4.3.25) is true for q = 2. Suppose now that (4.3.25) is true for every integer m less than q − 1. The inductive hypothesis ˜1 = 0: yields with U q−2 ˜ Uq ≤ dm dq−m + 1. (4.3.26) m=2

The last inequality, together with the identity dq = q−1 m=1 dm dq−m (cf. Comtet ˜q ≤ dq , proving (4.3.25) and thus Lemma 4.7. (1970) [39], page 64), implies U We continue the proof of Theorem 4.2. We deduce from (4.3.20) that the sequence (Aq (n))q satisﬁes (4.3.21) with Vq := Vq (n) = CM q−2 n

n−1

(r + 1)q−2 (r).

r=0

Hence, to prove Theorem 4.2, it suﬃces to check condition (4.3.22). m/2

(V2

(q−m)/2

∨ Vm )(V2

q/2

∨ Vq−m ) ≤ V2

m/2

∨ V2

(q−m)/2

Vq−m ∨ Vm V2

∨ Vm Vq−m .

4.3. COMBINATORIAL MOMENT INEQUALITIES

81

To control each of these terms, we use three inequalities. Let p be a positive integer, 2 ≤ p ≤ q − 1. We deduce from, n−1 n−1 q−p p−2 q−2 q−2 n−1 p−2 q−2 (r + 1) (r) ≤

(r) (r + 1) (r) , r=0

r=0

that, for 2 ≤ p ≤ q − 1,

r=0 p−2

q−p

Vp ≤ Vqq−2 V2q−2 .

(4.3.27)

n−1

Deﬁne the discrete r.v. Z by P(Z = r + 1) = (r)/ i=0 i . Jensen inequality implies Zp−2 ≤ Zq−2 if 1 ≤ p − 2 ≤ q − 2 so that p−2

Vp ≤ Vqq−2 . For 0 < α < 1,

αq

V2 2 Vq1−α ≤ V2

q/2

(4.3.28) ∨ Vq .

(4.3.29)

Using (4.3.28), we get (q−m)/2

Vm V2

m/2

V2

(q−m)q

m−2

≤ V2 2(q−2) Vq q−2 mq

q−m−2 q−2

Vq−m ≤ V22(q−2) Vq

From (4.3.27) we obtain q

q−4

Vm Vq−m ≤ V2q−2 Vqq−2 q/2

Now (4.3.29) implies that these three bounds are less than V2

∨ Vq .

Theorem 4.3. If (Xn )n∈N is a centered and (I, c)-weak dependent sequence, then $ n 1 −1 q−1 q (2q − 2)! q |E(Sn )| ≤ Cq

(u) ∧ n Qi (u)du (q − 1)! i=1 0 q/2 ⎫ n 1 ⎬ 2 −1 ∨ C2 ,

(u) ∧ n Qi (u)du ⎭ 0 i=1

where −1 (u) is the generalized inverse of [u] (see eqn. (2.2.14)). Proof of Theorem 4.3. We begin from relation (4.3.16), and we try to evaluate the coeﬃcient of dependence Cr,q for (I, ψ)-weak dependent sequences. For this, we need the forthcoming lemma. Lemma 4.8. If the sequence (Xn )n∈N is (I, c)-weak dependent, then n 1 −1 q−1 q

(u) ∧ n Vq (n) ≤ Cq Qi (u)du. i=1

0

82

CHAPTER 4. TOOLS FOR NON CAUSAL CASES

Proof of Lemma 4.8. [157], we obtain

We use Lemma 4.4. Arguing exactly as in Rio (1993)

Vq (n) ≤

Cq

n−1

r=0 (i1 ,...,iq )∈Gr n−1 Cq q r=0

≤

Cq q

≤

(r)

q (r) 0

(i1 ,...,iq )∈Gr

n−1

Qi1 (u) · · · Qiq (u)du

0

j=1

r=0 (i1 ,...,iq )∈

. k≤r

Qqij (u)du

(r)

(r+1)

Gk

Qqij (u)du

Now ﬁxing ij and noting . that the number of completing (i1 , . . . , ij−1 , ij+1 , . . . , iq ) to get an sequence in k≤r Gk is less than (r + 1)q−1 : n−1

r=0 (i1 ,...,iq )∈∪k≤r Gk

(r)

(r+1)

Qqij (u)du

≤

n n−1 ij =0 r=0

≤

n i=0

1 0

(r)

(r+1)

(r + 1)q−1 Qqij (u)du

( −1 (u) ∧ n)q−1 Qqi (u)du

Lemma 4.8 is now completely proved. We continue the proof of Theorem 4.3. We deduce from Lemma 4.8 and Inequality (4.3.16), that the sequence (Aq (n))q fulﬁlls relation (4.3.21). So, as in the proof of Theorem 4.2, Theorem 4.3 is proved, if we prove that the sequence p−1 p

n 1 V˜p (n) := (cp ∨ 2) i=1 0 −1 (u) ∧ n Qi (u)du satisﬁes (4.3.22). We have, 1 −1 p−1 p

(u) ∧ n Qi (u)du 0

≤

0

1

−1

(u) ∧ n Q2i (u)du

q−p q−2 0

1

−1 q−2 q

(u) ∧ n Qi (u)du

p−2 q−2 .

The last bound proves that the sequence (V˜p (n))p fulﬁlls the convexity inequality (4.3.27), which in turns implies (4.3.22).

4.3.3

A ﬁrst exponential inequality

For any positive integers n and q ≥ 2, we consider the following assumption, Mq,n = n

n−1

(r + 1)q−2 Cr,q ≤ An

r=0

q! βq

(4.3.30)

4.3. COMBINATORIAL MOMENT INEQUALITIES

83

where β is some positive constant and An is a sequence independent of q. As a consequence of Theorem 4.2, we obtain an exponential inequality. Corollary 4.1. Suppose that (4.3.18) and (4.3.30) hold for some sequence An ≥ 1 for any n ≥ 2. Then for any positive real number x # # P |Sn | ≥ x An ≤ A exp −B βx , (4.3.31) for universal positive constants A and B. Remark 4.2.

√ • One may choose the explicit values A = e4+1/12 8π, and B = e5/2 . • Let us note that condition (4.3.30) holds if Cr,q ≤ CM q e−br for positive constants C, M, b. In such a case An is of order n. E.g. this holds if Xn ∞ ≤ M and Xn 2 ≤ σ under (Λ ∩ L∞ , Ψ)-weak dependence if

(r) = O(e−br ) and Ψ(h, k, u, v) ≤ eδ(u+v) Lip (h)Lip (k) for some δ ≥ 0. q−2 −ar e with integrals or with For this, either compare the series r (r + 1) derivatives of the function t → 1/(1 − t) = i ti at point t = e−a .

• The use of combinatorics in those inequalities makes them relatively weak. E.g. Bernstein inequality, valid for independent sequences allows to replace √ the term x in the previous inequality by x2 under the same assumption nσ 2 ≥ 1; in the mixing cases analogue inequalities are also obtained by using coupling arguments (not available here), e.g. the case of absolute regularity is studied in Doukhan (1994) [61]. Proof of Corollary 4.1. Theorem 4.2 written with q = 2p yields (2p)! 4p − 2 p E(Sn2p ) ≤ ). (M2p,n ∨ M2,n 2p − 1 2p Hence inequality (4.3.32) together with condition (4.3.30) implies p 2An (2p)! (4p − 2)! ∨ A E(Sn2p ) ≤ n (2p − 1)! β2 β 2p (2p)! (4p − 2)! (An ∨ Apn ) 2p ≤ (2p − 1)! β (4p)! ≤ (An ∨ Apn ) 2p . β From Stirling formula and from the fact that An ≥ 1 we obtain P(|Sn | ≥ x)

≤ ≤

E(Sn2p ) Apn 1/12−4p # ≤ e 8πp(4p)4p x2p x2p β 2p 2p √ 16 −7/4 2 # e e1/12 8π p An . xβ

(4.3.32)

84

CHAPTER 4. TOOLS FOR NON CAUSAL CASES

Now setting h(y) = (Cn y)4y with Cn2 =

16 −7/4 xβ e

√ An , one obtains

√ P(|Sn | ≥ x) ≤ e1/12 8πh(p). Deﬁne the convex function g(y) = log h(y). Clearly 1 inf+ g(y) = g . eCn y∈R 1 , then Suppose that eCn ≤ 1 and let p0 = eC n

√ √ −4 P(|Sn | ≥ x) ≤ e1/12 8πh(p0 ) ≤ e4+1/12 8π exp( ). eCn √ −4 ). Suppose now that eCn ≥ 1, then 1 ≤ e4+1/12 8π exp( eC n In both cases, inequality (4.3.31) holds and Corollary 4.1 is proved.

Remark. More accurate applications of those inequalities are proposed in Louhichi (2003) [125] and Doukhan & Louhichi (1999) [67]. In particular in Section 3 of [67] conditions for (4.3.18) are checked, providing several other bounds for the coeﬃcients Cr,q .

4.4

Cumulants

The main objective of this section is to reinterpret the cumulants which classically used expressions to measure the dependence properties of a sequence.

4.4.1

General properties of cumulants

Let Y = (Y1 , . . . , Yk ) ∈ Rk be a random vector, setting φY (t) = Eeit·Y =

k E exp i j=1 tj Yj for t = (t1 , . . . , tk ) ∈ Rk , we write mp (Y ) = EY1p1 · · · Ykpk for p = (p1 , . . . , pk ) if E(|Y1 |s + · · · + |Y1 |s ) < ∞ and |p| = p1 + · · · + pk = s. Finally if the previous moment condition holds for some r ∈ N∗ , then the function log φY (t) has a Taylor expansion log φY (t) =

i|p| κp (Y )tp + o(|t|r ), p!

as t → 0

|p|≤r

for some coeﬃcients κp (Y ) called the cumulant of Y of order p ∈ Rk if |p| ≤ s where we set p! = p1 ! · · · pk !, tp = tp11 · · · tpkk if t = (t1 , . . . , tk ) ∈ Rk and p = (p1 , . . . , pk ).

4.4. CUMULANTS

85

In the case p = (1, . . . , 1), to which the others may be reduced, we denote κ(1,...,1) (Y ) = κ(Y ). Moreover, if μ = {i1 , . . . , iu } ⊂ {1, . . . , k} κμ (Y ) = κ(Yi1 , . . . , Yiu ),

mμ (Y ) = m(Yi1 , . . . , Yiu ).

Leonov and Shyraev (1959) [119] (see also Rosenblatt, 1985, pages 33-34 [168]) obtained the following expressions κ(Y )

=

k

=

u

mμj (Y )

(4.4.1)

μ1 ,...,μu j=1

u=1

m(Y )

(−1)u−1 (u − 1)!

k

u

κμj (Y )

(4.4.2)

u=1 μ1 ,...,μu j=1

The previous sums are considered for all partitions μ1 , . . . , μu of the set {1, . . . , k}. We now recall some notions from Saulis and Statulevicius (1991) [173]. For this we reformulate their notations. Definition 4.2. Centered moments of a random vector Y = (Y1 , . . . , Yk ) are deﬁned by setting E (Y1 , . . . , Yl ) = EY1 c(Y2 , . . . , Yl ) where the centered random /012 variables c(Y2 , . . . , Yl ) are deﬁned recursively, by setting c(ξ1 ) = ξ = ξ1 −Eξ1 , / 01 2 c(ξj , ξj−1 , . . . , ξ1 ) = ξj c(ξj−1 , . . . , ξ1 ) = ξj (c(ξj−1 , . . . , ξ1 ) − Ec(ξj−1 , . . . , ξ1 )) We also write Yμ = (Yj /j ∈ μ) as a p−tuple if μ ⊂ {1, . . . , k}.

Quote for comprehension that E (ξ) = 0, E (η, ξ) = Cov(η, ξ) and, E (ζ, η, ξ) = E(ζηξ) − E(ζ)E(ηξ) − E(η)E(ζξ) − E(ξ)E(ζη). A remarkable result from Saulis and Statulevicius (1991) will be informative Theorem 4.4 (Saulis, Statulevicius, 1991 [173]). κ(Y1 , . . . , Yk ) =

k

(−1)u−1

u=1

Nu (μ1 , . . . , μu )

μ1 ,...,μu

u

E Yμj

j=1

where sums are considered for3all partitions3 μ14,4. . . , μu of the set {1, . . . , k} and the integers Nu (μ1 , . . . , μu ) ∈ 0, (u − 1)! ∧ k2 ! , deﬁned for any such partition, satisfy the relations • N (k, u) =

μ1 ,...,μu

Nu (μ1 , . . . , μu ) =

u−1 j=1

Ckj (u − j)k−1 ,

86

CHAPTER 4. TOOLS FOR NON CAUSAL CASES •

k

N (k, u) = (k − 1)!

u=1

Using this representation the following bound will be useful Lemma 4.9. Let Y1 , . . . , Yk ∈ R be centered random variables. For k ≥ 1, we set Mk = (k − 1)!2k−1 max1≤i≤k E|Yi |k , then |κ(Y1 , . . . , Yk )| Mk Ml

≤ Mk ,

(4.4.3)

≤ Mk+l , if k, l ≥ 2.

(4.4.4)

Mention that a consequence of this lemma will be used in the following: u

|κp (Y1 , . . . , Ypu )| ≤ Mp1 +···+pu

(4.4.5)

i=1 (a )

We shall use this inequality for components Yi = Xki i of a stationary sequence (j)

of RD −valued random variable hence maxi≥1 E|Yi |p ≤ max1≤j≤D E|X0 |p and we may set (j)

Mp = (p − 1)!2p−1 max E|X0 |p . 1≤j≤D

(4.4.6)

Proof of lemma 4.9. The second point in this lemma follows from the elementary inequality a! b! ≤ (a + b)! and the ﬁrst one is a consequence of theorem 4.4 and of the following lemma Lemma 4.10. For any j, p ≥ 1 and any real random variables ξ0 , ξ1 , ξ2 , . . . with identical distribution, c(ξj , ξj−1 , . . . , ξ1 )p ≤ 2j max ξi jpj , 1≤i≤j

with ξq = E1/q |ξ|q .

Proof of lemma 4.10. For simplicity we shall omit suprema replacing maxi≤j ξi p by ξ1 p . First of all, H¨ older inequality implies c(ξ1 )p ≤ ξ1 p + |Eξ1 | ≤ 2ξ1 p , We now use recursion; setting Zj = c(ξj , ξj−1 , . . . , ξ1 ) yields Zj = ξj (Zj−1 − EZj−1 ) hence Minkowski and H¨older inequalities entail Zj p

≤

ξj Zj−1 p + ξ0 p |EZj−1 |

≤

ξ0 pj Zj−1 q + ξ0 pj Zj−1 p

≤

2j ξ0 pl ξ0 j−1 q(j−1)

4.4. CUMULANTS where

87

1 1 + = 1; from p ≥ 1 we infer q(j − 1) ≤ pj to conclude. q p·l

Proof of lemma 4.9. Here again, we omit suprema and we replace maxj≤J Yj p

by Y0 p . With lemma 4.10 we deduce that | E Yμ | ≤ 2l−1 Y0 ll with l = #μ. Indeed, write Z = c(Y2 , . . . , Yl ), then with 1q + 1l = 1 we get E (Y1 , . . . , Yl ) = |EY1 Z| ≤ Y0 l Zq ≤ 2l−1 Y0 ll since q(l − 1) = l. Hence theorem 4.4 implies, |κ(Y )|

≤

k

Nu (μ1 , . . . , μu )

u=1 μ1 ,...,μu

≤

k

u

i 2#μi −1 Y0 #μ #μi

i=1

2k−u N (k, u)Y0 kk

u=1

≤ 2k−1 Y0 kk

k

N (k, u)

u=1

= 2k−1 (k − 1)!Y0 kk .

The following lemmas are essentially proved in Doukhan & Le´on (1989) [66] for real valued sequences (Xn )n∈Z . Let now (Xn )n∈Z denote a vector valued and stationary sequence (with values in RD ), we deﬁne∗ , extending Doukhan and Louhichi (1999)’s coeﬃcients, (a ) (al+1 ) (a ) (a ) sup · · · Xtq q (4.4.7) cX,q (r) = Cov Xt1 1 · · · Xtl l , Xtl+1 1 ≤ l < q 1 ≤ a1 , . . . , aq ≤ D t1 ≤ · · · ≤ tq tl+1 − tl ≥ r

We also deﬁne the following decreasing coeﬃcients, for further convenience, cX,q (r) = max cX,l (r)μq−l , 1≤l≤q

μt = max E|X0 |t .

with

1≤d≤D

(4.4.8)

In order to state the following results we set, for 1 ≤ a1 , . . . , aq ≤ D, (al )

κ(q)(a1 ,...,aq ) (t2 , . . . , tq ) = κ(1,...,1) (X0

(a )

(a )

, Xt2 2 , . . . , Xtq q )

The following decomposition lemma will be very useful. It explain how cumulants behave naturally as covariances. Precisely, it proves that a cumulant ∗ For D = 1 this coeﬃcient was already deﬁned in deﬁnition 4.1 as C r,q but the present notation recalls also the underlying process.

88

CHAPTER 4. TOOLS FOR NON CAUSAL CASES

κQ (Xk1 , . . . , XkQ ) is small if kl+1 − kl is large, k1 ≤ · · · ≤ kQ , and the process is weakly dependent. This is a natural extension of one essential property of the cumulants which states that such a cumulant vanishes if the index set can be partitioned into two strict subsets such that the vector random variables determined this way are independent. Definition 4.3. Let t = (t1 , . . . , tp ) be any p−tuple in Zp such that t1 ≤ · · · ≤ tp , we denote r(t) = max1≤l

max

t1 ≤ · · · ≤ tp r(t1 , . . . , tp ) ≥ r

max

1≤a1 ,...,ap ≤D

(a ) (a ) κp Xt1 1 , . . . , Xtp p

(4.4.9)

Lemma 4.11. Let (Xn )n∈Z be a stationary process, centered at expectation with ﬁnite moments of any order. Then if Q ≥ 2 we have, using the notation in lemma 4.9, κX,Q (r) ≤ cX,Q (r) +

Q−2

5 MQ−s

s=2

Q 2

6Q−s+1 κX,s (r)

7 (a) (a ) Proof of lemma 4.11. We denote Xη = i∈η Xi i for any p−tuples η ∈ Zp and p a = (a1 , . . . , ap ) ∈ {1, . . . , D} (this way admits repetitions of the succession η). We assume that k1 ≤ · · · ≤ kQ satisfy kl+1 − kl = r = max1≤s

(a )

(a)

(a)

κ(Xk1 1 , . . . , XkQQ ) = Cov(Xη(k) , Xη(k) ) −

u

where Kμ,k =

κνμ (k) Kμ,k ,

(4.4.10)

{μ}

κμi (k) and where the previous sum extends to partitions

μi =νu

μ = {μ1 , . . . , μu } of {1, . . . , Q} such that there exists some 1 ≤ i ≤ u with μi ∩ν = ∅ and μi ∩ν = ∅. For simplicity the previous formulas omit the reference to indices (a1 , . . . , aQ ) which is implicit. We use the simple but essential remark that r νμ (k) ≥ r(k) to derive |κνμ (k) | ≤ κX,#νμ (r). We now use lemma 4.9 to deduce |Mμ | ≤ MQ−#μν as in eqn. (4.4.5). This

4.4. CUMULANTS

89

yields the bound (a ) (a ) κ Xk1 1 , . . . , XkQQ ≤

[Q/2]

CX,Q (r) +

(u − 1)!

CX,Q (r) +

(u − 1)!

u=2

CX,Q (r) +

u=2

≤

CX,Q (r) +

Q−2 s=2

U since the inequality u=1 (u−1)p ≤ a sum and an integral.

Q−2

MQ−s κX,s (r)

s=2

[Q/2]

≤

MQ−#νμ |κνμ (k) (X (a) )|

μ1 ,...,μu

u=2 [Q/2]

≤

(u − 1)!

μ1 , . . . , μu #νμ = s

Q−2

(u − 1)Q−s MQ−s κX,s (r)

s=2

5 6Q−s+1 Q 1 MQ−s κX,s (r) Q−s+1 2

1 p+1 p+1 U

follows from a comparison between

Rewrite now the lemma 4.11 as κX,Q (r) ≤ cX,Q (r) +

Q−2

BQ,s κX,s (r)

s=2

thus the following formulas follow κX,2 (r) κX,3 (r)

≤ ≤

cX,2 (r), cX,3 (r),

κX,4 (r)

≤ ≤

cX,4 (r) + B4,2 κX,2 (r) cX,4 (r) + B4,2 cX,2 (r),

κX,5 (r)

≤ ≤

cX,5 (r) + B5,3 κX,3 (r) + B5,2 κX,2 (r) cX,5 (r) + B5,3 cX,3 (r) + B5,2 cX,2 (r),

κX,6 (r)

≤

cX,6 (r) + B6,4 κX,4 (r) + B6,3 κX,3 (r) + B6,2 κX,2 (r)

≤ ≤

cX,6 (r) + B6,4 (cX,4 (r) + B4,2 cX,2 (r)) + B6,3 cX,3 (r) + B6,2 cX,2 (r) cX,6 (r) + B6,4 cX,4 (r) + B6,3 cX,3 (r) + (B6,2 + B6,4 B4,2 )cX,2 (r).

A main corollary of lemma 4.11 is the following, it is proved by induction. Corollary 4.2. For any Q ≥ 2, there exists a constant AQ ≥ 0 only depending on Q, such that κX,Q (r) ≤ AQ c∗X,Q (r).

1

90

CHAPTER 4. TOOLS FOR NON CAUSAL CASES

Remark 4.3. This corollary explains an equivalence between the coeﬃcients cX,Q (r) and the κQ (r) which may also be preferred as a dependence coeﬃcient. A way to derive sharp bounds for the constants involved is, using theorem 4.4, to decompose the corresponding sum of centered moments in two terms, the ﬁrst of them involving the maximal covariance of a product. Section 12.3.2 will provide multivariate extensions devoted to spectral analysis. For completeness sake remark that formula (4.4.10) implies with BQ,Q = 1 that Q Q such that BQ,s κX,s (r). Hence there exists some constant A cX,Q (r) ≤ s=2

Q κ∗X,Q (r), cX,Q (r) ≤ A

κ∗X,Q (r) = max κ∗X,l (r)μQ−l 2≤l≤Q

Finally, we have proved that constants aQ , AQ > 0 satisfy aQ c∗X,Q (r) ≤ κ∗X,Q (r) ≤ AQ c∗X,Q (r) Hence, for ﬁxed Q those inequalities are equivalent. The previous formula (4.4.10) implies that the cumulant (a )

(a )

κ(Xk1 1 , . . . , XkQQ ) =

(a)

(a)

Kα,β,k Cov(Xα(k) , Xβ(k) )

α,β

writes as the linear combination of such covariances with α ⊂ {1, . . . , l} and β ⊂ {l + 1, . . . , Q} where the coeﬃcients Kα,β,k are some polynomials of the cu(a ) (a ) (a) mulants. For this one replaces the Q−tuple (Xk1 1 , . . . , XkQQ ) by (Xi )i∈νμ (k) for each partition μ, in formula (4.4.10) and use recursion. Such representation is useful namely if one precisely knows such covariances; let us mention the cases of Gaussian and associated processes for which additional informations are provided. The main attraction for cumulants with respect to covariance of products is that if a sample (Xk1 , . . . , Xkq ) is provided, the behaviour of the cumulant is that of cX,q (r(k)) with appears as suprema over a position l of the maximal lag in the sequence k. This means an automatic search of this maximal lag r(k) may be performed with the help of cumulants. Examples. The constants AQ , which are not explicit, may be determined for some low orders. A careful analysis of the previous proof allows the sharp

4.4. CUMULANTS

91

bounds† κX,2 (r)

= cX,2 (r)

κX,3 (r)

= cX,3 (r)

κX,4 (r) κX,5 (r)

≤ cX,4 (r) + 3μ2 cX,2 (r) ≤ cX,5 (r) + 10μ2 cX,3 (r) + 10μ3 cX,2 (r)

κX,6 (r)

≤ cX,6 (r) + 15μ2 cX,4 (r) + 20μ3 cX,3 (r)) + 150μ4cX,2 (r)

Our main application of those inequalities is the Lemma 4.12. Set ∞

κQ =

···

k2 =0

∞

max

kQ =0

1≤a1 ,...,aQ ≤D

(a ) (a ) (a ) κ X0 1 , Xk2 2 , . . . , XkQQ .

(4.4.11)

We use notation (4.4.8). For each Q ≥ 2, there exists a constant BQ such that κQ ≤ BQ

∞

∗ (r + 1)Q−2 CX,Q (r).

r=0

Proof of lemma 4.12. For this we only decompose the sums (a ) (a ) (a ) max κQ ≤ (Q − 1)! κ X0 1 , Xk2 2 , . . . , XkQQ k2 ≤···≤kQ

≡

1≤a1 ,...,aQ ≤D

(Q − 1)! κ Q

by considering the partition of the indice set E = {k = (k2 , . . . , kQ ) ∈ NQ−1 /k2 ≤ · · · ≤ kQ } into Er = {k ∈ E/ r(k) = r} for r ≥ 0, then κ Q =

∞ r=0 k∈Er

max

1≤a1 ,...,aQ ≤D

(a ) (a ) (a ) κ X0 1 , Xk2 2 , . . . , XkQQ

The previous lemma implies that there exists some constant AQ > 0 such that (a ) (a ) (a ) ∗ max (r) κ X0 1 , Xk2 2 , . . . , XkQQ ≤ AQ #Er CX,Q k∈Er

1≤a1 ,...,aQ ≤D

and the simple bound #Er ≤ (Q − 1)(r + 1)Q−2 concludes the proof. † To bound higher order cumulants we shall prefer lemma 4.11 rough bounds to those involved combinatorics coeﬃcients.

92

CHAPTER 4. TOOLS FOR NON CAUSAL CASES

Lemma 4.13. Let us assume now that D = 1 (the process is real valued and we omit the super-indices aj ). If the series (4.4.11) is ﬁnite for each Q ≤ p we set q = [p/2] (q = p/2 if p is even and q = (p − 1)/2 if p is odd) then ⎛ ⎞p q n E ⎝ ⎠ ≤ X nu γu , where (4.4.12) j u=1 j=1 γu

2q

=

v=1 p1 +···+pu

p! κp · · · κpu p ! · · · pu ! 1 =p 1

Proof. As in Doukhan and Louhichi (1999) [67], we bound p EXk1 · · · Xkp |E(X1 + · · · + Xn ) | = 1≤k1 ,...,kp ≤n EXk1 · · · Xkp ≤ Ap,n ≡ 1≤k1 ,...,kp ≤n

Let now μ = {i1 , . . . , iv } ⊂ {1, . . . , p} and k = (k1 , . . . , kp ), we set for convenience μ(k) = (ki1 , . . . , kiv ) ∈ Nv

(4.4.13)

In order to count the terms with their order of multiplicity, it is indeed not suitable to deﬁne the previous item as a set and cumulants or moments are deﬁned equivalently in this case. Hence, as in Doukhan and Le´ on (1989) [66], we compute, using formula (4.4.2) and using all the partitions μ1 , . . . , μu of {1, . . . , p} with exactly 1 ≤ u ≤ p elements Ap,n

=

p

u

κμj (k) (X)

1≤k1 ,...,kp ≤n u=1 μ1 ,...,μu j=1

=

p

u

κμj (k) (X)

u=1 μ1 ,...,μu 1≤k1 ,...,kp ≤n j=1

=

p

r=1 p1 +···+pr r

p! × p ! · · · pr ! =p 1

×

κpu (Xk1 , . . . , Xkpu )

(4.4.14)

u=1 1≤k1 ,...,kpu ≤n

|Ap,n | ≤

q u=1

nu

p1 +···+pu

u p! κp p ! · · · pu ! j=1 j =p 1

(4.4.15)

4.4. CUMULANTS

93

Identity (4.4.14) follows from a simple change of variables and takes in account that the number of partitions of {1, . . . , p} into u subsets with respective cardinalities p1 , . . . , pu is simply the corresponding multinomial coeﬃcient. Remark that, for λ ∈ N, and taking X’s stationarity in account, we obtain

|κpu (Xk1 , . . . , Xkλ )| ≤ nκλ

1≤k1 ,...,kλ ≤n

Using this remark and the fact that cumulants of order 1 vanish and the only non-zero terms are those for which p1 , . . . , pu ≥ 2 and thus 2u ≤ p, hence u ≤ q we ﬁnally deduce inequality (4.4.15). C s for s ≤ p and for a constant C > 0, the bound Remark 4.4. If κs ≤ q p p u (4.4.15) rewrites as C u=1 u n by using the multinomial identity.

4.4.2

A second exponential inequality

This section is based on Doukhan and Neumann (2005), [71]. In this section we will be concerned with probability and moment inequalities for Sn = X 1 + · · · + X n , where X1 , . . . , Xn are zero mean random variables which fulﬁll appropriate weak dependence conditions. We denote by σn2 the variance of Sn . Result are here stated without proof and we defer the reader to [71]. The ﬁrst result is a Bernstein-type inequality which generalizes and improves previous inequalities of this chapter. Theorem 4.5. Suppose that X1 , . . . , Xn are real-valued random variables deﬁned on a probability space (Ω, A, P) with EXi = 0 and P(|Xi | ≤ M ) = 1, for all i = 1, . . . , n and some M < ∞. Let Ψ : N2 → N be one of the following functions: (a) Ψ(u, v) = 2v, (b) Ψ(u, v) = u + v, (c) Ψ(u, v) = uv, or (d) Ψ(u, v) = α(u + v) + (1 − α)uv, for some α ∈ (0, 1). We assume that there exist constants K, L1 , L2 < ∞, μ ≥ 0, and a nonincreasing sequence of real coeﬃcients (ρ(n))n≥0 such that, for all u-tuples (s1 , . . . , su ) and all v-tuples (t1 , . . . , tv ) with 1 ≤ s1 ≤ · · · ≤ su ≤ t1 ≤ · · · ≤ tv ≤ n the following inequality is fulﬁlled: |Cov (Xs1 · · · Xsu , Xt1 · · · Xtv )| ≤ K 2 M u+v−2 Ψ(u, v)ρ(t1 − su ), where

∞ s=0

(s + 1)k ρ(s) ≤ L1 Lk2 (k!)μ ,

∀k ≥ 0.

(4.4.16)

(4.4.17)

94

CHAPTER 4. TOOLS FOR NON CAUSAL CASES

Then P (Sn ≥ t) ≤ exp −

t2 /2 1/(μ+2) (2μ+3)/(μ+2) t

,

(4.4.18)

An + Bn

where An can be chosen as any number greater than or equal to σn2 and 4+μ 2 nK 2 L1 ∨1 . Bn = 2(K ∨ M )L2 An Remark 4.5. (i) Inequality (4.4.18) resembles the classical Bernstein inequality for independent random variables. Asymptotically, σn2 is usually of order O(n) and An can be chosen equal to σn2 while Bn is usually O(1) and hence negligible. In cases where σn2 is very small or where the knowledge of the value of An is required for some statistical procedure, it might, however, be better to choose An larger than σn2 . It follows from (4.4.16) and (4.4.17) that a rough bound for σn2 is given by (4.4.19) σn2 ≤ 2nK 2 Ψ(1, 1)L1 . Hence, taking An = 2nK 2 Ψ(1, 1)L1 we obtain from (4.4.18) that n t2 Xi ≥ t ≤ exp − , P C1 n + C2 t(2μ+3)/(μ+2) i=1 1/(μ+2)

(4.4.20)

where C1 = 4K 2 Ψ(1, 1)L1 and C2 = 2Bn with Bn such that Bn = 2(K ∨ M ) L2 (23+μ /Ψ(1, 1)) ∨ 1 . Inequality (4.4.20) is then more of Hoeﬀding-type. (ii) In the causal case, we obtain in Theorem 5.2 of Chapter 5 a Bennett-type inequality for τ -dependent random variables. This also implies a Bernstein-type inequality, however, with diﬀerent constants. In particular, the leading term in the denominator of the exponent diﬀers from σn2 . This is a consequence of the method of proof which consists of replacing weakly dependent blocks of random variables by independent ones according to some coupling device (an analogue argument is used in [61] for the case of absolute regularity). (iii) Condition (4.4.16) in conjunction with (4.4.17) may be interpreted as a weak dependence condition in the sense that the covariances on the left-hand side tend to zero as the time gap between the two blocks of observations increases. Note that the supremum of expression (4.4.16) for all u-tuples (s1 , . . . , su ) and all v-tuples (t1 , . . . , tv ) with 1 ≤ s1 ≤ · · · ≤ su ≤ t1 ≤ · · · ≤ tv < ∞ such that t1 − su = r is denoted by Cu+v,r in (4.3.1). Conditions (4.4.16) and (4.4.17) are typically fulﬁlled for truncated versions of random variables from many time series models; see also Proposition 4.1 below. The constant K in (4.4.16) is included to possibly take advantage of a sparsity of data as it appears, for example, in nonparametric curve estimation.

4.4. CUMULANTS

95

(iv) For unbounded random variables, the coeﬃcients Cp,r may still be bounded by an explicit function of the index p under a weak dependence assumption; see Lemma 4.14 below. For example, assume that E exp(A|Xt |a ) ≤ L holds for some constants A, a > 0, L < ∞. Since the inequality up ≤ p!eu (p ∈ N, u ≥ 0) implies that um = (Aa)−m/a (Aaua )m/a ≤ (Aa)−m/a (m!)1/a eAu

a

∀m ∈ N

we obtain that E|Xt |m ≤ L(m!)1/a (Aa)−m/a holds for all m ∈ N. Lemma 4.14 below provides then appropriate estimates for Cp,r . Note that the variance of the sum does not explicitly show up in the Rosenthaltype inequality given in Theorem 4.2. Using the formula of Leonov and Shiryaev (1959) [119], we are able to obtain a more precise inequality which resembles the Rosenthal inequality in the independent case (see Rosenthal (1970) [170] and and Theorem 2.12 in Hall and Heyde (1980) [100] in the case of martingales). Theorem 4.6. Suppose that X1 , . . . , Xn are real-valued random variables on a probability space (Ω, A, P) with zero mean and let p be a positive integer. We assume that there exist ﬁnite constants K, M , and a nonincreasing sequence of real coeﬃcients (ρ(n))n≥0 such that, for all u-tuples (s1 , . . . , su ) and all v-tuples (t1 , . . . , tv ) with 1 ≤ s1 ≤ · · · ≤ su ≤ t1 ≤ · · · ≤ tv ≤ n and u + v ≤ p, condition (4.4.16) is fulﬁlled. Furthermore, we assume that E|Xi |p−2 ≤ M p−2 . Then, with Z ∼ N (0, 1),

|ESnp − σnp EZ p | ≤ Bp,n

Au,p K 2u (M ∨ K)p−2u nu ,

1≤u

where Bp,n = (p!)2 2p max2≤k≤p {ρk,n }, ρk,n = p/k

Au,p =

1 u!

n−1

k1 +···+ku =p,ki ≥2,∀i

s=0 (s

+ 1)k−2 ρ(s) and

p! . k1 ! · · · ku !

Remark 4.6. For even p, the above result implies that ESnp ≤ (p − 1)(p − 3) · · · 1 σnp + Bp,n Au,p K 2u (M ∨ K)p−2u nu , 1≤u

which resembles the classical Rosenthal inequality from the independent case. If supn Bp,n < ∞ and σn2 n, then the ﬁrst term on the right-hand side is asymptotically dominating, as n → ∞. This term is equal to the p-th moment of a Gaussian random variable with mean 0 and variance σn2 .

96

CHAPTER 4. TOOLS FOR NON CAUSAL CASES

Remark 4.7. The inequality from Theorem 4.6 is well suited for proving a central limit theorem via the method of moments. Assume ﬁrst that the random variables Xi are uniformly bounded, centred and satisfy condition (4.4.16) with lims→∞ ρ(s)/sp = 0, for all p > 0. Then ∞ σn2 = σ2 = E(X0 Xk ) n→∞ n

lim

k=−∞

is a convergent series, and thus the method of moments implies a central limit theorem, 1 d √ Sn −→n→∞ σZ. n

4.4.3

From weak dependence to the exponential bound

A large class of weak dependent sequences satisﬁes the assumption (4.4.16) of Theorem 4.5. If δ(x, y) = |x − y|, we write Λ(1) instead of Λ(1) (δ). (i) Assume that (Xt )t∈Z is an Rd -valued and stationary process which is (Λ(1) , Ψ)-weakly dependent. Then for any Lipschitz-continuous function F : Rd → R with F ∞ = M < ∞ and Lip F ≤ 1, the process Yt = F (Xt ) is real valued, stationary, and Yt ∞ ≤ M . Moreover, it is also (Λ(1) , Ψ)weakly dependent. (ii) In the more general case when Lip F possibly exceeds 1 (e.g., if the function F depends on the sample size in a statistical context), then weak dependence still holds where only Ψ(a, b, u, v) has to be replaced by ψY (a, b, u, v) = Ψ(aLip F, bLip F, u, v). For the special cases of η, κ and λ weak dependence conditions, one may re-formulate this as (Yt )t∈Z is still an η, κ or λ weakly dependent sequence but now we have to respectively consider the coeﬃcients ηY (r) λY (r)

= Lip F · η(r), κY (r) = Lip 2 F · κ(r), = max Lip F, Lip 2 F λ(r).

Now we relate the conditions of weak dependence to condition (4.4.16). Suppose that (Xt )t∈Z is a stationary sequence of real-valued random variables with Xt ∞ ≤ M which satisﬁes a weak dependence condition. To see the connection to (4.4.16), we consider the 7 functions g1 and g2 with g1 (x1 , . . . , xu ) = 7u v f (x /M ) and g (x , . . . , x ) = i 2 1 v i=1 i=1 f (xi /M ), where f (u) = u ∨ (−1) ∧ 1. These functions satisfy Lip gi ≤ 1/M and gi ∞ ≤ 1. The covariance in the deﬁnition of weak dependence can be expressed as in equation (4.4.16), up to a factor M u+v since g1 (Xi1 , . . . , Xiu ) = Xi1 · · · Xiu /M u and g2 (Xi1 , . . . , Xiv ) = Xi1 · · · Xiv /M v .

4.4. CUMULANTS

97

Proposition 4.1. Assume that the real valued sequence (Xt )t∈Z is (Λ(1) , Ψ)weakly dependent and that Xt ∞ ≤ M . Then Cov(Xs1 · · · Xsu , Xt1 · · · Xtv ) ≤ M u+v Ψ(M −1 , M −1 , u, v) (t1 − su )(4.4.21) Moreover, if (r) = e−ar , for some a > 0, then we may choose in inequality b (4.4.17) μ = 1 and L1 = L2 = 1/(1 − e−a ). If (r) = e−ar , for some a > 0, b ∈ (0, 1), then we may choose μ = 1/b and L1 , L2 appropriately as only depending on a and b. Remark 4.8. (i) Notice that Proposition 4.1 implies that (Λ(1) , Ψ)-dependence implies (4.4.16) with (a) (b) (c) (d)

Ψ(u, v) = 2v, K 2 = M and (r) = θ(r)/2, under θ-dependence, Ψ(u, v) = u + v, K 2 = M and (r) = η(r), under η-dependence, Ψ(u, v) = uv, K = 1 and (r) = κ(r), under κ-dependence, Ψ(u, v) = (u + v + uv)/2, K 2 = M ∨ 1 and (r) = 2λ(r), under λ-dependence.

(ii) Now if the vector valued process (Xt )t∈Z is an η, κ or λ-weakly dependent sequence, for any Lipschitz function F : Rd → R such that F ∞ = M < ∞, then the process Yt = F (Xt ) is real valued and the relation (4.4.16) holds with (a) (b) (c) (d)

Ψ(u, v) = 2v, K 2 = M Lip F and (r) = θ(r)/2, under θ-dependence, Ψ(u, v) = u + v, K 2 = M Lip F and (r) = η(r), under η-dependence, Ψ(u, v) = uv, K = Lip F and (r) = κ(r), under κ-dependence, Ψ(u, v) = (u + v + uv)/2, K 2 = (M ∨ 1)(Lip 2 F ∨ Lip F ) and (r) = 2λ(r), under λ-dependence.

Those bounds allow to use the Bernstein-type inequality in Theorem 4.5 for sums of functions of weakly dependent sequences. A last problem in this setting is to determine sharp bounds for the coeﬃcients Cp,r . This is even possible when the variables Xi are unbounded and is stated in the following lemma. Lemma 4.14. Assume that the real valued sequence (Xt )t∈Z is η, κ or λ-weakly dependent and that E|Xi |m ≤ Mm , for any m > p. Then, according to the type of the weak dependence condition: p−1

Cp,r

p−1

≤ 2p+3 p2 Mmm−1 η(r)1− m−1 , p−2 m−2

≤ 2p+3 p4 Mm

p−1 m−1

≤ 2p+3 p4 Mm

(4.4.22)

p−2 1− m−2

,

(4.4.23)

p−1 1− m−1

.

(4.4.24)

κ(r) λ(r)

98

CHAPTER 4. TOOLS FOR NON CAUSAL CASES

Remark 4.9. This lemma is the essential tool to provide a version of Theorem 4.6 which yields both a Rosenthal-type moment inequality and a rate of convergence for moments in the central limit theorem. We also note that this result does not involve the assumption that the random variables are a.s. bounded. In fact even the use of Theorem 4.5 does not really require such a boundedness; see Remark 4.5-(iv).

4.5

Tightness criteria

Following Andrews and Pollard (1994) [5], we give in this section a general criterion based on Rosenthal type inequalities and on chaining arguments. Let d ∈ N∗ and let (Xi )i∈Z be a stationary sequence with values in Rd . Let F be a class of functions from Rd to R. We deﬁne the empirical process {Zn (f ) , f ∈ F } by √ Zn (f ) := n(Pn (f ) − P (f )) , with P the common law of (Xi )i∈Z and, for f ∈ F, 1 Pn (f ) := f (Xi ) , n i=1

n

P (f ) =

f (x)P (dx) . Rd

We study the process {Zn (f ) , f ∈ F } on the space ∞ (Rd ) of bounded functions from Rd to R equipped with the uniform norm · ∞ . For more details on tightness on the non separable space ∞ (Rd ), we refer to van der Vaart and Wellner (1996) [183]. In particular, we shall not discuss any measurability problems which can be handled by using the outer probability. The process {Zn (f ) , f ∈ F } is tight on (∞ (Rd ), .∞ ) as soon as there exists a semi-metric ρ such that (F , ρ) is totally bounded for which lim lim sup P

δ→0 n→∞

sup

|Zn (f ) − Zn (g)| > ε

= 0,

∀ε > 0.

f,g∈F , ρ(f,g)≤δ

We recall the following deﬁnition of bracketing number. Definition 4.4. Let Q be a ﬁnite measure on a measurable space X . For any measurable function f from X to R, let f Q,r = Q(|f |r )1/r . If f Q,r is ﬁnite, one says that f belongs to LrQ . Let F be some subset of LrQ . The number of brackets NQ,r (ε, F ) is the smallest integer N for which there exist − some functions f1− ≤ f1 , . . . , fN ≤ fN in F such that: for any integer 1 ≤ i ≤ N − we have fi − fi Q,r ≤ ε, and for any function f in F there exists an integer 1 ≤ i ≤ N such that fi− ≤ f ≤ fi .

4.5. TIGHTNESS CRITERIA

99

Proposition 4.2. Let (Xi )i≥1 be a sequence of identically distributed random variables with values in a measurable space

n X , with common distribution P . Let −1 P = n Pn be the empirical measure n i=1 δXi , and let Zn be the normalized √ empirical process Zn = n(Pn − P ). Let Q be any ﬁnite measure on X such that Q − P is a positive measure. Let F be a class of functions from X to R and G = {f − l/(f, l) ∈ F × F }. Assume that there exist r ≥ 2, p ≥ 1 and q > 2 such that for any function g of G, we have 1/r

Zn (g)p ≤ C(gQ,1 + n1/q−1/2 ) ,

(4.5.1)

where the constant C does not depend on g nor n. If moreover 1 x(1−r)/r (NQ,1 (x, F ))1/p dx < ∞ and lim xp(q−2)/q NQ,1 (x, F ) = 0 , x→0

0

lim lim sup E

then

δ→0 n→∞

sup g∈G,gQ,1 ≤δ

|Zn (g)|p = 0 .

(4.5.2)

Proof of Proposition 4.2. It follows the line of Andrews and Pollard (1994) [5] and Louhichi (2000) [124]. It is based on the following inequality: given N real-valued random variables, we have max |Zi | p ≤ N 1/p max Zi p . 1≤i≤N

1≤i≤N

(4.5.3)

For any positive integer k, denote by Nk = NQ,1 (2−k , F ) and by Fk a family of k,− k ≤ fN in F such that fik − fik,− Q,1 ≤ 2−k , and functions f1k,− ≤ f1k , . . . , fN k k for any f in F , there exists an integer 1 ≤ i ≤ Nk such that fik,− ≤ f ≤ fik . First step. We shall construct a sequence hk(n) (f ) belonging to Fk(n) such that

(4.5.4) lim sup |Zn (f ) − Zn (hk(n) (f ))| = 0 . n→∞ f ∈F

p

For any f in F , there exist two functions gk− and gk+ in Fk such that gk− ≤ f ≤ gk+ and gk+ − gk− Q,1 ≤ 2−k . Since Q − P is a positive measure, we have 1 Zn (gk+ ) − Zn (gk− ) + √ E((gk+ − f )(Xi )) n i=1 √ ≤ |Zn (gk+ ) − Zn (gk− )| + n2−k . √ Since gk− ≤ f , we also have that Zn (gk− ) − Zn (f ) ≤ n2−k √ , which enables us to − + − conclude that |Zn (f ) − Zn (gk )| ≤ |Zn (gk ) − Zn (gk )| + n2−k . Consequently √ (4.5.5) sup |Zn (f ) − Zn (gk− )| ≤ max |Zn (fik ) − Zn (fik,− )| + n2−k . n

Zn (f ) − Zn (gk− ) ≤

f ∈F

1≤i≤Nk

100

CHAPTER 4. TOOLS FOR NON CAUSAL CASES

Combining (4.5.3) and (4.5.5), we obtain that

√

1/p max Zn (fik ) − Zn (fik,− )p + n2−k .

sup |Zn (f ) − Zn (gk− )| ≤ Nk f ∈F

1≤i≤Nk

p

(4.5.6) Starting from (4.5.6) and applying the inequality (4.5.1),we obtain

√

1/p 1/p

sup |Zn (f ) − Zn (gk− )| ≤ C(Nk 2−k/r + Nk n1/q−1/2 ) + n2−k . (4.5.7) f ∈F

p

From the integrability condition on NQ,1 (x, F ), and since the function x → 1/p −k/r tends to 0 as k x(1−r)/r N (x, F )1/p is non increasing, we infer that √ Nk 2 k(n) tends to inﬁnity. Take k(n) such that 2 = n/a for some sequence an n √ decreasing to zero. Then n2−k(n) tends to 0 as n tends to inﬁnity. Il remains to control the second term on right hand in (4.5.7). By deﬁnition of Nk(n) , we have that a 1 p(q−2)/q n . (4.5.8) Nk(n) np(1/q−1/2) = NQ,1 √ , F √ n n Since xp(q−2)/p NQ,1 (x, F ) tends to 0 as x tends to zero, we can ﬁnd a sequence an such that the right hand term in (4.5.8) converges to 0. The function hk(n) (f ) = − satisﬁes (4.5.4). gk(n) Second step. We shall prove that for any > 0 and n large enough, there exists a function hm (f ) in Fm such that

(4.5.9)

sup |Zn (hm (f )) − Zn (hk(n) (f )) ≤ . f ∈F

p

Given h in Fk , choose a function Tk−1 (h) in Fk−1 such that h − Tk−1 (h)Q,1 ≤ 2−k+1 . Denote by πk,k = Id and for l < k, πl,k (h) = Tl ◦ · · · ◦ Tk−1 (h). We consider the function hm (f ) = πm,k(n) (hk(n) (f )). We have that

sup |Zn (hm ) − Zn (hk(n) )| ≤ f ∈F

p

sup |Zn (πl,k(n) (hk(n) ) − Zn (πl−1,k(n) (hk(n) )| . (4.5.10)

k(n)

l=m+1 f ∈F

p

Clearly

sup |Zn (πl,k(n) (hk(n) )−Zn (πl−1,k(n) (hk(n) )| ≤ max |Zn (f )−Zn (Tl−1 (f ))| . f ∈F

p

f ∈Fl

Applying the inequality (4.5.1) to (4.5.10) we obtain k(n)

1/p 1/p (21/r Nl 2−l/r + Nl n1/q−1/2 )

sup |Zn (hm ) − Zn (hk(n) )| ≤ C f ∈F

p

l=m+1

p

4.5. TIGHTNESS CRITERIA

101

Clearly ∞

1/p Nl 2−l/r

≤2

2−m−1

0

l=m+1

x(1−r)/r (NQ,1 (x, F ))1/p dx,

which by assumption is as small as we wish. To control the second term, write

k(n)

n

1/q−1/2

≤ n

1/q−1/2

l=m+1

≤ 2n1/q−1/2

k(n) 1/p Nl

1/p −l

2l Nl

2

l=0 1

2−k(n)

1 (NQ,1 (x, F ))1/p dx. x

It is easy to see that if xp(q−2)/q NQ,1 (x, F ) tends to 0 as x tends to 0, then (q−2)/q

1

lim x

x→0

x

1 (NQ,1 (y, F ))1/p dy = 0 . y

Consequently, we can choose the decreasing sequence an such that 1 1 q−2 1 q (NQ,1 (x, F ))1/p dx = 0. lim √ n→∞ n x −1/2 an n The function hm (f ) = πm,k(n) (hk(n) (f )) satisﬁes (4.5.9). Third step. From steps 1 and 2, we infer that for any > 0 and n large enough, there exists hm (f ) in Fm such that

sup |Zn (f ) − Zn (hm (f ))| ≤ 2 . f ∈F

p

Using the same argument as in Andrews and Pollard (1994) [5] (see the paragraph “Comparison of pairs” page 124), we obtain that, for any f and g in F,

2/r |Zn (f ) − Zn (g)| ≤ 8 + Nm sup Zn (f ) − Zn (g)p . sup

p

f −gQ,1 ≤δ

We conclude the proof by noting that

lim sup lim sup

sup δ→0

n→∞

g∈G,gQ,1 ≤δ

f −gQ,1 ≤δ

|Zn (g)| ≤ 8 . p

Chapter 5

Tools for causal cases The purpose of this chapter is to give several technical tools useful to derive limit theorems in the causal cases. The ﬁrst section gives comparison results between diﬀerent causal coeﬃcients. The second section deals with covariance inequalities for the coeﬃcients γ1 , β˜ and φ˜ already deﬁned in Chapter 2. Section 3 discusses a coupling result for τ1 -dependent random variables. This coupling result is generalized to the case of variables with values in any Polish space. Section 4 gives various inequalities mainly Bennett, Fuk-Nagaev, Burkholder and Rosenthal type inequalities for diﬀerent dependent sequences. Finally Section 5 gives a maximal inequality as an extension of Doob’s inequality for martingales.

5.1

Comparison results

Weak dependence conditions may be compared as in the case of mixing conditions. Lemma 5.1. Let (Ω, A, P) be a probability space. Let (Xi )i∈Z be a sequence of real valued random variables. We have the following comparison results: 1. ∀ r ∈ N∗ , ∀ k ≥ 0,

α ˜ r (k) ≤ β˜r (k) ≤ φ˜r (k).

Assume now that Xi ∈ L1 (P) for all i ∈ Z. If Y is a real valued random variable QY is the generalized inverse of the tail function, t → P(|Y | > t), see (2.2.14). 2. Let QX ≥ supk∈Z QXk . Then, ∀ k ≥ 0, τ1,1 (k) ≤ 2

0

103

α ˜ 1 (k)

QX (u)du.

(5.1.1)

104

CHAPTER 5. TOOLS FOR CAUSAL CASES

Assume now that the sequence (Xi )i∈Z is valued in Rd for some d ∈ N∗ and that

each Xi belongs to L1 (P). On Rd , we put the distance d1 (x, y) = dp=1 |x(p) − y (p) |, where x(p) (resp. y (p) ) denotes the pth coordinate of x (resp. y). Assume that for all i ∈ Z, each component of Xi has a continuous distribution function, and let ω be the supremum of the modulus of continuity, that is sup |FX (k) (y) − FX (k) (z)| ,

ω(x) = sup max

i∈Z 1≤k≤d |y−z|≤x (1)

i

i

(d)

where for all i ∈ Z, Xi = (Xi , . . . , Xi ). Deﬁne g(x) = xω(x). Then we have 3. ∀ r ∈ N∗ , ∀ k ≥ 0, β˜r (k)

≤

φ˜r (k)

≤

2 r τ1,r (k) . τ1,r (k) d

g −1

2 r τ∞,r (k) , τ∞,r (k) d

g −1

where g −1 denotes the generalized inverse of g deﬁned in (2.2.14). 4. Assume now that there exists some positive constant K such that for all i ∈ Z, each component of Xi has a density bounded by K > 0. Then, for all r ∈ N∗ and k ≥ 0, " α ˜r (k) ≤ 4 r K d θ1,r (k). Remark 5.1. If each marginal distribution satisﬁes a concentration condition |FX (k) (y) − FX (k) (z)| ≤ K|y − z|a with a ≤ 1, K > 0 then Item 3. yields i

i

β˜r (k) φ˜r (k)

1

a

≤

2 r τ1,r (k) 1+a (Kd) 1+a ,

≤

2 r τ∞,r (k) 1+a (Kd) 1+a .

a

1

If e.g. for all i ∈ Z, each component of Xi has a density bounded by K > 0 then those relations write more simply with a = 1 as in Item 4. In order to prove Lemma 5.1, we introduce some general comments related with the notation (2.2.14). Let (Ω, A, P) be a probability space, M a σ-algebra of A and X a real-valued random variable. Let FM (t, ω) = PX|M (] − ∞, t], ω) be a conditional distribution function of X given M. For any ω, FM (·, ω) is a distribution function, and for any t, FM (t, ·) is a M-measurable random −1 variable. Hence for any ω, deﬁne the generalized inverse FM (u, ω) as in (2.2.14). −1 Now, from the equality {ω/t ≥ FM (u, ω)} = {ω/FM (t, ω) ≥ u}, we infer

5.1. COMPARISON RESULTS

105

−1 (u, ·) is M-measurable. In the same way, {(t, ω)/FM (t, ω) ≥ u} = that FM −1 {(t, ω)/t ≥ FM (u, ω)}, which implies that the mapping (t, ω) → FM (t, ω) is measurable with respect to B(R) ⊗ M. The same arguments imply that the −1 (u, ω) is measurable with respect to B([0, 1])⊗M. Denote mapping (u, ω) → FM −1 −1 by FM (t) (resp. FM (u)) the random variable FM (t, ·) (resp. FM (u, ·)), and let FM (t − 0) = sups
Proof of Lemma 5.1. ˜ ˜ • Item 1. follows from the deﬁnition of α(M, ˜ X), β(M, X) and φ(M, X). • Let us now prove Item 2. We ﬁrst prove that if M is a σ-algebra of A, and if X is a real valued random variable in L1 (P), then τ (M, X) ≤ 2

α(M,X) ˜ 0

QX (u)du .

(5.1.2)

The proof follows from arguments in Peligrad (2002) [140]. Denote X + = sup(X, 0) and X − = sup(−X, 0). Let F denote the distribution function of X. Let FM (t, ω) be the conditional distribution of X given M. Assume that there exists a random variable δ uniformly distributed over [0, 1], independent of the σ-algebra generated by X and M. As δ is uniformly distributed over [0, 1] and independent of the σ-algebra generated by X and M, U = FM (X − 0) + δ(FM (X) − FM (X − 0)) is uniformly distributed over [0, 1] conditionally to M. So U is independent of M and is uniformly distributed over [0, 1] (see Major (1978) [126] and also Rio (2000) [161]). Hence, (5.1.3) X ∗ = F −1 (U ) is independent of M and distributed as X. Moreover, −1 FM (U ) = X,

It yields

P-almost surely.

X − X ∗ 1 = E

0

1

−1 |FM (u) − F −1 (u)|du .

We start with equality (5.1.4). 1 −1 X − X ∗ 1 = E 0 |FM (u) − F −1 (u)|du ∞ = E 0 |FM (u) − F (u)|du ∞ = E 0 |P(X + > u) − P(X + > u|M)|du ∞ +E 0 |P(X − > u) − P(X − > u|M)|du

(5.1.4)

106

CHAPTER 5. TOOLS FOR CAUSAL CASES

Now, by deﬁnition of α ˜ , we have the inequalities E|P(X + > u) − P(X + > u|M)| ≤

α ˜ (M, X + ) ∧ 2P(X + > u)

E|P(X − > u) − P(X − > u|M)| ≤

α ˜ (M, X − ) ∧ 2P(X − > u)

It is clear that sup (˜ α(M, X + ), α ˜ (M, X − )) ≤ α ˜ (M, X). Deﬁne HX (x) = P(|X| > x). Next, using the inequality a ∧ b + c ∧ d ≤ (a + c) ∧ (b + d), we obtain that ∞ ∞ α(M,X) ˜ E|X − X ∗ | ≤ 2 α(M, ˜ X) ∧ HX (u)du ≤ 2 1t
0

0

Then, since P(|X| > u) > t if and only if u < QX (t) and applying Fubini Theorem, we get Inequality (5.1.2). Let us now prove (5.1.1). For i ∈ Z, let Mi = σ(Xj , j ≤ i). For i + k ≤ j, we infer from Inequality (5.1.2) that α˜ 1 (k) α˜ 1 (k) τ1 (Mi , Xj ) ≤ 2 QXj (u)du ≤ 2 QX (u)du, 0

0

and the result follows from the deﬁnition of τ1,1 (k). • For Item 3. we will write the proof for r = 1, 2, the generalization to r points being straightforward. We will make use of the following proposition: Proposition 5.1. Let (Ω, A, P) be a probability space, X = (X1 , . . . , Xd ) and Y = (Y1 , . . . , Yd ) two random variables with values in Rd , and M a σ-algebra of A. If (X ∗ , Y ∗ ) is distributed as (X, Y ) and independent of M then, assuming that each component Xk and Yk has a continuous distribution function FXk and FYk , we get for any x1 , . . . , xd , y1 , . . . , yd in [0, 1], d

˜ β(M, X) ≤

xk + P |FXk (Xk∗ ) − FXk (Xk )| > xk M ,

(5.1.5)

1

k=1 d

˜ xk + P |FXk (Xk∗ ) − FXk (Xk )| > xk M

β(M, X, Y ) ≤

1

k=1

d

+

yk + P |FYk (Yk∗ ) − FYk (Yk )| > yk M) . (5.1.6) 1

k=1

We prove Proposition 5.1 at the end of this section and we continue the proof of Item 3. Starting from (5.1.5) with x1 = · · · = xk = ω(x) and applying Markov’s inequality, we infer that

d 1

˜ β(M, X) ≤ dω(x) + E |Xk − Xk∗ | M . x 1 k=1

5.1. COMPARISON RESULTS

107

Now, from Proposition 6 in R¨ uschendorf (1985) [171] (see also Equality (5.3.5) of Lemma 5.3), one can choose X ∗ such that d

|Xk − Xk∗ | M = τ1 (M, X).

E 1

k=1

Hence

τ1 (M, X) ˜ , β(M, X) ≤ dω(x) + x and Item 3. follows for r = 1 by noting that d xω(x) = τ1 (M, X) for x = −1 τ1 (M,X) . Starting now from (5.1.6), Item 3. may be proved in the same g d way for r = 2. Proposition 5.1 is still true when replacing β˜ by φ˜ and · 1 by · ∞ (see Proposition 7 in Dedecker & Prieur, 2006 [49]). The proof for φ˜ follows then the same lines and is therefore omitted here. • It remains now to prove Item 4. We write the proof for r = 1 and d = 1 and then generalize to any dimension d ≥ 1 and any r ≥ 1. Case r = 1, d = 1. For i ∈ Z, let Mi = σ(Xj , j ≤ i). Let i + k ≤ j1 . We know from Deﬁnition 2.5 of Chapter 2 that α(M ˜ i , Xj1 ) = sup LXj1 |Mi (t) − LXj1 (t)1 . t∈R

Deﬁne ϕt (x) = 1x≤t − P(Xj1 ≤ t). We want to smooth the function ϕt which is not Lipschitz. Let ε > 0. We consider the following Lipschitz function ϕεt smoothing ϕt : ϕt (x), x ∈] − ∞, t] ∪ [t + ε, +∞[, ϕεt (x) = ϕt (x) + t−x+ε 1 , t < x < t + ε. t<x
1 ε

θ1,1 (k) + 2Kε. 8 θ1,1 (k) Then, if we take ε = 4 K and if we consider the supremum in t, we get " α ˜ 1 (k) ≤ 4 K θ1,1 (k) .

108

CHAPTER 5. TOOLS FOR CAUSAL CASES

Case r ≥ 1, d ≥ 1. The proof follows essentially r the same lines. Let i + k ≤ j1 < · · · < jr . Let X = (Xj1 , . . . , Xjr ) in Rd . We know from Deﬁnition 2.5 of Chapter 2 that α ˜ (Mi , X) = sup LX|Mi (t) − LX (t)1 . t∈(Rd )r

For t = (t1 , . . . , tr ) ∈ (Rd )r and x = (x1 , . . . , xr ) ∈ (Rd )r , deﬁne ϕt (x) =

r

(1xi ≤ti − P(Xji ≤ t)) =

i=1

r

ϕti (xi ).

i=1

We want to smooth the function ϕt which is not Lipschitz. Let ε > 0. We ﬁrst smooth each of the functions ϕti . In the following, if x is in Rd , x(p) denotes its pth coordinate. We consider the following Lipschitz function ϕεti smoothing ϕti : / − ∞, ti ] ∪ [ti + ε, +∞[, ϕεti is equal to ϕti on ] − ∞, ti ] ∪ [ti + ε, +∞[, and for xi ∈] d (j) (j) ti − xi + ε ε ϕti (xi ) = 1t(j) <x(j)
7 d1 (x, y) = dj=1 |x(j) − y (j) |. We then smooth ϕt (x) by ϕεt (x) = ri=1 ϕεti (xi ). We have ϕεt ||∞ ≤ 1 and Lip (ϕεt ) ≤ 1ε . Let X = (Xj1 , Xj2 ). We get LX|Mi (t) − LX (t)1

E |E(ϕt (X) − Eϕt (X) |Mi )| 2rKdε + 1ε rθ1,r (k) + 2 rKdε. " θ1,r (k) We conclude by taking the supremum in t ∈ (Rd )r and ε = 4K d . = ≤

Proof of Proposition 5.1. Let Z be a random variable with values in Rm and let f : Rm → R such that |f (z1 , . . . , zi , . . . , zm ) − f (z1 , . . . , zi , . . . , zm )| ≤ |1zi ≤ai − 1zi ≤ai | for some real numbers a1 , . . . , am . Let U be a σ-algebra and let Z ∗ be a random variable distributed as Z and independent of U. Then m ∗ ∗ ∗ f (Z1 , . . . Zk , Zk+1 , . . . Zm ) − f (Z1 , . . . Zk−1 , Zk∗ , . . . Zm ) |f (Z) − f (Z ∗ )| = k=1

≤

m k=1

|1Zk ≤ak − 1Zk∗ ≤ak | .

5.1. COMPARISON RESULTS

109

Hence m E |1Zk ≤ak − 1Zk∗ ≤ak | U . |E f (Z)|U) − E(f (Z))| ≤ E(|f (Z) − f (Z ∗ )|U) ≤ k=1

(5.1.7) Let t ∈ Rd . We ﬁrst apply (5.1.7) to Z = X, Z ∗ = X ∗ , U = M, and f (z) = 1z≤t −1 with a1 = t1 , . . . , ad = td . Since FX (FXk (Xk )) = Xk almost surely, we obtain k |E(1X≤t |M) − P(X ≤ t)| ≤

d E |1Xk ≤tk − 1Xk∗ ≤tk | M

(5.1.8)

k=1

≤

d E |1FXk (Xk )≤FXk (tk ) − 1FXk (Xk∗ )≤FXk (tk ) | M . k=1

Deﬁne Tk = FXk (Xk ), Tk∗ = FXk (Xk∗ ). We have E |1Tk ≤FXk (tk ) − 1Tk∗ ≤FXk (tk ) | M ≤ max FTk |M (FXk (tk )) − FXk (tk ), 1 − FTk |M (FXk (tk )) − (1 − FXk (tk )) . Now, for any y in [0, 1], 1v+u−v≤y PTk ,Tk∗ |M (du, dv) FTk |M (y) = ≤ 1v≤y+xk PTk∗ |M (dv) + 1v−u>xk PTk ,Tk∗ |M (du, dv) ≤ y + xk + 1v−u>xk PTk ,Tk∗ |M (du, dv) . In the same way, 1 − FTk |M (y) ≤ 1 − (y − xk ) +

1u−v>xk PTk ,Tk∗ |M (du, dv) .

Consequently, taking y = FXk (tk ), E |1Tk ≤FXk (tk ) − 1Tk∗ ≤FXk (tk ) | M ≤ xk +

1|u−v|>xk PTk ,Tk∗ |M (du, dv),

(5.1.9) and Inequality (5.1.5) follows from (5.1.9) and (5.1.8) by taking the supremum in t and the expectation. Let s, t ∈ Rd . In the same way, applying (5.1.7) to Z = (Z (1) , Z (2) ) = (X, Y ), Z ∗ = (X ∗ , Y ∗ ), U = M and f (z (1) , z (2) ) = (1z(1) ≤s − FX (s))(1z(2) ≤t − FY (t)) ,

110

CHAPTER 5. TOOLS FOR CAUSAL CASES

we obtain that |E((1X≤s − FX (s))(1Y ≤t − FY (t)) M) − E((1X≤s − FX (s))(1Y ≤t − FY (t)))| ≤

d d E |1Xk ≤sk − 1Xk∗ ≤sk | M + E |1Yk ≤tk − 1Yk∗ ≤tk | M , k=1

k=1

and we conclude the proof of (5.1.6) by using the same arguments as for (5.1.5).

5.2

Covariance inequalities

In this section we present some covariance inequalities for the coeﬃcients γ1 , β˜ ˜ We begin with the weakest of those coeﬃcients. and φ.

5.2.1

A covariance inequality for γ1

Definition 5.1. Let X, Y be real valued random variables. Denote by – QX the generalized inverse x of the tail function HX : x → P(|X| > x). – GX the inverse of x → 0 QX (u)du. – HX,Y the generalized inverse of x → E(|X|1|Y |>x ). Proposition 5.2. Let (Ω, A, P) be a probability space and M be a σ-algebra of A. Let X be an integrable random variable and Y be an M-measurable random variable such that |XY | is integrable. The following inequalities hold |E(Y X)| ≤

E(X|M)1 0

HX,Y (u)du ≤

E(X|M)1

0

QY ◦ GX (u)du .

(5.2.1)

If furthermore Y is integrable, then |Cov(Y, X)| ≤

γ1 (M,X)

0

QY ◦GX−E(X) (u)du ≤ 2

0

γ1 (M,X)/2

QY ◦GX (u)du . (5.2.2)

Combining Proposition 5.2 and the comparison results (2.2.13), (2.2.18) and Item 2. of Lemma 5.1, we easily derive covariance inequalities for θ1 , τ1 and α. ˜ Proof of Proposition 5.2. We start from the inequality ∞ |E(Y X)| ≤ E(|Y E(X|M)|) = E |E(X|M)1|Y |>t dt. 0

Clearly we have that E |E(X M |1|Y |>t ) ≤ E(X|M)1 ∧ E(|X|1|Y |>t ). Hence |E(Y X)|≤

1u<E(|X|1|Y |>t ) du dt ≤

∞ E(X|M)1 0

0

0

E(X|M)1 ∞ 0

1t
5.2. COVARIANCE INEQUALITIES

111

and the ﬁrst inequality in (5.2.1) is proved. In order to prove the second one we use Fr´echet’s inequality (1957) [88]: E(|X|1|Y |>t ) ≤

P(|Y |>t)

QX (u)du.

0

(5.2.3)

We infer from (5.2.3) that HX,Y (u) ≤ QY ◦ GX (u), which yields the second inequality in (5.2.1). We now prove (5.2.2). The ﬁrst inequality in (5.2.2) follows directly from (5.2.1). To prove the second one, note that QX−E(X) ≤ QX + X1 and consequently x x QX−E(X) (u)du ≤ QX (u)du + xX1 . (5.2.4) x

0

0

Set R(x) = 0 QX (u)du − xX1 . Clearly, R is non-increasing over ]0, 1], R ( ) ≥ 0 for small enough and R (1) ≤ 0. We infer that R is ﬁrst non-decreasing and next 1 non-increasing, and that for any x ∈ [0, 1], R(x) ≥ min(R(0), R(1)). Since 0 QX (u)du = X1 , we have that R(1) = R(0) = 0 and we infer from (5.2.4) that 0

x

QX−E(X) (u)du ≤

0

x

QX (u)du + xX1 ≤ 2

x

0

QX (u)du .

This implies GX−E(X) (u) ≥ GX (u/2) which concludes the proof of (5.2.2). Combining Proposition 5.2 and the comparison results (2.2.13), (2.2.18) and ˜ Item 2. of Lemma 5.1, we easily derive covariance inequalities for θ1 , τ1 and α. Corollary 5.1. Let (Ω, A, P) be a probability space. Let X be a real valued integrable random variable, and M be a σ−algebra of A. We have that |Cov(Y, X)| ≤ 2

0

α(M,X) ˜

QY (u)QX (u)du .

(5.2.5)

Proof of Corollary 5.1. To prove (5.2.5), put z = GX (u) in the second integral of (5.2.2), and we use the results of comparison (2.2.13), (2.2.18) and Item 2. of Lemma 5.1.

5.2.2

A covariance inequality for β˜ and φ˜

Proposition 5.3. Let X and Y be two real-valued random variables on the probability space (Ω, A, P). Let FX|Y : t → PX|Y (] − ∞, t]) be a distribution function of X given Y and let FX be the distribution function of X. Deﬁne the

112

CHAPTER 5. TOOLS FOR CAUSAL CASES

random variable b(σ(Y ), X) = supx∈R |FX|Y (x) − FX (x)|. For any conjugate exponents p and q, we have the inequalities 1

|Cov(Y, X)| ≤

1

2{E(|X|p b(σ(X), Y ))} p {E(|Y |q b(σ(Y ), X))} q (5.2.6) 1 1 ˜ ˜ (5.2.7) 2φ(σ(X), Y ) p φ(σ(Y ), X) q Xp Y q .

≤

Remark 5.2. Inequality (5.2.6) is a weak version of that of Delyon (1990) [57] (see also Viennet (1997) [185], Lemma 4.1) in which appear two random variables b1 (σ(Y ), σ(X)) and b2 (σ(X), σ(Y )) each having mean β(σ(Y ), σ(X)). Inequality (5.2.7) is a weak version of that of Peligrad (1983) [139], where the dependence coeﬃcients are φ(σ(Y ), σ(X)) and φ(σ(X), σ(Y )). Proof of Proposition 5.3. We start from the equality ∞ ∞ Cov(Y, X) = Cov(1X>x − 1X<−x , 1Y >y − 1Y <−y ) dx dy . 0

(5.2.8)

0

˜ Since b(σ(Y ), X)∞ = φ(σ(Y ), X), we only need to prove (5.2.6). Here, note that the value of b(σ(Y ), X) does not change if we replace FX|Y (x) and FX (x) by PX|Y (] − ∞, x[) and PX (] − ∞, x[) respectively. Consequently, the following inequalities hold: E(1X>x 1Y >y − P(X > x)P(Y > y)) ≤ E 1Y >y b(σ(Y ), X) ∧ E 1X>x b(σ(X), Y ) E(1X<−x 1Y >y − P(X < −x)P(Y > y)) ≤ E 1Y >y b(σ(Y ), X) ∧ E 1X<−x b(σ(X), Y ) E(1X>x 1Y <−y − P(X > x)P(Y < −y)) ≤ E 1Y <−y b(σ(Y ), X) ∧ E 1X>x b(σ(X), Y ) E(1X<−x 1Y <−y − P(X < −x)P(Y < −y)) ≤ E 1Y <−y b(σ(Y ), X) ∧ E 1X<−x b(σ(X), Y ) . Since a1 ∧ b1 + a1 ∧ b2 + a2 ∧ b1 + a2 ∧ b2 ≤ 2(a1 + a2 ) ∧ (b1 + b2 ), we infer from (5.2.8) that ∞ ∞ |Cov(Y, X)| ≤ 2 E(1|X|>x b(σ(X), Y )) ∧ E(1|Y |>y b(σ(Y ), X)) dx dy . 0

0

(5.2.9)

5.2. COVARIANCE INEQUALITIES

113

˜ ˜ Y ) and β2 = β(σ(Y ), X). Note β1 (or β2 ) is 0 if and only if Let β1 = β(σ(X), X is independent of Y , and then (5.2.6) is true in that case. Otherwise, let Pβ1 , Pβ2 be the probabilities with density b(σ(X), Y )/β1 and b(σ(Y ), X)/β2 with respect to P. The inequality (5.2.9) writes ∞ ∞ |Cov(Y, X)| ≤ 2 β1 Pβ1 (|X| > x) ∧ β2 Pβ2 (|Y | > y) dx dy . (5.2.10) 0

0

Let Qβ1 ,X et Qβ2 ,Y be the generalized inverse of x → Pβ1 (|X| > x) and y → Pβ2 (|Y | > y). Starting from (5.2.10), we have successively ∞ ∞ β1 ∧β2 1u<β1 Pβ1 (|X|>x) 1u<β2 Pβ2 (|Y |>y) du dx dy |Cov(Y, X)| ≤ 2 0

≤ 2

0

≤ 2

0

∞

0

∞

0 β1 ∧β2

β1 ∧β2

1Qβ1 ,X (u/β1 )>x 1Qβ2 ,Y (u/β2 )>y du dx dy

0

Qβ1 ,X (u/β1 )Qβ2 ,Y (u/β2 ) du .

0

Applying H¨ older’s inequality, and setting s = u/β1 and t = u/β2 , we obtain |Cov(Y, X)| ≤ 2

0

1/p

1

β1 Qpβ1 ,X (s) ds

0

1/q

1

β2 Qqβ2 ,Y

(t) dt

.

To complete the proof of (5.2.6), note that, by deﬁnition of Qβ1 ,X , 1 β1 Qpβ1 ,X (s) ds = E(|X|p b(σ(X), Y )) . 0

Corollary 5.2. Let f1 , f2 , g1 , g2 be four increasing functions, and let f = f1 −f2 et g = g1 − g2 . For any random variable Z, let Δp (Z) = inf a∈R Z − ap and Δp,σ(X),Y (Z) = inf a∈R (E(|Z − a|p b(σ(X), Y )))1/p . For any conjugate exponents p and q, we have the inequalities |Cov(g(Y ), f (X))| ≤ 2 Δp,σ(X),Y (f1 (X)) + Δp,σ(X),Y (f2 (X)) × Δq,σ(Y ),X (g1 (Y )) + Δq,σ(Y ),X (g2 (Y )) , 1

1

|Cov(g(Y ), f (X))| ≤ 2φ(σ(X), Y ) p φ(σ(Y ), X) q × Δp (f1 (X)) + Δp (f2 (X)) Δq (g1 (Y )) + Δq (g2 (Y )) . In particular, if μ is a signed measure with total variation μ and f (x) = μ(] − ∞, x]), we have ˜ |Cov(Y, f (X))| ≤ μE(|Y |b(σ(Y ), X)) ≤ φ(σ(Y ), X)μ Y 1 .

(5.2.11)

114

CHAPTER 5. TOOLS FOR CAUSAL CASES

Proof of Corollary 5.2. For the two ﬁrst inequalities, note that, for any a1 , a2 , b1 , b2 , |Cov(g(Y ), f (X))| ≤

|Cov(g1 (Y ) − b1 , f1 (X) − a1 )| +|Cov(g1 (Y ) − b1 , f2 (X) − a2 )| +|Cov(g2 (Y ) − b2 , f1 (X) − a1 )| +|Cov(g2 (Y ) − b2 , f2 (X) − a2 )| .

the functions f1 − a1 , f2 − a2 , g1 − b1 , g2 − b2 being nondecreasing, we infer that b(σ(fi (X)), gj (Y ) − bj ) ≤ E(b(σ(X), Y )|σ(fi (X))) almost surely. Now, apply (5.2.6) and (5.2.7), and take the inﬁmum over a1 , b1 , a2 , b2 . To show (5.2.11), we take q = 1 and p = ∞. Let μ = μ+ − μ− be the Jordan decomposition of μ. We have f (x) = f1 (x) − f2 (x), with f1 (x) = μ+ (] − ∞, x]) and f2 (x) = μ− (] − ∞, x]). To conclude, apply the preceding inequalities and note that μ = μ+ (R) + μ− (R) and Δ1,σ(Y ),X (Y ) ≤ E(|Y |b(σ(Y ), X)), 2Δ∞ (f1 (X)) ≤ μ+ (R), and 2Δ∞ (g1 (X)) ≤ μ− (R).

5.3

Coupling

There exist several methods to obtain limit theorems for sequences of dependent random variables. One of the most popular and useful is the coupling of the initial sequence with an independent one. The main result in Section 5.3.1 is a coupling result (Dedecker and Prieur (2004) [45]) allowing to replace a sequence of τ1 −dependent random variables by an independent one, having the same marginals. Moreover, a variable in the newly constructed sequence is independent of the past of the initial one and it is close, for the L1 norm, to the variable having the same rank. The price to pay to replace the initial sequence by an independent one depends on the τ1 −dependence properties of the initial sequence. Various approaches to coupling have been developed by diﬀerent authors. We refer to a recent paper by Merlev`ede and Peligrad (2002) [130] for a survey on the various coupling methods and their applications. The approach used in this chapter lies on the quantile transform of Major (1978) [126]. It has been used for strongly mixing sequences by Rio (1995) [159] and Peligrad (2002) [140] to obtain a coupling result in L1 . In Section 5.3.2, the coupling result of Section 5.3.1 is generalized to the case of variables with values in any Polish space X .

5.3. COUPLING

5.3.1

115

A coupling result for real valued random variables

The main result of this section is a coupling result for the coeﬃcient τ1 , which appears to be the appropriate coeﬃcient for coupling in L1 . We ﬁrst recall the nice coupling properties known for the usual mixing coeﬃcients. Berbee (1979) [16] and Goldstein (1979) [96] proved: if Ω is rich enough, there exists a random variable X ∗ distributed as X and independent of M such that P(X = X ∗ ) = β(M, σ(X)). For the mixing coeﬃcient α(M, σ(X)), Bradley (1983) [29] proved the following result: if Ω is rich enough, then for each 1 ≤ p ≤ ∞ and each λ < Xp , there exists X ∗ distributed as X and independent of M such that ∗

P(|X − X | ≥ λ) ≤ 18

Xp λ

p/(2p+1)

(α(M, σ(X)))2p/(2p+1) .

For the weaker coeﬃcient α(M, ˜ X), Rio (1995, 2000) [159, 160] obtained the following upper bound, which is not directly comparable to Bradley’s: if X belongs to [a, b] and if Ω is rich enough, there exists X ∗ independent of M and α(M, X). This result has then distributed as X such that X − X ∗ 1 ≤ (b − a)˜ been extended by Peligrad (2002) [140] to the case of unbounded variables. Recall that the random variable X ∗ appearing in the results by Rio (1995, 2000) [159, 160] and Peligrad (2002) [140] is based on Major’s quantile transformation (1978) [126]. X ∗ has the following remarkable property: X − X ∗ 1 is the inﬁmum of X − Y 1 where Y is independent of M and distributed as X. Starting from the exact expression of X ∗ , Dedecker and Prieur (2004) [45] proved that τ1 (M, X) is the appropriate coeﬃcient for the coupling in L1 (see Lemma 5.2 below). Lemma 5.2. Let (Ω, A, P) be a probability space, X an integrable real-valued random variable, and M a σ-algebra of A. Assume that there exists a random variable δ uniformly distributed over [0, 1], independent of the σ-algebra generated by X and M. Then there exists a random variable X ∗ , measurable with respect to M ∨ σ(X) ∨ σ(δ), independent of M and distributed as X, such that X − X ∗ 1 = τ1 (M, X).

(5.3.1)

This coupling result is a useful tool to obtain suitable inequalities, to prove ˜ various limit theorems and to obtain upper bounds for β(M, X) (see Dedecker and Prieur (2004) [48]). Remark 5.3. From Berbee’s lemma and Lemma 5.2 above, we see that both β(M, σ(X)) and τ1 (M, X) have a property of optimality: they are equal to the inﬁmum of E(d0 (X, Y )) where Y is independent of M and distributed as X, for the distances d0 (x, y) = 1x =y and d0 (x, y) = |x − y| respectively.

116

CHAPTER 5. TOOLS FOR CAUSAL CASES

Proof of Lemma 5.2. We ﬁrst construct X ∗ using the conditional quantile transform of Major (1978), see (5.1.3) in the proof of Lemma 5.1. The choice of this transform is due to its property to minimize the distance between X and X ∗ in L1 (R). We recall equality (5.1.4): 1 −1 ∗ X − X 1 = E |FM (u) − F −1 (u)|du . (5.3.2) 0

For two distribution functions F and G, denote by M (F, G) the set of all probability measures on R × R with marginals F and G. Deﬁne * d(F, G) = inf |x − y|μ(dx, dy) μ ∈ M (F, G) , and recall that (see Dudley (1989) [80], Section 11.8, Problems 1 and 2 page 333) 1 d(F, G) = |F (t) − G(t)|dt = |F −1 (u) − G−1 (u)|du . (5.3.3) R

0

On the other hand, Kantorovich and Rubinstein (Theorem 11.8.2 in Dudley (1989) [80]) have proved that * d(F, G) = sup f dF − f dG f ∈ Λ(1) (R) . (5.3.4) Combining (5.1.4), (5.3.3) and (5.3.4), we have that * X − X ∗ 1 = E sup f dFM − f dF f ∈ Λ(1) (R) , and the proof of Lemma 5.2 is complete.

5.3.2

Coupling in higher dimension

We show that the coupling properties of τ1 described in the previous section remain valid when X is any Polish space. This is due to a conditional version of the Kantorovich Rubinstein Theorem (Theorem 5.1). Lemma 5.3. Let (Ω, A, P) be a probability space, M a σ-algebra of A and X a random variable with values in a Polish space (X , d). Assume that d(x, x0 ) PX (dx) is ﬁnite for some (and therefore any) x0 ∈ X . Assume that there exists a random variable δ uniformly distributed over [0, 1], independent of the σ-algebra generated by X and M. Then there exists a random variable X ∗ , measurable with respect to M ∨ σ(X) ∨ σ(δ), independent of M and distributed as X, such that (5.3.5) τp (M, X) = E(d(X, X ∗ )|M )p .

5.3. COUPLING

117

We ﬁrst need some notations. We denote P(Ω) the set of probability measures on a space Ω and deﬁne * Y(Ω, A, P; X ) = μ ∈ P(Ω × X ), A ⊗ BX ∀A ∈ A, μ(A × X ) = P(A) . Recall that every μ ∈ Y(Ω, A, P; X ) is disintegrable, that is, there exists a (unique, up to P-a.s. equality) A∗μ -measurable mapping ω → μω , Ω → P(X ), such that μ(f ) = f (ω, x) dμω (x) dP(ω) Ω

X

for every measurable f : Ω×X → [0, +∞] (see Valadier (1973) [184]). Moreover, the mapping ω → μω can be chosen A-measurable. If X is endowed with the distance d, let denote * d,1 d(x, x0 ) dμ(ω, x) < ∞ , Y (Ω, A, P; X ) = μ ∈ Y Ω×X

where x0 is some ﬁxed element of X (this deﬁnition is independent of the choice of x0 ). For any μ, ν ∈ Y, let D(μ, ν) be the set of probability laws π on Ω×X ×X such that π(· × · × X ) = μ and π(· × X × ·) = ν. We now deﬁne the parametrized (d) (d) versions of ΔKR and ΔL . Set, for μ, ν ∈ Y d,1 , (d) ΔKR (μ, ν) = inf d(x, y) dπ(ω, x, y). π∈D(μ,ν)

Ω×X ×X

Let also Λ(1) denote the set of measurable integrands f : Ω × X → R such that f (ω, ·) ∈ Λ(1) ∩ L∞ for every ω ∈ Ω. We denote (d)

ΔL (μ, ν) = sup (μ(f ) − ν(f )) . f ∈Λ(1)

We now state the parametrized Kantorovich-Rubinstein Theorem of Dedecker, Prieur and Raynaud De Fitte (2004) [47], which is the main tool to prove the coupling result of Lemma 5.3. The proof of that theorem mainly relies on ideas contained in R¨ uschendorf (1985) [171]. Theorem 5.1. (Parametrized Kantorovich–Rubinstein Theorem) Let μ, ν ∈ Y d,1 (Ω, A, P; X ) and let ω → μω and ω → νω be disintegrations of μ and ν respectively. (d)

(d)

1. Let G : ω → ΔKR (μω , νω ) = ΔL (μω , νω ) and let A∗ be the universal completion of A. There exists an A∗ –measurable mapping ω → λω from Ω to P(X × X ) such that λω belongs to D(μω , νω ) and G(ω) = d(x, y) dλω (x, y). X ×X

118

CHAPTER 5. TOOLS FOR CAUSAL CASES

2. The following equalities hold: (d) ΔKR (μ, ν) =

Ω×X ×X

(d)

d(x, y) dλ(ω, x, y) = ΔL (μ, ν),

where λ is the element of Y(Ω, A, P; X × X ) deﬁned by λ(A × B × C) = λ (B × C) dP(ω) for any A in A, B and C in BX . In particular, A ω (d)

λ belongs to D(μ, ν), and the inﬁmum in the deﬁnition of ΔKR (μ, ν) is attained for this λ. Remark 5.4. This theorem is proved in a more general frame in Dedecker, Prieur and Raynaud De Fitte (2004) [47]. It also allows more general coupling results when working with more general cost functions than the metric d of X . Proof of Lemma 5.3. First notice that the assumption that d(x, x0 )PX (dx) < ∞ for some x0 ∈ X means that the unique measure of Y(Ω, A, P; X × X ) with disintegration PX|M (., ω) belongs to Y d,1 (Ω, A, P; X ). We now prove that if Q is any element of Y d,1 (Ω, A, P; X ), there exists a σ(δ) ∨ σ(X) ∨ X −measurable random variable Y such that Q• is a regular conditional probability of Y given M, and sup f (x)P (dx) − f (x)Q• (dx) P-a.s. E d(X, Y ) M = X|M f ∈Λ(1) ∩L∞

(5.3.6) Applying (5.3.6) with Q = P ⊗ PX , we get the result of Lemma 5.3. Proof of Equation (5.3.6). We apply Theorem 5.1 to the probability space (Ω, M, P) and to the disintegrated measures μω (·) = PX|M (·, ω) and νω = Qω . From point 1 of Theorem 5.1 we infer that there exists a mapping ω → λω ∗ from Ω to P(X × X ), measurable for M and BP(X ×X ), such that λω belongs to D(PX|M (·, ω), Qω ) and G(ω) = X ×X d(x, y)λω (dx, dy). On the measurable space (M, T ) = (Ω × X × X , M∗ ⊗ BX ⊗ BX ) we put the probability π(A × B × C) = λω (B × C)P(dω) . A

If I = (I1 , I2 , I3 ) is the identity on M, we see that a regular conditional distribution of (I2 , I3 ) given I1 is P(I2 ,I3 )|I1 =ω = λω . Since PX|M (·, ω) is the ﬁrst margin of λω , a regular conditional probability of I2 given I1 is PI2 |I1 =ω (·) = PX|M (·, ω). Let λω,x = PI3 |I1 =ω,I2 =x be a regular conditional distribution of I3 given (I1 , I2 ), so that (ω, x) → λω,x is measurable for M∗ ⊗ BX and BP(X ). From the unicity (up to P-a.s. equality) of regular conditional probabilities, it follows that λω (B × C) = λω,x (C)PX|M (dx, ω) P-a.s. (5.3.7) B

5.4. EXPONENTIAL AND MOMENT INEQUALITIES

119

Assume that we can ﬁnd a random variable Y˜ from Ω to X , measurable for σ(U ) ∨ σ(X) ∨ M∗ and BX , such that PY˜ |σ(X)∨M∗ (·, ω) = λω,X(ω) (·). Since ω → PX|M (·, ω) is measurable for M∗ and BP(X ) , one can check that PX|M is a regular conditional probability of X given M∗ . For A in M∗ , B and C inBX , we thus have = E 1A E 1X∈B E 1Y˜ ∈C |σ(X) ∨ M∗ |M∗ E 1A 1X∈B 1Y˜ ∈C = λω,x (C)PX|M (dx, ω) P(dω) B A = λω (B × C)P(dω) . A

We infer that λω is a regular conditional probability of (X, Y˜ ) given M∗ . By deﬁnition of λω , we obtain that ∗ ˜ sup E d(X, Y )|M = f (x)PX|M (dx) − f (x)Q• (dx) P-a.s. f ∈Λ(1) ∩L∞

(5.3.8) As X is Polish, there exists a σ(δ) ∨ σ(X) ∨ M-measurable modiﬁcation Y of Y˜ , so that (5.3.8) still holds for E(d(X, Y )|M∗ ). We obtain (5.3.6) by noting that E (d(X, Y )|M∗ ) = E (d(X, Y )|M) P-a.s. It remains to build Y˜ . Since X is Polish, there exists a one to one map f from X to a Borel subset of [0, 1], such that f and f −1 are measurable for B([0, 1]) and BX . Deﬁne F (t, ω) = λω,X(ω) (f −1 ([0, t])). The map F (·, ω) is a distribution function with generalized inverse F −1 (·, ω) and the map (u, ω) → F −1 (u, ω) is B([0, 1]) ⊗ M∗ ∨ σ(X)measurable. Let T (ω) = F −1 (δ(ω), ω) and Y˜ = f −1 (T ). It remains to see that PY˜ |σ(X)∨M∗ (·, ω) = λω,X(ω) (·). For any A in M∗ , B in BX and t in R, we have E 1A 1X∈B 1Y˜ ∈f −1 ([0,t]) = 1X(ω)∈B 1δ(ω)≤F (t,ω) P(dω). A

Since δ is independent of σ(X)∨M, it is also independent of σ(X)∨M∗ . Hence E 1A 1X∈B 1Y˜ ∈f −1 ([0,t]) = 1X(ω)∈B F (t, ω)P(dω) A = 1X(ω)∈B λω,X(ω) (f −1 ([0, t]))P(dω). A

Since {f

−1

5.4

Exponential and Moment inequalities

([0, t])/t ∈ [0, 1]} is a separating class, the result follows.

The ﬁrst theorem of this section extends Bennett’s inequality for independent sequences to the case of τ1 -dependent sequences. For any positive integer q,

120

CHAPTER 5. TOOLS FOR CAUSAL CASES

we obtain an upper bound involving two terms: the ﬁrst one is the classical n Bennett’s bound at level λ for a sum i=1 ξi of independent variables ξi such n that Var ( i=1 ξi ) = vq and ξi ∞ ≤ qM , and the second one is equal to nλ−1 τq (q + 1). Using Item 2. of Lemma 5.1, we obtain the same inequalities as those established by Rio (2000) [161] for strongly mixing sequences. This is not surprising, we follow the proof of Rio and we use Lemma 5.2 instead of Rio’s coupling lemma. Note that the same approach has been previously used by Bosq (1993) [26], starting from Bradley’s coupling lemma (1983) [29]. Theorem 5.2 and Theorem 5.3 below are due to Dedecker and Prieur (2004) [45].

5.4.1

Bennett-type inequality

Theorem 5.2. Let (Xi )i>0 be a sequence of real-valued random variables such

that Xi ∞ ≤ M , and Mi = σ(Xk , 1 ≤ k ≤ i). Let Sk = ki=1 (Xi − E(Xi )) and S n = max1≤k≤n |Sk |. Let q be some positive integer, vq some nonnegative number such that [n/q]

vq ≥ Xq[n/q]+1 + · · · + Xn 22 +

X(i−1)q+1 + · · · + Xiq 22 .

i=1

and h the function deﬁned by h(x) = (1 + x) log(1 + x) − x. λqM n vq 1. For λ > 0, P(|Sn | ≥ 3λ) ≤ 4 exp − + τ1,q (q + 1). h 2 (qM ) vq λ 2. For λ ≥ M q, P(S n ≥ (1q>1 + 3)λ) ≤ 4 exp −

λqM n vq h + τ1,q (q + 1) . (qM )2 vq λ

Proof of Theorem 5.2. We proceed as in Rio (2000) [161] page 83. For 1 ≤ i ≤ [n/q], deﬁne the variables Ui = Siq − Siq−q and U[n/q]+1 = Sn − Sq[n/q] . Let (δj )1≤j≤[n/q]+1 be independent random variables uniformly distributed over [0, 1] and independent of (Ui )1≤j≤[n/q]+1 . We apply Lemma 5.2: For any 1 ≤ i ≤ [n/q] + 1, there exists a measurable function Fi such that Ui∗ = Fi (U1 , . . . , Ui−2 , Ui , δi ) satisﬁes the conclusions of Lemma 5.2, with M = σ(Ul , l ≤ i − 2). The sequence (Ui∗ )1≤j≤[n/q]+1 has the following properties: a. For any 1 ≤ i ≤ [n/q] + 1, the random variable Ui∗ is distributed as Ui . ∗ b. The variables (U2i )2≤2i≤[n/q]+1 are independent and so are the variables ∗ (U2i−1 )1≤2i−1≤[n/q]+1 .

c. Moreover Ui − Ui∗ 1 ≤ τ1 (σ(Ul , l ≤ i − 2), Ui ).

5.4. EXPONENTIAL AND MOMENT INEQUALITIES

121

Since for 1 ≤ i ≤ [n/q] we have τ1 (σ(Ul , l ≤ i − 2), Ui ) ≤ qτ1,q (q + 1), we infer that for 1 ≤ i ≤ [n/q], Ui − Ui∗ 1 and

U[n/q]+1 −

∗ U[n/q]+1 1

≤ qτ1,q (q + 1)

(5.4.1)

≤ (n − q[n/q])τ1,n−q[n/q] (q + 1) .

Proof of 1. Clearly [n/q]+1

|Sn | ≤

|Ui −

([n/q]+1)/2

Ui∗ |+

i=1

i=1

[n/q]/2+1

∗ U2i +

∗ U2i−1 .

(5.4.2)

i=1

Combining (5.4.1) with the fact that τ1,n−q[n/q] (q + 1) ≤ τ1,q (q + 1), we obtain P

[n/q]+1 i=1

n |Ui − Ui∗ | ≥ λ ≤ τ1,q (q + 1) . λ

(5.4.3)

The result follows by applying Bennett’s inequality to the two other sums in (5.4.2). The proof of the second item is omitted. It is similar to the proof of Theorem 6.1 in Rio (2000) [161], page 83, for α-mixing sequences. Proceeding as in Theorem 5.2, we establish Fuk-Nagaev type inequalities (see Fuk and Nagaev (1971) [89]) for sums of τ1 -dependent sequences. Applying Item 2. of Lemma 5.1, we obtain the same inequalities (up to some numerical constant) as those established by Rio (2000) [161] for strongly mixing sequences. Notations 5.1. For any

non-increasing sequence (δi )i≥0 of nonnegative numbers, deﬁne δ −1 (u) = i≥0 1u<δi = inf{k ∈ N/δk ≤ u}. Note that δ −1 is the generalized inverse (see (2.2.14)) of the c` adl` ag function x → δ[x] , [·] denoting the integer part. Theorem 5.3. Let (Xi )i>0 be a sequence of centered and square integrable random variables, and deﬁne (Mi )i>0 and S n as in Theorem 5.2. Let X be some positive random variable such that QX ≥ supk≥1 Q|Xk | and s2n =

n n

|Cov(Xi , Xj )| .

i=1 j=1 −1 . For any λ > 0 and r ≥ 1, Let R = ((τ /2)−1 ◦ G−1 X )QX and S = R

λ2 −r/2 4n S(λ/r) P(S n ≥ 5λ) ≤ 4 1 + 2 + QX (u)du. rsn λ 0

(5.4.4)

122

CHAPTER 5. TOOLS FOR CAUSAL CASES

Proof of Theorem 5.3. Let q be any positive integer, and M > 0. As for the proof of Theorem 5.2 we deﬁne Ui = Siq − Siq−q , for 1 ≤ i ≤ [n/q]. We also deﬁne U i = (Ui ∧ qM ) ∨ (−qM ). Let ϕM (x) = (|x| − M )+ . Following the proof of Rio (2000) [161] for strongly mixing sequences, we ﬁrst prove that Sn ≤

max

1≤j≤[n/q]

|

j

U i | + qM +

i=1

n

ϕM (Xk ) .

(5.4.5)

k=1

To prove (5.4.5), we just have to notice that, if S n = Sk0 , then for j0 = [k0 /q], Sn ≤ |

j0

U i| +

i=1

j0

|Ui − Ui | +

k0

|Xk | ,

(5.4.6)

k=j0 +1

i=1

and then, as ϕM is convex, that j0

|Ui − U i | ≤

i=1

qj0

ϕM (Xk ),

(5.4.7)

k=1

and, by deﬁnition of ϕM , that k0

k0

|Xk | ≤ (k0 − qj0 )M +

k=j0 +1

ϕM (Xk ).

(5.4.8)

k=j0 +1

Now, to be able to apply Theorem 5.2, we need to center the variables U i . Then as the random variables Ui are centered, we get max

1≤j≤[n/q]

|

j

U i| ≤

i=1

≤

max

1≤j≤[n/q]

max

1≤j≤[n/q]

| |

j

[n/q]

(U i − E(U i ))| +

i=1

i=1

j

n

(U i − E(U i ))| +

i=1

E(|Ui − U i |) E(ϕM (Xk )) ,

k=1

using the convexity of ϕM . Hence we have proved that Sn ≤

max

1≤j≤[n/q]

|

j i=1

(U i − E(U i ))| + qM +

n

(E(ϕM (Xk )) + ϕM (Xk )). (5.4.9)

k=1

Let us now choose the size q of the blocks and the constant of truncation M . Let v = S(λ/r), q = (τ /2)−1 ◦ G−1 X (v) and M = QX (v). Clearly, we have that qM = R(v) = R(S(λ/r)) ≤ λ/r. Since M = QX (v), n 2n v P (E(ϕM (Xk )) + ϕM (Xk )) ≥ λ ≤ QX (u)du . (5.4.10) λ 0 k=1

5.4. EXPONENTIAL AND MOMENT INEQUALITIES

123

j We are now interested in the term P max1≤j≤[n/q] | i=1 (U i − E(U i ))| ≥ 3λ . We apply Theorem 5.2 to the sequence (U i − E(U i ))i∈Z with n = [n/q] and q = 1. We have U i = h(Ui ) where h : R → R is a Lipschitz function such that Lip (h) ≤ 1. Hence, we have τ1 (σ(U l , l ≤ i − 2), U i ) ≤ qτ1,q (q + 1). Since s2n ≥ U 1 22 + · · · + U [n/q] 22 we obtain: P

j λ2 −r/2 n + τ1,∞ (q +1). (5.4.11) max (U i −E(U i )) ≥ 3λ ≤ 4 1+ 2 1≤j≤[n/q] rsn λ i=1

To conclude, let us notice that the choice of q implies that τ1,∞ (q + 1) ≤ v 2 0 QX (u)du. Hence, since qM ≤ λ, combining (5.4.11), (5.4.10) and (5.4.9), we get the result.

5.4.2

Burkholder’s inequalities

The next result extends Theorem 2.5 of Rio (2000) [161] to non-stationary sequences. Proposition 5.4. Let (Xi )i∈N be a sequence of centered and square integrable random variables, and Mi = σ(Xj , 0 ≤ j ≤ i). Deﬁne Sn = X1 + · · · + Xn and l

bi,n = max Xi E(Xk |Mi )

i≤l≤n

k=i

. p/2

For any p ≥ 2, the following inequality holds n 1/2 Sn p ≤ 2p bi,n .

(5.4.12)

i=1

Proof. We proceed as in Rio (2000) [161] pages 46-47. For any t in [0, 1] and p ≥ 2, let hn (t) = Sn−1 + tXn pp . Our induction hypothesis at step n − 1 is the following: for any k < n hk (t) ≤ (2p)p/2

k−1

p/2 bi,k + tbk,k

.

i=1

This assumption is true at step 1. Assuming that it holds for n − 1, we have

n−1 to check it at step n. Setting G(i, n, t) = Xi (tE(Xn | Mi ) + k=i E(Xk | Mi )) and applying Theorem (2.3) in Rio (2000) with ψ(x) = |x|p , we get t n−1 hn (t) 1 p−2 ≤ E |Si−1 +sXi | G(i, n, t) ds+ E(|Sn−1 +sXn |p−2 Xn2 )ds . p2 0 0 i=1 (5.4.13)

124

CHAPTER 5. TOOLS FOR CAUSAL CASES

Note that the function t → E(|G(i, n, t)|p/2 ) is convex, so that for any t in p/2 [0, 1], E(|G(i, n, t)|p/2 ) ≤ E(|G(i, n, 0)|p/2 ) ∨ E(|G(i, n, 1)|p/2 ) ≤ bi,n . Applying H¨older’s inequality, we obtain E |Si−1 +sXi |p−2 G(i, n, t) ≤ (hi (s))(p−2)/p G(i, n, t)p/2 ≤ (hi (s))(p−2)/p bi,n . This bound together with (5.4.13) and the induction hypothesis yields 1 t n−1 hn (t) ≤ p2 bi,n (hi (s))(p−2)/p ds + bn,n (hn (s))(p−2)/p ds 0

i=1

≤ p2

n−1

0

p

(2p) 2 −1 bi,n

i=1

1 i 0

bj,n + sbi,n

t p2 −1 2 ds + bn,n (hn (s))1− p ds . 0

j=1

Integrating with respect to s we ﬁnd 1 i i i−1 p2 −1 p2 p2 2 2 bj,n + sbi,n ds = bj,n − bj,n , bi,n p j=1 p j=1 0 j=1 and summing in j we ﬁnally obtain t p2 n−1 2 hn (t) ≤ 2p bj,n + p2 bn,n (hn (s))1− p ds .

(5.4.14)

0

j=1

Clearly the function u(t) = (2p)p/2 (b1,n + · · · + tbn,n )p/2 solves the equation associated to Inequality (5.4.14). A classical argument ensures that hn (t) ≤ u(t) which concludes the proof. Corollary 5.3. Let (Xi )i∈N and (Mi )i∈N be as in Proposition 5.4. Deﬁne ˜ ˜ γ1,i = sup γ1 (Mk , Xi+k ), α ˜ i = sup α(M ˜ k , Xi+k ) and φi = sup φ(Mk , Xi+k ). k≥0

k≥0

k≥0

1. Let X be any random variable such that QX ≥ supk≥1 QXk , and let

n

n −1 γ1,n (u) = k=0 1u≤λ1,k and α ˜ −1 ˜ k . For p ≥ 2 we have n (u) = k=0 1u≤α the inequalities X1 1/p # −1 Sn p ≤ 2pn (γ1,n (u))p/2 Qp−1 X ◦ GX (u)du 0

≤

1 1/p # p/2 p 2pn (˜ α−1 QX du . n (u)) 0

2. Let Mq = supk≥1 Xi q . For q ≥ p ≥ 2 we have the inequality n−1 1/2 (q−1)/q Sn p ≤ 2 pMq Mqp/(2q−p) (n − k)φ˜k . k=0

5.4. EXPONENTIAL AND MOMENT INEQUALITIES

125

Proof of 1. Let r = p/(p − 2). By duality there exists an Mi -mesurable Y such that Y r = 1, and n E(|Y Xi E(Xk |Mi )|) . bi,n ≤ k=i

Let λi = GX (γ1,i ). Applying (5.2.1) and Fr´echet’s inequality (1957) [88], we obtain n γ1,k−i n λk−i QY Xi ◦ GX (u)du ≤ QY (u)Q2X (u)du . bi,n ≤ k=i

0

k=i

0

Using the duality once more, we get p/2

bi,n ≤

0

1

n

p/2 1u≤λk

QpX (u)du =

k=0

X1

0

−1 (γ1,n (u))p/2 Qp−1 X ◦ GX (u)du .

The ﬁrst inequality follows. To prove the second one, note that λk ≤ α ˜k . Proof of 2. First, note that bi,n ≤

n

Xi E(Xk | Mi )p/2 . Let r = p/(p − 2).

k=i

By duality, there exist a Mi -measurable variable Y such that Y r = 1 and Xi E(Xk | Mi )p/2 = |Cov(Y Xi , Xk )| . Applying inequality (5.2.7), and next H¨ older’s inequality, we obtain that (q−1)/q

Xi E(Xk | Mi )p/2 ≤ 2φ˜k−i

(q−1)/q

Y Xi q/(q−1) Xk q ≤ 2Mq Mqp/(2q−p) φ˜k−i

.

The result follows.

5.4.3

Rosenthal inequalities using Rio techniques

We suppose that the sequence (Xn )n∈N fulﬁlls the following covariance inequality, |Cov(f (X1 , . . . , Xn ), g(X1 , . . . , Xn ))| ≤

∂f ∂g ∞ ∞ |Cov(Xi , Xj )| , ∂xi ∂xj i∈I j∈J

(5.4.15) for all real valued functions f and g deﬁned on Rn having bounded ﬁrst diﬀerentials and depending respectively on (xi )i∈I and on (xi )i∈J where I and J are ∂f disjoints subsets of N. Sequences fulﬁlling (5.4.15) with sup ∞ < ∞ and ∂x i∈I i ∂g ∞ are κ-dependent with κ(r) = sup |Cov(Xi , Xj )| , they are also sup ∂x j j∈J |i−j|≥r

126

CHAPTER 5. TOOLS FOR CAUSAL CASES

ζ-dependent with ζ(r) = sup i∈N

|Cov(Xi , Xj )| .

{j/ |i−j|≥r}

Our task in this section, is to extend Doukhan and Portal (1983) [73] method in order to provide moment bounds of order r, where r is any positive real number not less than two. The main result of this paragraph is the following theorem. Theorem 5.4. Let r > 2 be a ﬁxed real number. Let (Xn ) be a strictly stationary sequence of centered r.v’s fulﬁlling (5.4.15). Suppose moreover that this sequence is bounded by M . Then there exists a positive constant Cr depending only on r, such that n k−1 r r r−2 r−2 M (i + 1) |Cov(X1 , X1+i )| , (5.4.16) E|Sn | ≤ Cr sn + k=1 i=0

where s2n := n

n i=0

|Cov(X1 , X1+i )|.

Theorem 5.4 gives, in particular, a unifying Rosenthal-type inequality for at least two models: associated or negatively associated processes. An immediate consequence of Theorem 5.4 is the following MarcinkiewiczZygmund bound. Corollary 5.4. Let r > 2 be a ﬁxed real number. Let (Xn ) be a strictly stationary sequence of centered r.v’s bounded by M and fulﬁlling (5.4.15). Suppose that (5.4.17) |Cov(X1 , X1+i )| = O(i−r/2 ), as i → +∞. Then E|Sn |r = O(nr/2 ).

(5.4.18)

For bounded associated sequences, condition (5.4.17) is shown to be optimal for the Marcinkiewicz-Zygmund bound (5.4.18) (cf. Birkel (1988) [22]). Proof of Theorem 5.4. The method is a generalization of the Lindeberg decomposition to an order r > 2. This method was ﬁrst developed by Rio (1995) [158] for mixing sequences and for r ∈]2, 3]. The restriction to sequences fulﬁlling the bound (5.4.15) is only for the sake of clarity and the method can be adapted successfully to other dependent sequences (cf. Proposition 5.5 below). We give here the great lines of the proof and we refer to Louhichi (2003) [125] for more details. Let p ≥ 2 be a ﬁxed integer. Let Φp be the class of functions φ : R+ → R+ such that φ(0) = φ (0) = · · · = φ(p−1) (0) = 0 and φ(p) is non decreasing and concave. Let φ be a function of the set Φp . Theorem 5.4 is proved if we suitably control Eφ(|Sn |) (since the function x → xr , for r ∈]p, p + 1] is one of those functions φ). Such a control will be done into the following steps.

5.4. EXPONENTIAL AND MOMENT INEQUALITIES

127

Step 1. The purpose of Step 1 is to reduce the control of Eφ(|Sn |) to that of a suitable polynomial function of |Sn |. For this, deﬁne gp : R+ × R → R+ by, gp (t, x) :=

3 p+1 4 1 x 10≤x≤t + (xp+1 − (x − t)p+1 )1t<x , (p + 1)!

(5.4.19)

for any x ≥ 0 and gp (t, x) = gp (t, −x). The following lemma is a generalization of Equality (4.3) of Rio (1995) [158], which was written for p = 2. Lemma 5.4. Let p ≥ 2 be a ﬁxed integer. Let φ ∈ Φp . Suppose that φ(p+1) exists and that limx→∞ φ(p+1) (x) = 0. Then φ(x) = 0

+∞

gp (t, x)νp (dt), (p+1)

where νp is the Stieltjes measure of −φp

deﬁned by νp (dt) = −dφ(p+1) (t).

Lemma 5.4 reduces then the estimation of Eφ(|Sn |) to that of Egp (t, Sn ). Step 2. The purpose of Step 2 is then to give bounds of Ef (Sn ), for real-valued functions f belonging to a suitable set containing the functions x → gp (t, x). For this, we denote by Cp the class of real-valued, p times continuously differentiable functions f such that f (0) = · · · = f (p) (0) = 0. Let Fp (b1 , b2 ) be the subclass of Cp+1 such that f (p) ∞ ≤ b1 and that f (p+1) ∞ ≤ b2 , where f (i) ∞ = supx∈R |f (i) (x)| and f (i) is the diﬀerential of order i of f . In this step, we give an estimation of Ef (Sn ), for f ∈ Fp (b1 , b2 ). Let us note that the function gp as deﬁned by (5.4.19) belongs to the set Fp (t, 1) We ﬁrst exhibit the mains terms of our calculations. Notations. We denote by (p−2) the sum over i1 , . . . , ip−2 such that 0 := i0 ≤

i1 ≤ · · · ≤ ip−2 ≤ k − 1, that is 0≤i1 ≤···≤ip−2 ≤k−1 . We deﬁne Ep−2,k (Δf ) =

EXk Xk−i1 · · · Xk−ip−2 Δp−2,k (f, u) ,

sup

0≤u≤1

(p−2)

where 3 4 Δp−2,k (f ) := Δp−2,k (f, u) = f (Sk−ip−2 −1 + uXk−ip−2 ) − f (Sk−ip−2 −1 ) 1 = uXk−ip−2 f Sk−ip−2 −1 + uvXk−ip−2 dv. 0

We set for p ≥ 2, Ep−2,k (f ) =

(p−2)

|EXk Xk−i1 · · · Xk−ip−2 f (Sk−ip−2 −1 )|.

128

CHAPTER 5. TOOLS FOR CAUSAL CASES

Finally, recall that i0 = 0, E0,k (f ) = |EXk f (Sk−1 )| and that E0,k (Δf ) = sup |EXk Δ0,k (f, u)|. 0≤u≤1

For any real-valued function f of the set Fp (b1 , b2 ), the quantity |E(f (Sn ))| is evaluated by means of the main terms Ep−2,k (f (p−1) ) and Ep−2,k (Δf (p−1) ) as shows the following lemma. Lemma 5.5. Let p ≥ 2 be a ﬁxed integer. Let (Xn ) be a sequence of r.v’s fulﬁlling (5.4.15), centered and bounded by M . There exists a positive constant Cp depending only on p, such that for any f ∈ Fp (b1 , b2 ), $ |E(f (Sn ))|

≤ Cp

spn (b1

∧ b2 sn ) + (b1 ∧ b2 M )M

p−2

k−1 n

|Cov(Xk , Xk−i )|

k=1 i=0

+

n

Ep−2,k (f

k=1

(p−1)

)+

n

9

Ep−2,k (Δf

(p−1)

) .

k=1

From now Cp denotes a positive constant depending only on p and that will be diﬀerent from line to line. Evaluation of the main terms Ep−2,k (f ) and Ep−2,k (Δf ). The object of this step is to evaluate the main terms Ep−2,k (f ) and Ep−2,k (Δf ) of Lemma 5.5. This evaluation involves the following covariance quantities: Mm,n := M m−2

k−1 n (i + 1)m−2 |Cov(X1 , Xi+1 )|, for 2 ≤ m ≤ p, (5.4.20) k=1 i=0

Mm,n (b1 , b2 ) :=

n k−1

(b1 ∧ b2 (r + 1)M )(r + 1)m−2 M m−2 |Cov(X1 , Xr+1 )|.

k=1 r=0

(5.4.21) Let us note that M2,n is close to Var Sn and that M2,n = nVar X1 in the i.i.d. case. Those covariance quantities satisfy the following analogous of H¨ older’s inequality: (r−4)/(r−2) Mr,n ≤ srn + Mr,n , Mm,n Mr−m,n ≤ s2r/(r−2) n

(5.4.22)

for any r > 4, 2 < m < r. We now state the basic technical lemma of the proof of Theorem 5.4. Lemma 5.6. Let f be a real valued function of the set F1 (b1 , b2 ). Let (Xn ) be a centered sequence of random variables fulﬁlling (5.4.15). Suppose that (Xn )

5.4. EXPONENTIAL AND MOMENT INEQUALITIES

129

is uniformly bounded by M . Then, for any integer p ≥ 2, there exists a positive constant Cp depending only on p, for which n

Ep−2,k (Δf ) +

k=1

Ep−2,k (f )

(5.4.23)

k=1

$ ≤ Cp

n

spn (b1

∧ b 2 sn ) +

p−2

9 Mm,n Mp−m,n (b1 , b2 ) + Mp,n (b1 , b2 ) ,

m=2

where the sum

p−2 m=2

equals to 0, whenever p ∈ {2, 3}.

End of the proof of Theorem 5.4. Finally, we combine the three previous steps in order to ﬁnish the proof of Theorem 5.4. Let us explain. We ﬁrst make use of Lemma 5.4, together with Fubini’s theorem, to obtain, +∞ Egp (t, Sn ) νp (dt). (5.4.24) Eφ(|Sn |) = 0

(p−1)

We recall that the functions x → gp (t, x) and x → gp (t, x) belong respectively to Fp (t, 1) and to F1 (t, 1) (in fact if f ∈ Fp (t, 1), then f (p−1) ∈ F1 (t, 1)). Hence we deduce, applying Lemma 5.5 to the function x → gp (t, x) and Lemma (p−1) (t, x), 5.6 to the function x → gp 9 $ p−2 p Egp (t, Sn ) ≤ Cp Mm,n Mp−m,n (t, 1) + Mp,n (t, 1) + sn (t ∧ sn ) . (5.4.25) m=2

Taking into account Lemma 5.4 and the fact g (p) (x) = x ∧ t, we deduce that +∞ φ(p) (x) = (t ∧ x)νp (dt). (5.4.26) 0

Inequalities (5.4.24), (5.4.25) and (5.4.26) yield: Eφ(|Sn |) ≤ Cp spn φ(p) (sn ) +

+

n k−1 (M (i + 1))p−2 φ(p) (M (i + 1))|Cov(X1 , X1+i )| k=1 i=0 p−2

Mm,n

m=2

n k−1

(5.4.27)

(M (i + 1))p−m−2 φ(p) (M (i + 1))|Cov(X1 , X1+i )| .

k=1 i=0

Now, we use the concavity property of the function φ(p) , 1 1 t(1 − t)p−1 xp dt, (1 − t)p−1 φ(p) (tx)dt ≥ xp φ(p) (x) φ(x) = (p − 1)! 0 (p − 1)! 0

130

CHAPTER 5. TOOLS FOR CAUSAL CASES

to deduce that

xp φ(p) (x) ≤ Cp φ(x).

(5.4.28)

We conclude, combining inequalities (5.4.27) and (5.4.28), $ Eφ(|Sn |)

≤

Cp

n k−1 k=1 i=0

p−2

+

(M (i + 1))−2 φ(M (i + 1))|Cov(X1 , X1+i )| + φ(sn )

Mm,n

m=2

9 k−1 n −m−2 (M (i + 1)) φ(M (i + 1))|Cov(X1 , X1+i )| k=1 i=0

The last inequality applied to φ(x) = xr , $ E|Sn | ≤ Cr r

for r ∈]p, p + 1], leads to

p−2 n k−1 (M (i + 1))r−2 |Cov(X1 , X1+i )| + srn + Mm,n Mr−m,n k=1 i=0

9 .

m=2

The proof of Theorem 5.4 is now complete, using the last inequality together with (5.4.22) (recall that M2,n ≤ s2n ). In the case where the sequence (Xn )n∈N∗ is θ−dependent, the proof of the inequalities of Theorem 5.4 can be adapted. We then get a variation of the technical proof of Theorem 5.4 written just above. Hence it will be omitted here. Let us state the inequalities we obtained in the case where (Xn )n∈N∗ is θ−dependent. Proposition 5.5. Let r be a ﬁxed real number > 2. Let (Xn ) be a stationary sequence of θ1,∞ −dependent centered random variables. Suppose moreover that this sequence is bounded by 1. Let Sn := X1 + X2 + · · · + Xn , for n ≥ 1 and S0 = X0 = 0. Then there exists a positive constant Cr depending only on r, such that srn + Mr,n ) , (5.4.29) E|Sn |r ≤ Cr (˜

n−1

n−1 where Mr,n := n i=0 (i + 1)r−2 θ(i), and s˜2n := M2,n = n i=0 θ(i). We refer to Prieur (2002) [155] for a detailed proof of Proposition 5.5.

5.4.4

Rosenthal inequalities for τ1 -dependent sequences

We give here a corollary of Theorem 5.3. Corollary 5.5. Let (Xi )i>0 be a sequence of centered random variables belonging to Lp for some p ≥ 2. Deﬁne (Mi )i>0 , S n , QX and sn as in Theorem 5.3. Recall that τ −1 has been deﬁned in Notations 5.1. We have S n pp ≤ ap spn + nbp

0

X1

((τ /2)−1 (u))p−1 Qp−1 X ◦ GX (u)du ,

5.4. EXPONENTIAL AND MOMENT INEQUALITIES

131

where ap = 4p5p (p + 1)p/2 and (p − 1)bp = 4p5p (p + 1)p−1 . Moreover we have that s2n ≤ 4n

X1

0

(τ /2)−1 (u)QX ◦ GX (u)du .

Proof of Corollary 5.5. It suﬃces to integrate (5.4.4) (as done in Rio (2000) [161] page 88) and to note that

1

Q(u)(R(u))p−1 (u)du = 0

0

X1

((τ /2)−1 (u))p−1 Qp−1 X ◦ GX (u)du .

The bound for s2n holds with θ1,1 instead of τ1,∞ (see Dedecker and Doukhan (2002) [43]).

5.4.5

Rosenthal inequalities under projective conditions

We recall two moment inequalities given in Dedecker (2001) [42]. We use these inequalities in Chapter 10 to prove the tightness of the empirical process for α, ˜ β˜ and φ˜ dependent sequences. Proposition 5.6. Let (Xi )i∈Z be a stationary sequence of centered and square integrable random variables and let Sn = X1 + · · ·+ Xn . Let Mi = σ(Xj , j ≤ i). The following upper bound holds 1/3 Sn p ≤ (pnV∞ )1/2 + 3p2 n X03 p/3 + M1 (p) + M2 (p) + M3 (p) , where VN = E(X02 ) + 2

N

|E(X0 Xk )| and

k=1

M1 (p)

M2 (p)

M3 (p)

=

=

=

l−1 +∞ l=1 m=0 +∞ +∞

X0 Xm E(Xl+m | Mm )p/3 X0 E(Xm Xl+m − E(Xm Xl+m )| M0 )p/3

l=1 m=l +∞

1 2

X0 E(Xk2 − E(Xk2 )| M0 )p/3 .

k=1

Proposition 5.7. We keep the same notations as in Proposition 5.6. For any positive integer N , the following upper bound holds 1/3 ˜2 (p)+M3 (p) Sn p ≤ (pn (VN −1 + 2M0 (p)))1/2 + 3p2 n X03 p/3 + M˜1 (p)+ M ,

132

CHAPTER 5. TOOLS FOR CAUSAL CASES

where M0 (p) =

+∞

X0 E(Xl | M0 )p/2

l=N

˜1 (p) = M

N −1 l−1

X0 Xm E(Xl+m | Mm )p/3

l=1 m=0

˜2 (p) = M

+∞ N −1

X0 E(Xm Xl+m − E(Xm Xl+m )| M0 )p/3 .

l=1 m=l

5.5

Maximal inequalities

Our ﬁrst result is an extension of Doob’s inequality for martingales. This maximal inequality is stated in the nonstationary case. Proposition 5.8. Let (Xi )i∈Z be a sequence of square-integrable and centered random variables, adapted to a nondecreasing ﬁltration (Fi )i∈Z . Let λ be any nonnegative real number and Gk = (Sk∗ > λ). We have E((Sn∗ − λ)2+ ) ≤ 4

n

E(Xk2 1Gk ) + 8

k=1

n−1

Xk 1Gk E(Sn − Sk |Fk )1 .

k=1

Proof of Proposition 5.8. We proceed as in Garsia (1965) [90]: (Sn∗ − λ)2+ =

n

∗ ((Sk∗ − λ)2+ − (Sk−1 − λ)2+ ).

(5.5.1)

k=1

Since the sequence (Sk∗ )k≥0 is nondecreasing, the summands in (5.5.1) are nonnegative. Now ∗ ∗ − λ)+ )((Sk∗ − λ)+ + (Sk−1 − λ)+ ) > 0 ((Sk∗ − λ)+ − (Sk−1 ∗ . In that case Sk = Sk∗ , whence if and only if Sk > λ and Sk > Sk−1 ∗ ∗ − λ)2+ ≤ 2(Sk − λ)((Sk∗ − λ)+ − (Sk−1 − λ)+ ). (Sk∗ − λ)2+ − (Sk−1

Consequently (Sn∗ − λ)2+

≤

2

n

(Sk − λ)(Sk∗ − λ)+ − 2

k=1

≤

2(Sn − λ)+ (Sn∗ − λ)+ − 2

n

∗ ((Sk − λ)(Sk−1 − λ)+ )

k=1 n

∗ (Sk−1 − λ)+ Xk .

k=1

5.5. MAXIMAL INEQUALITIES

133

Noting that 2(Sn − λ)+ (Sn∗ − λ)+ ≤ (Sn∗

−

λ)2+

≤ 4(Sn −

1 ∗ (S − λ)2+ + 2(Sn − λ)2+ , we infer 2 n

λ)2+

−4

n

∗ (Sk−1 − λ)+ Xk .

(5.5.2)

k=1

In order to bound (Sn − λ)2+ , we adapt the decomposition (5.5.1) and next we apply Taylor’s formula: (Sn − λ)2+

n

=

((Sk − λ)2+ − (Sk−1 − λ)2+ )

k=1 n

=

2

(Sk−1 − λ)+ Xk + 2

k=1

n

Xk2

1

0

k=1

(1 − t)1Sk−1 +tXk >λ dt.

Since 1Sk−1 +tXk >λ ≤ 1Sk∗ >λ , it follows that (Sn − λ)2+ ≤ 2

n

(Sk−1 − λ)+ Xk +

k=1

n

Xk2 1Sk∗ >λ .

(5.5.3)

k=1

Hence, by (5.5.2) and (5.5.3) (Sn∗ − λ)2+ ≤ 4

n

∗ (2(Sk−1 − λ)+ − (Sk−1 − λ)+ )Xk + 4

k=1

n

Xk2 1Sk∗ >λ .

k=1

Let D0 = 0 and Dk = 2(Sk − λ)+ − (Sk∗ − λ)+ for k > 0. Clearly Dk−1 Xk =

k−1

(Di − Di−1 )Xk .

i=1

Hence (Sn∗

−

λ)2+

≤4

n−1

n

i=1

k=1

(Di − Di−1 )(Sn − Si ) + 4

Xk2 1Sk∗ >λ .

(5.5.4)

Since the random variables Di − Di−1 are Fi -measurable, we have: E((Di − Di−1 )(Sn − Si )) = ≤

E((Di − Di−1 )E(Sn − Si | Fi )) E|(Di − Di−1 )E(Sn − Si | Fi )|. (5.5.5)

∗ − λ)+ , then It remains to bound |Di − Di−1 |. If (Si∗ − λ)+ = (Si−1

|Di − Di−1 | = 2|(Si − λ)+ − (Si−1 − λ)+ | ≤ 2|Xi |1Si∗ >λ ,

134

CHAPTER 5. TOOLS FOR CAUSAL CASES

because Di − Di−1 = 0 whenever Si ≤ λ and Si−1 ≤ λ. Otherwise Si = Si∗ > λ ∗ and Si−1 ≤ Si−1 < Si , which implies that ∗ Di − Di−1 = (Si − λ)+ + (Si−1 − λ)+ − 2(Si−1 − λ)+ .

Hence Di − Di−1 belongs to [0, 2((Si − λ)+ − (Si−1 − λ)+ ) ]. In any case |Di − Di−1 | ≤ 2|Xi |1Si∗ >λ , which together with (5.5.4) and (5.5.5) implies Proposition 5.8. Consider the projection operators Pi : for any f in L2 , Pi (f ) = E(f | Fi ) − E(f | Fi−1 ). Combining Proposition 5.8 and a decomposition due to McLeish, we obtain the following maximal inequality. Proposition 5.9. Let (Xi )i∈Z be a sequence of square-integrable and centered ﬁltration. Deﬁne the σrandom variables, : and (Fi )i∈Z be any nondecreasing : algebras F−∞ = i∈Z Fi and F∞ = σ( i∈Z Fi ). Deﬁne the random variables Sn = X1 + · · · + Xn and Sn∗ = max{0, S1 , . . . , Sn }. For any i in Z, let

j ∗ (Yi,j )j≥1 be the martingale Yi,j = k=1 Pk−i (Xk ) and Yi,n = max{0, Yi,1 , . . . , ∗ > λ}. AsYi,n }. Let λ be any nonnegative real number and G(i, k, λ) = {Yi,k sume that the sequence is regular: for any integer k, E(Xk |F−∞ ) = 0 and E(Xk |F∞ ) = Xk . For any two sequences of nonnegative numbers (ai )i≥0 and is ﬁnite and bi = 1 we have (bi )i≥0 such that K = a−1 i ∞ n 2 E (Sn∗ − λ)2+ ≤ 4K ai E(Pk−i (Xk )1G(i,k,bi λ) ) . i=0

k=1

Proof of Proposition 5.9. Since the sequence is regular, we decompose Xk =

+∞

Pk−i (Xk ).

i=−∞

Consequently Sj =

Yi,j and therefore: (Sj −λ)+ ≤

i∈Z

(Yi,j −bi λ)+ . Applying

i∈Z

H¨older inequality and taking the maximum on both sides, we get ∗ (Sn∗ − λ)2+ ≤ K ai (Yi,n − bi λ)2+ . i∈Z

Taking the expectation and applying Proposition 5.8 to the martingale (Yi,n )n≥1 , we obtain Proposition 5.9.

Chapter 6

Applications of strong laws of large numbers We consider in this chapter a stochastic algorithm with weakly dependent input noise (according to Deﬁnition 2.2). In particular, the case of γ1 -dependence is considered. The ODE (ordinary diﬀerential equation) method is generalized to such situation. For this, we use tools for causal and non causal sequences developed in the previous chapters. Illustrations to the linear regression frame and to the law of large numbers for triangular arrays of weighted dependent random variables are also given.

6.1

Stochastic algorithms with non causal dependent input

We consider the Rd -valued stochastic algorithm, deﬁned on a probability space (Ω, A, P) and driven by the recurrence equation Zn+1 = Zn + γn h(Zn ) + ζn+1 , where • h is a continuous function from an open set G ⊆ Rd to Rd , • (γn ) a decreasing to zero deterministic real sequence satisfying γn = ∞.

(6.1.1)

(6.1.2)

n≥0

• (ζn ) is a “small” stochastic disturbance. The ordinary diﬀerential equation (ODE) method associates (we refer for instance to Benveniste et al. (1987) [15], Duﬂo (1996) [82], Kushner and Clark 135

136

CHAPTER 6. APPLICATIONS OF SLLN

(1978) [114]) the possible limit sets of (6.1.1) with the properties of the associated ODE dz = h(z). dt

(6.1.3)

These sets are compact connected invariant and “chain-recurrent” in the Bena¨ım sense for the ODE (cf. Bena¨ım (1996) [14]). These sets are more or less complicated. Various situations may then happen. The most simple case is an equilibrium : z is a solution of h(z) = 0, but equilibria cycle, or a ﬁnite set of equilibria is linked to the ODE’s trajectories, connected sets of equilibria or, periodic cycles for the ODE may also happen. . . In order to use the ODE method, we suppose that (Zn ) is a.s. bounded and ζn+1

=

cn (ξn+1 + rn+1 ),

where (cn ) denotes a nonnegative deterministic sequence such that γn = O(cn ), c2n < ∞,

(6.1.4)

(6.1.5)

(ξn ) and (rn ) are Rd -valued sequences, deﬁned on (Ω, A, P), and adapted with respect to an increasing sequence of σ-ﬁelds (Fn )n≥0 and satisfying almost surely (a.s.) on A ⊂ Ω, ∞

cn ξn+1 < ∞

a.s.

and

(6.1.6)

n=0

lim rn = 0 a.s.

n→∞

(6.1.7)

The classical theory of algorithms is related to a noise (ξn ) which is a martingale diﬀerence sequence. Our aim is to replace this condition about the noise by weakly dependence conditions as being introduced in Chapter 2. In Section 6.1.1, we suppose that the sequence (ξn ) is (Λ(1) ∩ L∞ , Ψ)-dependent according to Deﬁnition 2.2, where Ψ(f, g) = C(df , dg )(Lip(f ) + Lip(g)), for some function C : N∗ × N∗ → R. Various examples of this situation may be found in Chapter 3, they include general Bernoulli shifts, stable Markov chains such as, ξt = G(ξt−1 , . . . , ξt−p ) +

ζt , ξt = a0 + j≥1 aj ξt−j ζt generated by some i.i.d. sequence (ζt ), or ARCH(∞) models. In Section 6.1.2 below, we consider a weakly dependent noise in the sense of the γ1 -weak coeﬃcients of Dedecker and Doukhan (2003) [43] deﬁned by (2.2.17).

6.1. STOCHASTIC ALGORITHMS

137

Note that (cf. Remark 2.5) a causal version of (F , G, Ψ)-dependence implies γ1 -dependence, where the left hand side in deﬁnition of weak dependence writes ≤ C(v)Lip(g) (r). Counter-examples of γ1 -dependent sequences which are not θ-dependent may also be found there. The notion of γ1 -dependence is generalized to Rd valued sequences. Proposition 6.1. The two following assertions are equivalent: (i) A Rd -valued sequence (Xn ) is γ1 -dependent, (ii) Each component (Xn ) ( = 1, . . . , d) of (Xn ) is γ1 - dependent. Proof. Clearly, E(Xn+r −E(Xn+r )|Fn )1 ≤ E(Xn+r −E(Xn+r )|Fn )1 , hence (i) implies (ii). The second implication follows from,

; < d < 2 E(Xn+r E(Xn+r − E(Xn+r )|Fn )1 = E= − E(Xn+r )|Fn ) .

=1

The two forthcoming sections are devoted to provide moment inequalities of the Marcinkiewicz-Zygmund type adapted to deduce the relation (6.1.6) in those two frames. The following sections are devoted to study the examples of RobbinsMonro and Kiefer-Wolfowitz algorithms and to obtain suﬃcient conditions for the complete convergence of triangular arrays, extending on Chow (1966) [37]. Finally, the last section is devoted to the speciﬁc of the linear regression algorithm with dependent entries. In [36], Chen (1985) has also studies this topic. He works in a more general matrix valued framework. Assuming only the stationarity and the ergodicity of entries, he derives the a.s. convergence of the algorithm. We get the same result with a γ1 -dependence assumption, but this assumption, more restrictive, allow us to reach, thanks to a moment technic, a precise n−1/2 -convergence rate.

6.1.1

Weakly dependent noise

n Let (ξn ) be a sequence of centered random variables. Let Sn be the sum i=1 ξi and Cq = maxu+v≤q C(u, v). Suppose that an analogous of the bounds (4.3.2) and (4.3.3) are satisﬁed by the process ξ: sup |Cov(ξt1 · · · ξtm , ξtm+1 · · · ξtq )| ≤ Cq q γ M q−2 (r),

(6.1.8)

where the supremum is taken over all {t1 , . . . , tq } such that 1 ≤ t1 ≤ · · · ≤ tq , and 1 ≤ m < q such that tm+1 − tm = r, or |Cov(ξt1 · · · ξtm , ξtm+1 · · · ξtq )| ≤ (Cq ∨ 2)

0

(r)∧1

Qξt1 (x) · · · Qξtq (x)dx. (6.1.9)

138

CHAPTER 6. APPLICATIONS OF SLLN

Those bounds respectively imply moment inequalities in Theorems 4.2 and

n 4.3. Denoting Σn = i=1 ci−1 ξi , and using similar techniques as in § 4.3 (see Doukhan and Louhichi (1999) [67]) one has, Proposition 6.2. Let p ≥ 2 be some ﬁxed integer and let (ξn ) be a centered sequence of real random variables such that (6.1.8) holds for all q ≤ p. Then for n ≥ 2, |EΣpn |

(2p − 2)! ≤ (p − 1)!

$ γ

Cp p M

p−2

n

cpi−1

i=1

n−1

p−2

(r + 1)

(r)

r=0

∨

C2 2γ

n i=1

c2i−1

n−1 r=0

p/2 ⎫ ⎬

(r) . ⎭

This result is mainly adapted to bounded sequences. Proof of proposition 6.2. The proof is done in Brandi`ere and Doukhan (2004) [32]. We have, using arguments from Doukhan and Louhichi’s (1999) [67] as done for the proof of Theorem 4.2, p n ci ξi ≤ p! ct1 · · · ctp |E(ξt1 · · · ξtp )|. (6.1.10) E 1≤t1 ≤···≤tp ≤n

i=1

Denote Ap (n) = tp−1 , Ap (n)

≤

1≤t1 ≤···≤tp ≤n ct1

· · · ctp |E(ξt1 · · · ξtp )|, so for any t2 ≤ tm ≤

ct1 · · · ctp |E(ξt1 · · · ξtm )E(ξtm+1 · · · ξtp )|

1≤t1 ≤···≤tp ≤n

+

ct1 · · · ctp |Cov(ξt1 · · · ξtm , ξtm+1 · · · ξtp )|.

1≤t1 ≤···≤tp ≤n

Denote A1p (n)

=

ct1 · · · ctp |E(ξt1 · · · ξtm )E(ξtm+1 . . . ξtp )|,

1≤t1 ≤···≤tp ≤n

A2p (n)

=

ct1 · · · ctp |cov(ξt1 · · · ξtm , ξtm+1 · · · ξtp )|.

1≤t1 ≤···≤tp ≤n

Since the sequence (cn ) is decreasing to 0, we deduce, as in Doukhan and Louhichi (1999) [67], A1p (n) ≤ Am (n)Ap−m (n).

(6.1.11)

6.1. STOCHASTIC ALGORITHMS

139

n p n−1 γ p−2 (r + 1)p−2 (r), and By (6.1.8) we obtain A2p (n) ≤ t1 =1 ct1 r=0 Cp p M

n p n−1 γ p−2 the expression (r + 1)p−2 (r) = Vp (n), veriﬁes, for i=1 ci r=0 Cp p M q−2

p−q

any integer 2 ≤ q ≤ p − 1 : Vq (n) ≤ Vpp−2 (n)V2p−2 (n). Now, Lemma 4.7 (see 12 of Doukhan and Louhichi (1999) [67]) leads to Ap (n) ≤ also Lemma p 2p − 2 1 (V22 (n) ∨ Vp (n)), hence p p−1 n p (2p − 2)! p2 (V (n) ∨ Vp (n)). E ci ξi ≤ (p − 1)! 2 i=1 This ensures the result. The following result is appropriate to more general real-valued random variables but require a moment assumption and a tail condition. Proposition 6.3. Let p > 2 be a ﬁxed integer and (ξn ) be a centered sequence of random variables. Assume that for all 2 < q ≤ p, Inequality (6.1.9) holds with Mq

q−2

≤

p−q

Mpp−2 M2p−2

(6.1.12)

and there exists a constant c > 0 such that ∃k > p, Then for n ≥ 2, |EΣpn |

≤

∀i ≥ 0 :

P(|ξi | > t) ≤

c . tk

(6.1.13)

$ n n−1 k−p (2p − 2)! 1/k p p−2 c ci−1 (r + 1) (r) k Mp (p − 1)! r=0 i=1 p/2 ⎫ n−1 n ⎬ k−2 (6.1.14) ∨ M2 c2i−1

(r) k ⎭ r=0

i=1

Note that (6.1.13) holds as soon as the ξn ’s have a k-th order moment such that supi≥0 E|ξi |k ≤ c. Now we argue as in Billingsley (1968) [20]: if (6.1.8) holds for some p such that $ γ

Cp p M

p−2

n i=1

cpi−1

n−1

p−2

(r + 1)

r=0

(r) ⎛

∨ ⎝ C2 2γ

n i=1

c2i−1

n−1 r=0

p/2 ⎫ ⎬

(r) <∞ ⎭

140

CHAPTER 6. APPLICATIONS OF SLLN

then for any t > 0, limn→∞ P(supk≥1 |Σn+k − Σn | > t) = 0. Thus (Σn ) is a.s. a Cauchy sequence, hence it converges. In the same way, if (6.1.9) holds for some p such that n p/2 n n−1 ∞ p k−p k−2 p−2 2 ci−1 (r + 1) (r) k ci−1

(r) k < ∞, (6.1.15) ∨ i=1

r=0

i=1

r=0

then (Σn ) converges with probability 1. Proof of proposition 6.3. by (6.1.9)

Using the same notations as in the previous proof,

Vp (n) ≤ Mp

n

cpi

1

0

i=1

( −1 (u) ∧ n)p−1 Qpi (u)du,

where (u) = [u] ([u] denotes the integer part of u). Denote Wp (n) = Mp

n

cpi

i=1

1

0

( −1 (u) ∧ n)p−1 Qpi (u)du.

If (6.1.12) is veriﬁed, then q−2

p−q

Wq (n) ≤ Wpp−2 (n)W2p−2 (n), which completes the proof.

6.1.2

γ1 −dependent noise

Let (ξn )n≥0 be a sequence of integrable real-valued random variables, and (γ1 (r))r≥0 be the associated mixingale-coeﬃcients deﬁned in (2.2.17). Then the following moment inequality holds. Proposition 6.4. Let p > 2 and (ξn )n∈N be a sequence of centered random variables such that (6.1.13) holds. Then for any n ≥ 2, ⎞p/2 ⎛ n 2(k−p) n−1 2(k−p) 2− ci p(k−1) γ1 (j) p(k−1) ⎠ , (6.1.16) |EΣpn | ≤ ⎝2pK1 i=1

j=0

where K1 depends on r, p and c. Notice that here p ∈ R, and is not necessarily an integer. If now (6.1.16) holds for some p such that ∞ i=1

c2−m i

∞ j=0

γ1 (j)m < ∞,

(6.1.17)

6.1. STOCHASTIC ALGORITHMS

141

where m = 2(k−p) p(k−1) < 1, then (Σn ) converges with probability 1. The result extends to Rd . Indeed, if we consider a centered Rd -valued and γ1 -dependent sequence (ξn )n≥0 , one has as previously EΣn = E p

n d =1

p ci ξi

,

i=1

and if each component (ξn )n≥0 ( = 1, . . . , d) is γ1 -dependent and veriﬁes (6.1.13) and (6.1.17), EΣn p < ∞ and we conclude as before that (Σn )n≥0 converges a.s. Proof of proposition 6.4. we deduce

Proceeding as in Dedecker and Doukhan (2003) [43], |E(Σpn )| ≤

2p

n

p2 bi,n

,

i=1

where bi,n

−i

= max ci ξi E(ci+k ξi+k |Fi ) .

p i≤≤n

k=0

2

Let q = p/(p − 2), then there exists Y such that Y q = 1. Applying Proposition 1 of Dedecker and Doukhan (2003) [43], we obtain bi,n ≤

n−i k=0

γ1 (k)

0

where GX is the inverse of x → G( cui ), we get bi,n ≤

n−i k=0

γ1 (k)

0

Q{Y ci ξi } ◦ G

Q{Y ci ξi } ◦ G{ci+k ξi+k } (u)du, x 0

QX (u)du. Since G{ci ξi } (u) = Gξi ( cui ) =

u

du ≤

ci+k

n−i k=0

ci+k

0

γ1 (k) ci+k

Q{Y ci ξi } ◦ G(u)du,

and the Fr´echet inequality (1957) [88] yields bi,n

≤

n−i

ci+k

k=0

≤

n−i k=0

γ (k)

G( c1

i+k

0

ci ci+k

0

1

)

QY (u)Q{ci ξi } (u)Q(u)du

1 u≤G( γ1 (k) ) Q2 (u)QY (u)du ci+k

142

CHAPTER 6. APPLICATIONS OF SLLN

older’s inequality, we also obtain where Q = Qξi . Using H¨ bi,n ≤ ci

n−i

ci+k

k=0 1

p2

1 p

1{u≤G( γ1 (k) )} Q (u)du

0

1

By (6.1.13), Q(u) ≤ c r u− r and setting K =

bi,n

≤

ci

n−i k=0

≤

ci

n−i k=0

ci+k

.

ci+k

0

1

1

r−1 1

rc r

yields

p

γ (k)

u≤G( c1

i+k

)

p

c r u− r du

p2

r (1− pr ) p2 γ1 (k) r−1 ci+k K . ci+k 2(r−p)

Noting that (cn )n≥0 is decreasing, the result follows with K1 = K p(r−1) . Equip Rd with its p-norm (x1 , . . . , xd )pp = xp1 + · · · + xpd . Let (ξn )n≥0 be an Rd -valued and (F , Ψ)- dependent sequence. Set ξn = (ξn1 , . . . , ξnd ) then

n

d n p i=1 ci ξi p = =1 ( i=1 ci ξi )p . If each component (ξn )n≥0 is (F , Ψ)-dependent and such that a relation like (6.1.15) holds, then EΣn pp < ∞. Arguing as before, we deduce that the sequence (Σn )n≥0 converges with probability 1.

6.2

Examples of application

6.2.1

Robbins-Monro algorithm

The Robbins-Monro algorithm is used for dosage, to obtain level a of a function f which is usually unknown. It is also used in mechanics, for adjustments, as well as in statistics to ﬁt a median (Duﬂo (1996) [82], page 50). It writes Zn+1 = Zn − cn (f (Zn ) − a) + cn ξn+1 ,

(6.2.1)

2 with cn = ∞ and cn < ∞. It is usually assumed that the prediction error (ξn ) is an identically distributed and independent random variables, but this does not look natural. Weak dependence seems more reasonable. Hence the previous results, ensure the convergence a.s. of this algorithm, under

the usual assumptions and the conditions yielding the a.s. convergence of n0 cn ξn+1 . Under the assumptions of Proposition 6.2, if for some integer p > 2 ∞ r=0

(r + 1)p−2 (r) < ∞,

(6.2.2)

6.3. WEIGHTED DEPENDENT TRIANGULAR ARRAYS

143

then the algorithm (6.2.1) converges a.s. If the assumptions of Proposition 6.3 hold, then the convergence a.s. of the algorithm (6.2.1) is ensured as soon as, for some p > 2,

∞ k−p (r + 1)p−2 (r) k

∨

r=0

∞

(r)

k−2 k

< ∞.

r=0

Under the assumptions of Proposition 6.4, as soon as (6.1.17) is satisﬁed, the algorithm (6.2.1) converges with probability 1.

6.2.2

Kiefer-Wolfowitz algorithm

It is also a dosage algorithm. Here we want to reach the minimum z ∗ of a real function V which is C 2 on an open set G of Rd . The Kiefer-Wolfowiftz algorithm (Duﬂo (1996) [82], page 53) is stated as: Zn+1 = Zn − 2cn ∇V (Zn ) − ζn+1

(6.2.3)

where ζn+1 = cbnn ξn+1 + cn b2n q(n, Zn ), q(n, Zn ) ≤ K (for some K > 0),

cn = ∞, n cn b2n < ∞ and n (cn /bn )2 < ∞ (for instance, cn = n1 , bn = n−b with 0 < b < 12 ). Usually, the prediction error (ξn ) is assumed to be i.i.d, centered, square integrable and independent of Z0 . The previous results improve on this assumption until weakly innovations. It is now enough to ensure the a.s. conver cdependent n gence of bn ξn+1 . The weak dependence assumptions are the same as for the Robbins-Monro algorithm. Concerning the γ1 -weak dependence, the condition (6.1.17) is replaced by ∞ 2−m ∞ ci i=1

6.3

bi

γ1 (i)m < ∞.

i=1

Weighted dependent triangular arrays

In this section, we consider a sequence (ξi )i≥1 of random variables and a triangular array of non-negative constant weights {(cni )1≤i≤n ; n ≥ 1}. Let Un =

n

cni ξi .

i=1

If the ξi ’s are i.i.d., Chow (1966) [37] has established the following complete convergence result.

144

CHAPTER 6. APPLICATIONS OF SLLN

Theorem (Chow (1966) [37]) Let (ξi )i be independent and identically distributed random variables with Eξi = 0 and E|ξi |q < ∞ for some q ≥ 2. If for some constant K (non depending on n), n1/q max1≤i≤n |cni | ≤ K, and

n 2 i=1 cni ≤ K then, ∞

∀t > 0,

P(n−1/q |Un | ≥ t) < ∞.

n=1

The last inequality is a result of complete convergence of n−1/q |Un | to 0. This notion was introduced by Hsu and Robbins (1947) [109]. Complete convergence implies the almost sure convergence from the Borel-Cantelli Lemma. Li et al.(1995) [120] extend this result to arrays (cni ){n≥1 i∈Z} for q = 2. Quote also Yu (1990) [196], who obtains a result analogue to Chow’s for martingale diﬀerences. Ghosal and Chandra (1998) [91] extend the previous results and prove some similar results to these of Li et al.(1995) [120] for martingales diﬀerences. As in [120], their main tool is Hoﬀmann-Jorgensen Inequality (HoﬀmannJorgensen (1974) [107]). Peligrad and Utev (1997)

n [143] propose a central limit theorem for partial sums of a sequence Un = i=1 cni ξi where supn c2ni < ∞, max1≤i≤n |cni | → 0 as n → ∞ and ξi ’s are in turn, pairwise mixing martingale diﬀerence, mixing sequences or associated sequences. Mcleish (1975) [128], De Jong (1996) [56], and, more recently Shinxin (1999) [177], extend the previous results in the case of Lq -mixingale arrays. Those results have various applications. They are used for the proof of strong convergence of kernel estimators. Li et al.(1995) [120] results are extended to our weak dependent frame. A straightforward consequence of Proposition 6.3 is the following. Corollary 6.1. Under the assumptions of Proposition 6.3, if q is an even integer such that k > q > p, and if for some constant K, non depending on n n q−1 )k, or we assume that i=1 c2n,i−1 < K, and if (r) = O(r−α ), with α > ( k−q

(r) = O(e−r ), then for all positive real number t, P(n−1/p |Un | ≥ t) < ∞. n

Proof. Proposition 6.3 implies E|Un |

q

n

n−1 n (2q − 2)! 1/k q q−2 (k−q)/k c ≤ cn,n−i (r + 1) (r) Mq (q − 1)! r=0 i=1 n n−1 2 (k−2)/k p/2 ∨ M2 cn,n−i

(r) ) . r=0

q−1 < K and (r) = O(r−α ), with α > ( k−q )k, then there is some E|Un |q K1 > 0 with E|Un |q < K1 , ¿the result follows from P(n−1/p |Un | > t) ≤ q q/p . t n

If

2 i=1 cn,i−1

i=1

6.4. LINEAR REGRESSION

145

n

2 −θr ), i=1 cn,i−1< K and (r) = O(e −1/p P n |Un | > t < ∞. K2 and n

If

then E|Un |q < K2 for a real constant

The following corollary is a direct consequence of proposition 6.4. Corollary 6.2. Suppose that all the assumptions of Proposition 6.4 are satis ∞ ∞ m ﬁed. If q > p, k > q > 1, and i=1 c2−n γ (j) < ∞ where m = 2q ( k−q 1 ni j=0 k−1 ), then for any positive real number t, P(n−1/p |Un | ≥ t) < ∞. n

⎛ Proof. We have from Proposition 6.4, E|Un |q ≤ ⎝2qK1 Now the relation

∞

2−n i=1 cni

n i=1

∞

m < ∞ implies j=0 γ1 (j)

⎞q/2 γ1 (j)m ⎠

.

j=0

P(n−1/p |Un | > t) <

n

∞. This concludes the proof.

6.4

c2−m ni

n−1

Linear regression

We observe a stationary bounded sequence, (yn , xn ) ∈ R × Rd , deﬁned on a probability space (Ω, A, P). We look for the vector Z ∗ which minimizes the linear prediction error of yn with xn . We identify the Rd -vector xn and its column matrix in the canonical basis. So Z ∗ = arg min E[(yn − xTn Z)2 ]. Z∈Rd

This problem leads to study the gradient algorithm Zn+1 = Zn + cn (yn+1 − xTn+1 Zn )xn+1 , where cn = ng with g > 0 (so (cn ) veriﬁes (6.1.2) and (6.1.5)). Let Cn+1 = xn+1 xTn+1 , we obtain: Zn+1 = Zn + cn (yn+1 xn+1 − Cn+1 Zn ).

(6.4.1)

Denote U = E(yn+1 xn+1 ), C = E(Cn+1 ), Yn = Zn − C −1 U and h(Y ) = −CY , then (6.4.1) becomes : Yn+1 ζn+1

= =

Yn + cn h(Yn ) + cn ζn+1 , with −1 (yn+1 xn+1 − Cn+1 C U ) + (C − Cn+1 )Yn .

Note that the solutions of (6.1.3) are the trajectories z(t) = z0 e−Ct ,

(6.4.2) (6.4.3)

146

CHAPTER 6. APPLICATIONS OF SLLN

so every trajectory converges to 0, the unique equilibrium point of the diﬀerentiable function h (Dh(0) = −C and 0 is an attractive zero of h). Denoting Fn = (σ(Yi ) / i ≤ n), we also deﬁne the following assumption A-lr: C is not singular, (Cn ) and (yn , xn ) are γ1 -dependent sequences with γ1 (r) = O(ar ) for a < 1. Note that if (yn , xn )n∈N is θ1,1 -dependent (see Deﬁnition 2.3) then A-lr is satisﬁed. First, note that if a Rd -valued sequence (Xn ) is θ1,1 -dependent, any t Rj -valued sequence (j = 1, . . . , d − 1) (Yn ) = (Xnt1 , . . . , Xnj ) is θ1,1 -dependent. So, if (yn , xn ) is θ1,1 -dependent, then so are (yn ) and (xjn ) (j = 1, . . . , d). Let f a bounded 1-Lipschitz function, deﬁned on R and g the function deﬁned on R2 by g(x, y) = f (xy). It is enough to prove that g is a Lipschitz function deﬁned on R2 . |g(x, y) − g(x , y )| |x − x | + |y − y |

|xy − x y | |x − x | + |y − y | |x||y − y | + |y ||x − x | ≤ |x − x | + |y − y | ≤ max(|x|, |y |),

≤

and g is Lipschitz as soon as x and y are bounded. Thus, since (xn ) and (yn ) are bounded, the result follows. Denoting M = supn xn 2 , one has, Proposition 6.5. Under Assumption A-lr (Yn ) is a.s. bounded and the perturbation (ζn ) of algorithm (6.4.2) splits into three terms of which two are γ1 dependent and one is a rest leading to zero. So the ODE method assures the a.s 1 then convergence of Yn to zero ( hence Z ∗ = C −1 U ). Moreover if g < 2M √ nYn = O(1),

a.s.

(6.4.4)

Proof of Proposition 6.5. To start with, we prove that Yn → 0 a.s by assuming that (Yn ) is a.s bounded. Then we justify this assumption and ﬁnally we prove (6.4.4). The perturbation ζn+1 splits into two terms : (yn+1 xn+1 − Cn+1 C −1 U ) and (C − Cn+1 )Yn . The ﬁrst term is centered and obvious γ1 -dependent with a dependence coeﬃcient γ1 (r). Now γ1 (r) = O(ar ) thanks to Assumption A-lr. It remains to study (C − Cn+1 )Yn . Study of (C − Cn+1 )Yn : write (C − Cn+1 )Yn = ξn+1 + rn+1 with ξn+1 = (C − Cn+1 )Yn − E[(C − Cn+1 )Yn ] and rn+1 = E[(C − Cn+1 )Yn ]. We will prove that the sequence (ξn ) is γ1 -dependent with an appropriate dependent coeﬃcient and that limn→∞ rn = 0. Notice that

6.4. LINEAR REGRESSION rn+1 = E[(C − Cn+1 )

147 n−1

(Yj+1 − Yj )] + E[(C − Cn+1 )Y n2 ],

j= n 2

and since Yj+1 − Yj = −cj Cj+1 Yj + cj (yj+1 xj+1 − Cj+1 C −1 U ), we obtain rn+1 = E(C − Cn+1 )Y n2 −

n−1

E(C − Cn+1 )cj Cj+1 Yj

j= n 2

+

n−1

E(C − Cn+1 )cj (yj+1 xj+1 − Cj+1 C −1 U ).

j= n 2

If n2 is not an integer, we replace it by n−1 2 . In the same way, in the ﬁrst sum

j−1 we replace Yj by i=j/2 (Yi+1 − Yi ) + Yj/2 with the same remark as above if j/2 is not an integer. So

rn+1 =

n−1

E(C − Cn+1 )cj (yj+1 xj+1 − Cn+1 C −1 U ) + E(C − Cn+1 )Y n2

j= n 2

−

n−1

⎡

E(C−Cn+1 )cj Cj+1 ⎣

j= n 2

j−1

⎤ −ci Ci+1 Yi + ci (yi+1 xi+1 − Ci+1 C −1 U ) + Yj/2 ⎦

i=j/2

Expectations conditionally with respect to Fj+1 of each term of the second sum and with respect F n2 of the last term give, by assuming that (Yn ) is bounded: rn+1 ≤ A + K1

n−1

cj γ1 (n + 1 − j)γ1 (j + 1) + K2 γ1 (n/2 + 1), (6.4.5)

j= n 2

where A denote the last sum of the previous representation of rn , and Ki (for i = 1, 2, . . .) non-negative constants. Moreover, A= −

n−1

E(C − Cn+1 )cj (Cj+1 − C)[

j= n 2

×

j−1

−ci Ci+1 Yi + ci (yi+1 xi+1 − Ci+1 C −1 U )]

i=j/2

−

n−1 j= n 2

E(C − Cn+1 )cj (Cj+1 − C)Yj/2

148

CHAPTER 6. APPLICATIONS OF SLLN

+

n−1

E(C − Cn+1 )cj C[

j= n 2

+

n−1

j−1

−ci Ci+1 Yi + ci (yi+1 xi+1 − Ci+1 C −1 U )]

i=j/2

E(C − Cn+1 )cj CYj/2 .

j= n 2

Expectations conditionally successively with respect to Fj+1 and Fi+1 of each term of the ﬁrst and the third sum and with respect Fj+1 then F j of the second 2 and the fourth sum give, by assuming that (Yn ) is bounded, A ≤ K3

n−1

cj γ1 (n − j)

j= n 2

j−1

ci γ1 (j − i) + K4

n−1

cj γ1 (n − j)γ1 (j/2).(6.4.6)

j= n 2

i=j/2

Since cj = gj , (6.4.5) and (6.4.6) involve that rn is O(n−2 ), so rn converges to zero. On the other hand, for r ≥ 6: E(ξn+r |Fn ) = =

E[(C − Cn+r )Yn+r−1 |Fn ] − E(ξn+r ), n+r−2

E[(C − Cn+r )(Yj+1 − Yj )|Fn ]

j=n+ r2

+

E[(C − Cn+r )Yn+ r2 |Fn ] − rn+r .

Note also that if r2 is not an integer, we replace it by technics as above, we obtain

r+1 2 .

Using the same

EE(ξn+r |Fn ) ⎛ ⎞ j−1 n+r−2 ≤ K5 ⎝ cj γ1 (n + r − j − 1) ci γ1 (j − i) + γ1 (r/2)⎠ + O((n + r)−2 ) j=n+ r2

i=j/2

hence, r r EE(ξn+r |Fn ) ≤ O((n + )−2 ) + K5 γ1 ( ) + O((n + r)−2 ), 2 2 and E(ξn+r |Fn )1 = γ1 (r), with γ1 (r) = O(r−2 ). So (ξn ) is γ1 -dependent and since (6.1.17) is satisﬁed, the ODE method may be used and Yn converges to 0 a.s. √ Now we prove that Yn is a.s bounded. Let V (Y ) = Y T CY = CY 2 . Since C is not singular, V is a Lyapounov function and ∇V (Y ) = 2CY is a Lipschitz function, so we have V (Yn+1 ) ≤ V (Yn ) + (Yn+1 − Yn )T ∇V (Yn ) + K6 Yn+1 − Yn 2 .

6.4. LINEAR REGRESSION

149

Furthermore Yn+1 − Yn 2 ≤ 2c2n (yn+1 xn+1 − Cn+1 C −1 U 2 + 2c2n Cn+1 Yn 2 ). Since (yn , xn ) is bounded, (Cn ) and yn+1 xn+1 −Cn+1 C −1 U 2 are also bounded. Moreover K7 V (Y ) Cn+1 Yn 2 ≤ K7 Yn 2 ≤ λmin (C) where λmin (C) is the smallest eigenvalue of C. So V (Yn+1 ) ≤ V (Yn )(1 + K8 c2n ) + K9 c2n + 2(Yn+1 − Yn )T CYn . The last term becomes 2(Yn+1 − Yn )T CYn

=

−2cn CYn 2 + 2cn (yn+1 xn+1 − Cn+1 C −1 U )T CYn

≤

+2cn YnT (C − Cn+1 )CYn , −2cn CYn 2 + cn K10 CYn + 2cn YnT (C − Cn+1 )CYn

≤

−2cn CYn 2 + cn K10 CYn + 2cn un+1 V (Yn ),

where un = max{XiT (C − Cn )Xi 1 ≤ i ≤ d} and {X1 , . . . , Xd } is an orthogonal basis of unit eigenvectors of C. We now obtain V (Yn+1 ) ≤ +

V (Yn )(1 + K8 c2n + 2cn un+1 ) K9 c2n − cn (2CYn 2 − K10 CYn ).

(6.4.7)

Note that under the assumption A-lr (un ) is a γ1 -dependent sequence with a ∞ weakly dependent coeﬃcient γ1 (r) = O(ar ) and n=0 cn un+1 < ∞. If V (Yn ) ≥

2 K10 4λmin (C) ,

then CYn ≥ K10 /2 and −(2CYn 2 − K10 CYn ) ≤ 0. K2

Denote T = inf{n / V (Yn ) ≤ 4λmin10(C) }. By the Robbins-Sigmund theorem, V (Yn ) converges a.s. to a ﬁnite limit on {T = +∞}, so (Yn ) is bounded since V is a Lyapounov function. K2 On {lim inf n V (Yn ) ≤ 4λmin10(C) }, V (Yn ) does not converge to ∞ and using Theorem 2 of Delyon (1996) [58], we deduce that V (Yn ) converges to a ﬁnite limit, as soon as : ∀k > 0, c2n h(Yn ) + ζn+1 2 1{V (Yn ) 0, cn ζn+1 , ∇V (Zn ) 1{V (Yn )
2 Using relation cn < ∞ and the fact that on {V (Yn ) < k}, h(Yn ) + ζn+1 2 is bounded, we deduce (6.4.8). To prove (6.4.9), it is enough, by Proposition 6.4, to prove that ζn+1 , ∇V (Yn ) 1{V (Yn )
of Proposition 6.4, it is necessary to center en+1 . So we are going to prove that cn Een+1 < ∞

150

CHAPTER 6. APPLICATIONS OF SLLN

and that (en+1 −Een+1 ) is a γ1 -dependent sequence with a dependent coeﬃcient γ1 (r) equals to O(r−2 ). Study of E(en+1 ). First of all, we must note a few elements. Denoting I the unit matrix of Rd , Yn = (I − cn−1 Cn )Yn−1 + cn−1 (xn yn − Cn C −1 U ). Note that λmax (Cn ) = xn 2 ≤ M (λmax (Cn ) =: the largest eigenvalue of Cn ). For n large enough cn−1 M < 1 and (I − cn−1 Cn ) is not singular. So, if M1 = supn {xn yn − Cn C −1 U }, then we obtain Yn−1 ≤

1 (Yn + cn−1 M1 ) ≤ (1 + bcn−1 )(Yn ∧ M1 ) 1 − cn−1 λ

where b ≥ 0 does not depend on n. Moreover V (Yn ) < k =⇒ Yn 2 < and

k , λmin (C)

Yn < k =⇒ V (Yn ) < λmax (C)k 2 .

So that, 1{V (Yn )
"

k λmin (C)

∧ M1 .

And since cn = ng , for any 0 ≤ j ≤ n, (1 + acn−1 )j is bounded independently of n, so is kn−j . And E(en+1 ) = E(xn+1 yn+1 − Cn+1 C −1 U )T CYn 1{V (Yn )
n−1

E(yn+1 xn+1 − Cn+1 C −1 U )T C(Yj+1 − Yj ) 1{Yj
j= n 2

+ +

E(yn+1 xn+1 − Cn+1 C −1 U )T CZ n2 n−1

E(Yj+1 − Yj )T (C − Cn+1 )C(Yj+1 − Yj ) 1{Yj
j= n 2

+

2

n−1 n−2 j= n 2

E(Yj+1 − Yj )T (C − Cn+1 )

i=j+1

×C(Yi+1 − Yi ) 1{Yj
2

n−1

T EYn/2 (C − Cn+1 )C(Yj+1 − Yj ) 1{Yj
j= n 2

+

T EYn/2 (C − Cn+1 )CYn/2 1{Yn/2
6.4. LINEAR REGRESSION

151

Note that if n2 is not an integer, we replace it by technique, we obtain that

n−1 2 .

Using always the same

Een+1 = O(n−2 ) + O(a 2 ) + O(n−2 ) + O(n−2 ) + O(a 2 ), n

n

hence cn Een < ∞. Study of (en − Een ). We now prove that this sequence is γ1 -dependent with a relevant dependent coeﬃcient. Write E(en+r − Een+r |Fn ) = Dn+r + Gn+r − Een+r , with Dn+R

= E[(yn+r xn+r − Cn+1 C −1 U )T CYn+r−1 1{V (Yn+r−1 )
Gn+r

T = E(Yn+r−1 (C − Cn+r )CYn+r−1 1{V (Yn+r−1 )
Dn+r

=

n+r−2

E[((yn+r xn+r − Cn+1 C −1 U )T C(Yj+1 − Yj ) 1{Yj
j=n+ r2

+ E[((yn+r xn+r − Cn+1 C −1 U )T CYn+ r2 1{Yn+ r
r 2

Here again, if is not a integer, we replace it by as for rn give

r−1 2 .

2

Again, the same techniques

EDn+r = O((n + r)−2 ) + O(an+ 2 ) r

We study Gn+r in the same way and EGn+r = O((n + r)−2 ), and since Een+r = O((n + r)−2 ), (6.1.17) is satisﬁed and the result is proved. Proof of (6.4.4) For n > N , denote by ΠN n = (I − cn Cn+1 ) · · · (I − cN CN +1 ). 1 , for N ≥ 1, ΠN is no singular and Since g < 2M n Yn+1 = ΠN n YN +

n

N −1 1 cj ΠN ξj+1 , n (Πj )

j=N 1 where ξj+1 = yj+1 xj+1 − Cj+1 C −1 U . We obtain, since Yn → 0,

−YN =

∞

−1 1 cj (ΠN ξj+1 . j )

j=N −1 = (I − cN CN +1 )−1 · · · (I − cj Cj+1 )−1 and (ΠN j ) −1 ≤ 7j (ΠN j )

1

i=N (1

− ci M )

.

152

CHAPTER 6. APPLICATIONS OF SLLN

Hence

−1 = O exp M (ΠN j )

j i=N

ci

=O

j N

gM ,

(6.4.10)

B

√

∞ g

N N −1 1

√ (Πj ) ξj+1 . N YN =

j j

j=N

1 Since g < 2M , (6.4.10) involves that the sum converges. Indeed (6.1.17) is 1 veriﬁed with k = 5 and p = 3 (so m = 13 ) and since ξj+1 is γ1 -dependent with r a mixingale coeﬃcient γ1 (r) = O(a ). Hence the result is proved.

Chapter 7

Central Limit theorem In this chapter, we give suﬃcient conditions for the central limit theorem in the non causal and causal contexts. In Sections 7.1 and 7.2, we give suﬃcient conditions for κ and λ dependent sequences, for random variables having moments of order 2 + ζ. The proof is based on a decomposition which combines Bernstein blocks with the Lindeberg method. In Section 7.3, we prove a central limit theorem for random ﬁelds, under an exponential decay of the covariance of Lipshitz functions of the variables. The proof is based on Stein’s method, as described in Bolthausen (1982) [24]. In Section 7.4, we focus on the causal case: in Theorem 7.5, we give necessary and suﬃcient conditions for the conditional central limit theorem. This notion is more precise than convergence in distribution and implies the stable convergence in the sense of R´enyi (1963) [156]. In the last Section 7.5, we give some applications of Theorem 7.5: in particular, we ˜ give suﬃcient conditions for the central limit theorem for γ, α ˜ and φ-dependent sequences.

7.1

Non causal case: stationary sequences

In all the section, we shall consider a centered and stationary real-valued sequence (Xn )n∈Z such that μ = E|X0 |m < ∞,

for a real number m = 2 + ζ > 2.

We also deﬁne σ2 =

(7.1.1)

Cov(X0 , Xk ),

k∈Z

whenever it exists. Let Sn = X1 + · · · + Xn . The following results come from Doukhan and Wintenberger (2005) [77]. 153

154

CHAPTER 7. CENTRAL LIMIT THEOREM

Theorem 7.1 (κ-dependence). Assume that the κ-weakly dependent stationary process satisﬁes (7.1.1) and that κ(r) = O(r−κ ) for κ > 2 + 1ζ . Then σ 2 is well deﬁned and n−1/2 Sn converges in distribution to the normal distribution with mean 0 and variance σ 2 . Remark 7.1. Under the more restrictive ζ-dependence condition, Bulinski and Shashkin (2005) [33] obtain the central limit theorem with the sharper assumption ζ(r) = O(r−κ ) for κ > 1 + 1/(m − 2) (beware notations in this remark). The diﬀerence between the two conditions is natural, since it may be proved

for ζ-weakly dependent sequences that ζ(r) ≥ s≥r κ(s). This simple bound, checked from the deﬁnitions, explains the loss in the rate of convergence of κ(r) to 0. The following result relaxes the previous dependence assumptions to the cost of a faster decay for the dependence coeﬃcients. Theorem 7.2 (λ-dependence). Assume that the λ-weakly dependent stationary process satisﬁes (7.1.1) and that λ(r) = O(r−λ ) for λ > 4 + 2ζ . Then the conclusion of Theorem 7.1 holds. As stressed in section 3.1.3 the coeﬃcient λ is very useful to work out the case of Bernoulli shifts with weakly dependent innovations (ξi )i . Theorem 7.3. Denote by λξ (r) the λ-dependence coeﬃcients of the sequence (ξi )i . Assume that H : RZ → R satisﬁes the condition (3.1.11) for some m > 2 such that lm ≤ m − 1 with E|ξ0 |m < ∞, and some sequence bi ≥ 0 such that

i |i|bi < ∞. Then Xn = H(ξn−i , i ∈ Z) satisfy the central limit theorem in the following cases: • Geometric case: br = O e−rb and λξ (r) = O (e−rc ). • Mixed case: br = O e−rb and λξ (r) = O (r−c ) with c > 4 + 2/(m − 2). • Riemanian case: If br = O r−b for some b > 2 and λξ (r) = O (r−c ) with (10 − 4m)b(m − 1) . c> (2 − m)(b − 2)(m − 1 − l) The constants b > 0 and c > 0 obtained are diﬀerent for each case. This theorem is useful to derive the weak invariance principle in many cases (see again Section 3.1.3). We now look with more detail the following example.

7.2. LINDEBERG METHOD

155

∞ Example. Consider the two sided sequence Xt = −∞ ai ξt−i with ARCH(∞) innovations: ⎛ ⎞ ∞ ξt = ξ˜t ⎝a + a j ξt−j ⎠ , j=1

where the process (ξ˜i )i is i.i.d.. Under Riemanian decays (ar = O(r−a ) and a r = O(r−a )), we infer from Theorem 7.3 that the central limit theorem holds as soon as: (10 − 4m)a(m − 1) a > + 1. (2 − m)(a − 2)(m − 1 − l) Remark 7.2. The technique Lindeberg method and we √of the proofs is based on ∗ prove in fact that |E (f (Sn / n) − f (σN ))| ≤ Cn−c for f (x) = eitx , where the constants c∗ , C > 0 depend only on the parameters ζ and κ or λ respectively, and where c∗ < 12 (see Proposition 2 section 7.2.2 for more details). When κ or λ tends to inﬁnity, we have c∗ = ζ/(4 + ζ). For ζ ≥ 2 and κ or λ tends to inﬁnity, we notice that c∗ → 13 . Using a smoothing lemma, this also yields an analogue bound for the uniform distance in the real case (d = 1): 1 sup P √ Sn ≤ t − P (σN ≤ t) ≤ Cn−c . n t∈R A ﬁrst and easy way to control c is to set c = c∗ /4 but the corresponding rate is really a bad one. Petrov (1995) [144] obtains the exponent 12 in the i.i.d. case and Rio (2000) [161] reaches the exponent 13 for strongly mixing sequences. In proposition 3 section 7.2.2, we achieve c = c∗ /3. Analogous bad convergence rates have been settled in the case of weakly dependent random ﬁelds in [63]. The following subsections are devoted to the proofs. We ﬁrst describe in detail the Lindeberg method with Bernstein blocks in Section 7.2 (another version of the Lindeberg method will be presented is the causal framework in Section 7.4.3). The main tools are the controls of the variance of Sn and of Sn 2+δ obtained in Lemma 4.2 and 4.3 of section 4.2. Rates of convergence for the central limit theorem are obtained in 7.2.2.

7.2

Lindeberg method

d Let x1 , . . . , xk be random variables # with values in R (equipped with the Eu2 2 clidean norm (x1 , . . . , xd ) = x1 + · · · + xd ), centered at expectation and such that for some 0 < δ ≤ 1: k i=1

Exi 2+δ ≤ A < ∞

(7.2.1)

156

CHAPTER 7. CENTRAL LIMIT THEOREM

We consider independent random variables y1 , . . . , yk , independent of the variables x1 , . . . , xk and such that yi ∼ Nd (0, Var xi ). Denote by Cb3 the set of bounded functions from Rd to R with bounded and continuous partial derivatives up to order 3. The Lindeberg method relies on the following lemma Lemma 7.1 (Bardet et al., 2006 [10]). For any f ∈ Cb3 , let:

and

Δ = |E (f (x1 + · · · + xk ) − f (y1 + · · · + yk ))| (7.2.2) k j = 1, 2. (7.2.3) Cov fi (j) i (x1 + · · · + xi−1 ), xji , Tj = i=1

fi (t)

= |E f (t + yi+1 + · · · + yk )

Let Cov(f (x), y)

=

Cov(f

(x), y 2 ) =

∂f (x), y , and ∂x =1 2 d d ∂ f Cov (x), yk y , ∂xk ∂x d

Cov

k=1 =1

where by convention the empty sums are equal to 0. The following upper bound holds: 1

δ Δ ≤ |T1 | + |T2 | + 4f

1−δ ∞ f ∞ A. 2 Remark 7.3. If k tends to ∞, we denote the variables by (xi,k )1≤i≤k and we set A = A(k), Tj = Tj (k). Assume that A(k) and Tj (k) tend to 0 as k tends to

∞, and assume moreover that σk2 = ki=1 Ex2i,k converges to σ 2 as k tends to ∞. Then k Sk = xi,k →k→∞ N (0, σ 2 ), in distribution. i=1

Condition A(k) → 0 implies the usual Lindeberg condition, condition σk2 → σ 2 is only the convergence of variances, while the conditions Tj (k) → 0 are the only one related to dependence. Examples. Assume that (Xt )t∈Z is a stationary times series the following examples are widely developed in [10]. • The ﬁrst example of application of such a situation is the Bernstein block method used below for proving the central limit theorem. • Functional estimation also enters this frame. In that case we write xi,k = fk (Xi ) for functions fk which approximate the Dirac distribution, see chapter 11.

7.2. LINDEBERG METHOD

157

• A ﬁnal example is provided by subsampling, in which xi,k = Ximn for 1 ≤ imn ≤ n and 1 mn n; this example ﬁts chapter 13 but we defer the reader to [10] for shortness. Proof of lemma 7.1. We ﬁrst notice that: Δ1 + · · · + Δk ,

Δ ≤ Δi wi

where

|E (fi (wi + xi ) − fi (wi + yi )))| , x1 + · · · + xi−1

= =

(7.2.4) i = 1, . . . , k (7.2.5)

Let x, w ∈ Rd . The Taylor formula writes in the two distinct following ways (for suitable values w1 , w2 ): f (w + x)

= =

1 f (w) + xf (w) + f

(w1 )(x, x) 2 1

1

f (w) + xf (w) + f (w)(x, x) + f

(w2 )(x, x, x) 2 6

here f (j) (x)(y1 , . . . , yj ) stands for the value of the symmetric j-linear form f (j) at (y1 , . . . , yj ). Let f (j) ∞ = supx f (j) (x) with f (j) (x) =

sup y1 ,...,yj ≤1

|f (j) (x)(y1 , . . . , yj )| .

Thus for w, x, y ∈ Rd we may write: 1 f (w + x) − f (w + y) = f (w)(x − y) + f

(w)(x2 − y 2 ) 2 (f

(w1 ) − f

(w))(x, x) − (f

(w1 ) − f

(w))(y, y) + 2 1

2 = (x − y)f (w) + f (w)(x − y 2 ) 2 f

(w2 )(x, x, x) − f

(w2 )(y, y, y) + 6 Thus T

=

|T | ≤ ≤ ≤

1 f (w + x) − f (w + y) − f (w)(x − y) − f

(w)(x2 − y 2 ) satisﬁes 2 2 2 2

3 3 2(x + y )f ∞ ∧ (x + y )f

∞ 3 f

∞ f

∞

2 2 2f ∞ x 1 ∧ x + y |y| 1 ∧ 3f

∞ 3f

∞ 2

1−δ

δ f ∞ f ∞ x2+δ + y2+δ , δ 3

158

CHAPTER 7. CENTRAL LIMIT THEOREM

the last inequality following from the inequality 1 ∧ a ≤ aδ , valid for a ≥ 0 and δ ∈ [0, 1[. Here we set f

(w)(x2 − y 2 ) = f

(w)(x, x)) − f

(w)(y, y) for notational convenience. This relation together with the decomposition (7.2.4) (j) and the upper bound fi ∞ ≤ f (j) ∞ (valid for 1 ≤ i ≤ k, and 0 ≤ j ≤ 3) entails Δ ≤

k 1 2

δ |T1 | + |T2 | + δ f

1−δ f Exi 2+δ + Eyi 2+δ ∞ ∞ 2 3 i=1

≤

1

1 1 + 3 4 − 2

1−δ

δ |T1 | + |T2 | + 2 f ∞ f ∞ A 2 3δ 1

δ |T1 | + |T2 | + 4f

1−δ ∞ f ∞ A 2 δ

≤

1

where we have used the bound Eyi 2+δ ≤ (Eyi 4 )(2+δ)/4 ≤ (3(Exi 2 )2 ) 2 + 4 .

7.2.1

δ

Proof of the main results

In this section we ﬁrst prove theorem 7.1 and 7.2, and then we give rates for this central limit results. Some useful moment inequalities are proved in section 4.1. They are essential in the following proof. Proof of Theorems 7.1 and 7.2. Let S = q = q(n) in such a way that

√1 Sn n

and consider p = p(n) and

1 q(n) p(n) = lim = lim = 0. q(n) n→∞ p(n) n→∞ n n Let k = k(n) = p(n)+q(n) and lim

n→∞

1 Z = √ (U1 + · · · + Uk ) , n

with

Uj =

Xi ,

i∈Bj

where Bj =](p + q)(j − 1), (p + q)(j − 1) + p] ∩ N is a subset of p successive integers from {1, . . . , n} such that, for j = j , Bj and Bj are at least

distant of q = q(n). We note Bj the block between Bj and Bj+1 and Vj = i∈B Xi . Vk j

is the last block of Xi between the end of Bk and n. Let σp2 = Var (U1 )/p, and Y =

V1 + · · · + Vk

√ , n

Vj ∼ N (0, p · σp2 ),

where the Gaussian variables Vj are independent and independent of the sequence (Xn )n∈Z . We ﬁx t ∈ Rd and we deﬁne f : Rd → C with f (x) = eit·x . Then: E f (S) − f (σN ) = E(f (S) − f (Z)) + E(f (Z) − f (Y )) + E(f (Y ) − f (σN )).

7.2. LINDEBERG METHOD

159

The Lindeberg method consists in proving that this expression converges to 0 as n → ∞. The ﬁrst and the last term in this inequality are referred to as auxiliary terms in this Bernstein-Lindeberg method. They come from the replacement of the individual initial - non-Gaussian and Gaussian respectively - random variables. The second term is analogue to that obtained with decoupling and turns the proof of the central limit theorem to the independent case. The third term is referred to as the main term and following the proof under independence it will be bounded above by using a Taylor expansion. Because of the dependence structure, in the corresponding bounds, some additional covariance terms will appear. Auxiliary terms. Using Taylor expansions up to the second order, we bound: |E(f (S) − f (Z))| ≤ |E(f (Y ) − f (σN ))| ≤

f

2∞ E|S − Z|2 2 f

2∞ E|Y − σN |2 2

and,

Firstly: 1 2 E (V1 + · · · + Vk ) n Note that the set under the sums of Xi in V1 + · · · + Vk has cardinality smaller than (k + 1)q + p. Using the bounds (4.2.6) and (4.2.5), we infer that under conditions (4.2.3) and (4.2.2) respectively, E|Z − S|2

(k + 1)q + p . n 8 kp We notice that Y follows the distribution of σp N and then working with n Gaussian random variables: kp E|Y − σN |2 ≤ − 1 σp2 + σp2 − σ 2 . n E|Z − S|2

Since |kp/n − 1| q/p, we need to bound |σp2 − σ 2 | ≤

|i| |E(X0 Xi )| + |E(X0 Xi )|. p

|i|
|i|>p

Let ai = |E(X0 Xi )|. Under conditions (4.2.3) and (4.2.2) respectively, the series

∞ ∞ a converge and s = a converges to 0 as j converges to inﬁnity. i j i i=0 i=j Consequently |σp2 − σ 2 | ≤ 2

p−1 i=0

p−1 i 2 · ai + 2sp ≤ si + 2sp . p p i=0

160

CHAPTER 7. CENTRAL LIMIT THEOREM

Cesaro lemma implies that this term converges to 0. Hence |Ef (S) − f (Z)| + |Ef (Y ) − f (σN )| tends to 0 as n ↑ ∞. To precise the rate of convergence, we now assume that ai = O(i−α ) with α > 1 we see that |σp2 − σ 2 | p1−α . The convergence rate is then given by pq + np + p1−α if E(X0 Xi ) = O(i−α ). Since E(X0 Xi ) = Cov(X0 , Xi ), we then use equations (4.2.5) and (4.2.6) and we ﬁnd α = κ or α = λ(m − 2)/(m − 1) depending of the weak-dependence setting. With p = na , q = nb , those bounds become: nb−a + na−1 + na(1−κ) , in the κ-weak dependence setting, nb−a + na−1 + na(1−λ(m−2)/(m−1)) , under λ-weak dependence. Main terms. It remains to control the expression |T1 | + 12 |T2 | and A in lemma 7.2.2 with now, for 1 ≤ j ≤ k: 1 xj = √ Uj , n

1 yj = √ Vj n

• The terms Tj , are bounded by using the weak-dependence properties. The expressions of this bound are obtained by rewriting |Cov(F (Xm , m ∈ Bi , i < j), G(Xm , m ∈ Bj ))|. Note that F ∞ ≤ 1 and that we can compute a bound for Lip F with

1 F (x1 , . . . , xkp ) = f √n i<j xi (with possible repetitions in the sequence (x1 , . . . , xkp )): ⎛ ⎞ ⎛ ⎞ ⎞ ⎛ 1 1 1 f ⎝ √ ⎠ ⎝ ⎠ ⎠ ⎝ xi − f √ yi ≤ 1 − exp it · √ (yi − xi ) n i<j n i<j n i<j kp t2 √ yi − xi 2 . ≤ n i=1

p √ For√ G(x1 , . . . , xp ) = f ( i=1 xi / n), we have G∞ ≤ 1 and Lip G 1/ n. We then distinguish the two cases, noting that the gap between blocks is at least q. Since the bounds of the diﬀerent covariances do not depend of j, we then obtain the controls: – In the κ dependence setting: 1 |T1 | + |T2 | kp · κ(q) 2

7.2. LINDEBERG METHOD

161

– In the λ dependence setting: # 1 |T1 | + |T2 | kp(1 + k/p) · λ(q). 2

Reminding that p = na , q = nb , κ(r) = O (r−κ ) or λ(r) = O r−λ , the bounds become n1−κb or n1+(1/2−a)+ −λb in respectively κ or λ context. • Using the stationarity of the sequence Xn we obtain: δ δ |A| n−1− 2 E|Sp |2+δ ∨ p1+ 2 . 2+δ We then use the result of lemma 4.3 to bound the moment E|S p | . If −λ 1 2 −κ then κ > 2 + ζ , or λ > 4 + ζ , where κ(r) = O (r ) or λ(r) = O r there exists δ ∈]0, ζ ∧ 1[ and C > 0 such that:

E|Sp |2+δ ≤ Cp1+δ/2 . We then obtain:

A k(p/n)1+δ/2 .

Reminding that p = na , the bound is of order n(a−1)δ/2 in both κ or λ-weak dependence setting.

7.2.2

Rates of convergence

We present two propositions that give rates of convergence in the central limit theorem. Proposition 7.1. Assume that the weakly dependent stationary process (Xn )n satisﬁes (7.1.1) then the diﬀerence between the characteristic functions is bounded by: √ E f (Sn / n) − f (σN ) ≤ Cn−c∗ , where C is some positive constant and c∗ depends of the weakly dependent coefﬁcients as follows: • in the λ-dependence case with λ(r) = O(r−λ ) for λ > 4 + ζ2 , then c∗ = where

A 2λ − 1 , 2 (2 + A)(λ + 1)

# (2λ − 6 − ζ)2 + 4(λζ − 4ζ − 2) + ζ + 6 − 2λ ∧ 1, A= 2

162

CHAPTER 7. CENTRAL LIMIT THEOREM

• in the κ-dependence case with κ(r) = O(r−κ ) for κ > 2 + ζ1 , then c∗ =

(κ − 1)B κ(2 + B)

where B=

# (2κ − 3 − ζ)2 + 4(κζ − 2ζ − 1) + ζ + 3 − 2κ ∧ 1. 2

In the case d = 1, we use Theorem 5.1 of Petrov (1995) [144] to obtain: Proposition 7.2 (A rate in the Berry Essen bound). Assume that the real weakly dependent stationary process (Xn )n satisﬁes the same assumptions than in Proposition 7.1. We obtain: ∗ sup |Fn (x) − Φ(x)| = O n−c /4 . x

where c∗ is deﬁned in Proposition 7.1. Proof of proposition 7.1. In the previous section, we have expressed the rates of the diﬀerent terms. Let us recall these rates: • In the λ-dependence case, we ﬁnally only have to consider the three largest rates: (a−1)δ/2, 1+(1/2−a)+ −λb and b−a. The previous optimal choice of a∗ is smaller than 1/2, then we have to consider the rate 3/2 − a − λb and not 1 − λb. We ﬁnd: 6 5 1 (1 + λ)δ + 3 ∈ 0, a∗ = (2 + δ)(λ + 1) 2 3 ∈]0, a∗ [ b ∗ = a∗ 2(λ + 1) ∗

Finally, we obtain the rate n−c . • In the κ-dependence case: – Auxiliary terms: b − a, a − 1 and a(1 − κ), – Main terms: 1 − κb and (a − 1)δ/2. The idea is to choose carefully a∗ and b∗ ∈]0, 1[ such that the main rates are equal. Because δ < 1, a > b, we directly see that (a − 1)δ/2 > a − 1

7.3. NON CAUSAL RANDOM FIELDS

163

and 1 − κb > a(1 − κ), so that the only rate of the auxiliary term that it remains to consider is b − a. Finally, we obtain a∗

=

b∗

=

2κ − 2 ∈]0, 1[ (2 + δ)κ + δ 2 + 2δ ∈]0, a∗ [ a∗ 2 + δ + δκ 1−

Finally, we obtain the proposed rate. Proof of proposition 7.2. We have seen that for t ﬁx, we control the distance between the characteristic functions of S and σN by a term proportional to ∗ t2 n−c . Here, t2 appear because |t| was included in the constants (not depending of n) of the bound of the Lipschitz coeﬃcients. Let Φ be the distribution function of σN and Fn be the distribution function S. Theorem 5.1 p. 142 in Petrov (1995) [144] gives, for every T > 0: ∗

sup |Fn (x) − Φ(x)| n−c T 3 + x

1 . T

We optimize T to obtain a rate of convergence in the central limit theorem.

7.3

Non causal random ﬁelds

Let (Bn )n∈N be an increasing sequence of ﬁnite subsets of Zd fulﬁlling Zd =

∞ n=1

Bn ,

lim

n→+∞

#∂Bn = 0, #Bn

(7.3.1)

∂Bn = {i ∈ Bn / ∃ j ∈ Bn , d(i, j) = 1} and for i = (i(l))1≤l≤d and j = (j(l))1≤l≤d in Zd , d(i, j) = max1≤l≤d |i(l) − j(l)|. Let (Yi )i∈Zd be a real valued random ﬁelds. Suppose that (Yi )i∈Zd satisﬁes the following two assumptions: A) A covariance inequality. Recall that for a real valued function h deﬁned on Rn |h(x) − h(y)| . Lip (h) = sup n x =y i=1 |xi − yi | Let R1 and R2 be two disjoints and ﬁnite subsets of Zd , and let f and g be two real valued functions deﬁned respectively on R#R1 and R#R2 such that

164

CHAPTER 7. CENTRAL LIMIT THEOREM

Lip (f ) < ∞ and Lip (g) < ∞. We suppose that for any positive real number δ, there exists a constant Cδ (not depending on f g, R1 and R2 ) such that |Cov (f (Yi , i ∈ R1 ), g(Yi , i ∈ R2 )| ≤ Cδ Lip (f )Lip (g) (#R1 + #R2 ) exp (−δd(R1 , R2 )) ,

(7.3.2)

where d(R1 , R2 ) = mini∈R1 , j∈R2 d(i, j). We refer the reader to the book of Liggett (1985) [122] for some interacting particle models fulﬁlling such a covariance inequality. B) Weak stationarity. Suppose that for any i, j ∈ Zd Cov(Yi , Yj ) = Cov(Y0 , Yj−i ).

(7.3.3)

Theorem 7.4 gives a central limit theorem for the random ﬁelds (Yi )i∈Zd . Theorem 7.4. Let (Bn )n∈N be an increasing sequence of ﬁnite subsets of Zd fulﬁlling (7.3.1). Let (Yi )i∈Zd be a real valued random ﬁeld, satisfying (7.3.2) d and (7.3.3). Suppose that, Yi ∞ < M . Let

for any i ∈ Z , EYi = 0 and supi∈Zd−1/2 Sn = i∈Bn Yi . Then k∈Zd |Cov(Y0 , Yk )| < ∞ and (#Bn ) Sn converges

2 in distribution to a centered normal law with variance σ = k∈Zd Cov(Y0 , Yk ). Proof of Theorem 7.4. The proof of Theorem 7.4 follows from Proposition 7.3 and Proposition 7.4 below. Proposition 7.3. Let (Yi )i∈Zd be a real valued random ﬁeld such that EYi = 0 and EYi2 < ∞ for any i ∈ Zd . Suppose that, for any i, j ∈ Zd , (7.3.3) holds and that, for any positive real number δ, there exists a positive constant Cδ such that |Cov(Yi , Yj )| ≤ Cδ e−δd(i,j) .

(7.3.4)

d Let (B n )n be a sequence of ﬁnite and increasing sets of Z fulﬁlling (7.3.1). Let Sn = i∈Bn Yi . Then

k∈Zd

|Cov(Y0 , Yk )| < ∞

and

1 Var Sn = Cov(Y0 , Yk ). n→+∞ #Bn d lim

k∈Z

Proof of Proposition 7.3. The ﬁrst conclusion of Proposition 7.3 follows from

7.3. NON CAUSAL RANDOM FIELDS

165

the bound (7.3.4), together with the following elementary calculations

|Cov(Y0 , Yk )|

≤ C

k∈Zd

exp(−δd(0, k))

k∈Zd ∞

≤ C ≤ C ≤ C

k∈Zd r=0 ∞

exp(−δd(0, k))1r≤d(0,k)
exp(−δr)

r=0 ∞

1d(0,k)
k∈Zd

exp(−δr)#{k ∈ Zd / d(0, k) < r + 1}

r=0

≤ C

∞

exp(−δr)rd

(7.3.5)

r=0

where C is a positive constant depending only on δ and d. We now prove the second part of Proposition 7.3. Thanks to (7.3.1), we can ﬁnd a sequence u = (un ) of positive real numbers such that lim un = +∞,

n→∞

#∂Bn exp(un ) = 0. n→+∞ #Bn lim

(7.3.6)

Let (∂u Bn )n be the sequence of subsets of Zd deﬁned by ∂u Bn = {s ∈ Bn d(s, ∂Bn ) < un }. The following bound #∂u Bn ≤ Cd (#∂Bn )udn , together with the suitable choice of the sequence (un ) and the limit (7.3.1) ensures #∂u Bn lim = 0, (7.3.7) n→∞ #Bn we shall use this fact below without further comments. Let Bnu = Bn \ ∂u Bn . We decompose the quantity Var Sn as in Newman (1980) [135]: 1 Var Sn #Bn

=

1 Cov (Yi , Yj ) = T1,n + T2,n + T3,n , #Bn i∈Bn j∈Bn

166

CHAPTER 7. CENTRAL LIMIT THEOREM

where T1,n

=

1 #Bn u

1 #Bn u

Cov (Yi , Yj ) ,

i∈Bn j∈Bn /d(i,j)≥un

T2,n

=

Cov (Yi , Yj ) ,

i∈Bn j∈Bn :d(i,j)
T3,n

=

1 #Bn

Cov (Yi , Yj ) .

i∈∂u Bn j∈Bn

Control of T1,n . We have, since |Bnu | ≤ |Bn | and applying (7.3.4) |T1,n | ≤ sup |Cov(Yi , Yj )| i∈Zd j∈Zd :d(i,j)≥u n

≤

Cδ sup i∈Zd

e−δd(i,j) .

(7.3.8)

{j∈Zd /d(i,j)≥un }

For any ﬁxed i ∈ Zd , we argue as for (7.3.5) and we obtain

e−δd(i,j)

≤ C

∞

e−δr rd

r=[un ]

{j∈Zd /d(i,j)≥un }

≤ Ce−δ[un ] udn .

(7.3.9)

We obtain, collecting (7.3.8), (7.3.9) together with the ﬁrst limit in (7.3.6): lim T1,n = 0.

(7.3.10)

n→+∞

Control of T3,n . We obtain using (7.3.3) and (7.3.4) : |T3,n |

≤

#∂u Bn |Cov(Y0 , Yk )| . #Bn d

(7.3.11)

k∈Z

The last bound, together with the limit (7.3.7) and the ﬁrst conclusion of Proposition 7.3, gives lim T3,n = 0. (7.3.12) n→∞

Control of T2,n . We deduce using the following implication, if i ∈ Bnu and j is not belonging to Bn then d(i, j) ≥ un , that T2,n

=

1 #Bn u

i∈Bn {j∈Zd /d(i,j)
Cov(Yi , Yj )

7.3. NON CAUSAL RANDOM FIELDS Equality (7.3.3) ensures

Cov(Yi , Yj ) =

{j∈Zd /d(i,j)
167

Cov (Y0 , Yk ) .

{k∈Zd /d(0,k)
Hence T2,n

=

#Bnu #Bn

Cov(Y0 , Yk ).

{k∈Zd /d(0,k)
The last equality together with the ﬁrst limit in (7.3.6) and (7.3.7), implies that Cov(Y0 , Yk ). (7.3.13) lim T2,n = n→∞

k∈Zd

The second conclusion of Proposition 7.3 is proved by collecting the limits (7.3.10), (7.3.12) and (7.3.13). Proposition 7.4. Let (Bn )n∈N be an increasing sequence of ﬁnite subsets of Zd such that #Bn tends to inﬁnity as n goes to inﬁnity. Let (Yi )i∈Zd be a real valued random ﬁeld satisfying (7.3.2). Suppose that EYi = 0 for any i ∈ Zd ,

and that supi∈Zd Yi ∞ < M . Let Sn = i∈Bn Yi . If there exists a ﬁnite real number σ 2 such that Var Sn lim = σ2 , (7.3.14) n→∞ #Bn then (#Bn )−1/2 Sn converges in distribution to a centered normal law with variance σ 2 . Proof of Proposition 7.4. We need the following notation. Let (mn ) be a sequence of positive integers to be ﬁxed later. We suppose for the moment that limn→∞ mn = ∞. For any i ∈ Zd , we deﬁne a neighborhood Vi of i in Bn as Vi = B(i, mn ) ∩ Bn ,

(7.3.15)

where B(i, mn ) = {j ∈ Zd / d(i, j) < mn }. Let Vic denote the complementary of Vi in Bn i.e. Vic = Bn \ Vi . For I a subset of Bn , we denote by Yi , S(I c ) = Yi . S(I) = i∈I

i∈Bn \I

Hence S(Bn ) = i∈Bn Yi =: Sn for the sake of brevity. Finally we denote by F (b2 , b3 ) the set of the real valued function h deﬁned on R, three times diﬀerentiable, such that h(0) = 0, b2 := h

∞ < +∞ and that b3 := h(3) ∞ < +∞. We also need the following proposition.

168

CHAPTER 7. CENTRAL LIMIT THEOREM

Proposition 7.5. Let h be a ﬁxed function of the set F (b2 , b3 ). Let (Bn )n∈N be a sequence of ﬁnite subsets of Zd . For any i ∈ Bn , let Vi be the set as deﬁned random ﬁeld. Suppose that EYi = 0 by (7.3.15). Let (Yi )i∈Zd be a real valued and EYi2 < +∞ for any i ∈ Zd . Let Sn = i∈Bn Yi . Then 1

E(h(Sn )) − Var Sn tE(h (tSn ))dt 0 1 ≤ |Cov (Yi , h (tSn (Vic )))| dt + 2 E|Yi ||S(Vi )| (b2 ∧ b3 |S(Vi )|) 0 i∈B n

i∈Bn

(Yi S(Vi ) − E(Yi S(Vi ))) + b2 |Cov(Yi , S(Vic ))| . +b2 E i∈Bn

(7.3.16)

i∈Bn

Remark 7.4. For a random ﬁeld (Yi )i∈Zd of independent random variables such that supi∈Zd EYi4 < ∞, Proposition 7.5 applied with Vi = {i} implies that E(h(Sn )) − Var Sn

1

0

tE(h (tSn ))dt ≤ # 2 E|Yi |2 (b2 ∧ b3 |Yi |) + b2 |Bn | sup Yi2 2 .

i∈Zd

i∈Bn

Proof of Proposition 7.5. We have, h(Sn ) =

Sn

1

h (tSn )dt =

0

1

= 0

Yi h

1

0

Yi h (tSn ) dt

i∈Bn

dt

Yi (h (tSn ) − h

i∈Bn

Yi S(Vi )

i∈Bn

1

0

E (Yi S(Vi )) E (Yi Sn )

0

(tS(Vic ))

th

(tSn )dt −

i∈Bn

+

0

(tS(Vic ))

i∈Bn

+

1

i∈Bn

+ +

1

0

− tS(Vi )h (tSn )) dt

E (Yi S(Vi ))

i∈Bn 1

th

(tSn )dt −

th

(tSn )dt.

i∈Bn

1 0

E (Yi Sn )

0

th

(tSn )dt 1

th

(tSn )dt (7.3.17)

We take the expectation in the equality (7.3.17). The obtained formula, together

7.3. NON CAUSAL RANDOM FIELDS

169

with the following estimations, proves Proposition 7.5. |h (tSn ) − h (tS(Vic )) − tS(Vi )h

(tSn )| ≤ |h (tSn ) − h (tS(Vic )) − tS(Vi )h

(tS(Vic ))| + |S(Vi )||h

(tSn ) − h

(tS(Vic ))| ≤ 2|S(Vi )| (b2 ∧ b3 |S(Vi )|) . The purpose now is to control the right hand side of the bound (7.3.16) for a random ﬁeld (Yi )i∈Zd fulﬁlling the covariance inequality (7.3.2) and the requirements of Proposition 7.4. Corollary 7.1. Let h be a ﬁxed function of the set F (b2 , b3 ). Let (Bn )n∈N be a sequence of ﬁnite subsets of Zd . For any i ∈ Bn , let Vi be the set as deﬁned by (7.3.15). Let (Yi )i∈Zd be a real valued random ﬁeld, fulﬁlling the covariance d inequality (7.3.2).

Suppose that, for any i ∈ Z , EYi = 0 and supi∈Zd Yi ∞ < M . Let Sn = i∈Bn Yi . Then, for any positive real number δ, there exists a positive constant C(δ, M, d) independent of n, such that 1

E(h(Sn )) − Var Sn tE(h (tS ))dt n h∈F (b2 ,b3 ) 0 ≤ C(δ, M, d) b2 (#Bn )2 e−δmn + b3 mdn #Bn sup

+ b2 mdn

∞ 1/2 3m 1/2 n # # . #Bn m k d e−δ(k−2mn ) + b2 #Bn mdn e−δk k d k=3mn

k=1

Proof of Corollary 7.1. From now, we denote by C a positive constant that may be diﬀerent from line to line, independent of n and depending, eventually, on M , δ and d. We have Vic = {j ∈ Zd / d(i, j) ≥ mn } ∩ Bn . Hence d({i}, Vic ) ≥ mn . The last bound together with (7.3.2), proves that c |Cov (Yi , h (tSn (Vic )))| ≤ Cb2 (#Vic + 1)e−δd({i},Vi ) i∈Bn

i∈Bn

≤ In the same way, we prove that b2 |Cov(Yi , S(Vic ))| i∈Bn

Cb2 (#Bn )2 e−δmn .

≤

Cb2 (#Bn )2 e−δmn .

(7.3.18)

(7.3.19)

170

CHAPTER 7. CENTRAL LIMIT THEOREM

Since #Vi ≤ #B(0, mn ) ≤ Cmdn and supj∈Zd k∈Zd |Cov(Yj , Yk )| < ∞, we deduce that E|Yi ||S(Vi )| (b2 ∧ b3 |S(Vi )|) ≤ b3 M #Bn sup E|S(Vi )|2 (7.3.20) i∈Zd

i∈Bn

≤ Cb3 #Bn mdn

|Cov(Y0 , Yk )|.

k∈Zd

It remains to control

E (Yi S(Vi ) − E(Yi S(Vi ))) . i∈Bn

For this, we argue as in Bolthausen (1982) [24]. We have 2 E (Yi S(Vi ) − E(Yi S(Vi ))) = Var ( Yi S(Vi )) i∈Bn i∈Bn = Cov(Yi S(Vi ), Yj S(Vj )). i∈Bn j∈Bn

Hence

2 E (Yi S(Vi ) − E(Yi S(Vi ))) i∈Bn ≤

|Cov(Yi Yi , Yj Yj )| .

(7.3.21)

i∈Bn i ∈B(i,mn ) j∈Bn j ∈B(j,mn )

Next, we have |Cov(Yi Yi , Yj Yj )| (7.3.22) ≤ |Cov(Yi Yi , Yj Yj )| 1d(i,j)≥3mn + |Cov(Yi Yi , Yj Yj )| 1d(i,j)≤3mn . We begin with the ﬁrst term. The covariance inequality (7.3.2) together with some elementary estimations, imply that |Cov(Yi Yi , Yj Yj )| 1d(i,j)≥3mn

≤ ≤ ≤

∞

|Cov(Yi Yi , Yj Yj )| 1k≤d(i,j)≤k+1

k=3mn ∞

C

C

k=3mn ∞ k=3mn

e−δd({i,i },{j,j

})

1k≤d(i,j)≤k+1

e−δ(k−2mn ) 1d(i,j)≤k+1 .

7.3. NON CAUSAL RANDOM FIELDS

171

To obtain the last bound, note that for any i ∈ B(i, mn ) and j ∈ B(j, mn ), we have d({i, i }, {j, j }) + 2mn ≥ d({i, i }, {j, j }) + d(i, i ) + d(j, j ) ≥ d(i, j). Hence,

|Cov(Yi Yi , Yj Yj )| 1d(i,j)≥3mn

i∈Bn i ∈B(i,mn ) j∈Bn j ∈B(j,mn )

≤ Cm2d n ≤

∞

e−δ(k−2mn ) 1j∈B(i,k+1)

k=3mn i∈Bn j∈Bn ∞ C#Bn m2d k d e−δ(k−2mn ) . n k=3mn

(7.3.23)

We control the second term in (7.3.22). Inequality (7.3.2) and the fact that d({i}, {i , j, j }) ≤ d({i}, {i }), imply that |Cov(Yi Yi , Yj Yj )| 1d(i,j)≤3mn ≤ |Cov(Yi , Yi Yj Yj )| 1d(i,j)≤3mn + |Cov(Yi , Yi )| |Cov(Yj , Yj )| 1d(i,j)≤3mn

≤ Ce−δd({i},{i ,j,j

})

1d(i,j)≤3mn .

Using the last bound, we infer that |Cov(Yi Yi , Yj Yj )| 1d(i,j)≤3mn ≤

3m n

|Cov(Yi Yi , Yj Yj )| 1d(i,j)≤3mn 1k−1≤d({i},{i ,j,j })≤k

k=1

≤C

3m n

e−δ(k−1) 1d(i,j)≤3mn 1d({i},{i ,j,j })≤k .

(7.3.24)

k=1

Next, we have 1d({i},{i ,j,j })≤k ≤ 1d({i},{i })≤k + 1d({i},{j})≤k + 1d({i},{j })≤k , and consequently

d 1d(i,j)≤3mn 1d({i},{i ,j,j })≤k ≤ C#Bn m2d n k .

i∈Bn i ∈B(i,mn ) j∈Bn j ∈B(j,mn )

(7.3.25)

172

CHAPTER 7. CENTRAL LIMIT THEOREM

Combining (7.3.24) and (7.3.25), we obtain that |Cov(Yi Yi , Yj Yj )| 1d(i,j)≤3mn i∈Bn i ∈B(i,mn ) j∈Bn j ∈B(j,mn )

≤

C#Bn m2d n

3m n

e−δk k d . (7.3.26)

k=1

We collect the bounds (7.3.21), (7.3.23) and (7.3.26) and we obtain, ⎧ ∞ 1/2 ⎨ # E (Yi S(Vi ) − E(Yi S(Vi ))) ≤ C #Bn mdn k d e−δ(k−2mn ) ⎩ i∈Bn k=3mn 3m 1/2 ⎫ ⎬ n + mdn e−δk k d . (7.3.27) ⎭ k=1

Finally, the bounds (7.3.18), (7.3.19), (7.3.20), (7.3.27), together with Proposition 7.5 prove Corollary 7.1. End of the proof of Proposition 7.4. We apply √Corollary 7.1 to the real and imaginary parts of the function x → exp(iux/ #Bn ) − 1. Those functions belong to the set F (b2 , b3 ), with 2 3 u |u| √ √ and b3 = . b2 = #Bn #Bn We √ obtain, noting by φn the characteristic function of the normalized sum Sn / #Bn , 1 md φn (u) − 1 + Var Sn u2 ≤ C(δ, M, d, u) #Bn e−δmn + √ n tφ (tu)dt n |Bn | #Bn 0 ∞ 1/2 3m 1/2 ⎫ ⎬ n md md +√ n k d e−δ(k−2mn ) +√ n e−δk k d ⎭ #Bn k=3m #Bn k=1 n $ 9 3d/2 md mn ≤ C(δ, M, d, u) #Bn e−δmn + √ n + √ e−δmn /2 . #Bn #Bn For a suitable choice of the sequence mn (for example we can take mn = 2 δ log #Bn ), the right hand side of the last bound tends to 0 an n goes to inﬁnity: Var Sn 2 1 lim φn (u) − 1 + u tφn (tu)dt = 0. (7.3.28) n→∞ #Bn 0

In order to ﬁnish the proof of Proposition 7.4, we need the following lemma.

7.4. CONDITIONAL CENTRAL LIMIT THEOREM (CAUSAL)

173

Lemma 7.2. Let σ 2 be a positive real number. Let (Xn ) be a sequence of real valued random variables such that supn∈N EXn2 < +∞. Let φn be the characteristic function of Xn . Suppose that for any u ∈ R, u 2 (7.3.29) lim φn (u) − 1 + σ tφn (t)dt = 0. n→∞

0

Then, for any u ∈ R, lim φn (u) = e−

n→∞

u2 σ2 2

.

Proof. Lemma 7.2 is a variant of Lemma 2 in Bolthausen (1982) [24], which is an adaptation of Stein’s method. Markov inequality implies that the sequence (μn )n∈N of the laws of (Xn ) is tight with the condition supn∈N EXn2 < ∞. Theorem 25.10 in Billingsley (1995) [21] proves the existence of a subsequence μnk and a probability measure μ such that μnk converges weakly to μ as k tends to inﬁnity. Let φ denote the characteristic function of μ. We deduce from (7.3.29) that, for any u ∈ R, u 2 φ(u) − 1 + σ tφ(t)dt = 0, 0

or equivalently, for any u ∈ R, φ (u) + σ 2 uφ(u) = 0. We obtain integrating the last equation φ(u) = exp(−

σ 2 u2 ), 2

for any u ∈ R. The proof of Lemma 7.2 is complete by the use of Theorem 25.10 in Billingsley (1995) [21] and its corollary. The proof of Proposition 7.4 follows by (7.3.14), (7.3.28) and Lemma 7.2.

7.4

Conditional central limit theorem (causal)

Let Sn be the partial sums of a triangular array with stationary rows. In this section, we give necessary and suﬃcient conditions for Sn to satisﬁes the conditional central limit theorem. These conditions imply the weak convergence of n−1/2 Sn to a mixture of normal distribution, but they also imply the stable convergence in the sense of R´enyi (1963) [156] (see section 7.5.1). The main result of this section (Theorem 7.5) is due to Dedecker and Merlev`ede (2002) [44].

174

CHAPTER 7. CENTRAL LIMIT THEOREM

Definition 7.1. Let (Ω, A, P) be a probability space, and T : Ω → Ω be a bijective bimeasurable transformation preserving the probability P. An element A is said to be invariant if T (A) = A. We denote by I the σ-algebra of all invariant sets. The probability P is ergodic if each element of I has measure 0 or 1. Finally, let H be the space of continuous real functions ϕ such that x → |(1 + x2 )−1 ϕ(x)| is bounded. Theorem 7.5. For each positive integer n, let M0,n be a σ-algebra of A satisfying M0,n ⊆ T −1 (M0,n ). Deﬁne the .nondecreasing ﬁltration (Mi,n )i∈Z ∞ :∞ by Mi,n = T −i (M0,n ) and Mi,inf = σ ( n=1 k=n Mi,k ). Let X0,n be a M0,n -measurable and square integrable random variable and deﬁne the sequence (Xi,n )i∈Z by Xi,n = X0,n ◦ T i . Finally, for any t in [0, 1], write Sn (t) = X1,n + · · · + X[nt],n . Suppose that n−1/2 X0,n converges in probability to zero as n tends inﬁnity. The following statements are equivalent: S1 There exists a nonnegative M0,inf -measurable random variable η such that, for any ϕ in H, any t in [0, 1] and any positive integer k,

√ −1/2

S1(ϕ) : lim E ϕ(n Sn (t)) − ϕ(x tη)g(x)dx Mk,n

=0 n→∞

1

where g is the distribution of a standard normal. S2 (a)

lim lim sup E

t→0 n→∞

Sn2 (t) nt

|Sn (t)| = 0. 1∧ √ n

1 (b) lim lim sup √ E(Sn (t)|M0,n )1 = 0. t→0 n→∞ t n (c) There exists a nonnegative M0,inf -measurable random variable η such that,

S 2 (t)

− η M0,n = 0 . lim lim sup E n t→0 n→∞ nt 1 Moreover the random variable η satisﬁes η = η ◦ T almost surely. We now give the proof of this theorem. The fact that S1 implies S2 is obvious. In the next sections, we focus on the consequences of condition S2. We start with some preliminary results.

7.4.1

Deﬁnitions and preliminary lemmas

Definition 7.2. Let μ be a signed measure on a metric space (S, B(S)). Denote by |μ| the total variation measure of μ, and by μ = |μ|(S) its norm. We say that a family Π of signed measures on (S, B(S)) is tight if for every positive

7.4. CONDITIONAL CENTRAL LIMIT THEOREM (CAUSAL)

175

there exists a compact set K such that |μ|(K c ) < for any μ in Π. Denote by C(S) the set of continuous and bounded functions from S to R. We say that a sequence of signed measures (μn )n>0 converges weakly to a signed measure μ if for any ϕ in C(S), μn (ϕ) tends to μ(ϕ) as n tends to inﬁnity. Lemma 7.3. Let (μn )n>0 be a sequence of signed measure on (Rd , B(Rd )), and set μ ˆn (t) = μn (exp(i < t, · >)). Assume that the sequence (μn )n>0 is tight and that supn>0 μn < ∞. The following statements are equivalent 1. the sequence (μn )n>0 converges weakly to the null measure. 2. for any t in Rd , μ ˆn (t) tends to zero as n tends to inﬁnity. Proof of Lemma 7.3. 1 ⇒ 2 is obvious. It remains to prove that 2 ⇒ 1. We proceed in 3 steps. Step 1. Let D(Rd ) be the space of functions from Rd to C which are inﬁnitely derivable with compact support. Let ϕ be any element of D(Rd ) and set ϕ(t) ¯ = ϕ(−t). ˆ From Plancherel equality, we have μn (ϕ) = (2π)−d μ ˆn (ϕ). ¯ The function μn | ϕ¯ being inﬁnitely derivable and fast decreasing, it belongs to L1 (λ). Since |ˆ converges to zero everywhere and is bounded by supn>0 μn , the dominated convergence theorem implies that μ ˆn (ϕ) ¯ tends to zero as n tends to inﬁnity. Consequently, for any ϕ in D(Rd ), μn (ϕ) converges to zero as n tends to inﬁnity. Step 2. Let ϕ be any function from Rd to R, continuous and with compact support. For any positive , there exists ϕ in D(Rd ) such that ϕ − ϕ ∞ ≤ . Since furthermore supn>0 μn is ﬁnite, we infer from Step 1 that μn (ϕ) tends to zero as n tends to inﬁnity. Step 3. For any positive integer k, let fk be a positive and continuous function from Rd to R satisfying: fk ∞ ≤ 1, f (x) = 1 for any x in [−k, k]d , f (x) = 0 c for any x in [−k − 1, k + 1]d . For any continuous bounded function ϕ, write |μn (ϕ)| ≤ |μn (ϕfk )| + ϕ∞ |μn |(([−k, k]d )c ) . From Step 2 the ﬁrst term on right hand tends to zero as n tends to inﬁnity. Since the sequence (μn )n>0 is tight, the second term on right hand is as small as we wish by choosing k large enough. This completes the proof of 1. Definition 7.3. Deﬁne the set R(Mk,n ) of Rademacher Mk,n -measurable random variables: R(Mk,n ) = {21A − 1 / A ∈ Mk,n }. Recall that g is the N (0, 1)distribution. For the random variable η introduced in Theorem 7.5 and any bounded random variable Z, let 1. νn [Z] be the image measure of Z.P by the variable n−1/2 Sn (t).

176

CHAPTER 7. CENTRAL LIMIT THEOREM

2. ν[Z] be the image measure # of g.λ ⊗ Z.P by the variable φ from R ⊗ Ω to R deﬁned by φ(x, ω) = x tη(ω). Lemma 7.4. Let μn [Zn ] = νn [Zn ] − ν[Zn ]. For any ϕ in H, the statement S1(ϕ) is equivalent to: S3(ϕ): for any Zn in R(Mk,n ) the sequence μn [Zn ](ϕ) tends to zero as n tends to inﬁnity. Proof of Lemma 7.4. For Zn in R(Mk,n ) and ϕ in H, we have √ |μn [Zn ](ϕ)| = E Zn ϕ(n−1/2 Sn (t)) − ϕ(x tη)g(x)dx

√ −1/2

S (t)) − ϕ(x tη)g(x)dx E ϕ(n ≤

M n k,n .

1

Consequently S1(ϕ) implies S3(ϕ). Now to prove that S3(ϕ) implies S1(ϕ), choose √ −1/2 A(n, ϕ) = E ϕ(n Sn (t)) − ϕ(x tη)g(x)dx Mk,n ≥ 0 , and Znϕ = 21A(n,ϕ) − 1. Obviously

√ ϕ −1/2

μn [Zn ](ϕ) = E ϕ(n Sn (t)) − ϕ(x tη)g(x)dx Mk,n

, 1

and S3(ϕ) implies S1(ϕ).

7.4.2

Invariance of the conditional variance

We ﬁrst prove that if S2 holds, the random variables η satisﬁes η = η ◦ T almost surely (or equivalently that η is measurable with respect to the P-completion of I). From S2(c) and both the facts that (Xi,n )i∈Z is strictly stationary and M0,n ⊆ M1,n , we have for any t in ]0, 1],

Sn2 (t) ◦ T

lim E η ◦ T − (7.4.1) M0,n

= 0. n→∞ nt 1 On the other hand, deﬁning ψ(x) = x2 (1 − (1 ∧ |x|)) and using the fact that T preserves P, we have

2

2

Sn (t) Sn2 (t) ◦ T

Sn (t) |Sn (t)|

≤ 2 nt 1 ∧ √n

nt − nt 1 1

S (t) ◦ T

(t) 1

S n n

. (7.4.2) √ −ψ + ψ √

t n n 1

7.4. CONDITIONAL CENTRAL LIMIT THEOREM (CAUSAL)

177

To control the second term on right hand, note that the function ψ is 3-lipschitz and bounded by 1. It follows that for each positive ,

S (t) ◦ T

Sn (t) √ n

≤ 3 + 2P(|X0,n − X[nt],n | > n ) .

ψ √ √ −ψ

n n 1 Using that n−1/2 X0,n converges in probability to 0, we derive that

S (t) ◦ T

Sn (t)

n

= 0, √ −ψ lim ψ √

n→∞ n n 1 and the second term on right hand in (7.4.2) tends to 0 as n tends to inﬁnity. This fact together with inequality (7.4.2) and Condition S2(a) yield

2

Sn (t) Sn2 (t) ◦ T

= 0,

− lim lim sup

t→0 n→∞ nt nt 1 which together with S2(c) imply that

Sn2 (t) ◦ T

lim lim sup E η − M0,n

= 0. t→0 n→∞ nt 1

(7.4.3)

Combining (7.4.1) and (7.4.3), it follows that lim E(η − η ◦ T |M0,n )1 = 0, n→∞ which implies that C

lim E η − η ◦ T M0,k = 0 . n→∞

k≥n

1

Applying the martingale convergence theorem, we obtain C

lim E η − η ◦ T M0,k = E(η − η ◦ T |M0,inf )1 = 0 . n→∞

k≥n

1

(7.4.4)

According to S2(c), the random variable η is M0,inf -measurable. Therefore, (7.4.4) implies that E(η ◦ T |M0,inf ) = η. The fact that η ◦ T = η almost surely is a direct consequence of the following elementary result. Lemma 7.5. Let (Ω, A, P) be a probability space, X an integrable random variable, and M a σ-algebra of A. If the random variable E(X|M) has the same law as X, then E(X|M) = X almost surely. Proof of Lemma 7.5. For any real number m, consider A1 = {X ≤ m}, A2 = {E(X|M) ≤ m}, C1 = A1 ∩ Ac2 and C2 = A2 ∩ Ac1 . Since by assumption the random variables X and E(X|M) are identically distributed, it follows that P(A1 ) = P(A2 ), P(C1 ) = P(C2 ) and E(X1A1 ) = E(X1A2 ). This implies in

178

CHAPTER 7. CENTRAL LIMIT THEOREM

particular that E((X − m)1C1 ) = E((X − m)1C2 ). These terms having opposite signs, they are zero. Since X − m is positive on C2 , it follows that C2 and consequently A1 ΔA2 have probability zero (Δ denoting the symmetric diﬀerence). Now, it is easily seen that E((E(X|M))2 1A1 ) = E((E(X|M))2 1A2 ) = E(X 2 1A1 ) and 2

E(XE(X|M)1A1 ) = E(XE(X|M)1A2 ) = E((E(X|M)) 1A2 )

(7.4.5) (7.4.6)

According to (7.4.5) and (7.4.6), we obtain (X − E(X|M))1A1 22 = X1A1 22 + E(X|M)1A1 22 − 2E(XE(X|M)1A1 ) = 0

(7.4.7)

Since (7.4.7) is true for any real m, it follows that X = E(X|M) almost surely, and Lemma 7.5 is proved. .

7.4.3

End of the proof

First, note that we can restrict ourselves to bounded functions of H: if S2 implies S1(h) for any continuous and bounded function h then we easily infer from S2(c) that n−1 Sn2 (t) is uniformly integrable for any t in [0, 1], which implies that S1 extends to the whole space H. Definition 7.4. Let B13 (R) be the class of three-times continuously diﬀerentiable functions from R to R such that max(h(i) ∞ , i ∈ {0, 1, 2, 3}) ≤ 1. Suppose now that S1(h) holds for any h in B13 (R). Applying Lemma 7.4, this is equivalent to say that S3(h) holds for any h in B13 (R), which obviously implies that S3(h) holds for ht = eit . Using that the probability νn [1] is tight (since it converges weakly to ν[1]) and that |μn [Zn ]| ≤ νn [1] + ν[1], we infer that μn [Zn ] is tight, and Lemma 7.3 implies that S3(h) (and therefore S1(h)) holds for any continuous bounded function h. On the other hand, from the asymptotic negligibility of n−1/2 X0,n we infer that, for any positive integer k, n−1/2 (Sn (t) − Sn (t) ◦ T k ) converges in probability to zero. Consequently, since any function h belonging to B13 (R) is 1-Lipschitz and bounded, we have

lim h(n−1/2 Sn (t)) − h(n−1/2 Sn (t) ◦ T k ) = 0 , n→∞

1

and S1(h) is equivalent to

√ −1/2 k

lim E h(n Sn (t) ◦ T ) − h(x tη)g(x)dx Mk,n

= 0. n→∞

1

Now, since both η and P are invariant by T , we infer that Theorem 7.5 is a straightforward consequence of Proposition 7.6 below:

7.4. CONDITIONAL CENTRAL LIMIT THEOREM (CAUSAL) Proposition 7.6. Let Xi,n and Mi,n be deﬁned as holds, then, for any h in B13 (R) and any t in [0, 1],

√ −1/2 E h(n S (t)) − h(x tη)g(x)dx lim

n

n→∞

179

in Theorem 7.5. If S2

M0,n

=0 1

where g is the distribution of a standard normal. Proof of Proposition 7.6. We prove the result for Sn (1), the proof of the general case being unchanged. Without loss of generality, suppose that there exists 1)-distributed and independent random variables, a sequence (εi )i∈Z of N (0, . independent of M∞,∞ = σ( k,n Mk,n ). Definition 7.5. Let i, p and n be three integers such that 1 ≤ i ≤ p ≤ n. Set q = [n/p] and deﬁne Ui,n

=

Xiq−q+1,n + · · · + Xiq,n ,

Vi,n

=

Δi

=

εiq−q+1 + · · · + εiq ,

Γi

=

1 √ (U1,n + U2,n + · · · + Ui,n ) 8n η (Δi + Δi+1 + · · · + Δp ) . n

Definition 7.6. Let g be any function from R to R. For k and l in [1, p] and any positive integer n ≥ p, set gk,l;n = g(Vk,n + Γl ), with the conventions gk,p+1;n = g(Vk,n ) and g0,l;n = g(Γl ). Afterwards, we shall apply this notation to the successive derivatives of the function h. For brevity we shall omit the index n. √ Let sn = η(ε1 + · · · + εn ). Since (εi )i∈Z is independent of M∞,∞ , we have, integrating with respect to (εi )i∈Z , √ E h(n−1/2 Sn (1)) − h(x η)g(x)dx M0,n = E(h(n−1/2 Sn (1)) − h(Vp,n )|M0,n ) + E(h(Vp,n ) − h(Γ1 )|M0,n ) + E(h(Γ1 ) − h(n−1/2 sn )|M0,n ) .

(7.4.8)

Here, note that |n−1/2 Sn (1)−Vp,n | ≤ n−1/2 (|Xn−p+2,n |+· · ·+|Xn,n |). Using the asymptotic negligibility of n−1/2 X0,n , we infer that n−1/2 Sn (1) − Vp,n converges in probability to zero. Since furthermore h is 1-Lipschitz and bounded, we conclude that (7.4.9) lim h(n−1/2 Sn (1)) − h(Vp,n )1 = 0 , n→∞

and the same arguments yield lim h(Γ1 ) − h(n−1/2 sn )1 = 0 .

n→∞

(7.4.10)

180

CHAPTER 7. CENTRAL LIMIT THEOREM

In view of (7.4.9) and (7.4.10), it remains to control the second term in the right hand side of (7.4.8). To this end, we use Lindeberg’s decomposition. h(Vp,n ) − h(Γ1 ) =

p

(hi,i+1 − hi−1,i+1 ) +

i=1

p

(hi−1,i+1 − hi−1,i ) .

(7.4.11)

i=1

Now, applying Taylor’s integral formula we get that: ⎧ ⎪ hi,i+1 − hi−1,i+1 ⎪ ⎪ ⎨ ⎪ ⎪ ⎪ ⎩hi−1,i+1 − hi−1,i

1 √ Ui,n h i−1,i+1 n

=

8 −

=

η Δi h i−1,i+1 n

+

1 2

U h 2n i,n i−1,i+1

+

Ri

−

η 2

Δ h 2n i i−1,i+1

+

ri

where √ η|Δi | ηΔ2i |Ui,n | √ √ and |ri | ≤ . (7.4.12) 1∧ 1∧ n n n Since Δi is centered and independent of σ M∞,∞ ∪ σ(h i−1,i+1 ) , we have √ √ E( ηΔi h i−1,i+1 |M0,n ) = E(Δi )E( ηh i−1,i+1 |M0,n ) = 0. It follows that 2 Ui,n |Ri | ≤ n

E(h(Vp ) − h(Γ1 )|M0,n ) = D1 + D2 + D3 ,

(7.4.13)

where D1

=

p

E(n−1/2 Ui,n h i−1,i+1 |M0,n ),

i=1

1 2 E(n−1 (Ui,n − ηΔ2i )h

i−1,i+1 |M0,n ), 2 i=1 p

D2 D3

= =

p E Ri + ri M0,n . i=1

Control of D3 . From (7.4.12) and the fact that T preserves P, we get p

Ri 1 ≤ E

i=1

Sn2 (1/p) |Sn (1/p)| √ 1∧ n/p n

,

and S2(a) implies that lim lim sup

p→∞ n→∞

p i=1

Ri 1 = 0 .

(7.4.14)

7.4. CONDITIONAL CENTRAL LIMIT THEOREM (CAUSAL)

181

−1/2 Moreover, since for t ∈ [0, 1], the sequence (η/n)

p (ε1 + · · · + ε[nt] ) obviously satisﬁes S2(a), the same argument applies to i=1 ri 1 . Finally

lim lim sup D3 1 = 0 .

p→∞ n→∞

(7.4.15)

Control of D1 . Denote by Eε the integration with respect to the sequence (εi )i∈Z . Set l(i, n) = (i − 1)[n/p]. Bearing in mind the deﬁnition of h i−1,i+1 and integrating with respect to (εi )i∈Z we deduce that the random variable Eε (h i−1,i+1 ) is Ml(i,n),n -measurable and bounded by one. Now, since the σalgebra M0,n is included into Ml(i,n),n , we obtain E(n−1/2 Ui,n h i−1,i+1 |M0,n )1 ≤ E(n−1/2 Ui,n |Ml(i,n),n )1 . Using that T preserves P, the latter equals E(n−1/2 Sn (1/p)|M0,n )1 . Consequently D1 1 ≤ pn−1/2 E(Sn (1/p)|M0,n )1 and S2(b) implies that lim lim sup D1 = 0 .

p→∞ n→∞

(7.4.16)

Control of D2 . Integrating with respect to (εi )i∈Z , we have 2 2 − ηΔ2i )h

i−1,i+1 |M0,n )1 = E((Ui,n − η[np−1 ])Eε (h

i−1,i+1 )|M0,n )1 . E((Ui,n

Arguing as for the control of D1 , we have 2 2 − η[np−1 ])Eε (h

i−1,i+1 )|M0,n )1 ≤ E(n−1 Ui,n − η[np−1 ]|Ml(i,n),n )1 . E((Ui,n

Since both η and P are invariant by the transformation T , the latter equals E(Sn2 (1/p) − η[np−1 ]|M0,n )1 . Consequently

2

[n/p] Sn (1/p)

−η D2 1 ≤ E M0,n

n/p n/p 1 and S2(c) implies that lim lim sup D2 1 = 0 .

p→∞ n→∞

(7.4.17)

End of the proof of Proposition 7.6. From (7.4.15), (7.4.16) and (7.4.17) we infer that, for any h in B13 (R), lim lim sup D1 + D2 + D3 1 = 0 .

p→∞ n→∞

This fact together with (7.4.8), (7.4.9), (7.4.10) and (7.4.13) imply Proposition 7.6.

182

CHAPTER 7. CENTRAL LIMIT THEOREM

7.5

Applications

7.5.1

Stable convergence

Let (Ω, A, P) be a probability space. A sequence (Xn )n>0 of integrable random variables is said to converge weakly in L1 to X if for any bounded random variable Y , we have lim E(ZXn ) = E(ZX) . n→∞

Let X be some polish space. A random probability μ on X is a function from B(X ) × Ω such that μ(., ω) is a probability measure for any ω in Ω and μ(B, .) is A-measurable for any B in B(X ). Let μ be random probability on X . We say that a sequence (Xn )n>0 with values in a Polish space X converges stably with respect to μ if the sequence (ϕ(Xn ))n>0 converges weakly in L1 to μ(ϕ) for any continuous bounded function ϕ. The equivalence of this deﬁnition with that of R´enyi (1963) [156] is proved in Aldous and Eagleson (1978) [1]. Note that, if (Xn )n>0 converges stably with respect to μ, then it converges in distribution to the probability ν(A) = μ(A, ω)P(dω). Here is a one consequence of the stablility. Lemma 7.6. Let (Xn )n>0 and (Yn )n>0 be two sequences of random variables with values in two polish spaces X and Y. If (Xn )n>0 converges stably with to Y then (Xn , Yn ) converges respect to μ and (Yn )n>0 converges in probability in distribution to the probability ν(A × B) = μ(A, ω)1Y (ω)∈B P(dω). Proof of Lemma 7.6. Let f and g be two bounded Lipschitz functions. We have |E(f (Xn )g(Yn )) − ν(f ⊗ g))| = |E(f (Xn )g(Yn )) − E(μ(f )g(Y ))| . Consequently |E(f (Xn )g(Yn )) − ν(f ⊗ g))| ≤ |E(f (Xn )g(Y )) − E(μ(f )g(Y ))| + g(Yn ) − g(Y )1 . The ﬁrst term on right hand tends to zero by the weak-L1 convergence and the second term tends to zero because Yn converges in probability to Y and g is bounded Lipschitz. The next Corollary shows that if the CCLT holds then (n−1/2 Sn (t))n>0 converges stably. Corollary 7.2. Let Xi,n , Mi,n , Sn (t) be as in Theorem 7.5. Suppose that the sequence (M0,n )n≥1 is nondecreasing. If Condition S2 is satisﬁed, then the sequence (n−1/2 Sn (t)) n>0 √ converges stably with respect to the random probability deﬁned by μ(ϕ) = ϕ(x tη)g(x)dx.

7.5.

APPLICATIONS

183

Proof of Corollary 7.2. We shall prove that if S2 holds, then, for any bounded random variable Z, any t in [0, 1] and any ϕ in H, √ lim E Zϕ(n−1/2 Sn (t)) = E Z ϕ(x tη)g(x)dx . (7.5.1) n→∞

Since n−1 Sn2 (t) is uniformly integrable, we only need . to prove (7.5.1) for continuous bounded functions. Recall that M∞,∞ = σ( k,n Mk,n ). Since both Sn (t) and η are M∞,∞ -measurable, we can and do suppose that so is Z. Set Zk,n = E(Z|Mk,n ), and use the decomposition √ E Zϕ(n−1/2 Sn (t)) − E Z ϕ(x tη)g(x)dx = T1 + T2 + T3 , where T1 T2 T3

= E (Z − Zk,n )ϕ(n−1/2 Sn (t)) √ = E Zk,n ϕ(n−1/2 Sn (t)) − ϕ(x tη)g(x)dx √ = E (Zk,n − Z) ϕ(x tη)g(x)dx .

By assumption, the array Mk,n is nondecreasing in k and n. Since the random variable Z is M∞,∞ -measurable, the martingale convergence theorem implies that limk→∞ limn→∞ Zk,n − Z1 = 0. Consequently, lim lim sup |T1 | = lim lim sup |T3 | = 0 .

k→∞ n→∞

k→∞ n→∞

On the other hand, Theorem 7.5 implies that T2 tends to zero as n tends to inﬁnity, which completes the proof of Corollary 7.2. We end this section with an application of the stable convergence to random normalization. The proof is straightforward, using Lemma 7.6 and Corollary 7.2. Corollary 7.3. Let Xi,n , Mi,n , Sn (t) be as in Theorem 7.5. Suppose that the sequence (M0,n )n≥1 is nondecreasing and that P(η > 0) = 1. If Condition S2 is satisﬁed, and if (ηn )n>0 converges in probability to η then, for any t in ]0, 1], n−1/2 Sn (t) # tηn ∨ n−1

converges in distribution to N (0, 1) .

For more about stable convergence, we refer to the book by Castaing et al. (2004) [35].

184

CHAPTER 7. CENTRAL LIMIT THEOREM

7.5.2

Suﬃcient conditions for stationary sequences

For strictly stationary sequences, Theorem 7.5 writes as follows. Theorem 7.6. Let M0 be a σ-algebra of A satisfying M0 ⊆ T −1 (M0 ) and deﬁne the nondecreasing ﬁltration (Mi )i∈Z by Mi = T −i (M0 ). Let X0 be a M0 -measurable, square integrable and centered random variable. Deﬁne the sequence (Xi )i∈Z by Xi = X0 ◦ T i , and Sn = X1 + · · · + Xn . The following statements are equivalent: S1 There exists a nonnegative M0 -measurable random variable η such that, for any ϕ in H and any positive integer k,

√

−1/2 lim E ϕ(n Sn ) − ϕ(x η)g(x)dx Mk = 0 1

n→∞

where g is the distribution of a standard normal. S2 (a) the sequence (n−1 Sn2 )n>0 is uniformly integrable. (b) the sequence E(n−1/2 Sn |M0 )1 tends to 0 as n tends to inﬁnity. (c) there exists a nonnegative M0 -measurable random variable η such that E(n−1 Sn2 − η|M0 )1 tends to 0 as n tends to inﬁnity. Moreover the random variable η satisﬁes η = η ◦ T almost surely. In Proposition 7.7 and 7.8, we give two conditions implying S2. For the usual central limit theorem, Proposition 7.7 is due to Gordin (1969) [97], Corollary 7.4 is due to Heyde (1974) [103] and Proposition 7.8 is due to Dedecker and Rio (2000) [50]. Proposition 7.7. Let (Mi )i∈Z and (Xi )i∈Z be as in Theorem 7.6. Let Hi be the Hilbert space of Mi -measurable, centered and square integrable functions. For all integer j less than i, denote by Hi ! Hj , the orthogonal of Hj into Hi . Assume that there exists a random variable m in H0 ! H−1 such that n

1

lim √

X0 ◦ T i − m ◦ T i = 0 , n→∞ n i=1 2

(7.5.2)

then S2 (hence S1) holds. Corollary 7.4. Let (Mi )i∈Z and (Xi )i∈Z be as in Theorem 7.6, and deﬁne Hi as in Proposition 7.7. Let Pi be the projection operator on Hi ! Hi−1 : for any function f in L2 , Pi (f ) = E(f |Mi ) − E(f |Mi−1 ). If n

P0 (Xi ) converges in L2 to m and lim n−1/2 Sn 2 = m2 ,

i=0

then (7.5.2) (hence S1) holds.

n→∞

(7.5.3)

7.5.

APPLICATIONS

185

Proof of Proposition 7.7. Let Wn = m◦T +· · ·+m◦T n . Since (m◦T i )i∈Z is a stationary sequence of martingale diﬀerences with respect to the ﬁltration (Mi )i∈Z , it satisﬁes S2. More precisely, n−1 E(Wn2 |M0 ) converges to η = E(m2 |I) in L1 . Now, we shall use (7.5.2) to see that the sequence (Xi )i∈Z also satisﬁes S2. Proof of S2(b). From (7.5.2) it is clear that n−1/2 E(Sn |M0 )2 tends to zero as n tends to inﬁnity. Proof of S2(c). To see that n−1 E(Sn2 |M0 ) converges to η in L1 , write

1

E(Sn2 − Wn2 |M0 )

1 n

≤ ≤

1

Sn2 − Wn2

1 n Sn + Wn 2 Sn − Wn 2 √ √ . n n

(7.5.4)

From (7.5.2) the latter tends to zero as n tends to inﬁnity and therefore (Xi )i∈Z satisﬁes s2(c) with η = E(m2 |I). Proof of S2(a). Using both that n−1 Wn2 is uniformly integrable and that the function x → (1 ∧ |x|) is 1-Lipschitz, we have, for any positive real M , 2 |Sn | |Wn | Wn √ √ = 0. (7.5.5) 1 ∧ − 1 ∧ lim n→∞ n M n M n Since |x2 (1 ∧ |y|) − z 2 (1 ∧ |t|)| ≤ |x2 − z 2 | + z 2 |(1 ∧ |y|) − (1 ∧ |t|)|, we infer from (7.5.4) and (7.5.5) that

2

Sn |Sn | Wn2 |Wn |

1∧ √ − 1∧ √

lim = 0. n→∞ n M n n M n 1 Now, the uniform integrability of n−1 Wn2 yields 2 Sn |Sn | 1∧ √ lim lim sup E = 0, M→∞ n→∞ n M n which means exactly that n−1 Sn2 is uniformly integrable. Proof of Corollary 7.4. By assumption, the random variable m belongs to H0 ! H−1 . It remains to check (7.5.2). Let mi = m◦ T i and Tn = m1 + · · ·+ mn . Clearly E((Sn − Tn )2 ) = E(Sn2 ) + E(Tn2 ) − 2E(Sn Tn ). By assumption n−1 E(Sn2 ) converges to m22 = n−1 E(Tn2 ). To prove (7.5.2), it suﬃces to prove that n−1 E(Sn Tn ) converges to m22 . Now, by stationarity n n n |k| 1 1 1− E(Sn Tn ) = E(mXk ) . E(Xi mj ) = n n i=1 j=1 n k=−n

(7.5.6)

186

CHAPTER 7. CENTRAL LIMIT THEOREM

By Kronecker’s lemma n−1 E(Sn Tn ) converges to k∈Z E(mXk ) as soon as

the latter converges. Now since m belongs to H ! H , it follows that 0 −1 k∈Z

2 E(mXk ) = k≥0 E(mP0 (Xk )). Since m = k≥0 P0 (Xk ) in L , the result follows. Proposition 7.8. Let (Mi )i∈Z , (Xi )i∈Z and Sn be as in Theorem 7.6. Consider the condition: n X0 E(Xk |M0 ) converges in L1 . (7.5.7) k=1

If (7.5.7) is satisﬁed, then S2 holds and the sequence E(X02 |I) + 2E(X0 Sn |I) converges in L1 to η. Proof of Proposition 7.8. We ﬁrst prove that E(X02 |I) + 2E(X0 Sn |I) converges in L1 . From assumption (7.5.7), the sequence E(X02 |M0 ) +2E(X0 Sn |M0 ) converges in L1 . The result is then a consequence of part (b) of Lemma 7.7 below: Lemma 7.7. We have: (a) Both E(X0 Xk |I) and E(E(X0 Xk |M0 )|I) are almost surely equal to M0 measurable random variables. (b) E(X0 Xk |I) = E(E(X0 Xk |M0 )|I) almost surely. Lemma 7.7(b) is derived from Lemma 7.7(a) via the following elementary fact, whose proof is omitted. Lemma 7.8. Let Y be a random variable in L1 (P) and U, V two σ-algebras of (Ω, A, P). Suppose that E(Y |U) and E(E(Y |V)|U) are V-measurable. Then E(Y |U) = E(E(Y |V)|U) almost surely. Proof of Lemma 7.7(a). The fact that E(E(X0 Xk |M0 )|I) is almost surely equal to some M0 -measurable random variable follows from the L1 -ergodic theorem. Indeed the variables E(Xi Xk+i |M0 ) are M0 -measurable and 1 E(Xi Xk+i |M0 ) in L1 . n→∞ n i=1 n

E(E(X0 Xk |M0 )|I) = lim

Next, from the stationarity of the sequence (Xi )i∈Z , we have n

1 1

Xi Xi+k = E(X0 Xk |I) −

E(X0 Xk |I) − n i=1 n 1

−k i=1−(n+k)

Xi Xi+k . 1

Both this equality and the L1 -ergodic theorem imply that E(X0 Xk |I) is the limit in L1 of a sequence of M0 -measurable random variables.

7.5.

APPLICATIONS

187

Proof of S2(a). Let S n = max{|S1 |, . . . , |Sn |}. In Chapter 8, we shall prove that (n−1 (S n )2 )n>0 is uniformly integrable as soon as (7.5.7) holds. Proof of S2(c). For any positive integer N , we introduce ΛN = [(k − 1)q + 1, kq]2 ∩ {(i, j) ∈ Z2 / |i − j| > N },

¯ N = [1, n]2 − ΛN . and Λ

and ηN = E(X02 |I) + 2(E(X0 X1 |I) + · · · + E(X0 XN |I) ). Since ηN converges in L1 to η, it suﬃces to prove that n n

1

lim lim sup ηN − E Xi Xj M0 = 0 . N →∞ n→∞ n i=1 j=1 1

(7.5.8)

Using the triangle inequality and the fact that ηN = E(ηN |M0 ) almost surely, we obtain n n

1 1

Xi Xj M0 ≤ ηN − Xi Xj

ηN − E n i=1 j=1 n 1 1 ΛN

1

+

E(Xi Xj |M0 ) . n ¯ 1 ΛN

Applying the L1 -ergodic theorem the ﬁrst term on right hand tends to 0 as n tends to inﬁnity. To control the second term, write

1

E(Xi Xj |M0 )

n ¯ 1

≤

ΛN

n n

2

E(Xj |Mi )

Xi n i=1 1 j=i+N

≤

2 n

n

n−i

E(Xj |M0 ) .

X0

i=1

By assumption (7.5.7) limN →∞ maxn−N ≤i≤n X0 consequently lim lim sup

N →∞ n→∞

1

j=N

n−i j=N

E(Xj |M0 )1 = 0 and

n n−i

2

E(Xj |M0 ) = 0 .

X0 n i=1 1 j=N

This completes the proof of S2(c). C Mi and recall that Pi has been deﬁned in Proof of S2(b). Let M−∞ = i∈Z

Corollary 7.4. We have the orthogonal decomposition Xk = E(Xk |M−∞ ) +

∞ i=0

Pk−i (Xk ) .

(7.5.9)

188

CHAPTER 7. CENTRAL LIMIT THEOREM

Using the stationarity of (Xi )i∈Z we infer from (7.5.9) that ∞

P0 (Xi )22 =

i=0

∞

P−i (X0 )22 ≤ X0 22 .

(7.5.10)

i=0

Now, from the decomposition n X−1 X X √ E(Sn |M0 ) = √−1 E(Sn |M−1 ) + √−1 P0 (Xi ) , n n n i=1

we infer that n 1 1 X0 2 √ X−1 E(Sn |M0 )1 ≤ √ X0 E(Sn ◦ T |M0 )1 + √ P0 (Xi )2 . n n n i=1 (7.5.11) By (7.5.7), the ﬁrst term on right hand tends to zero as n tends to inﬁnity. On the other hand, we infer from (7.5.10) and Cauchy-Shwarz’s inequality that n n−1/2 i=1 P0 (Xi )2 vanishes as n goes to inﬁnity, and so does the left hand term in (7.5.11). By induction, we can prove that for any positive k,

1 lim √ X−k E(Sn |M0 )1 = 0 . n→∞ n

(7.5.12)

Now

k

1

1 1

√ E(|X0 ||I)E(Sn |M0 )1 ≤ √ E(Sn |M0 ) E(|X0 ||I) − |X−i |

n n

k i=1 1

k

1

1

+ √ E(Sn |M0 ) |X−i | . (7.5.13)

n

k i=1

1

From (7.5.12), the second term on right hand tends to zero as n tends to inﬁnity. Applying ﬁrst Cauchy-Schwarz’s inequality and next the L2 -ergodic theorem, we easily deduce that the ﬁrst term on right hand is as small as we wish by choosing k large enough. Therefore 1 lim √ E(|X0 ||I)E(Sn |M0 )1 = 0 . n } and B = Ac = {1 }. Set A = {1 E |X0 |I >0 E |X0 |I =0 For any positive real m, we have n→∞

(7.5.14)

1 1 √ 1A E(Sn |M0 )1 ≤ √ E(|X0 ||I)E(Sn |M0 )1 n m n 1 + √ 10<E(|X0 ||I)<m E(Sn |M0 )1 . (7.5.15) n

7.5.

APPLICATIONS

189

From (7.5.14), the ﬁrst term on right hand tends to zero as n tends to inﬁnity. Letting m tend to zero we infer that the second term on right hand of (6.12) is as small as we wish. Consequently 1 lim √ 1A E(Sn |M0 )1 = 0 . n

(7.5.16)

n→∞

On the other hand, noting that E(|X0 |1B ) = 0, we infer that X0 is zero on the set B. Since B is invariant by T , Xk is zero on B for any k in Z. Now arguing as in Lemma 7.7(b), we obtain E(E(|Sn ||M0 )|I) = E(|Sn ||I). These two facts lead to 1B E(Sn |M0 )1 ≤ E(1B E(E(|Sn ||M0 )|I)) ≤ E(|Sn |1B ) ≤ 0

(7.5.17)

Collecting (7.5.16) and (7.5.17), we conclude that n−1/2 E(Sn |M0 )1 tends to zero as n tends to inﬁnity. This completes the proof.

7.5.3

γ-dependent sequences

Applying Corollary 7.4 and Proposition 7.8, we derive suﬃcient conditions for the CCLT expressed in terms of the coeﬃcients γ2 , and γ1 . Here, the coeﬃcients are deﬁned by γ1 (k) = γ1 (M0 , Xk ),

and γ2 (k) = γ2 (M0 , Xk ) .

Corollary 7.5. Let (Mi )i∈Z and (Xi )i∈Z be as in Theorem 7.6 and deﬁne M−∞ = ∩i∈Z Mi . Deﬁne the operators Pi as in Corollary 7.4.

1. If E(X0 |M−∞ ) = 0 and i≥0 P0 (Xi )2 < ∞ then (7.5.3) (hence S1) holds. Moreover, η is the same as in Proposition 7.8. 2. Assume that

γ2 (k) √ < ∞. (7.5.18) k k>0

Then E(X0 |M−∞ ) = 0 and i≥0 P0 (Xi )2 < ∞, and S1 holds.

Proof of Corollary 7.5: Proof of 1. Starting from (7.5.9) with E(Xk |M−∞ ) = 0, we obtain E(X0 Xk ) = E(P−i (X0 )P−i (Xk )) = E(P0 (Xi )P0 (Xi+k )) . i≥0

i≥0

Hence, using H¨ older’s inequality, ∞ k=1

|E(X0 Xk )| ≤

i≥0

P0 (Xi )2

∞ k=1

2 P0 (Xi+k )2 ≤ P0 (Xi )2 , i≥0

(7.5.19)

190

CHAPTER 7. CENTRAL LIMIT THEOREM

so that

k∈Z

|E(X0 Xk )| is ﬁnite. Consequently

∞ ∞ ∞ 1 E(Sn2 ) = E(X0 Xk ) = E(P0 (Xi )P0 (Xj )) . n→∞ n i=0 j=0

lim

(7.5.20)

k=−∞

On the other hand ∞ ∞ ∞ 2 E(m2 ) = E = P0 (Xi ) E(P0 (Xi )P0 (Xj )) . i=0

(7.5.21)

i=0 j=0

Combining (7.5.20) and (7.5.21), we see that (7.5.3) holds. To compute η, note that we can in fact prove a stronger result than (7.5.19), that is ∞

E(X0 Xk |I)1 ≤

k=1

P0 (Xi )2

2 .

i≥0

It follows that n−1 E(Sn2 |I) converges in L1 to η =

k∈Z

E(X0 Xk |I).

Proof of 2. Note ﬁrst that (7.5.18) implies that E(Xk |M−∞ ) = 0. Let (Lk )k>0 −1

i be a sequence of positive numbers such that i>0 < ∞. Starting k=1 Lk from (7.5.9) and using the stationarity of (Xi )i∈Z we obtain that

Lk E(Xk |M0 )22 =

k>0

Lk

k>0

Pi (Xk )22 =

i

Lk P0 (Xi )22 .

i>0 k=1

i≤0

Let ai = L1 + · · · + Li . Applying H¨ older’s inequality in 2 , we obtain that

P0 (Xi )2

i>0

1/2 1 1/2 ai P0 (Xi )22 a i>0 i i>0 1/2 1 1/2 ≤ Lk E(Xk |M0 )22 a i>0 i ≤

k>0

Since (7.5.18) holds, one can take L−1 = k lows.

√

kE(Xk |M0 )2 . The result fol-

Corollary 7.6. Let (Mi )i∈Z and (Xi )i∈Z be as in Theorem 7.6. Let GX0 and QX0 be as in Deﬁnition 5.1. If ∞ k=0

0

γ1 (k)

QX0 ◦ GX0 (u)du < ∞

then (7.5.7) (hence S1) holds. Moreover (7.5.22) holds as soon as

(7.5.22)

7.5.

APPLICATIONS

191

1. P(|X0 | > x) ≤ (c/x)r for r > 2, and i≥0 (γ1 (i))(r−2)/(r−1) < ∞.

2. X0 r < ∞ for r > 2, and i≥0 i1/(r−2) γ1 (i) < ∞. 3. E(|X0 |2 log(1 + |X0 |)) < ∞ and γ1 (i) = O(ai ) for some a < 1. Proof of Corollary 7.6. Applying Inequality (5.2.1), we obtain that

X0 E(Xk |M0 )1 ≤

k≥0

k≥0

γ1 (k)

0

Q|X0 | ◦ G|X0 | (u)du ,

so that (7.5.22) implies (7.5.7). Proof of 1. Since P(|X| > x) ≤ (c/x)r we easily get that x ur r/(r−1) c(r − 1) (r−1)/r x QX (u)du ≤ and then GX (u) ≥ . r c(r − 1) 0 Set Kc,r = c(c − cr−1 )1/(r−1) . We obtain the bound

γ1 (i)

0

i≥0

QX0 ◦ GX0 (u)du

≤ Kc,r

i≥0

γ1 (i)

u−1/(r−1) du

0

Kc,r (r − 1) (r−2)/(r−1) γ1,i . r−2

≤

i≥0

Proof of 2. Note ﬁrst that

X0 1

0

Qr−1 X0

◦ GX0 (u)du =

1

0

QrX0 (u)du = E(|X0 |r ) .

Deﬁne next γ1−1 (u) =

1u<γ1 (i) = inf{k ∈ N/ γ1 (k) ≤ u} .

i≥0

Applying H¨ older’s inequality, we obtain that i≥0

0

γ1 (i)

QX0 ◦ GX0 (u)du

r−1

≤ X0 rr

0

X0 1

(γ1−1 (u))(r−1)/(r−2) du

r−2 .

Here note that (γ1−1 (u))q

=

∞ j=0

((j + 1)q − j q )1u<γ1 (j)

(7.5.23)

192

CHAPTER 7. CENTRAL LIMIT THEOREM

Now, apply (7.5.23) with q = (r − 1)/(r − 2). Noting that (i + 1)q − iq ≤ q(i + 1)q−1 , we infer that

X1

0

(γ1−1 (u))(r−1)/(r−2) du

≤

q

(i + 1)1/(r−2) γ1 (i) .

i≥0

Proof of 3. Let ρ(i) = γ1 (i)/X1 and U be a random variable uniformly distributed over [0, 1]. We have

X1

0

γ1−1 (u)QX0 ◦ GX0 (u)du = =

0

1

ρ−1 (u)QX0 ◦ GX0 (uX0 1 )du

E((ρ−1 (U ))QX0 ◦ GX0 (U X0 1 )) .

Let φ be the function deﬁned on R+ by φ(x) = x(log(1 + x))p−1 . Denote by φ∗ its Young’s transform. Applying Young’s inequality, we have that E((ρ−1 (U ))QX0 ◦ GX0 (U X0 1 )) ≤ 2(ρ−1 (U ))φ∗ QX0 ◦ GX0 (U X0 1 )φ Here, note that QX0 ◦ GX0 (U X0 1 )φ is ﬁnite as soon as 0

X1

QX0 ◦ GX0 (u)(log(1 + QX0 ◦ GX0 (u)))p−1 du < ∞ .

Setting z = GX0 (u), we obtain the condition 0

1

Q2X0 (u)(log(1 + QX0 (u)))du < ∞ .

(7.5.24)

Since QX0 (U ) has the same distribution as |X0 |, we infer that (7.5.24) holds as soon as E(|X0 |2 (log(1 + |X0 |))) is ﬁnite. It remains to control (ρ−1 (U ))φ∗ . Arguing as in Rio (2000) [161] page 17, we see that (ρ−1 (U ))φ∗ is ﬁnite as soon as there exists c > 0 such that −1 ρ(i) φ ((i + 1)/c) < ∞ . (7.5.25) i≥0 −1

Since φ has the same behavior as x → ex as x goes to inﬁnity, we can always ﬁnd c > 0 such that (7.5.25) holds provided that γ1 (i) = O(ai ) for some a < 1.

7.5.4

˜ α ˜ and φ-dependent sequences

In this section, the coeﬃcients are deﬁned by ˜ φ˜1 (k) = φ(M 0 , Xk )

˜ and α ˜1 (k) = φ(M 0 , Xk ) .

7.5.

APPLICATIONS

193

Let Xn = X0 ◦ T i and Sn (f ) = f (X0 ) + · · · + f (Xn ) and Sn,0 (f ) = Sn (f ) − nE(f (X0 )) . Let C(p, M, PX ) be the closed convex hull of the class of functions g which are monotonic on an open interval and 0 elsewhere, and such that E(|g(X0 )|p ) < M . Corollary 7.7. Let (Mi )i∈Z be as in Theorem 7.6. Assume that, for some p ≥ 2, (φ˜1 (k))(p−1)/p √ < ∞. (7.5.26) f ∈ C(p, M, PX ) and k k≥0 Then Sn,0 (f ) satisﬁes S1 with η=

E(f (X0 )(f (Xk ) − E(f (Xk )))|I) .

(7.5.27)

k∈Z

Remark 7.5. We can apply this result to the case of uniformly expanding maps (see Section 3.3). If T is a uniformly expanding map of [0, 1] preserving a probability μ, and if f belongs to C(2, M, μ), then n−1/2 (f ◦T +· · ·+f ◦T n−nμ(f )) converges weakly in the space ([0, 1], μ) to a Gaussian distribution with mean 0 and variance σ 2 (f ) = μ((f − μ(f ))2 ) + 2 μ((f − μ(f ))f ◦ T k ) . k>0

Let C(Q) the closed convex hull of the class of functions g which are monotonic on an open interval and 0 elsewhere, and such that Q|g(X0 )| ≤ Q, where Q is a given quantile function. Corollary 7.8. Let (Mi )i∈Z be as in Theorem 7.6. Assume that, for some quantile function Q, f ∈ C(Q)

and

k≥0

0

α(k) ˜

Q2 (u)du < ∞ .

(7.5.28)

Then Sn,0 (f ) satisﬁes S1 with η given by (7.5.27). Proof of Corollaries 7.7 and 7.8. Without

k loss of generality, it suﬃces to prove k the results for f = i=1 αi gi , where i=1 αi = 1 and gi is monotonic on an interval and 0 elsewhere, and E(|gi (X0 )|p ) < M (resp. Q|gi (X0 )| ≤ Q). To prove Corollary 7.7 it suﬃces to see that the sequence (f (Xi ))i∈Z satisﬁes (7.5.18), that is E(f (Xk )|M0 ) − E(f (Xk ))2 √ < ∞. (7.5.29) k k≥0

194

CHAPTER 7. CENTRAL LIMIT THEOREM

Clearly, it suﬃces to check (7.5.29) for a single function gi . Note that by mono˜ ˜ tonicity φ(M 0 , gi (Xk )) ≤ 2φ(M0 , Xk ). Let S1 (M0 ) be the set of all M0 measurable random variables Z such that E(Z 2 ) = 1. Clearly, E(gi (Xk )|M0 ) − E(gi (Xk ))2 =

sup Z∈S1 (M0 )

|Cov(Z, gi (Xk ))|

(7.5.30)

Applying (5.2.7), we have, for any conjugate exponents p, q, |Cov(Z, (f − g)(Yk ))| ≤ 4gi (X0 )p Zq (φ˜1 (k))1/q . and the result easily follows. In the same way, to prove Corollary 7.8, it suﬃces to prove that gi (X0 )(E(gi (Xk )|M0 ) − E(gi (X0 )))1 < ∞ , k≥0

which follows (5.2.5).

7.5.5

Suﬃcient conditions for triangular arrays

The next condition is the natural extension of Condition (7.5.7). m

E(Xk,n |M0,n ) = 0 . lim lim sup sup X0,n

N →∞ n→∞ N ≤m≤n

(7.5.31)

1

k=N

If (7.5.31) is satisﬁed, deﬁne R(N, X) and N (X) as follows: m

E(Xk,n |M0,n ) , R(N, X) = lim sup sup X0,n n→∞ N ≤m≤n

1

k=N

and N (X) = inf{N > 0 : R(N, X) = 0} (N (X) may be inﬁnite). Proposition 7.9. Let Xi,n and Mi,n be as in Theorem 7.5. Assume that (7.5.31) and S2(b) are satisﬁed. Assume furhtermore that, for each 0 ≤ k < N (X), there exists an M0,inf -measurable random variable λk such that for any t in ]0, 1], [nt] 1 Xi+k,n Xi,n converges in L1 to λk . (7.5.32) nt i=1 Then Condition S2 holds with η = λ0 + 2

N (X)−1 k=1

λk .

Remark 7.6. Let us have a look to a particular case, for which N (X) = 1. Conditions (7.5.31) and (7.5.32) are satisﬁed if condition R1. and R2. below are fulﬁlled

7.5.

APPLICATIONS

R1. lim

n

n→∞

k=1

195

X0,n E(Xk,n |M0,n )1 = 0.

R2. For any t in ]0, 1],

1 nt

[nt] i=1

2 Xi,n converges in L1 to λ.

In the stationary case, these results extend on classical results for triangular arrays of martingale diﬀerences (see for instance Hall and Heyde (1980) [100], Theorem 3.2), for which Condition R1. is satisﬁed. This particular case is suﬃcient to improve on many results in the context of kernel estimators. ˜ We conclude with a simple result for φ-mixing arrays. Here, the coeﬃcients are deﬁned by ˜ φ˜1 (k) = sup φ(M 0,n , Xk,n ) . n>0

Corollary 7.9. Let Xi,n and Mi,n be as in Theorem 7.5. If there exists two conjugate exponent p ≤ q such that sup X0,n p · X0,n q < ∞

and

n>0

∞

(φ˜1 (k))1/p < ∞ ,

(7.5.33)

k=0

then (7.5.31) holds. If furthermore, for any k > 0 the sequence X0,n Xk,n 1 converges to 0 as n tends to inﬁnity, then N (X) = 1. If furthermore Lindeberg’s condition holds: for any > 0,

2 lim E(X0,n 1|X0,n |>√n ) = 0 ,

n→∞

(7.5.34)

2 2 then S2 holds as soon as E(X0,n ) converges, and η = limn→∞ E(X0,n ).

Proof of Proposition 7.9. Proof of S2(a). Write ﬁrst S 2 (t) |Sn (t)| E n (1 ∧ √ ) nt n

≤ ≤

2 S 2 (t) E n 1|Sn (t)|>2√n + E(Sn2 (t)) nt nt 2 4 E (|Sn (t)| − )2+ + E(Sn2 (t)) (7.5.35) nt nt

Since (7.5.31) holds, (nt)−1 E(Sn2 (t)) is bounded, so that the second term on right hand is a small as we wish. Consequently, we infer from (7.5.35) that S2(a) holds as soon as, for any positive ,

lim lim sup

t→0 n→∞

1 E (|Sn (t)| − )2+ = 0 . nt

(7.5.36)

In fact, we shall prove that (7.5.36) holds with S n (t) = sups∈[0,t] |Sn (s)| instead of |Sn (t)|. Deﬁne Sn∗ (t) = sups∈[0,t] (Sn (s))+ and G(t, , n) by G(t, , n)

196

CHAPTER 7. CENTRAL LIMIT THEOREM

= {Sn∗ (t) > }. From Proposition 5.8, we have, for any positive integer N , [nt] N −1 1 ∗ 1 2 E (Sn (t) − )+ ≤ 8E 1G(t,,n) |Xk,n Xk+i,n | nt nt i=0 k=1

+ 8 sup

m

X0,n E(Xk,n |M0,n ) . (7.5.37)

N ≤m≤[nt]

1

k=N

Since (7.5.31) holds, the second term on right hand is as small as we wish by choosing N large enough. To control the ﬁrst term, note that by Proposition 5.8 and Markov’s inequality, limt→0 lim supn→∞ P(G(t, , n)) = 0. The result 2 2 follows by noting that 2|Xk,n Xl,n | ≤ Xk,n + Xl,n and by using (7.5.32) for k = 0. Proof of S2(c). For any ﬁnite integer 0 ≤ N ≤ N (X), deﬁne the variable ηN = λ0 + 2(λ1 + · · · + λN −1 ) and the two sets ΛN ΛN

= [1, [nt]]2 ∩ {(i, j) ∈ Z2 / |i − j| < N } and = [1, [nt]]2 ∩ {(i, j) ∈ Z2 / j − i ≥ N }, so that,

S 2 (t)

1

− ηN M0,n ≤ ηN − Xi,n Xj,n

E n nt nt 1 1 ΛN

2

+

E(Xi,n Xj,n |M0,n ) . (7.5.38) nt 1 ΛN

Hence, it remains to show that lim

lim sup

N →N (X) n→∞

1

E(Xi,n Xj,n |M0,n ) = 0 .

nt 1

(7.5.39)

ΛN

Using ﬁrst the inclusion M0,n ⊆ Mi,n for any positive i and second the stationarity of the sequence, we obtain succesively

1

E(Xi,n Xj,n |M0,n )

nt 1

≤

ΛN

1

Xi,n E(Xj,n |Mi,n )

nt 1 ΛN

≤

m

sup X0,n E(Xk,n |M0,n ) .

N ≤m≤n

k=N

and (7.5.39) follows from (7.5.31). This completes the proof.

1

7.5.

APPLICATIONS

197

Proof of Corollary 7.9. Using the stationarity and the inequality (5.2.7), we infer that X0,n E(Xk,n |M0,n )1 ≤ 2X0,np X0,n q (φ˜1 (k))1/p ,

(7.5.40)

so that (7.5.31) follows easily from (7.5.33). Moreover, if for each k > 0 the sequence X0,n Xk,n 1 converges to 0, the fact that N (X) = 1 follows from (7.5.40) and the dominated convergence theorem. To prove the last point of this corollary, it suﬃces, by applying Proposition (7.9), to prove that for any t in ]0, 1], [nt]

1

2 2 )) = 0 .

(Xk,n − E(X0,n n→∞ nt 1

lim

(7.5.41)

k=1

According to the condition (7.5.34), it suﬃces to prove that lim lim sup Var

→0 n→∞

[nt] 1 2 Xk,n 1|Xk,n |≤√n = 0 . nt

(7.5.42)

k=1

2 Setting Yk,n = Xk,n 1|Xk,n |≤√n , we have the elementary inequality

Var

[nt] n 1 2 Yk,n ≤ |Cov(Y0,n , Yk,n )| . nt nt k=1

(7.5.43)

k=0

Now, bearing in mind the deﬁnition of Yk,n and applying (5.2.7), we have successively |Cov(Y0,n , Yk,n )|

2 = |Cov(Y0,n , Xk,n 1|Xk,n |≤√nK )| 2 2 ≤ 2(φ˜1 (k))1/p X0,n 1|X0,n |≤√n p X0,n 1|X0,n |≤√n q

≤ 2n 2 (φ˜1 (k))1/p X0,n p X0,n q , and (7.5.42) follows from (7.5.43) and (7.5.33).

Chapter 8

Donsker Principles In this chapter we give suﬃcient conditions for the smoothed partial sum process of a sequence (resp. ﬁeld) of real-valued random variables to converge to a Brownian motion (resp. Brownian sheet). In the non causal case, we show that the conditions on κ(r) and λ(r) (implied by η dependence) given in Theorems 7.1 and 7.2 respectively, are suﬃcient to obtain the weak invariance principle. In Section 8.2 we give a general result for η dependent random ﬁelds having moments of order 4. For the causal case, we present the functional version of the conditional central limit theorem (CCLT) established in Section 7.4. We shall see in Section 8.4 that the suﬃcient conditions for the CCLT for γ, α ˜ ˜ and φ-dependent sequences, are also suﬃcient for the conditional invariance principle.

8.1

Non causal stationary sequences

Let (Xi )i∈Z be a stationary sequence of centered and square integrable random variables, and let Un be the Donsker line [nt]

1 nt − [nt] X[nt]+1 . Un (t) = √ Xi + √ n i=1 n In this short section, we show that under the same assumptions as in Theorem 7.1 or Theorem 7.2, the weak invariance principle holds. This follows easily from the control of Sn 2+δ obtained at the end of Section 7.2.1. Theorem 8.1. Assume that (Xi )i∈Z satisfy the assumptions of Theorem 7.1 or the assumptions of Theorem 7.2. Then the process Un converges weakly in (C([0, 1]), · ∞ ) to a Wiener process with variance σ 2 = k∈Z Cov(X0 , Xk ). 199

200

CHAPTER 8. DONSKER PRINCIPLES

Proof. The ﬁnite dimensional convergence of the process Un can be proved as in Section 7.2.1. It remains to prove the tightness. Recall that under the assumptions of Theorem 7.1 or of Theorem 7.2, there exist there exists δ > 0 and C > 0 such that: E|Sp |2+δ ≤ Cp1+δ/2 . The tightness follows then from standard arguments, which can be found in many papers. For instance, if λ denotes the Lebesgue measure, we infer from the moment inequality above that (Un , λ) belongs to the class C(1 + δ/2, 2 + δ) deﬁned in Bickel and Wichura (1971) [19] (see the inequality (3) of this paper). The tightness of the process {Un (t), t ∈ [0, 1]} follows by applying Theorem 3 of the above paper. Remark. The same result follows the same lines for the Bernoulli shift models with dependent inputs from theorem 7.3.

8.2

Non causal random ﬁelds

Here we establish the Donsker principle for non causal weak dependent sequences and random ﬁelds. A block B in [0, 1]d is a subset of [0, 1]d of the form ]s, t] = 7d d p=1 ]sp , tp ]. Let (Xj )j∈Zd be a random ﬁeld and B be a block in [0, 1] . Let nB = {nx, x ∈ B}, and Sn (B) =

1 nd/2

Xj .

(8.2.1)

j∈nB∩Zd

For any j ∈ Zd , denote by C j the unit with upper corner j, and deﬁne the continuous process: Un (B) =

1 nd/2

λ(nB ∩ C j )Xj ,

j∈Zd

where λ denotes the Lebesgue measure on Rd . For t ∈ [0, 1]d , deﬁne Sn (t) = Sn ([0, t]) and Un (t) = Un ([0, t]). Theorem 8.2. Let (Xj )j∈Zd be a η-weak dependent stationary centered random ﬁeld. Assume that E|X0 |4 = M 4 ≤ ∞. If η(r) ≤ cr−a , with a > 3d, then the weakly in (C([0, 1]d ), · ∞ ) to a Brownian sheet with process Un converges

variance σ 2 = k∈Zd Cov(X0 , Xk ).

8.2. NON CAUSAL RANDOM FIELDS

8.2.1

201

Moment inequality

First we establish a bound for the fourth moment of a partial sum: Lemma 8.1. Assume that the assumptions of theorem 8.2 are satisﬁed, then for any block B in [0, 1]d, there exists C > 0 such that: E(Sn (B)4 ) ≤ Cλ(B)2 Proof. For a ﬁnite sequence k = (k1 , . . . , kq ) of elements of Zd , deﬁne Πk = 7q i=1 Xki . For any integer q ≥ 1, set: Aq (n) = |E (Πk )| , (8.2.2) k∈(nB∩Zd )q

then

|E(Sn (B))4 | ≤ n−2d A4 (n).

(8.2.3)

The gap of k is deﬁned by the max of the integers r such that the sequence may be split into two non-empty subsequences k1 and k2 ⊂ Zd whose mutual distance equals r (d(k1 , k2 ) = min{i−j1 / i ∈ k1 , j ∈ k2 } = r). If the sequence is constant, its gap is 0. Deﬁne the set Gr (q, n) = {k ∈ (nB)q and the gap of k is r}. Sorting the sequences of indices by their gap: Aq (n) ≤

E|Xk |q +

∞

|Cov (Πk1 , Πk2 )|

(8.2.4)

r=1 k∈Gr (q,n)

k∈nB

+

∞

|E (Πk1 ) E (Πk2 )| .

(8.2.5)

r=1 k∈Gr (q,n)

Deﬁne Vq (n) as the sum of the right hand side of (8.2.4). We get A4 (n) ≤ V4 (n) + V2 (n)2 . Denote by N the cardinality of nB ∩ Zd . To build a sequence k belonging to Gr (q, n), we ﬁrst ﬁx one of the N points of nB ∩ Zd . We choose a second point on the 1 -sphere of radius r centered on the ﬁrst point. The third point is in a ball of radius r centered on one of the preceding points, and so on. We get #Gr (q, 4) ≤

N 2d(2r + 1)d−1 .2(2r + 1)d .3(2r + 1)d ≤ 12dN 33dr3d−1 ,

#Gr (q, 2) ≤

N 2d(2r + 1)d−1 ≤ 2dN 3d−1 rd−1

and V4 (n) ≤ V2 (n) ≤

N M 4 + 12dN 33d N M 2 + 2dN 3d−1

∞ r=1 ∞ r=1

r3d−1 η(r), rd−1 η(r),

202

CHAPTER 8. DONSKER PRINCIPLES

so that

A4 (n) ≤ C(N + N 2 ).

Because N is an integer and |N − nd λ(B)| ≤ 2dN/n, Lemma 8.1 is proved.

8.2.2

Finite dimensional convergence

To prove the convergence of the ﬁnite dimensional distributions, we note that it is suﬃcient to prove it for the ﬁnite dimensional distribution of Sn (B), because Un (B) − Sn (B) tends to zero in probability. Considering B and C two disjoint blocks of [0, 1]d , we check that the joint distribution of (Sn (B), Sn (C)) satisﬁes: lim Cov(Sn (B), Sn (C)) = 0.

n→∞

Denote b− and b+ the lower and upper vertex of block B. If the domains are non − intersecting, for at least one coordinate (say the ﬁrst), we have (say) b+ 1 ≤ c1 . Then |Cov(Sn (B), Sn (C))| ≤ n−d |Cov(Xi , Xj )| i∈nB j∈nC

≤

n−d

⎛

j∈nC,j1 ≤nβ

i∈nB

+

⎝

|Cov(Xi , Xj )| ⎞

|Cov(Xi , Xj )|⎠

j∈nC,j1 >nβ

⎛

≤

n−d ⎝nβ+d−1

r∈N

=

o(n

β−1

,n

β(d−a)

rd−1 η(r) + nd

⎞ rd−1 η(r)⎠

r>nβ

)

Taking β < 1 gives the result. Now, let ν and μ be two reals, we show that Sn = νSn (B) + μ(Sn (C)) tends to a Gaussian distribution. Write Sn = n−d/2 αi,n Xi i∈{0,...,n}d

where αi,n = ν + μ if i ∈ nB ∩ nC, αi,n = ν if i ∈ nB \ nC, αi,n = μ if i ∈ nC \ nB and αi,n = 0 elsewhere. We use the Bernstein blocking technique, (1939) [13]. Let p(n) and q(n) be sequences of integers such that p(n) = o(n) and q(n) = o(p(n)). Assume that the Euclidean division of n by (p + q) gives a quotient k and a remainder r. Denote j = (j, . . . , j). Deﬁne K = {1, . . . , k + 1}d

8.2. NON CAUSAL RANDOM FIELDS

203

and order K by the lexicographic order; for i ∈ {1, . . . , k}d , deﬁne the blocks Pi = [(p + q)(i − 1), . . . , (p + q)i − q1]. If r > 0, for i ∈ {0, . . . , 1}d \ {0} deﬁne the blocks Pi = [(p + q)(k + i), . . . , (p + q)(k + i) + (r ∨ p)1]. Denote Q the set of indices that are not in one of the Pi . Note that the cardinality of Q is less than d(k + 1)qpd−1 . For each block Pi and Q, we deﬁne the partial sums: ui = n−d/2 αj,n Xj , j∈Pi

v

=

n

−d/2

αj,n Xj .

j∈Q

Recall lemma 11 in Doukhan and Louhichi (1999) [67].

Lemma 8.2. Let Sn = n−d/2 j∈{0,...,n}d αj,n Xj be a sum of centered triangular array; set σn2 = Var Sn . Assume that : 1 Ev 2 = 0. σn2 ⎞ ⎞ t ui ⎠ , h uj ⎠ → 0, for all t ∈ R, σn lim

n→∞

⎛ ⎛ Cov ⎝g ⎝ t σn j∈K

(8.2.6)

(8.2.7)

i∈K,i<j

where h and g are either the sine or the cosine function, lim

n→∞

1 E|ui |2 1{|ui |≥σn } = 0, for all > 0, σn2

(8.2.8)

k 1 E|ui |2 = 1. 2 n→∞ σn i=1

(8.2.9)

i∈K

and

lim

Then Sn /σn converges in distribution to a Gaussian N (0, 1)-distribution. First note that

Cov(α0,n X0 , αj,n Xj ) < ∞

(8.2.10)

j∈Nd

so that σn2 tends to a constant. If this constant is zero then the limit of Sn is 0. If it is not, we check the conditions of the preceding lemma for the array αj,n Xj . To check (8.2.6), note that |Cov(Xi , Xj )| ≤ (|ν| + |μ|)2 n−d η(|j − i|) Ev 2 ≤ (|ν| + |μ|)2 n−d i,j∈Q

≤

2(|ν| + |μ|)2

d(k + 1)qp nd

i∈Q j∈Q ∞ d−1 r=0

rd−1 η(r) = o(1).

204

CHAPTER 8. DONSKER PRINCIPLES ⎛

t Consider 8.2.7. Note that g ⎝ σn

⎞ ui ⎠ is a function of at most ((k + 1)p)d

i∈K,i =j

variables Xl and that its Lipschitz modulus is less than t(|ν| + |μ|)/nd/2 σn . Similarly h(tuj /σn ) is a function of at most pd variables and its Lipschitz modulus is less than t(|ν| + |μ|)/nd/2 σn . Using the weak dependence property, we get ⎛ ⎛ ⎞ ⎞ t t(|ν| + |μ|) Cov ⎝g ⎝ t ui ⎠ , h uj ⎠ ≤ pd ((k + 1)d + 1) · η(q). σ σ nd/2 σn n n i∈K,i =j and

⎛ ⎛ Cov ⎝g ⎝ t σn j∈K

⎞ ui ⎠ , h

i∈K,i =j

⎞ t t(|ν| + |μ|) uj ⎠ ≤ pd (k + 1)d+1 · η(q) σn nd/2 σn = O(n3d/2 p−d q −a ).

Taking p = n5/6 and q = n5d/6a gives a bound tending to 0. To prove (8.2.8), it is suﬃcient to show that E|ui |4 = O(k −2d ). But ⎛

⎞4 ⎞4 ⎛ 4 2d 1 (|ν| + |μ|) ⎠ ≤ p E (Sp ([0, ¯1]))4 , ⎝ E ⎝ d/2 αj,n Xj ⎠ ≤ E X j n2d n2d n j∈P j∈P i

i

and we conclude with the moment inequality 4

E (Sp ([0, ¯ 1])) = O(1). In order to prove (8.2.9), note that (8.2.6) implies that 1 lim 2 Var ui = 1. n→∞ σn i∈K

But

2 ui − E|ui | Var i∈K

i∈K

≤ 2

|Cov(ui , uj )|

i∈K;i =j

≤ 2(k + 1)d

∞

η(j)

j=q

= O(np−d q −a+1 ). Taking p = n5/6 and q = n5d/6a gives a bound tending to 0.

8.3. CONDITIONAL (CAUSAL) INVARIANCE PRINCIPLE

8.2.3

205

Tightness

Recall that λ is the Lebesgue measure on Rd . Applying Lemma 8.1, we infer that (Sn , λ) and (Un , λ) belong to the class C(2, 4) deﬁned in Bickel and Wichura (1971) [19] (see the inequality (3) of this paper). The tightness or relative compactness of the process {Un (t), t ∈ [0, 1]d } follows by applying Theorem 3 of [19].

8.3

Conditional (causal) invariance principle

In this section we give the functional version of Theorem 7.5. Denote by H∗ the space of continuous functions ϕ from (C([0, 1]), · ∞ ) to R such that x → |(1 + x2∞ )−1 ϕ(x)| is bounded. Theorem 8.3. Let Xi,n , Mi,n and Sn (t) be as in Theorem 7.5. For any t in [0, 1], let Un (t) = Sn (t)+(nt−[nt])X[nt]+1,n and deﬁne S n (t) = sup0≤s≤t |Sn (s)|. The following statements are equivalent: S1∗ There exists a nonnegative M0,inf -measurable random variable η such that, for any ϕ in H∗ and any positive integer k,

√

S1∗ (ϕ) : lim E ϕ(n−1/2 Un ) − ϕ(x η)W (dx) Mk,n = 0 1

n→∞

where W is the distribution of a standard Wiener process. S2∗ Properties S2(b) and (c) of Theorem 7.5 hold, and (a) is replaced by: (a∗ ) the sequence (n−1 (S n (1))2 )n>0 is uniformly integrable, and lim lim sup E

t→0 n→∞

(S n (t))2 S n (t) 1∧ √ nt n

= 0.

Moreover the random variable η satisﬁes η = η ◦ T almost surely. Remark 8.1. Let Xi,n , Mi,n and Un be as in Theorem 8.3. Suppose that the sequence (M0,n )n≥1 is nondecreasing. If Condition S1∗ is satisﬁed, then the sequence (n−1/2 Un )n>0 converges stably with respect to the random probability √ deﬁned by μ(ϕ) = ϕ(x η)W (x)dx. The proof of this result is the same as that of Corollary 7.2. We now give the proof of this theorem. Once again, it suﬃces to prove that S2∗ implies S1∗ .

206

8.3.1

CHAPTER 8. DONSKER PRINCIPLES

Preliminaries

Definition 8.1. Recall that R(Mk,n ) is the set of Rademacher Mk,n -measurable random variables: R(Mk,n ) = {21A − 1 / A ∈ Mk,n }. For the random variable η introduced in Theorem 8.3 and any bounded random variable Z, let 1. νn∗ [Z] be the image measure of Z · P by the process n−1/2 Un . 2. ν ∗ [Z] be the image measure of W ⊗Z.P # by the variable φ from C([0, 1])⊗Ω to C([0, 1]) deﬁned by φ(x, ω) = x η(ω). We need the functional analogue of Lemma 7.4 (the proof is unchanged). Lemma 8.3. Let μ∗n [Zn ] = νn∗ [Zn ] − ν ∗ [Zn ]. For any ϕ in H∗ the statement S1∗ (ϕ) is equivalent to: S3∗ (ϕ): for any Zn in R(Mk,n ) the sequence μ∗n [Zn ](ϕ) tends to zero as n tends to inﬁnity. Suppose that S1∗ (ϕ) holds for any bounded function ϕ of H∗ . Since the sequence (n−1 (Sn∗ (1))2 )n>0 is uniformly integrable, S1∗ (ϕ) obviously extends to the whole space H∗ . Consequently, we can restrict ourselves to the space of continuous bounded functions from C([0, 1]) to R. According to Lemma 8.3, the proof of Theorem 8.3 will be complete if we show that, for any Zn in R(Mk,n ), the sequence μ∗ [Zn ] converges weakly to the null measure as n tends to inﬁnity. Definition 8.2. For 0 ≤ t1 < · · · < td ≤ 1, deﬁne the functions πt1 ...td and Qt1 ...td from C([0, 1]) to Rd by the equalities πt1 ...td (x) = (x(t1 ), . . . , x(td )) and Qt1 ...td (x) = (x(t1 ), x(t2 ) − x(t1 ), . . . , x(td ) − x(td−1 )). For any signed measure μ on (C([0, 1]), B(C([0, 1]))) and any function f from C([0, 1]) to Rd , denote by μf −1 the image measure of μ by f . Let μ and ν be two signed measures on (C([0, 1]), B(C([0, 1]))). Recall that if = νπt−1 for any positive integer d and any d-tuple such that 0 ≤ μπt−1 1 ...td 1 ...td t1 < · · · < td ≤ 1, then μ = ν. Consequently Theorem 8.3 is a straightforward consequence of the two following items 1. relative compactness: for any Zn in R(Mk,n ), the family (μ∗n [Zn ])n>0 is relatively compact with respect to the topology of weak convergence. 2. ﬁnite dimensional convergence: for any positive integer d, any d-tuple 0 ≤ t1 < · · · < td ≤ 1 and any Zn in R(Mk,n ) the sequence μ∗ [Zn ]πt−1 1 ...td converges weakly to the null measure as n tends to inﬁnity.

8.3. CONDITIONAL (CAUSAL) INVARIANCE PRINCIPLE

8.3.2

207

Finite dimensional convergence

Clearly it is equivalent to take Qt1 ...td instead of πt1 ...td in item 2. The following lemma shows that ﬁnite dimensional convergence is a consequence of Condition S2 of Theorem 7.5. The stronger condition S2∗ is only required for tightness. Lemma 8.4. For any a in Rd deﬁne fa from Rd to R by fa (x) =< a, x >. If S2 holds then, for any a in Rd , any d-tuple 0 ≤ t1 < · · · < td ≤ 1 and any Zn in R(Mk,n ), the sequence μ∗n [Zn ](fa ◦ Qt1 ...td )−1 converges weakly to the null measure. Write μ∗n [Zn ](fa ◦ Qt1 ...td )−1 (exp(i·)) = μ∗n [Zn ]Q−1 t1 ...td (exp(i < a, · >)). According to Lemma 8.4, the latter converges to zero as n tends to inﬁnity. Taking Zn = 1, we infer that the probability measure νn∗ [1]Q−1 t1 ...td converges weakly to the probability measure ν ∗ [1]Q−1 and hence is tight. Since |μ∗ [Zn ]Q−1 t1 ...td t1 ...td | ≤ −1 −1 −1 νn∗ [1]Qt1 ...td + ν ∗ [1]Qt1 ...td , the sequence (μ∗ [Zn ]Qt1 ...td )n>0 is tight. Consequently we can apply Lemma 7.3 to conclude that μ∗ [Zn ]Q−1 t1 ...td converges weakly to the null measure. Proof of Lemma 8.4. According to Lemma 8.3, we have to prove the property S1∗ (ϕ ◦ fa ◦ Qt1 ...td ) for any continuous bounded function ϕ. Arguing as in Section 7.4.3, we can restrict ourselves to the class of function B13 (R). Let h be any element of B13 (R) and write h ◦ fa ◦ Qt1 ...td (n−1/2 Un ) − =

d =1

√ h ◦ fa ◦ Qt1 ...td (x η)W (dx)

U (t ) − U (t ) # n n −1 √ − h (a x (t − t−1 )η) g(x) dx , h a n

where the random variable h (x) is equal to

−1

d d U (t ) − U (t ) # n i n i−1 √ h ai ai xi (ti − ti−1 )η g(xi )dxi . +x+ n i=1 i=+1

i=+1

Note that for any ω in Ω, the random function h belongs to B13 (R). To complete the proof of Lemma 8.4, it suﬃces to see that, for any positive integers k and , the sequence

#

Un (t−1 )

E h al Un (t ) − √ − h a x (t − t−1 )η g(x) dxMk,n

n 1 (8.3.1)

208

CHAPTER 8. DONSKER PRINCIPLES

tends to zero as n tends to inﬁnity. Since h is 1-Lipshitz and bounded, we infer from the asymptotic negligibility of n−1/2 X0,n that

S (t − t ) ◦ T [nt−1 ]+1

Un (t ) − Un (t−1 ) n −1

= 0.

√ √ lim h a − h a

n→∞ n n 1 (8.3.2) Denote by g the random function g = h ◦ T −[nt−1 ]−1 . Combining (8.3.1), (8.3.2) and the fact that Mk−1−[nt−1 ],n ⊆ Mk,n , we infer that it suﬃces to prove that

√ −1/2

Sn (u)) − g (a x uη) g(x) dxMk,n

lim E g (a n

= 0 . (8.3.3) n→∞

1

Since the random functions g is M0,n -measurable (8.3.3) can be proved exactly as property S1 of Theorem 7.5 (see Section 7.4.3). This completes the proof of Lemma 8.4.

8.3.3

Relative compactness

In this section, we shall prove that the sequence (μ∗n [Zn ])n>0 is relatively compact with respect to the topology of weak convergence. That is, for any increasing function f from N to N, there exists an increasing function g with value in f (N) and a signed mesure μ on (C([0, 1]), B(C([0, 1]))) such that (μ∗g(n) [Zg(n) ])n>0 converges weakly to μ. Let Zn+ (resp. Zn− ) be the positive (resp. negative) part of Zn , and write μ∗n [Zn ] = μ∗n [Zn+ ] − μ∗n [Zn− ] = νn∗ [Zn+ ] − νn∗ [Zn− ] − ν ∗ [Zn+ ] + ν ∗ [Zn− ] , where νn∗ [Z] and ν ∗ [Z] are deﬁned in 1. and 2. of Deﬁnition 8.1. Obviously, it is enough to prove that each sequence of ﬁnite positive measures (νn∗ [Zn+ ])n>0 , (νn∗ [Zn− ])n>0 , (ν ∗ [Zn+ ])n>0 and (ν ∗ [Zn− ])n>0 is relatively compact. We prove the result for the sequence (νn∗ [Zn+ ])n>0 , the other cases being similar. Let f be any increasing function from N to N. Choose an increasing function l with value in f (N) such that + lim E(Zl(n) ) = lim inf E(Zf+(n) ) .

n→∞

n→∞

We must sort out two cases: + 1. If E(Zl(n) ) converges to zero as n tends to inﬁnity, then, taking g = l, the + ∗ sequence (νg(n) [Zg(n) ])n>0 converges weakly to the null measure. + 2. If E(Zl(n) ) converges to a positive real number as n tends to inﬁnity, we introduce, for n large enough, the probability measure pn deﬁned by pn

8.4. APPLICATIONS

209

+ + ∗ ))−1 νl(n) [Zl(n) ]. Obviously if (pn )n>0 is relatively compact with = (E(Zl(n) respect to the topology of weak convergence, then there exists an increasing function g with value in l(N) (and hence in f (N)) and a measure ν such that + ∗ [Zg(n) ])n>0 converges weakly to ν. Since (pn )n>0 is a family of probabil(νg(n) ity measures, relative compactness is equivalent to tightness. Here we apply Theorem 8.2 in Billingsley (1968) [20]: to derive the tightness of the sequence (pn )n>0 it is enough to show that, for each positive ,

lim lim sup pn (x/w(x, δ) ≥ ) = 0 ,

δ→0 n→∞

(8.3.4)

where w(x, δ) is the modulus of continuity of the function x. According to the deﬁnition of pn , we have U 1 l(n) + pn (x/ w(x, δ) ≥ ) = ,δ ≥ . Z ·P w # + E(Zl(n) ) l(n) l(n) + + Since both E(Zl(n) ) converges to a positive number and Zl(n) is bounded by one, we infer that (8.3.4) holds if U l(n) lim lim sup P w # ,δ ≥ = 0. (8.3.5) δ→0 n→∞ l(n)

From Theorem 8.3 and inequality (8.16) in Billingsley (1968) [20], it suﬃces to prove that, for any positive , S l(n) (δ)

1 ≥ √ = 0. (8.3.6) lim lim sup P # δ→0 n→∞ δ l(n)δ δ We conclude by noting that (8.3.6) follows straightforwardly from S2(a∗ ) and Markov’s inequality. Conclusion. In both cases there exists an increasing function g with value in + ∗ f (N) and a measure ν such that (νg(n) [Zg(n) ])n>0 converges weakly to ν. Since this is true for any increasing function f with value in N, we conclude that the sequence (νn∗ [Zn+ ])n>0 is relatively compact with respect to the topology of weak convergence. Of course, the same arguments apply to the sequences (νn∗ [Zn− ])n>0 , (ν ∗ [Zn+ ])n>0 and (ν ∗ [Zn− ])n>0 , which implies the relative compactness of the sequence (μ∗n [Zn ])n>0 .

8.4 8.4.1

Applications Suﬃcient conditions for stationary sequences

For strictly stationary sequences, Theorem 8.3 writes as follows.

210

CHAPTER 8. DONSKER PRINCIPLES

Theorem 8.4. Let (Mi )i∈Z and (Xi )i∈Z be as in Theorem 7.6. Deﬁne Sn = X1 + · · · + Xn and Un (t) = S[nt] + (nt − [nt])X[nt]+1 . The following statements are equivalent: S1∗ There exists a nonnegative M0 -measurable random variable η such that, for any ϕ in H∗ and any positive integer k,

√

lim E ϕ(n−1/2 Un ) − ϕ(x η)W (dx) Mk = 0 n→∞

1

where W is the distribution of a standard Wiener process. S2∗ Properties S2(b) and (c) of Theorem 7.6 hold, and (a) is replaced by: (a∗ ) the sequence (n−1 (max1≤i≤n |Si |)2 )n>0 is uniformly integrable. Under the conditions of Proposition 7.8, S1∗ holds: Proposition 8.1. Let (Mi )i∈Z and (Xi )i∈Z be as in Theorem 7.6 and deﬁne M−∞ = ∩i∈Z Mi . Deﬁne the operators Pi as in Corollary 7.4.

1. If E(X0 |M−∞ ) = 0 and i≥0 P0 (Xi )2 < ∞ then S1∗ holds. Moreover, η is the same as in Proposition 7.8. 2. If (7.5.7) is satisﬁed, then S1∗ holds and η is the same as in Proposition 7.8. Remark 8.2. As in Chapter 7, we deduce from Proposition 8.1 suﬃcient con˜ 1 and φ˜1 . ditions for the functional CCLT in terms of the coeﬃcients γ1 , γ2 , α More precisely, S1∗ holds if either (7.5.18), (7.5.22), (7.5.26) or (7.5.28) is satisﬁed. Proof of Proposition 8.1. In view of Corollary 7.5 and Proposition 7.8, it is enough to prove that S2(a∗ ) holds. Proof of item 1. According to Proposition 5.9, for any two sequences of non−1 ) and (b ) such that K = negative numbers (a m m≥0 m m≥0 m≥0 am is ﬁnite and

m≥0 bm = 1, we have ∞ n 1 √ 1 ∗ 2 E (Sn − M n)2+ ≤ 4K am E Pk−m (Xk )1Γ(m,n,bm M √n) , n n m=0 k=1 (8.4.1) k where Γ(m, n, λ) = 0 ∨ max P−m (X ) > λ . Here, we take bm = 1≤k≤n =1

−1 am is ﬁnite. 2−m−1 and am = (P0 (Ym,i )2 + (m + 1)−2 )−1 . By assumption,

8.4. APPLICATIONS

211

Since for all m ≥ 0 n 1 P0 (Xm )22 2 am E Pk−m (Xk )1Γ(m,n,bm M √n) ≤ ≤ P0 (Xm )2 , n P0 (Xm )2 + (m + 1)2 k=1

we infer from (8.4.1) that for any > 0, there exists N ( ) such that N () n 1 √ 2 1 ∗ 2 E (Sn − M n)+ ≤ + 4K am E Pk−m (Xk )1Γ(m,n,bm M √n) . n n m=0 k=1 (8.4.2) Now by Doob’s maximal inequality

√ 4 nk=1 Pk−m (Xk )22 4P0 (Xm )22 = P Γ(m, n, bm M n) ≤ , 2 2 bm M n b2m M 2

and consequently √ lim sup P Γ(m, n, bm M n) = 0 .

M→∞ n>0

(8.4.3)

n 2 Since n−1 k=1 Pk−m (Xk ) converges in L1 (apply the ergodic theorem), we infer from (8.4.3) that n 1 2 Pk−m (Xk )1Γ(m,n,bm M √n) = 0 . lim lim sup E M→∞ n→∞ n

(8.4.4)

k=1

Combining (8.4.2) and (8.4.4), we conclude that √ 1 lim lim sup E (Sn∗ − M n)2+ = 0 . M→∞ n→∞ n

(8.4.5)

Of course, the same arguments apply to the sequence (−Xk )k∈Z so that (8.4.4) holds for max1≤k≤n |Sk | instead of Sn∗ . This completes the proof. Proof of item 2. Let Ak (λ) = {max1≤i≤k |Si | > λ}. From Proposition 5.8 applied to the sequences (Xi )i∈Z and (−Xi )i∈Z we get that n 2 ≤8 E(Xk2 1Ak (λ) ) + 21Ak (λ) Xk E(Sn − Sk | Fk )1 . E max |Si | − λ 1≤i≤n

+

k=1

(8.4.6) By assumption the sequence (Xk2 )k>0 and the array (Xk E(Sn − Sk | Fk ))1≤k≤n are uniformly integrable. It follows that the L1 -norms of the above random variables are each bounded by some positive constant K. Hence, from (8.4.6) with λ = 0 we get that E((max1≤i≤n |Si |)2 ) ≤ 24Kn. It follows that 2 √ ≤ 24KM −2 . (8.4.7) P(Ak (M n )) ≤ (nM 2 )−1 E max |Si | 1≤i≤n

212

CHAPTER 8. DONSKER PRINCIPLES

From the inequality (8.4.7) and the uniform integrability of both (Xk2 )k>0 and (Xk E(Sn − Sk | Fk ))1≤k≤n we infer that √ 2 = 0. lim lim sup n−1 E max |Si | − M n

M→∞ n→∞

1≤i≤n

+

This completes the proof.

8.4.2

Suﬃcient conditions for triangular arrays

Under the conditions of Proposition 7.9, S1∗ holds: Proposition 8.2. Let Xi,n and Mi,n be as in Proposition 7.9. If (7.5.31) and (7.5.32) hold, then S1∗ holds with the same η as in Proposition 7.9. Proof of Proposition 8.2. In view of Proposition 7.9, it is enough to prove that S2(a∗ ) holds. In fact, this follows from the inequality (7.5.37). .

Chapter 9

Law of the iterated logarithm (LIL) In this chapter, we derive laws of the iterated logarithm. We ﬁrst give a bounded law of the iterated logarithm in a non causal setting. We then focus on τ dependent sequences for which we derive a causal strong invariance principle. The main tool to prove it is the Fuk-Nagaev type inequality given in Theorem 5.3 of Chapter 5.

9.1

Bounded LIL under a non causal condition

In this section, we derive a bounded law of the iterated logarithm under a non causal condition detailed in the assumptions of Theorem 4.5 of Chapter 4. We get the following theorem:

Theorem 9.1. Suppose that (Xn )n∈Z is a stationary process

n satisfying the assumptions of Theorem 4.5 of Chapter 4. If σn2 = Var ( i=1 Xi ), assume that σ 2 = limn→∞ σn2 /n > 0. Then we have 1 lim sup √ |Sn | ≤ 1 n→∞ σ 2n log log n

a.s.

(9.1.1)

Proof of Theorem 9.1. Let a > 1. We deﬁne the subsequence (nk )k∈Z as nk = [ak ]. We obtain from Theorem 4.5 that, for any nk ≤ n < nk+1 and any 213

214

CHAPTER 9. LAW OF THE ITERATED LOGARITHM (LIL)

ﬁxed c, # |Sn | σ2 n 2 > c log log nk ≤ 2 exp −c log log nk 2 (1 + o(1)) P √ σn 2nσ 2 2 = 2 exp −c log log nk (1 + o(1)) 2 = O k −c (1+o(1)) . This implies by the maximal inequality given in Theorem 2.2 in M´ oricz et al.(1982) [133] that # |Sn | √ P max > c log log nk ≤ C k −c , (9.1.2) nk ≤n 1, 1 lim sup √ |Sn | ≤ c n→∞ σ 2n log log n

a.s.

This implies (9.1.1).

9.2

Causal strong invariance principle

In this section, we present a strong invariance principle for partial sums of τ1,∞ −dependent sequences. Let (Xn )n∈Z be a stationary sequence of zero-mean square integrable real valued random variables. Let Mi = σ(Xj , j ≤ i). Deﬁne Sn = X 1 + · · · + X n

and Sn (t) = S[nt] + (nt − [nt])X[nt]+1 .

We assume that σn2 /n = Var (Sn )/n converges to some constant σ 2 as n tends to inﬁnity (this will always be true for any of the conditions we shall use hereafter). For σ > 0, we study the almost sure behavior of the partial sum process * (9.2.1) σ −1 (2n log log n)−1/2 Sn (t) t ∈ [0, 1] . Before stating the main result, let us recall existing results in the i.i.d. case or in other frames of dependence. Let S be the subset of C([0, 1]) consisting of all absolutely continuous functions with respect to the Lebesgue measure such that 1 h(0) = 0 and (h (t))2 dt ≤ 1. 0

9.2. CAUSAL STRONG INVARIANCE PRINCIPLE

215

In 1964, Strassen [180] proved that if the sequence (Xn )n∈Z is i.i.d. then the process deﬁned in (9.2.1) is relatively compact with a.s. limit set S. This result is known as the functional law of the iterated logarithm (FLIL for short). Heyde and Scott (1973) [104] extended the FLIL to the case where E(X1 |M0 ) = 0 and the sequence is ergodic. Starting from this result and from a coboundary decomposition due to Gordin (1969) [97], Heyde (1975) [105] proved that the FLIL holds if E (Sn |M0 ) converges in L2 and the sequence is ergodic. Heyde’s condition holds as soon as γ1 (k)/2 ∞ k Q ◦ G(u) du < ∞, (9.2.2) 0

k=1

where the functions Q = Q|X0 | and G = G|X0 | have been deﬁned in Chapter 5 and γ1 (k) = E (Xk |M0 ) 1 is the coeﬃcient introduced in Section 2.2.4 of Chapter 2. Other types of dependence have been soon considered for the FLIL (see for instance the review paper by Philipp (1986) [150]). For ρ and φ-mixing sequences, a strong invariance principle is given in Shao (1993) [174]. The case of strongly (α-)mixing sequences has been considered by Oodaira and Yoshihara (1971) [137], Dehling and Philipp (1982) [54] , and Bradley (1983) [29] among others. In 1995, Rio [159] proved a FLIL (and even a strong invariance principle) for the process deﬁned in (9.2.1) as soon as the DMR (Doukhan, Massart and Rio (1994) [70]) condition (9.2.3) is satisﬁed ∞ 2 αX (k) Q2 (u) du < ∞, (9.2.3) k=1

0

where αX (k) has been deﬁned in Section 1.2 of Chapter 1. Considering Corollary 7.6 which gives the central limit theorem for γ-dependent sequences, we think that a reasonable condition for the FLIL is condition (9.2.2) without the k in front of the integral. Actually, we can only prove this conjecture with τ1,∞ (k) instead of γ1 (k), that is the FLIL holds as soon as ∞ k=1

0

τ1,∞ (k)/2

Q ◦ G(u) du < ∞.

(9.2.4)

Theorem 9.2. Let (Xn )n∈Z be a strictly stationary sequence of centered and square integrable random variables satisfying (9.2.4). Then σn2 /n converges to σ 2 , and there exists a sequence (Yn )n∈N of independent N (0, σ 2 )-distributed random variables (possibly degenerate) such that n i=1

(Xi − Yi ) = o

# n log log n a.s.

216

CHAPTER 9. LAW OF THE ITERATED LOGARITHM (LIL)

Such a result is known as a strong invariance principle. If σ > 0, Theorem 9.2 and Strassen’s FLIL for the Brownian motion yield the FLIL for the process (9.2.1). As in Corollary 7.6, we obtain simple suﬃcient conditions for the FLIL to hold: Corollary 9.1. Let (Xn )n∈Z be a strictly stationary sequence of centered and square integrable random variables. Any of the following conditions implies (9.2.4) and hence the FLIL.

(r−2)/(r−1) 1. P(|X0 | > x) ≤ (c/x)r for some r > 2, and i≥0 (τ1,∞ (i)) < ∞.

2. X0 r < ∞ for some r > 2, and i≥0 i1/(r−2) τ1,∞ (i) < ∞. 3. E(|X0 |2 log(1 + |X0 |)) < ∞ and τ1,∞ (i) = O(ai ) for some a < 1. Condition (9.2.4) is essentially optimal as shown in Corollary 9.2 below, derived from the examples given in Doukhan, Massart and Rio (1994) [70]: Corollary 9.2. For any r > 2, there is stationary Markov chain (Xn )n∈Z such that 1. E(X0 ) = 0 and, for any nonnegative real x, P(|X0 | > x) = min(1, x−r ). 2. The sequence (τ1,∞ (i))i≥0 satisﬁes supi≥0 i(r−1)/(r−2) τ1,∞ (i) < ∞. 3. lim sup (n log log n)−1/2 |Sn | = +∞ almost surely. n→∞

This corollary follows easily from Proposition 3 in Doukhan, Massart and Rio (1994) [69]. Let us now write the proof of Theorem 9.2. Proof of Theorem 9.2. We ﬁrst need to give some precise notations. Notations 9.1. Deﬁne the set * √ ψ(n) Ψ = ψ N → N, ψ increasing, →n→∞ ∞, ψ(n) = o(n LLn ) . n If ψ is some function of Ψ, let M1 = 0 and Mn =

n−1

(ψ(k) + k),

for n ≥ 2 .

k=1

For n ≥ 1, deﬁne the random variables Mn +ψ(n)

Un =

i=Mn +1

Mn+1

Xi ,

Vn =

i=Mn+1 +1−n

Xi ,

and

9.2. CAUSAL STRONG INVARIANCE PRINCIPLE

217

Mn+1

Un =

|Xi | .

i=Mn +1

If Lx = max(1, log x), deﬁne the truncated random variables n −n ,√ U n = max min Un , √ . LLn LLn Theorem 9.2 is a consequence of the following Proposition Proposition 9.1. Let (Xn )n∈Z be a strictly stationary sequence of centered and square integrable random variables satisfying condition (9.2.4). Then σn2 /n a function ψ ∈ Ψ and a sequence (Wn )n∈N converges to σ 2 and there exist of independent N 0, ψ(n)σ 2 -distributed random variables (possibly degenerate) such that n # (a) (Wi − U i ) = o Mn LLn a.s. i=1 ∞ E(|Un − U n |) √ <∞ n LLn n=1 √ (c) Un = o n LLn a.s.

(b)

Proof of Proposition 9.1. It is adapted from the proof of Proposition 2 in Rio (1995) [159]. Proof of (b). Note ﬁrst that n |Un | − √ E|Un − U n | = E so that LLn + +∞ P(|Un | > t)dt . E|Un − U n | =

(9.2.5)

√n LLn

In the following we write Q instead of Q|X0 | . Since Un is distributed as Sψ(n) , we infer from Theorem 5.3 that t − r2 t2 20 ψ(n) S ( 5 r ) + Q(u)du. (9.2.6) P(|Un | > t) ≤ 4 1 + 25 r s2ψ(n) t 0 Consider the two terms A1,n =

4 √ n LLn

+∞

1+

√n LLn

− r2 t2 dt , 25 r s2ψ(n)

218

CHAPTER 9. LAW OF THE ITERATED LOGARITHM (LIL) 20 ψ(n) A2,n = √ n LLn

+∞ √n LLn

1 t

S ( 5tr ) Q(u)du dt . 0

From (9.2.5) and (9.2.6), we infer that E|Un − U n | √ ≤ A1,n + A2,n . n LLn

(9.2.7)

Study of A1,n . Since the sequence (Xn )n∈N satisﬁes (9.2.4), s2ψ(n) /ψ(n) converges to some positive constant. Let Cr denote some constant depending only on r which may vary from line to line. We have that A1,n

4 ≤ √ n LLn

+∞ √n LLn

t−r n−r r r . −r dt ≤ Cr sψ(n) Cr sψ(n) LLn1− 2

We infer that A1,n = O(ψ(n) n−r LLn(r−2)/2 ) as n tends to inﬁnity. Since ψ ∈ Ψ and r > 2, we infer that n≥1 A1,n is ﬁnite. r/2

Study of A2,n . We use the elementary result: if (ai )i≥1 is a sequence of positive numbers, then there exists a sequence of positive numbers (bi )i≥1 such that bi →

∞ −1 ∞ and i≥1 ai bi < ∞ if and only if i≥1 ai < ∞ (note that b2n = ( i=n ai ) works). Consequently n≥1 A2,n is ﬁnite for some ψ ∈ Ψ if and only if

1 √ LLn n≥1

+∞ √n LLn

S ( 5tr ) Q(u)du dt < +∞.

1 t

(9.2.8)

0

Recall that S = R−1 , with the notations of Theorem 5.3. To prove (9.2.8), write

+∞ √n LLn

1 t

S ( 5tr )

Q(u)du dt

+∞

=

0

√n LLn

1 t

1

1

1R(u)≥ 5tr Q(u)du dt

0

5 r R(u)

Q(u)

= 0

√ n LLn

1

=

Q(u) log 0

1 dtdu t

5 r R(u) √n LLn

1R(u)≥

5r

n √ LLn

du.

Consequently (9.2.8) holds if and only if

1

Q(u) 0

5 r R(u) 1 √ log 1 R(u)≥ √n du < +∞. √n LLn 5 r LLn LLn n≥1

(9.2.9)

9.2. CAUSAL STRONG INVARIANCE PRINCIPLE

219

To see that (9.2.9) holds, we shall prove the following result: if f is any increasing function such that f (0) = 0 and f (1) = 1, then for any positive R we have that R log (9.2.10) (f (n) − f (n − 1)) 1f (n)≤R ≤ (R − 1) ∨ 0 ≤ R . f (n) n≥1

Applying this result to f (x) = x(LLx)−1/2 and R = 5rR(u), and noting that (LLn)−1/2 ≤ C (f (n) − f (n − 1)) for some constant C > 1, we infer that 1 1 1 5 r R(u) n √ log Q(u) 1(R(u)≥ √ Q(u)R(u)du , ) du ≤ 5Cr 5 r LLn √n LLn 0 0 LLn n≥1 which is ﬁnite as soon as (9.2.4) holds. It remains to prove (9.2.10). If R ≤ 1 the result is clear. Now, for R > 1, let xR be the largest integer such that f (xR ) ≤ R and write R∗ = f (xR ). Note ﬁrst that (log R) (f (n) − f (n − 1)) 1f (n)≤R ≤ R∗ log R. (9.2.11) n≥1

On the other hand, we have that

log (f (n)) (f (n) − f (n − 1)) 1f (n)≤R =

xR

log (f (n)) (f (n) − f (n − 1)) .

n=1

n≥1

It follows that

log (f (n)) (f (n) − f (n − 1)) ≥

n≥1

1

R∗

log x dx = R∗ log R∗ − R∗ + 1. (9.2.12)

Using (9.2.11) and (9.2.12) we get that R log (f (n) − f (n − 1)) 1f (n)≤R ≤ R∗ − 1 + R∗ (log R − log R∗ ). f (n) n≥1

(9.2.13) Using Taylor’s inequality, we have that R∗ (log R−log R∗ ) ≤ R−R∗ and (9.2.10) follows. The proof of (b) is complete.

Mn+1 (|Xi | − E|Xi |) . We easily see that Proof of (c). Let Tn = i=M n +1 Un = (ψ(n) + n) E|X1 | + Tn . √ By deﬁnition of Ψ, we have ψ(n) = o n LLn . Here note that n n Tn ≤ √ + 0 ∨ Tn − √ . LLn LLn

(9.2.14)

(9.2.15)

220

CHAPTER 9. LAW OF THE ITERATED LOGARITHM (LIL)

Using same arguments as for the proof of (b), we obtain that n E 0 ∨ Tn − √LLn √ < +∞ , so that n LLn n≥1 n 0 ∨ Tn − √LLn √ < +∞ a.s. n LLn n≥1 √ Consequently max(0, Tn − n(LLn)−1/2 ) = o(n LLn) almost surely, and the result follows from (9.2.14) and (9.2.15). Proof of (a). In the following, (δn )n≥1 and (ηn )n≥1 denote independent sequences of independent random variables with uniform distribution over [0, 1], independent of (Xn )n≥1 . Since U n is a 1-Lipschitz function of Ui , τ (σ(Ui , i ≤ n − 1), U n ) ≤ ψ(n)τ (n). Using Lemma 5.2 and arguing as in the proof of The∗ orem 5.2, we get the existence of a sequence (U n )n≥1 of independent random ∗ variables with the same distribution as the random variables U n such that U n is a measurable function of U l , δl l≤n and ∗ E |U n − U n | ≤ ψ(n) τ (n). Since (9.2.4) holds, we have that ∗ E U n − U n √ < ∞ so that Mn LLn n≥1

|U n − U ∗ | n √ < +∞ a.s. M LLn n n≥1

Applying Kronecker’s lemma, we obtain that n

∗

(U i − U i ) = o

# Mn LLn a.s.

(9.2.16)

i=1

We infer from (9.2.4) and from Dedecker and Doukhan (2003) [43] that (ψ(n))

−1

Var Un →n→∞ σ 2

and

(ψ(n))

−1/2

D Un −−−−→ N 0, σ 2 . n→∞

Hence the sequence (Un2 /ψ(n))n≥1 is uniformly integrable (Theorem 5.4. in ∗ Billingsley (1968) [20]). Consequently, since the random variables U n have the same distribution as the random variables U n , we deduce from the above limit results, from Strassen’s representation theorem (see Dudley (1968) [79]), and from Skorohod’s lemma (1976) [179] that one can construct some sequence

9.2. CAUSAL STRONG INVARIANCE PRINCIPLE

221

∗

(Wn )n≥1 of σ(U n , ηn )-measurable random variables with respective distribution N 0, ψ(n) σ 2 such that 2 ∗ E U n − Wn = o (ψ(n)) as n → +∞, (9.2.17) which is exactly equation (5.17) of the proof of Proposition 2(c) in Rio (1995) [159]. The end of the proof is the same as that of Rio. Proof of Theorem 9.2. By Skohorod’s lemma (1976) [179], there exists a se quence (Yi )i≥1 of independent N 0, σ 2 -distributed random variables satisfying

Mn +ψ(n) Wn = i=M Yi for all positive n. Deﬁne the random variable n +1

Mn+1

Vn =

Yi .

i=Mn+1 +1−n

Let n(k) := √ sup {n ≥ 0/ Mn ≤ k}, and note that by deﬁnition of Mn we have n(k) = o( k). Applying Proposition 9.1(c) we see that n(k) k √

Xi − (Ui + Vi ) ≤ Un(k) =o k LLk a.s. i=1

(9.2.18)

i=1

From (5.26) in Rio (1995) [159], we infer that

n(k)

√ Vi = o k LLk a.s.

n(k)

and

i=1

Vi = o

√ k LLk a.s.

(9.2.19)

i=1

Gathering (9.2.18), (9.2.19) and Proposition 9.1(a) and (b), we obtain that k

n(k)

Xi −

i=1

(Wi + Vi ) = o

√

k LLk

a.s.

(9.2.20)

i=1

k

n(k) Clearly i=1 Yi − i=1 (Wi + Vi ) is normally distributed with variance smaller √ than ψ(n(k)) + n(k). Since n(k) = o( k) we have that ψ(n(k)) + n(k) = √ o( kLLk) by deﬁnition of ψ. An elementary calculation on Gaussian random variables shows that k i=1

n(k)

Yi −

(Wi + Vi ) = o

√ k LLk a.s.

i=1

Theorem 9.2 follows from (9.2.20) and (9.2.21).

(9.2.21)

Chapter 10

The Empirical process In this chapter, we prove central limit theorems for the empirical distribution function of weakly dependent stationary sequences (or ﬁelds). Except in the last section (Section 10.6), where the oscillations of the empirical distribution of weakly dependent random ﬁelds are studied, all the results are mainly based on the tightness criterion given in Proposition 4.2. In Section 10.1, we give a suﬃcient condition for the tightness, based on the control of the covariances between indicators of half lines. In Section 10.2 we prove an empirical central limit theorem for η-dependent sequences by assuming an exponential decay of the coeﬃcients. In Sections 10.3 and 10.4, we give suﬃcient conditions in terms ˜ φ, ˜ θ and τ . In Section 10.5 we present the applications of the coeﬃcients α, ˜ β, of such results to the empirical copula process. Definition 10.1 (Empirical process). We recall the deﬁnition of the empirical process for the diﬀerent cases considered: • Stationary real valued or multivariate sequence: Let (Xn )n∈Z be a sequence of Rd valued random variables. Deﬁne in Rd the partial order by s ≤ t if and only if si ≤ ti for i = 1, . . . , d. The empirical process of X is deﬁned by n 1 1{Xi ≤t} . (10.0.1) Fn (t) = n i=1 • Stationary real valued random field: For N in N, let BN be the closed ball of radius N for the ∞ -norm on T = Zd and n = #BN = (2N + 1)d . Let (Xt )t∈T be a real valued random ﬁeld. We deﬁne the empirical process Fn (t) =

1 1{Xk ≤t} . n k∈BN

223

(10.0.2)

224

CHAPTER 10. THE EMPIRICAL PROCESS

In any case, if the variables are identically distributed with common distribution function F , we deﬁne the normalized empirical process by Un (t) =

√ n (Fn (t) − F (t)) .

We need to deﬁne the limit processes in the following central limit theorems. (B(t))t∈[0,1]d is a zero-mean Gaussian process with covariance function Γ(t, s) =

Cov(1X0 ≤t , 1Xk ≤s ).

(10.0.3)

k∈T

where T = Z in the case of random sequences and T = Zd in the case of random ﬁelds.

10.1

A simple condition for the tightness

We consider a stationary real valued sequence (Xn )n∈Z with continuous common repartition function F . We assume without loss of generality that the marginal distribution of this sequence is the uniform law on [0, 1]. Assume that the sequence (Xn )n∈Z satisﬁes the following weak dependence condition: Let F = {x → 1s<x≤t / for s, t ∈ [0, 1]}. We assume that for any m ∈ {1, 2, 3} and any 0 ≤ t1 ≤ t2 ≤ t3 ≤ t4 , m 4 sup Cov f (Xti ), f (Xti ) ≤ ε(r), f ∈F i=1

(10.1.1)

i=m+1

where r = tm+1 − tm and ε(r) does only depend on r (in this case a weak dependence condition holds for a class of functions Ru → R working only with the values u = 1, 2 or 3). Proposition 10.1. Let (Xn )n∈Z be a real valued stationary sequence fulﬁlling (10.1.1) with ε(r) = O(r−5/2−ν ), for some ν > 0.

(10.1.2)

Then the process Un is tight in D([0, 1]). Proof of Proposition 10.1. The moment inequality (4.3.17), together with conditions (10.1.1) and (10.1.2), yields the existence of a positive constant C such

10.2. η-DEPENDENT SEQUENCES

225

that for any s, t in [0, 1] Un (t) − Un (s)4

≤ ≤

1/2 1 n−1 1/4 r−a ∧ |t − s| + (r + 1)2 ε(r) n r=0 r=0 1/2 C r−a

C

n−1

r≥|t−s|−1/a

1/2 2−a |t − s| +n 4

+

r<|t−s|−1/a

≤

C{|t − s|

a−1 2a

+n

2−a 4

}.

The last bound together with the tightness criterion given in Proposition 4.2 proves that the sequence {Un (t), t ∈ [0, 1]} is tight. Remark 10.1. Stationary associated sequences (see Section 1.4) satisfy the requirement of Proposition 10.1 if sup sup Cov(1X0 >x , 1Xk >y ) = O(r−5/2−ν ).

|k|≥r x,y∈R

Using the following inequality : sup sup Cov(1X0 >x , 1Xk >y ) ≤ C sup Cov1/3 (X0 , Xk ),

|k|≥r x,y∈R

|k|≥r

for an universal constant C, Yu (1993) [195] proves the tightness under the condition Cov(X0 , Xr ) = O(r−a ), for a > 15/2. However in this case, the paper by Louhichi (2000) [124] proves the tightness under the condition Cov(X0 , Xr ) = O(r−a ), for a > 4.

10.2

η-dependent sequences

In this section, Y is a stationary η-dependent sequence in [0, 1]d with uniform marginal distributions. Theorem 10.1. Assume that (Yi )i∈Z is a stationary η-dependent zero-mean process in [0, 1]d with uniform marginal distributions. Assume that there exist some constants C > 0 and a > 4d + 2 such that η(r) ≤ Cr−a . Then the process Un converges in distribution in D([0, 1]d ) to the Gaussian process B. The following lemma is used to prove Theorem 10.1. Lemma 10.1. Assume that (Yi )i∈Zp is a stationary multivariate random ﬁeld with value in [0, 1]d and that the density of marginal distribution of the vectors

226

CHAPTER 10. THE EMPIRICAL PROCESS

Yi are bounded by CY . For s ≤ t in [0, 1]d , denote gt,s (x) = 1{x ≤ t} − 1{x ≤ s}. Let i = (i1 , . . . , iu ) in (Zp )u and j = (j1 , . . . , jv ) in (Zp )v be two sets of indices that are r-distant in L1 -distance. Let G and H be two bounded Lipschitz functions on Ru and Rv respectively. Denote Yi = (Yi1 , . . . , Yiu ). Then |Cov (G(gt,s (Yi )), H(gt,s (Yj )))| ≤ ψ(G, H) r ,

(10.2.1)

where, setting φ(G, H) = dG H∞ Lip (G), we deﬁne 1/2

• if Y is η-dependent, r = ηr

ψ(G, H) = 4(CY d)1/2 (φ(G, H) + φ(H, G)),

(10.2.2)

1/3

• if Y is κ-dependent, r = κr

ψ(G, H) = 2(4(CY d))2/3 (φ(G, H) + φ(H, G))2/3 (φ(G, H)φ(H, G))1/3 (10.2.3) Proof of lemma 10.1 For δ ≥ 0, deﬁne the δ-approximations of 1{x≥t} by: hδ,t (x) =

d (x(p) − t(p) + δ) p=1

δ

1{t(p) −δ<x(p)
Deﬁne gδ,t,s = hδ,t − hδ,s . Then its Lipschitz modulus is equal to δ −1 , where

the distance in Rd is d1 (x, y) = dp=1 |x(p) − y (p) | and E |gs,t (Y0 ) − gt,s,δ (Y0 )| ≤ 2dCY δ because the density of the variable Yi is bounded by CY and the two functions are equal except on 2d blocks of width δ. Deﬁne G0 (Yi ) = G (gt,s (Yi ) and Gδ (Yi ) = G (gt,s,δ (Yi )). |Cov(G0 (Yi ), H0 (Yj )) − Cov(Gδ (Yi ), Hδ (Yj ))| ≤ |E (G0 (Yi )H0 (Yj )) − E (Gδ (Yi )Hδ (Yj ))| + |E (G0 (Yi )) E (H0 (Yj )) − E (Gδ (Yi )) E (Hδ (Yj ))| After substitution of the variables one by one, the ﬁrst term of the right hand side is bounded by: (uH∞ Lip (G) + vG∞ Lip (H))E |gt,s (Y0 ) − gt,s,δ (Y0 )| . so that: |E (G0 (Yi )H0 (Yj )) − E (Gδ (Yi )Hδ (Yj ))| ≤ 2CY dδ(φ(G, H) + φ(H, G)). The bound of the second term is the same. Now if Y is η-dependent: |Cov (Gδ (Yi ), Hδ (Yj )) | ≤ (φ(G, H) + φ(H, G))

ηr . δ

10.2. η-DEPENDENT SEQUENCES Hence

227

ηr |Cov (G0 (Yi ), H0 (Yj ) | ≤ (φ(G, H) + φ(H, G)) 4CY dδ + δ

If Y is κ-dependent: |Cov (Gδ (Yi ), Hδ (Yj )) | ≤ (φ(G, H)φ(H, G))

κr . δ2

Hence |Cov (G0 (Yi ), H0 (Yj )) | ≤ 4CY d(φ(G, H) + φ(H, G))δ + φ(G, H)φ(H, G)

κr δ2

Choosing the optimal δ, relation (10.2.1) is proved. We apply this lemma for the case of products of indicator functions, namely for s ≤ t in Rd , for any k multi-index of Zu , we deﬁne Πk =

u

gt,s (Ykj ) − F (t) + F (s).

j=1

We note 7uthat Πk = G(gt,s (Yk )) where the function G is deﬁned by G(x1 , . . . xu ) = j=1 (xj − c) with c = F (t) − F (s). Here G∞ = Lip (G) = 1. Corollary 10.1. Assume that (Yi )i∈Z is a stationary η-dependent zero-mean process in [0, 1]d with uniform marginal distributions, then for any sequences i = (i1 , . . . , iu ) and j = (j1 , . . . , jv ) such that r ≤ j1 − iu : (10.2.4) Cov Πi , Πj ≤ (u + v) (r) , # with (r) = 4 dη(r). Next we prove a Rosenthal type inequality. Proposition 10.2. Assume that Y is a stationary η-dependent zero-mean process in [0, 1]d with uniform marginal distributions. Assume moreover that condition (10.2.4) is satisﬁed with (r) = Cr−a . Then, for l < (a + 1)/2 and (s, t) such that E|x0 (s, t)| < C, we have 2l

⎛

2 (4l − 2)!3 ⎝ E(Un (t) − Un (s)) ≤ kl (2l − 1)! 3 2l

n1−l kl + (2l)! 3 where kl = C +

C2a a−2l+1

.

E|x0 (s, t)| C

E|x0 (s, t)| C

1−1/a l

1+(1−2l)/a , (10.2.5)

228

CHAPTER 10. THE EMPIRICAL PROCESS

Proof of Proposition 10.2. Let s ≤ t be in Rd . As |xi (s, t)| ≤ 1, we get for any sequences i ∈ Zu , j ∈ Zv , (10.2.6) Cov Πi , Πj ≤ 2 E|x0 (s, t)|. For any integer q ≥ 1, set

Aq (n) =

E Π , k

(10.2.7)

k∈{1,...,n}q then

E(Un (s) − Un (t))2l ≤ (2l)!n−l A2l (n).

(10.2.8)

Let q ≥ 2. For a ﬁnite sequence k = (k1 , . . . , kq ) of elements of Z, let (k(1) , . . . , k(q) ) be the same sequence ordered from the smaller to the larger. The gap r(k) in the sequence is deﬁned as the maximum of the integers k(i+1) − k(i) , i = 1, . . . , q − 1. Choose any index j < q such that k(j+1) − k(j) = r, and deﬁne the two nonempty subsequences k1 = (k(1) , . . . , k(j) ) and k2 = (k(j+1) , . . . , k(q) ). Deﬁne Gr (q, n) = {k ∈ {1, . . . , n}q / r(k) = r}. Sorting the sequences of indices by their gaps, we get Aq (n) ≤ +

+

n k=1 n−1

E|x0 (s, t)|q Cov Πk1 , Πk2

r=1

k∈Gr (q,n)

n−1

r=1

k∈Gr (q,n)

E Πk1 E Πk2 .

(10.2.9)

(10.2.10)

Deﬁne Bq (n) =

n k=1

E|x0 (s, t)|q +

n−1

r=1

k∈Gr (q,n)

Cov Πk1 , Πk2 .

In

order to prove that the expression (10.2.10) is bounded by the product 1 m Am (n)Aq−m (n) we make a ﬁrst summation over the k’s with #k = m. Hence q−1 Aq (n) ≤ Bq (n) + Am (n)Aq−m (n). (10.2.11) m=1

Now we give a bound of Bq (n). To build a sequence k belonging to Gr (q, n), we ﬁrst ﬁx one of the n points of {1, . . . , n}. We choose a second point among

10.2. η-DEPENDENT SEQUENCES

229

the two points that are at distance r from the ﬁrst point. The i-th point lies in an interval of radius r centered at one of the i − 1 preceding points. Thus for r ∈ N∗ , we have #Gr (q, n) ≤ n · 2 · 2(2r + 1) · · · (q − 1)(2r + 1) ≤ 2n(q − 1)!(3r)q−2 . We use condition (10.2.4) and condition (10.2.6) to deduce: Bq (n) ≤

n E|x0 (s, t)| + 2n(q − 1)!

n−1

(3r)q−2 min(q (r), 2 E|x0 (s, t)|)

r=1

≤

n E|x0 (s, t)| + n 3

q−1

q!

n−1

r

q−2

min( (r), E|x0 (s, t)|)

.

r=1

Denote by R the integer such that R < (E|x0 (s, t)|/C)−1/a ≤ R + 1. For any 2 ≤ q ≤ 2l: R−1 ∞ Bq (n) ≤ n E|x0 (s, t)| + 3q−1 nq! E|x0 (s, t)| rq−2 + C rq−2−a

≤ ≤

r=1

r=R

E|x0 (s, t)| q−1 C 3 nq! R Rq−1−a + q−1 a−q+1 E|x0 (s, t)| C + R−a . 3q−1 nq!(E|x0 (s, t)|/C)−(q−1)/a q−1 a−q+1 q−1

But R ≥ 1, so that (E|x0 (s, t)|/C)−1/a ≤ 2R, and

Bq (n) ≤ 3q−1 nq!(E|x0 (s, t)|/C)1−(q−1)/a C +

C2a a − 2l + 1

.

We ﬁnd that:

−1/a q 1+1/a E|x0 (s, t)| n kl E|x0 (s, t)| Bq (n) ≤ 3 q! C 3 C

(10.2.12)

so Bq (n) is bounded by a function M q Vq that satisﬁes condition (4.3.24) and gives (4l − 2)! E|x0 (s, t)| −1/a 2l 3 ( A2l (n) ≤ (2l)!(2l − 1)! C l 1+1/a 2 E|x0 (s, t)| n kl E|x0 (s, t)| 1+1/a n kl + (2l)! 3 C 3 C (4l − 2)!32l n kl E|x0 (s, t)| 1−1/a l 2 ≤ (2l)!(2l − 1)! 3 C n kl E|x0 (s, t)| 1+(1−2l)/a , +(2l)! 3 C

230

CHAPTER 10. THE EMPIRICAL PROCESS

and (10.2.5) is proved. Proof of theorem 10.1. CLT for the ﬁnite dimensional distributions of Un . Let (s1 , . . . , sm ) be a ﬁxed sequence of elements in [0, 1]d. Denote Bn the vector-valued process Bn = (Un (s1 ), . . . , Un (sm )). To prove a CLT for the vector Bn is equivalent to prove the Gaussian convergence for any linear

mcombination of its coordinates. Let (α1 , . . . , αm ) be a real vector such that j=1 αj = 0.

Deﬁne Zi = m j=1 αj (1{Yi ≤ sj } − P (Yi ≤ sj )). Deﬁne also 1 Zi = αj Un (sj ). Sn = √ n 1≤i≤n

1≤j≤m

We use the Bernstein blocking technique, as in chapter 8. Let p(n) and q(n) be sequences of integers such that p(n) = o(n) and q(n) = o(p(n)). Assume that the Euclidean division of n by (p + q) gives a quotient k and a remainder r. For i = 1, . . . , k, we deﬁne the interval Pi = {(p + q)(i − 1) + 1, . . . , (p + q)i − q} and if r = 0, Pk+1 = {(p + q)k + 1, . . . , (p + q)k + r ∨ p}. Q the set of indices that are not in one of the Pi . Note that the cardinal of Q is less than (k + 1)q. For each block Pi (1 ≤ i ≤ k + 1) and Q, we deﬁne the partial sums: 1 ui = √ Zj , n j∈Pi

1 v= √ Zj . n j∈Q

We use lemma 8.2. We check the conditions for the sequence Zj . To check n−1 n−1 1 (k + 1)q (8.2.6), note that σn2 ≤

(r) and that Ev 2 ≤

(r). n r=0 n r=0 √ Let us check (8.2.7). Using (10.2.4) with Lip G = Lip H = t max αj / nσn , dG = mpk and dH = mp, we get

j−1 t t t j αj ui , h uj ≤ mp (k + 1)

(q). Cov g σn i=1 σn σn and

j−1 t t t j αj 2 ui , h uj ≤ mp (k+1)

(q) = O(n3/2 p−1 q −a ). Cov g σ σ σ n n n j=2 i=1

k+1

˜ 10.3. α ˜ , β˜ AND φ-DEPENDENT SEQUENCES

231

Taking p = n5/6 and q = n5/6a gives a bound tending to 0. To prove (8.2.8), it is suﬃcient to show that E|ui |4 = O(k −2 ). But ⎞4 ⎛ 4 m m p2 p2 4 ⎠ ⎝ Zj = 2 αi Bp (si ) ≤ 2 m3 α4i (Bp (si ) − Bp (0)) , n n i=1 i=1 j∈Pi

and we conclude by applying Proposition 10.2 for l = 2 to the couples (0, si ). In order to prove (8.2.9), note that (8.2.6) implies that k+1 1 ui = 1. lim 2 Var n→∞ σn i=1 But k+1 k+1 2 ui − E|ui | Var i=1

≤ 2

|Cov(ui , uj )|

1≤i =j≤k

i=1

∞

≤

4(k + 1)p

(j) = O(q −a+1 ) = o(1). n j=q

Taking p = n5/6 and q = n5/6a gives a bound tending to 0. Tightness of Un . We use the criteria of Proposition 4.2. Deﬁne F = {gs,t /gs,t (x) = 1{x ≤ t} − 1{x ≤ s} − F (t) + F (s); s, t ∈ [0, 1]d} . By deﬁnition Un (t) − Un (s) = Zn (gs,t ) and gs,t PY ,1 = E|x0 (s, t)|. Recalling that a > 2d + 1, from the Rosenthal inequality (10.2.5) for l = d + 1 we get 1/r

Zn (gs,t )p ≤ C(gs,t PY ,1 + n1/q−1/2 ) , with p = q = 2l = 2d + 2, r = 2a/(a − 1). For the class of functions considered, the covering number NPY ,1 (x, F )) is O(x−d ) so that 1 1 1 a+1 d x(1−r)/r (NPY ,1 (x, F ))1/p dx ≤ C x− 2 ( a + d+1 ) dx. 0

0

Because a > d + 1, the exponent is greater than −1, and the integral is ﬁnite. As p = q = 2d + 2, the last condition holds directly.

10.3

˜ α, ˜ β˜ and φ-dependent sequences

Consider the three following conditions: ˜ 2 (k) ˜ 2 (k) = O k −7/3−ε if d = 1, and α (C1 ) There exists ε > 0 such that α

232

CHAPTER 10. THE EMPIRICAL PROCESS

= O k −2d−ε if d > 1. (C2 ) There exists ε > 0 such that (C3 ) There exists ε > 0 such that

β˜2 (k) = O k −2d−ε . −1−ε φ˜2 (k) = O k .

Theorem 10.2. If one of the conditions (Ci ), i = 1, . . . , 3 holds, then the process Un converges in distribution in D(Rd ) to the Gaussian process B. Remark 10.2. For d = 1, the condition (C1 ) is better than the condition √ αX (k) = O(k −1− 2− ) given in Shao and Yu (1996) [175] for strongly mixing sequences (recall that αX (k) has been deﬁned in Section 1.2 of Chapter 1). In fact, in Theorem 7.3 of his book, Rio (2000) [161] has shown that the rate αX (k) = O(k −1− ) is suﬃcient for the weak convergence of the d-dimensional distribution function. Proof of Theorem 10.2. We keep the notations of Chapter 4. The ﬁnite dimensional convergence of Un can be proved as before. Let us prove the tightness of Un . Let F = {x → 1x≤t , t ∈ Rd }, and let G = {f − h , f, h ∈ F } . We have to prove that the process {Zn (f ), f ∈ F } is asymptotically tight, that is there exists a semi metric ρ on F such that (F , ρ) is totally bounded, and, for every

> 0, sup |Zn (f ) − Zn (g)| > = 0 . (10.3.1) lim lim sup P δ→0 n→∞

ρ(f,g)≤δ, f,g∈F

Since NQ,1 (x, F ) = O(x−d ) for any ﬁnite measure Q on Rd , the set (F , · Q,1 ) is totally bounded. Consequently, the property (10.3.1) follows from (4.5.2) by applying Markov’s inequality. Let us prove that condition (C1 ) implies (4.5.2). For any s, t in Rd , let fs,t (x) = 1x≤t − 1x≤s and f˜s,t (x) = fs,t (x) − fs,t (x)P (dx). With proposition 5.6 for (f˜s,t (Xi ))i∈Z for any p ≥ 1, the quantity Zn (f˜s,t )p is bounded by 1/3 # pV∞ + n1/3−1/2 3p2 (f˜s,t (X0 )3 p/3 + M1 (p) + M2 (p) + M3 (p)) , (10.3.2) where V∞ , M1 (p), M2 (p) and M3 (p) are deﬁned in Proposition 5.6. Let P be the law of X0 . Use inequality (5.2.5), then for any r > 2, g ∈ G, Zn (g)p ≤ 4(p gP,1)1/r

(r−2)/r α ˜ 1 (k) k≥0

+∞ 3/p 1/3 + n1/3−1/2 3p2 1 + 10 k α ˜2 (k) . (10.3.3) k=1

We then apply Proposition 4.2 with q = 3. Recall that NP,1 (x, F ) = O(x−d ). If d = 1, we can take r = 7/2 and p > 7/2 such that (4.5.2) holds under C1 . If

10.4. θ AND τ -DEPENDENT SEQUENCES

233

d > 1 we can take r = 3 and p > 3d such that (4.5.2) holds under (C1 ). Let us prove that condition (C2 ) implies (4.5.2). Deﬁne the measure Q on Rd by +∞ Q(dx) = B(x) P (dx) = 1 + 4 bk (x) P (dx), (10.3.4) k=1

where bk (x) is the function from R to [0, 1] such that b(σ(X 0 ), Xk ) = bk (X0 )

+∞ and P is the law of X0 . Note that Q is ﬁnite as soon as k=1 β1 (k) is ﬁnite. Applying the inequality (5.2.6), we obtain that for any g in G, 1/3 +∞ 3/p 1/2 1/3−1/2 2 ˜ Zn (g)p ≤ (p gQ,1 ) + n k β2 (k) . 3p 1 + 10 d

k=1

We then apply Proposition 4.2 with r = 2 and q = 3. Since NQ,1 (x, F ) = O(x−d ), we can take p > 3d such that (4.5.2) holds under (C2 ). Let us prove that condition (C3 ) implies (4.5.2). Applying Proposition 5.7 to the sequence (f˜s,t (Xi ))i∈Z , we obtain, for any p ≥ 1, 1/2

Zn (f˜s,t )p ≤ (p(V∞ + 2M0 (p))) 1/3 ˜1 (p) + M ˜2 (p) + M3 (p) + n1/3−1/2 3p2 f˜(X0 )3 p/3 + M , ˜1 (p), M ˜2 (p) and M3 (p) are deﬁned in Proposition 5.7. where V∞ , M0 (p), M Applying the inequality (5.2.7), we get that for any g in G, +∞ 1/2 Zn (g)p ≤ (pgQ,1 )1/2 + 2p φ˜2 (k) k=N

+ n1/3−1/2 3p2 1 + 2

N

k φ˜2 (k) + 4

k=1

∞

φ˜2 (k)(k ∧ N ) + 4

k=1

+∞

1/3 . φ˜2 (k)

k=1

(10.3.5) We take now N = nα with α = 1/(2 + ε). If (C3 ) holds, we infer from (10.3.5) that there exists some positive constant C such that, for any g in G, 1/2

Zn (g)p ≤ CgQ,1 + Cn−ε/(4+2ε) . To conclude we apply Proposition 4.2 with r = 2 and q = 2 + ε. NQ,1 (x, F ) = O(x−d ), (4.5.2) holds under (C3 ).

10.4

θ and τ -dependent sequences

Consider the three following conditions:

Since

234

CHAPTER 10. THE EMPIRICAL PROCESS

(C4 ) Each component of X1 hasa bounded density and there existsε > 0 such that θ1,2 (k) = O k −14/3−ε if d = 1, and θ1,2 (k) = O k −4d−ε if d > 1. (C5 ) Each component of X1 has a bounded density and there exists ε > 0 such that τ1,2 (k) = O k −4d−ε . (C6 ) Each component of a bounded density and there exists ε > 0 such X1 has that τ∞,2 (k) = O k −2−ε . Theorem 10.3. If one of the conditions (Ci ), i = 4, . . . , 6 holds, then the process Un converges in distribution in D(Rd ) to the Gaussian process B. Proof of Theorem 10.3. The result is immediate by using Theorem 10.2 and Lemma 5.1.

10.5

Empirical copula processes

Copulas describe the dependence structure between some random vectors. They have been introduced a long time ago by Sklar (1959) [178] and have been rediscovered recently, especially for their applications in ﬁnance and biostatistics. Brieﬂy, a d-dimensional copula is a distribution function on [0, 1]d , whose marginal distributions are uniform and that summarizes the dependence “structure” independently of the speciﬁcation of the marginal distributions. To be speciﬁc, consider a random vector X = (X1 , . . . , Xd ) in Rd , whose joint distribution function is F and whose marginal distribution functions are denoted by Fj , j = 1, . . . , d. Then there exists a unique copula C deﬁned on the product of the values taken by the r.v. Fj (Xj ), such that C(F1 (x1 ), . . . , Fd (xd )) = F (x1 , . . . , xd ), for any x = (x1 , . . . , xd ) ∈ Rd . C is called the copula associated with X. When d F is continuous, it is deﬁned on [0, 1]d, with an obvious extension to R . When F is discontinuous, there are several choices to expand C on the whole [0, 1]d (see Nelsen (1999) [134] for a complete theory). The natural empirical counterpart of C is the so-called empirical copula, deﬁned by −1 −1 Cn (u) = Fn (Fn,1 (u1 ), . . . , Fn,d (ud )), for every u1 , . . . , ud in [0, 1], where Fn denotes the empirical process as in Deﬁnition 10.0.1 and Fn,i the empirical process of the i-th marginal distribution. We use the usual “generalized inverse” notations, for every j = 1, . . . , d, Fj−1 (u) = inf{t / Fj (t) ≥ u}. Empirical copulas have been introduced by Deheuvels (1979, 1981a, 1981b), [51],[52],[53] in an i.i.d. framework. This author studied the consistency of Cn

10.5. EMPIRICAL COPULA PROCESSES

235

and the limiting behavior of n1/2 (Cn − C) under the strong assumption of independence between margins. Fermanian et al.(2002) [86] proved some functional CLT for this empirical copula process in a more general framework and provide some extensions. Note that the results of [86] are available under the sup-norm and outer expectations assumptions, as in van der Vaart and Wellner (1996) [183]. Assume that the process (Yi )i∈Z , Y = (F1 (X1 ), . . . , Fd (Xd )) is weakly dependent. Note that the covariance structure of the limit process B depends not only on the copula C (via the term associated with i = 0 e.g.), but also on the joint law between X0 and Xi , for every i. This is diﬀerent from the i.i.d. case, where B becomes a Brownian bridge whose covariance structure is a function of C only. Actually, the covariances of B depend here on every successive copulas of the random vectors (X0 , Xi ). We can state: Theorem 10.4. If (Yi )i∈Z is weakly dependent and if the empirical process of (Yi )i∈Z converges in distribution in D([0, 1]d ) to a Gaussian process B , if C has some continuous ﬁrst partial derivatives, then the process n1/2 (Cn − C) tends weakly to a Gaussian process G in D([0, 1]d ). Moreover, this process has continuous sample paths and can be written as G(u) = B(u) −

d

∂j C(u)B(u1 , . . . , uj−1 , 0, uj+1 , . . . , ud ),

(10.5.1)

j=1

for every u ∈ [0, 1]d . Note that the covariance structure of n1/2 (Cn − C) is involved, because of both (10.0.3) and (10.5.1). Proof of theorem 10.4. The proof is directly adapted from Fermanian et al.(2002) [86]. Brieﬂy, we can assume that the law of X is compactly supported on [0, 1]d , eventually by working with Y = (F1 (X1 ), . . . , Fd (Xd )). Indeed, it can be proved that the empirical copulas associated with Y and X are equal on all the points (i1 /n, . . . , id /n), i1 , . . . , id in {0, . . . , n} (lemma 3 in [86]), thus on [0, 1]d as a whole. Consider the usual norm on l∞ ([0, 1]) and the Skohorod metric δ on D([0, 1]d . Deﬁne the mappings (D([0, 1]d ), δ) → (D([0, 1]d ), δ) × (D([0, 1]), δ)⊗d φ1 : F → (F, F1 , . . . , Fd ) φ2 :

(D([0, 1]d ), δ) × (D([0, 1]), δ)⊗d → (l∞ ([0, 1]d )) × (l∞ ([0, 1]))⊗d (F, F1 , . . . , Fd ) → (F, F1−1 , . . . , Fd−1 )

236

CHAPTER 10. THE EMPIRICAL PROCESS

φ3 :

(l∞ ([0, 1]d ) × l∞ ([0, 1]))⊗d → (l∞ ([0, 1]d )) (F, G1 , . . . , Gd ) → F (G1 , . . . , Gd ) .

Clearly, φ1 is Hadamard-diﬀerentiable because it is linear. Moreover, φ2 is Hadamard-diﬀerentiable tangentially to the corresponding product of continuous functions by applying theorem 3.9.23 in van der Vaart and Wellner (1996) [183]. Note that, for any function h ∈ C([0, 1]), the convergence of a sequence hn towards h in (D([0, 1]), δ) is equivalent to the convergence in (D([0, 1]), · ∞ ). Thus, working with the Skorohod metric is not an hurdle here. At last, φ3 is Hadamard-diﬀerentiable by applying theorem 3.9.27 in [183]. Thus, the chain rule applies : φ = φ3 ◦ φ2 ◦ φ1 is Hadamard-diﬀerentiable tangentially to C([0, 1]d ). The result follows by applying the functional Δ-method to the empirical process of Y and to the function φ (see theorem 3.9.4 in [183]).

10.6

Random ﬁelds

In this section, we give rates of convergence in the central limit theorem for a η- or κ-weak dependent ﬁeld ξ. Assume that the density of the variable ξi is bounded by Cξ . Following lemma 4.1, we get (I, c)-weak dependence with # # • for η-weak dependence, (r) = η(r) and c(df , dg ) = 2 8Cξ (df + dg ). 1

2

4

• for κ-weak dependence: (r) = (κ(r)) 3 and c(df , dg ) = 2(8Cξ ) 3 (df +dg ) 3 . For the sake of simplicity, we shall assume that the process takes its values in [0, 1]. This may be achieved by using the quantile transform. Let (S, | · |S ) be the space of c` adl` ag functions D([0, 1]) with the Skorohod metric. Let π denote the Prohorov distance between distribution functions on (S, | · |S ). If X and Y are two processes on S, we also denote π(X, Y ) the Prohorov distance between their distributions. Central limit theorem for the empirical process Let Un be the normalized version of the empirical process deﬁned by (10.0.2) with respect to the closed ball BN of radius N and cardinality n = #BN = (2N +1)d . Denote gs,t (u) = 1s 0 and b > 0 such that (r) ≤ Ce−br , then

10.6. RANDOM FIELDS

237

π(Un , X) = O n−α0 log(n)−β0 , where α0

=

β0

=

1 , 8d + 24 10d2 + 39d + 28 . 8d + 24

The proof is based on the well known result on the Prohorov distance: Lemma 10.2. Let δ be a positive real and D be a ﬁnite subset {x1 , . . . , xm } ⊂ [0, 1], such that every x in [0, 1] is in a δ-neighborhood of some xi . Let X and Y be two distributions on S. Deﬁne the δ-oscillation of X wX (δ) =

sup (X(x) − X(y)) ,

x−y<δ

and εX (δ) = inf{ε ∈ R P(wX (δ) > ε) ≤ ε}. The Prohorov distance between X and Y is bounded by: π(X, Y ) ≤ εX (δ) + εY (δ) + π(XD , YD ), where XD is the ﬁnite dimensional distribution of X on the subset D. We need to compute bounds for the δ-oscillations and distance between laws. These are based on moment inequalities for the process Un . Proposition 10.3. Assume that (r) ≤ Ce−br , with b > 0. For (s, t) such that |t − s| < Cξ e−4b /C and |t − s| < Ce3b /Cξ : 2dl (4l − 2)! E(Un (t) − Un (s))2l ≤ 6(1 ∨ b−2 ) log (1/|t − s|) (2l − 1)! l × (2(2d)!Cξ |t − s|) + (2l)!(2ld)!n1−l Cξ |t − s| . (10.6.1)

Note that Un (t)−Un (s) = √1n k∈BN gs,t (ξk ). The proposition is a consequence of the bound of the covariance of quantities depending on the functions gs,t (ξk ). Proof of proposition 10.3. We adapt the proof of Proposition 10.2 to the series (gs,t (ξk ))k∈BN . For a sequence k = (k1 , . . . , kq ) 7 of elements of T , deﬁne ξk = (ξk1 , . . . , ξkq ) and when s and t are ﬁxed, Πk = qi=1 gs,t (ξki ). For any integer q ≥ 1, set: E Π , Aq (N ) = (10.6.2) k q k∈BN then,

|E(Un (s) − Un (t))2l | ≤ (2l)!n−l A2l (N ).

(10.6.3)

238

CHAPTER 10. THE EMPIRICAL PROCESS

Let q ≥ 2. For a ﬁnite sequence k = (k1 , . . . , kq ) of elements of T , the gap is deﬁned by the max of the integers r such that the sequence may be split into two non-empty subsequences k1 and k2 ⊂ Zd whose mutual distance equals r (d(k1 , k2 ) = min{i − j1 /i ∈ k1 , j ∈ k2 } = r). If the sequence is constant, its q gap is 0. Deﬁne the set Gr (q, N ) = {k ∈ BN and the gap of k is r}. Sorting the sequences of indices by their gap:

Aq (N ) ≤

E|gs,t (ξk1 )|q +

k1 ∈BN 2N

+

2N

Cov Πk1 , Πk2

r=1 k∈Gr (q,N )

E Πk1 E Πk2 .

(10.6.4)

r=1 k∈Gr (q,N )

Deﬁne

Bq (N ) =

E|gs,t (ξk1 )|q +

k1 ∈BN

2N

Cov Πk1 , Πk2 .

r=1 k∈Gr (q,N )

We get Aq (N ) ≤ Bq (N ) +

q−1

Am (N )Aq−m (N ).

m=1

To build a sequence k belonging to Gr (q, N ), we ﬁrst ﬁx one of the n points of BN . We choose a second point on the 1 -sphere of radius r centered on the ﬁrst point. The third point is in a ball of radius r centered on one of the preceding points, and so on. . . Thus #Gr (q, N ) ≤ n · 2d(2r + 1)d−1 · 2(2r + 1)d · · · (q − 1)(2r + 1)d ≤ ndq!(3r)d(q−1)−1 . We use Lemma 4.1 to deduce: 2N Bq (N ) ≤ n Cξ |t − s| + dq! (3r)d(q−1)−1 min( (r), Cξ |t − s|) . r=1

Let R be an integer to be chosen later. Bq (N ) ≤ nd3

d(q−1)

R−1 ∞ d(q−1)−1 q! Cξ |t − s| r +C rd(q−1)−1 e−br . r=0

r=R

Comparing summations with integrals we get Bq (N ) ≤

n(3(1 ∨ b−1 ))d(q−1) q!(d(q − 1))! × Ce4b −b(R+1) e ×Rd(q−1) Cξ |t − s| 1 + . Cξ |t − s|

10.6. RANDOM FIELDS

239

Choose R as the integer part of 1b log Ce4b /Cξ |t − s| and assume that (s, t) ∈ T are such that |t − s| ≤ e−4b Cξ /C and |t − s| ≤ e3b C/Cξ . Then R ≥ 1 and dq Bq (N ) ≤ 6(1 ∨ b−2 ) log (1/|t − s|) nCξ |t − s| q! (dq)!,

(10.6.5)

so that Bq (N ) is bounded by a function M q Vq that satisﬁes condition (4.3.24). Then A2l (N ) ≤

2dl (4l − 2)! 6(1 ∨ b−2 ) log (1/|t − s|) (2l)!(2l − 1)! (2(2d)!nCξ |t − s|)l + (2l)!(2ld)!nCξ |t − s| ,

and (10.6.1) is proved. Oscillations of the empirical process Using Proposition 10.3, we give a bound for the modulus of continuity of Un . Proposition 10.4. If (r) ≤ Ce−br with b > 0, then for δ ≥ 1/n: εUn (δ) ≤ K1 (Cξ , b, d)δ 1/2 logd+1 (1/δ),

(10.6.6)

d 1 where K1 (Cξ , b, d) = 8 6(1 ∨ b−2 ) (2(2d)!Cξ ) 2 . Proof of proposition 10.4. We show that exponential moment of Un (t) − Un (s) are ﬁnite and use Stroock’s method to ﬁnd the oscillation. Lemma 10.3. Let f (u) = |u|1/2 logd (1/u). Assume that (r) ≤ Ce−br with b > 0, and that δ ≥ 1/n. Then there exists a constant c0 such that for every c < c0 and every (s, t) such that |t − s| ≤ δ: |U (t) − U (s)| n n ≤ B(c) < +∞. E exp c f (t − s) Proof of lemma 10.3 Using the moment inequality (10.6.1) and Stirling’s formula: E

|Un (t) − Un (s)| f (t, s)

2p

2d p ≤ p2p 8e2 Cξ (2d)! 6(1 ∨ b−2 ) .

|Un (t) − Un (s)| E exp c ≤ f (t − s) ≤

2k 12 |Un (t) − Un (s)| E k! f (t, s)

∞ k c k=0 ∞ k=0

2d k2 ck k k 2 8e Cξ (2d)! 6(1 ∨ b−2 ) . k!

240

CHAPTER 10. THE EMPIRICAL PROCESS

−d 2 − 1 8e Cξ (2d)! 2 , the lemma is true. For c0 = 6(1 ∨ b−2 ) We apply a lemma of Garsia (1965) [90]. Let c = c0 /2, B(c) as in lemma 10.3, ψ(u) = ecu − 1 and 0 < δ < e−2d . The lemma says that if δ δ |Un (t, ω) − Un (s, ω)| Y (ω) = ψ dsdt < ∞, f (|t − s|) 0 0 then |Un (t, ω) − Un (s, ω)| ≤ 8ψ −1 (4Y (ω)/δ 2 )f (δ). Using Markov inequality and the fact that E(Y ) ≤ B(c)δ 2 : λ δ2 4B(c) . P(|Un (t) − Un (s)| ≥ λ) ≤ P Y ≥ ψ ≤ 4 8f (δ) ψ 8fλ(δ) For λ = (4/c)f (δ) log(1/δ), P(|Un (t) − Un (s)| ≥ λ) ≤ 4B(c)δ 1/2 /(1 − δ 1/2 ) . For δ suﬃciently small, this term is less than λ, so that εUn (δ) ≤ K1 δ 1/2 logd (1/δ) with K1 = 4/c. Oscillations of the limit process Proposition 10.5. If (r) ≤ Ce−br with b > 0, then etting K3 (Cξ , b, d) = d 1/2 (Cξ d!(2d + 7)) 2d 1 ∨ b−1 , εX (δ) ≤ K3 (Cξ , b, d)δ 1/2 log(d+1)/2 (1/δ).

(10.6.7)

Proof of proposition 10.5. We use a chaining argument to bound the δ-oscillations of the non-stationary Gaussian limit process X. Deﬁne a semi-metric ρ¯ on [0, 1] by ρ¯(s, t)2 = Var (X(t) − X(s)). As a Gaussian process, X satisﬁes the exponential inequality: P(|X(t) − X(s)| ≥ λ¯ ρ(s, t)) ≤ exp{−λ2 /2}.

By deﬁnition, ρ(s, ¯ t)2 = j∈T Cov (x0 (t) − x0 (s), xj (t) − xj (s)). Cauchy-Schwarz inequality:

Using the

|Cov (x0 (t) − x0 (s), xr (t) − xr (s)) | ≤ Var (x0 (t) − x0 (s)) ≤ Cξ |t − s|, and by deﬁnition of (r), |Cov (x0 (t) − x0 (s), xr (t) − xr (s)) | ≤ (r). For a metric ν over [0, 1], we denote Nν (ε) the covering number (see Pollard (1984) [151], p.143). It is the cardinality of the smallest set S of points of [0, 1], so that for any t, ν(t, S) ≤ ε. The corresponding covering integral is ε 1/2 Jν (ε) = 2 log Nν (u)2 /u du. 0

10.6. RANDOM FIELDS

241

Computing as in (10.6.5), it is easy to show that ρ¯(s, t)2 =

∞

(2r + 1)d−1 (r) ∧ Cξ |t − s| ≤ K 2 |t − s| logd (1/|t − s|).

r=0

# where K = 2 Cξ d!(1 ∨ b−1 )d . Let ε > 0. The inclusion of the balls of the two metrics implies that Nρ¯(Kε1/2 logd/2 (1/ε)) ≤ N|·| (ε). Thus Nρ¯(u) ≤ 2d+1 (K/u)2 logd+1 (K/u) ≤ 2d (K/u)d+2 . The corresponding covering integral is: δ 1/2 Jρ¯(δ) ≤ 4 log 2d+1 (K/u)d+3 + 2 log(1/u) du 0

≤ ≤

2δ log1/2 2d+1 K d+3 + (4d + 14)1/2 log−1/2 (1/δ) 2δ log1/2 2d+1 K d+3 + (4d + 14)1/2 δ log1/2 (1/δ).

δ

log(u)du 0

Taking ε = Kδ 1/2 logd/2 (1/δ): d/2 1/2 P max |X(t) − X(s)| ≥ 26Jρ¯(Kδ log (1/δ)) ≤ Kδ 1/2 logd/2 (1/δ). |t−s|≤δ

For a suﬃciently small δ, εX (δ) ≤ Jρ¯(Kδ 1/2 logd/2 (1/δ)) ≤ K(2d + 7)1/2 δ 1/2 log(d+1)/2 (1/δ). Distance between the finite dimensional laws Let m ∈ N. Let D = {t1 , . . . , tm } ⊂ [0, 1]. (xi (t1 ), . . . , xi (tm )). Deﬁne the partial sum sn :

We denote z i the m-vector

1 i sn = √ z. n i∈BN

Let Y be a centered Gaussian m-vector, whose covariance matrix is ΣD = (Σs,t )s,t∈D . We bound the Prohorov distance between sn and Y . For a real ε such that 0 ≤ ε ≤ 1, we deﬁne the class of functions Fε by Fε = {f ∈ Cb3 (Rm )/ 0 ≤ f ≤ 1, ||f (i) ||∞ ≤ 2m−1/2 ε−i for i = 1, 2 or 3}, (10.6.8) where f (i) is the diﬀerential of i-th order of f and the norm ||·||∞ is the operator sup-norm w.r.t to the norm · 1 . A bound of the Prohorov distance between sn and Y is given by: π(sn , Y ) ≤ 4m1/2 ε(1 + log1/2 (ε)) + 2 sup |E(f (sn ) − f (Y )|). Fε

(10.6.9)

242

CHAPTER 10. THE EMPIRICAL PROCESS

Proposition 10.6. Deﬁne m, p and q as functions of n which converge to inﬁnity with n, negligible with respect to n, q negligible with respect to p, and ε as a function of n which converges to zero as n tends to inﬁnity. For a suﬃciently large n, in the case of η-dependence: 4m1/2 ε(1 + log1/2 (1/ε))

(10.6.10)

+ +

K1 ε−1 m1/2 q 1/2 p−1/2 2K5 m1/2 ε−2 n3/2 (q)

(10.6.11) (10.6.12)

+ +

2K6 m3/2 ε−3 n (q) 2K7 mε−3 pd/2 n−1/2

(10.6.13) (10.6.14)

+

2K8 m3/2 ε−2 p−1 q.

(10.6.15)

π(sn , Y ) ≤

In the case of κ-dependence, the terms involving K5 and K6 are replaced by: + 2K5 m1/3 pd/3 ε−7/3 n4/3 (q) + 2K6 m4/3 pd ε−11/3 n (q). The values of the constants Ki are given in the proof. Proof of proposition 10.6. We use the Bernstein blocking technique (Bernstein, 1939, [13]. Assume that the Euclidean division of n by (p + q) gives a quotient a and a remainder r. Denote j = (j, . . . , j). Deﬁne K = {−a − 1, . . . , a + 1}d; for i ∈ {−a, . . . , a}d , we deﬁne the blocks Pi = [(p+q)(i−1), . . . , (p+q)i−q1]. These blocks are separated with bands of width q. We complete the construction with at most 4a + 4 incomplete blocks on the boundary, also separated with bands of width q, and associate each of them with a corresponding index in the boundary of K. Denote Q the set of indices that are in the separating bands. Note that the cardinality of Q is less than d(2a + 1)qpd−1 . We order the set of blocks P by the lexicographic order of their index in K. We deﬁne the variables (ui )i=1,...,(2a+1)d and v exactly as in section 8.2.2. Consider an independent sequence of centered Gaussian vectors (y i )i=1,...,k , such that each y i has the same covariance matrix as ui . Let ε > 0 and f ∈ Fε deﬁned by (10.6.8). We decompose f (sn ) − f (Y ) =

f (sn ) − f

+

f

+

f

k

ui

i=1 k i=1

y

i

k i=1

−f

u

i

(10.6.16)

k

yi

(10.6.17)

i=1

− f (Y ).

(10.6.18)

10.6. RANDOM FIELDS

243

Consider the left hand side of (10.6.16). k k k i i i u u +v −f u E f (sn ) − f ≤ E f i=1

i=1

≤

2m−1/2 ε−1

i=1

m

E(|vs |)

s=1

≤

2m1/2 ε−1 E(|v1 |2 )1/2 .

Because (r) is decreasing, E(|v1 |2 ) =

1 1 |Cov (xi (t1 ), xj (t1 ))| ≤

(i − j) n n i,j∈Q

≤

#Q n

N

i,j∈Q

(2r + 1)d−1 (r) ≤ K1

r=0

#Q q ≤ K1 , n p

where K1 is a constant depending on the sequence (r). We get k ui E f (sn ) − f ≤ K1 (mq/p)1/2 ε−1 .

(10.6.19)

i=1

Now, we apply Lemma 7.1 to the diﬀerence (10.6.17). We ﬁrst, give a bound for T1 . Assume that ξ is η-dependent. Deﬁne G = ∂f (u1 + · · · + ui−1 )/∂xs and H = uis . We apply lemma 10.1 for dG ≤ pd i, dH = pd , Lip (G) ≤ 2m−1/2 ε−2 , Lip (H) ≤ n−1/2 , G∞ ≤ 2m−1/2 ε−1 and H∞ ≤ pd n−1/2 . Because of the respective order of the parameters, we simplify (10.2.2): 1 j−1 Cov ∂f (u + · · · + u ) , ujs ≤ 2φ(G, H) (q) ≤ 4m−1/2 ε−2 p2d i n−1/2 (q). ∂xs Summing over j and s: |T1 | ≤

i

|E(f (1) (u1 + · · · + uj−1 ) · uj )| ≤ 4m1/2 ε−2 (q)n3/2 .

j=1

Now we give a bound for T2 . Deﬁne G =

∂ 2 f (u1 +···+ui−1 ) and H = ∂xsi ∂xti d −1/2 −3

uis uit . We

ε , Lip (H) ≤ apply Lemma 10.1 for dG = i2 p2d , dH = 2p , Lip (G) ≤ 2m pd n−1 , G∞ ≤ 2m−1/2 ε−2 and H∞ ≤ p2d n−1 . Keeping the larger exponents in each term: 2 ∂ f (u1 + · · · + uj−1 ) j j , us ut ≤ 4m−1/2 ε−3 p2d jn−1 (q). Cov ∂xs ∂xt

244

CHAPTER 10. THE EMPIRICAL PROCESS

Summing over j, s and t: |T2 | ≤

i E f (2) (u1 + · · · + uj−1 ) · (uj , uj ) ≤ 2m3/2 ε−3 (q)n. j=1

Assume that ξ is κ-dependent. Using the same bounds for functions G and H and relation (10.2.3): |T1 | ≤

i

|E f (1) (u1 + · · · + uj−1 ) · uj | ≤ 4m1/3 pd/3 ε−7/3 n4/3 (q),

j=1

|T2 | ≤

i E f (2) (u1 + · · · + uj−1 ) · (uj , uj ) ≤ 4m4/3 pd ε−11/3 n (q). j=1

The term of third order is bounded using the third order moment. Using Jensen inequality: m 3/4 E|u1 |32 ≤ m1/2 E|u1s |4 . s=1

Substituting 1 to the bound Cξ |t − s| and choosing R = 1 ∨ (4 + b−1 log(C)), 4d Equation (10.6.5) becomes V4 (N ) ≤ pd 3R(1 ∨ b−2 ) 4!(4d)!. Then 3/4 E|u1s |4 ≤ K7 p3d/2 n−3/2 , so that the last term is less than f (3) A ≤ K7 mε−3 pd/2 n−1/2

(10.6.20)

We conclude that, if ξ is η-dependent: k k |E(f ( uj ) − f ( yj ))| ≤ j=1

K5 m1/2 ε−2 n3/2 (q)

(10.6.21)

j=1

+K6 m3/2 ε−3 n (q) + K7 mε−3 pd/2 n−1/2 , where K5 = 4, K6 = 2 and K7 is the constant in (10.6.20). If ξ is κ-dependent: k k uj ) − f ( yj ))| |E(f ( j=1

≤ K5 m1/3 pd/3 ε−7/3 n4/3 (q)

(10.6.22)

j=1

+K6 m4/3 pd ε−11/3 n (q) + K7 mε−3 pd/2 n−1/2 . Now we have to bound the diﬀerence (10.6.18). We use the following lemma:

10.6. RANDOM FIELDS

245

Lemma 10.4. Let X and Y be two centered Gaussian vector of length m, with respective covariance matrix M and N . Let 0 < ε < 1, and f ∈ Fε . Then |E(f (X) − f (Y ))| ≤ m−1/2 ε−2 M − N 1 , where A1 =

i,j

|Ai,j |.

Proof of lemma 10.4. Because X and Y are Gaussian n √ √ E(f (X) − f (Y )) = f Zk + Xk,n / n − f Zk + Yk,n / n , k=1

where the Xk,n and Yk,n are independent copies √ of X and Y , and Zk = (X1,n +· · ·+Xk−1,n +Yk+1,n +· · ·+Yn,n )/ n. Using the Taylor expansion: n 1 (2) E f (0) · (Xk,n , Xk,n ) − f (2) (0) · (Yk,n , Yk,n ) 2n k=1 1 + 3/2 E f (3) (Vk,n ) · (Xk,n , Xk,n , Xk,n ) − f (3) (Wk,n ) · (Yk,n , Yk,n , Yk,n ) , 6n

E(f (X) − f (Y )) =

Vk,n , Wk,n being random vectors. The term involving f (3) tends to 0, and m 2 ∂ f (0) (2) (2) E f (0) · (Xk,n , Xk,n ) − f (0) · (Yk,n , Yk,n ) = (Mi,j − Ni,j ) , ∂xi ∂xj i,j=1 so that |E(f (X) − f (Y ))| ≤ m−1/2 ε−2 M − N 1 . Apply this lemma to the Gaussian vector Y and yi . The covariance matrix

pd of yi and Y are respectively k n ΣD,p and ΣD , where ΣD,p (s, t) =

h(j)Cov(x0 (s), xj (t))

|j|
Deﬁne the matrix M by M (s, t) =

and h(j) =

d (1 − jl /p). l=1

|j|
Cov(x0 (s), xj (t)). We have that

d 2 d−1 kp ≤ m kqp Σ ||ΣD,p ||∞ + m2 ΣD,p − M ∞ + m2 M − ΣD ∞ . − Σ D,p D n n 1 Because of the convergence of the series deﬁning Σ, ΣD,p ∞ is uniformly bounded

in p and gives the main contribution. Because 1 − h(j) ≤ d|j|/p, and |j|

246

CHAPTER 10. THE EMPIRICAL PROCESS

M − ΣD ∞ is the remainder of a geometric series and is smaller. We conclude that the diﬀerence (10.6.18) satisﬁes: k E f y i − f (Y ) ≤ K8 m3/2 ε−2 qp−1 .

(10.6.23)

i=1

Thus collecting the bounds (10.6.19), (10.6.21) or (10.6.22) and (10.6.23), we infer that the distance between the ﬁnite dimensional distributions of size m satisﬁes the inequalities of Proposition 10.6. Proof of theorem 10.5. Let D be the set of reals (i/m)i=1,...,m . The corresponding δ is m−1 . We collect the results of Proposition 10.4 (oscillation of the empirical process), Proposition 10.5 (oscillation of the Gaussian limit process) and Proposition 10.6 (distance between ﬁdi repartitions). We use lemma 10.2 to conclude. The (1/m)-oscillation of Un is less than K10 m−1/2 log2d+1 (m). The (1/m)oscillation of X is negligible with respect to m−1/2 log2d+1 (m). We choose q = log(n). The terms (10.6.11), (10.6.12) and (10.6.13) are negligible. The minimum rate is obtained for parameters such that m−1/2 log2d+1 (n), m1/2 ε log1/2 (n), m5/2 ε−3 pd/2 n−1/2 and m3/2 ε−2 p−1 log(n) are of same order with respect to n. The solution is 1

m = n 4d+12 (log(n)) p=n

1 d+3

ε=n

−1 4d+12

(log(n))

6d2 +17d+5 4d+12

−2d+2 d+3

,

2 +9d+1 − 2d4d+12

(log(n))

1

,

.

For this choice, the rate of convergence is n 8d+24 (log(n))

10d2 +39d+29 8d+24

.

Chapter 11

Functional estimation In this chapter we are going to consider some methods of functional estimation. We study the estimation of the marginal density of the one weak dependence sequence (Xt )t∈Z and also the estimation of the regression function in a two dimensional model Zt = (Xt , Yt )t∈Z . We will show that the CLT and uniform a.s. convergence results hold under general non causal weak dependence. We also establish sharp results about the MISE of these estimators under more restrictive causal dependence. Our principal goal consists in extending to less restrictive notions of weak dependence results already known for mixing sequences. We end the chapter with an overview over the diﬀerent methods of non parametric estimation in order to extend the ﬁeld of application of the previous results.

11.1

Some non-parametric problems

For a stationary two dimensional process (Zt )t∈Z with Zt = (Xt , Yt ), an important quantity is the regression function r(x) = E(Y0 |X0 = x). Various methods to ﬁt such a function have been developed. Nadaraya-Watson kernel estimates are very popular; see Rosenblatt (1991) [169], Prakasa-Rao (1983) [153], or Robinson (1983) [163], for instance. Volatility. Among other problems, one may wish to estimate the volatility of ﬁnancial times series, v(x) = Var (Xt |Xt−1 = x). The question enters 2 into the regression framework with both Yi = Xi−1 and Yi = Xi−1 since j 2 v(x) = v2 (x) − v1 (x), where vj (x) = E(X1 |X0 = x). Density. Another important problem of econometric interest is to estimate the marginal density f of a stationary sample. Density kernel estimators 247

248

CHAPTER 11. FUNCTIONAL ESTIMATION built by using a kernel function K are usually deﬁned. Derivatives of the density and regression functions can also be estimated by using analogous procedures. Here we simply set Yi ≡ 1.

Quantiles. Conditional quantiles are linked to the conditional distribution by the relation F (y | x) = P(X1 ≤ y|X0 = x). More precisely, we denote by q(t|x) = inf {y / F (y | x) > t} the generalized (right-continuous, with left-limits) inverse of the monotone function y → F (y | x). Set Yt (y) = 1{Xt+1 ≤y} . Consistent estimators of the conditional regression E(Yt (y)|Xt = x) provides information on the previous conditional quantiles. Derivatives of a density. An estimator of the derivative f (ν) of f , where ν is a ∂ ν1 +···+νd f vector (ν1 , . . . , νd ) in Nd and f (ν) = . ∂xν11 . . . ∂xνdd The kernel estimator fˆ of f (see below for a precise statement) gives another estimation fˆ(ν) of f (ν) as: n m1+ν1 +···+νd ∂ ν1 +···+νd K δ ∂ ν1 +···+νd fˆ = ν1 νd ν1 νd m (x − Xi ) ∂x1 . . . ∂xd n ∂x1 . . . ∂xd i=1

11.2

Kernel regression estimates

We now consider a stationary process (Zt )t∈Z with Zt = (Xt , Yt ) where Xt , Yt ∈ R. The quantity of interest is the regression function r(x) = E(Y0 |X0 = x). Let K be some kernel function integrated to 1, Lipschitz and with a compact support. The kernel estimators are deﬁned by fD(x) g (x) D rD(x)

x − Xt , h n x − Xt 1 Yt K , = gDn,h (x) = nh t=1 h 1 K = fDn,h (x) = nh t=1 n

= rDn,h (x) =

gDn,h (x) , if fDn,h (x) = 0; rD(x) = 0, fDn,h (x)

otherwise.

Here h = (hn )n∈N is a sequence of positive real numbers. We always assume that hn → 0, nhn → ∞ as n → ∞. Definition 11.1. Let ρ = a + b with (a, b) ∈ N×]0, 1]. Denote by Ca the set of a-times continuously diﬀerentiable functions. The set of ρ-regular functions Cρ

11.2. KERNEL REGRESSION ESTIMATES

249

is deﬁned as * Cρ = u : R → R u ∈ Ca and ∀K, ∃A ≥ 0,

|x|, |y| ≤ K ⇒ |u(a) (x) − u(a) (y)| ≤ A|x − y|b .

Assuming g ∈ Cρ , one can choose a kernel function K of order ρ (which is not necessarily nonnegative integer) such that the bias bh satisﬁes g (x)) − g(x) = O(hρ ), uniformly on any compact subset of R, bh (x) = E(D see e.g. Rosenblatt (1991) [169]∗ . If, moreover, ρ is an integer with b = 1, ρ = a − 1, then with an appropriately chosen kernel K of order ρ, bh (x) ∼ hρ g (ρ) (x) sρ K(s)ds/ρ!, uniformly on any compact interval. In view of the asymptotic analysis we assume that the marginal density f (·) of X0 exists and is continuous. Moreover, f (x) > 0 for any point x of interest and the regression function r(·) = E(Y0 |X0 = ·) exists and is continuous. Finally, for some p ≥ 1, x → gp (x) = f (x)E(|Y0 |p |X0 = x) exists and is continuous. We set g = f r with obvious shorthand notation. Moreover, we impose one of the following moment conditions:

11.2.1

E|Y0 |S < ∞,

for some

S ≥ p,

(11.2.1)

Eea|Y0 | < ∞,

for some

a > 0.

(11.2.2)

Second order and CLT results

We consider ﬁrst the properties of D g(x). The following conditionally centered equivalent of g2 appears in the asymptotic variance of the estimator rD, G2 (x) = f (x)Var (Y0 |X0 = x) = g2 (x) − f (x)r2 (x). Assume that the densities of the pairs (X0 , Xk ), k ∈ Z+ , exist, and are uniformly bounded: sup f(k) ∞ < ∞. Moreover, uniformly over all k ∈ Z+ , the functions k>0

r(k) (x, x ) = E |Y0 Yk ||X0 = x, Xk = x

(11.2.3)

are continuous. Under these assumptions, the functions g(k) = f(k) r(k) are locally bounded. ∗ Such

kernels can be constructed as K = pd for some known compactly supported (continuous) density d(·) and a polynomial p with d◦ p < ρ. If d(x) = 0 on some inﬁnite set then the system of equations tj K(t)dt = aj (0 ≤ j ≤ p) admits a unique solution for all choices of aj ’s because the quadratic form p → p2 (x)d(x) dx is positive deﬁnite.

250

CHAPTER 11. FUNCTIONAL ESTIMATION

Theorem 11.1. Suppose that the stationary sequence (Zt )t∈Z satisﬁes the conditions (11.2.1) with p = 2 and (11.2.3). Suppose that nδ h → ∞ for some δ ∈]0, 1[. Then √ D 2 nh (D g(x) − ED g (x)) −−−−→ N 0, g2 (x) K (u)du n→∞

and setting g˜(x) = D g(x) − r(x)fD(x) √

2 nh (˜ g (x) − E˜ g (x)) −−−−→ N 0, G2 (x) K (u)du D

n→∞

under any of the weak dependence condition formulated below. For clarity sake we do not precise the assumptions here but the result holds if the decay of the sequence θ(n) = O(n−a ) as n → ∞ for each a > 0 is faster than any Riemanian decay. The same holds too for η(n), κ(n) or λ(n). In order to consider asymptotics for the ratio estimator rD we use a method, already used by Collomb (1984) [38], which consists of studying higher order asymptotics. It is the topic of the next subsection. Theorem 11.2. Suppose that the stationary sequence (Zt )t∈Z satisﬁes the conditions (11.2.1) with p = 2 and (11.2.3). Consider a positive kernel K. Let f, g ∈ Cρ for some ρ ∈]0, 2], and nh1+2ρ → 0. Then, if f (x) = 0, √ G2 (x) D nh rD(x) − r(x) −−−−→ N 0, 2 K 2 (u)du n→∞ f (x) under any of the weak dependence condition formulated below. Assuming that the sequence (Zt )t∈Z is θ-weakly dependent with rate O(r−a ) and a > 3 , Ango Nze, B¨ uhlmann and Doukhan (2002) [6] prove that, uniformly in x belonging to any compact subset of R, 1 1 2 g2 (x) K (u) du + o Var (D g (x)) = , nh nh and

1 1 2 D Var D g(x) − r(x)f (x) = G2 (x) K (u) du + o . nh nh

The exponential moment assumption can be relaxed. Suppose that the stationary sequence (Zt )t∈Z satisﬁes conditions (11.2.1) and (11.2.3) with p = 2, S > 2. Former results then hold if the sequence (Zt )t∈Z is weak dependent with η(r) = O(r−a ) and a > 3S − 4/(S − 2) + 2δ /(S − 2), or λ(r) = O(r−a ) and a > 4S − 4/(S − 2) + 2δ /(S − 2).

11.2. KERNEL REGRESSION ESTIMATES

251

The CLT convergence Theorem 11.1 holds, under the conditions (11.2.1) with p = 2 and (11.2.3), if the stationary sequence (Zt )t∈Z is η or κ-weak dependent with rate O(r−a ) and 1 2 + 2(2 + j)δ a > αj (δ) = min max (2 + j, 3(2 + j)δ) , max 2 + j + , , δ 1+δ where j = 1 or j = 2 according respectively to η or λ dependence assumption. These results extend Doukhan and Louhichi (2001) [68], valid for the case of the density function fD, to the estimate D g under weak dependence. The ﬁrst right hand side term is obtained by Bernstein’s blocking technique. The second right hand side term results from the application of the Lindeberg method (see Rio (2000) [161]). The CLT convergence Theorem 11.2 relies on the expansion u

−1

=

p i=0

i

(−1)i

p+1

(u − u0 ) (u − u0 ) + (−1)p+1 i+1 u0 uup+1 0

,

(11.2.4)

where p = 2, u = bn , u0 = Ebn = 1, and rD(x) = an /bn (if bn = 0) with n x − Xi * x − X0 an = Yi K nEK , hn hn i=1 n x − Xi * x − X0 K nEK . bn = hn hn i=1 Using the Rosenthal inequalities described in § 4.3 and the aforementioned CLT, we obtain the CLT convergence Theorem 11.2 for the regression function, under conditions (11.2.1) for p = 2 and (11.2.3), if the stationary sequence (Zt )t∈Z is either η or κ-weakly dependent with rate O(r−a ), with a > αj (δ) and a > 3 ∨ 9(2+j) 7−4δ (j = 1 under η and j = 2 under κ dependence). The results stated in Theorem 11.1 and Theorem 11.2 also hold for ﬁnite dimensional convergence. The components are asymptotically jointly independent, much in the same way that for i.i.d. sequences. A rapid sketch of the proof for theorem 11.1. We proceed as in Rio (2000) [161] and more speciﬁcally as in Coulon-Prieur and Doukhan (2000) [40] for density estimation in a causal case. The case of non causal coeﬃcients is considered with Bernstein blocks as in Doukhan and Louhichi (2001) [68]. Here for 0 ≤ t ≤ n we consider Lipschitz truncationsat level M = M (n): x − Xt Tt (x) = Yt 1|Yt |≤M + M (n)1Yt >M − M 1Yt <−M K . h

n Then for a suitable constant σn , σn i=1 Tt (x) approximates the expression √ D nh D g(x) − r(x)f (x) (after centering).

252

CHAPTER 11. FUNCTIONAL ESTIMATION

√ Now a CLT may be proved for ξt = σn nh (Tt (x) − ETt (x)). The second assertion is a consequence of the ﬁrst one, where Yt is replaced by Yt − r(x). See Ango Nze et al., 2002, [6], Ango Nze & Doukhan, 2004, [7]. Remarks • A more easy and eﬃcient way to get such results may obtained by using lemma 7.1, see Bardet et al. (2006) [10] for density estimation and a paper by Nicolas Ragache will clarify the regression estimation case. ˆ be an estimate of a function H (among • Finite repartitions distributions. Let H the previously cited). Then a central limit result for √ ˆ ˆ − EH(x)) → N (0, s2 (x)) Zn (x) = nh(H(x) extends to a multivariate central limit theorem (Zn (x1 ), . . . , Zn (xk )) → Nk (0, Σ) where Σ denotes the diagonal matrix with entries (s2 (x1 ), . . . , s2 (xk )). The previous process is not tight in C[0, 1] at points for which s2 (x) = 0.

11.2.2

Almost sure convergence properties

Theorem 11.3. Let (Zt )t∈Z be a stationary sequence satisfying the conditions (11.2.1) with p = 2 and (11.2.3). Then under the forthcoming conditions, (i) There exists a sequence ( n )n∈N with nh/ ( n log(n)) → ∞ as n → ∞ such that for any M > 0, 8

n log(n) , almost surely. g (x) − ED g (x)| = O sup |D nh |x|≤M (ii) Assume now that inf |x|≤M f (x) > 0. If f, g ∈ Cρ for some ρ ∈]0, ∞[, h ∼ ( n log(n)/n)1/(1+2ρ) , then $ ρ/(1+2ρ) 9

n log(n) sup |D r(x) − r(x)| = O , almost surely. n |x|≤M Remark. Under conditions (ii) of Theorem 11.3, but assuming only the weaker condition about the bandwidth sequence nδ h → ∞, for some δ ∈]0, 1[, we obtain sup |D r (x) − r(x)| = o(1),

almost surely.

|x|≤M

For the sake of simplicity, we shall only consider the geometrically dependent case and we defer a reader to Ango Nze, B¨ uhlmann and Doukhan (2002) [6] for Riemanian decays.

11.2. KERNEL REGRESSION ESTIMATES

253

Theorem 11.4. Let (Zt )t∈Z be a stationary sequence satisfying the conditions (11.2.1) with p = 2 and (11.2.3), and either η or κ-weak dependent with geometric decay rate. (i) If nh/ log4 (n) → ∞, then for any M > 0, almost surely, sup |D g (x) − ED g (x)| = O

|x|≤M

log2 (n) √ nh

(ii) For any M > 0, if f, g ∈ Cρ for some ρ ∈]0, ∞[, h ∼

.

log4 (n) n

1/(1+2ρ) and

inf |x|≤M f (x) > 0, then, almost surely, $ sup |D r(x) − r(x)| = O

|x|≤M

log4 (n) n

ρ/(1+2ρ) 9 .

Proof of Theorem 11.3. We keep usual notations and denote by C a universal constant (whose value can change from one place to another). Assume that E(exp(a|Y0 |)) < ∞. Then

P

g (x) − g˜(x)| > 0 sup |D

|x|≤M

≤ n P (|Y0 | ≥ M0 log(n)) ≤ Cn1−M0 ,

and, by the Cauchy-Schwarz inequality, E F x − X0 1 h1/3 −M0 n g (x) − g˜(x)| ≤ E |Y0 |1{|Y0 |≥M0 log(n)} |K . sup E|D | ≤ h h h |x|≤M We can now reduce computations to the case of a density estimator, as in Doukhan and Louhichi (2001) [68]. Assume that the interval [−M, M ] is covered by Lν intervals with diameter 1/ν (here ν = ν(n) depends on n and we denote by Ij the j-th interval and xj the center of the interval). Assume that the relation hν → ∞ holds (for n → ∞). Assume that the compactly supported kernel K vanishes if t > R0 . Liebscher (1996) [121] exhibits another kernel-type density estimate g˜ based on an even, continuous, kernel, decreasing on [0, ∞[, constant on [0, 2R0 ], taking the value 0 at t = 3R0 (with compact support). Then, he proves that sup |˜ g (x) − E˜ g (x)| ≤ |˜ g (xj ) − E˜ g (xj )| + x∈Ij

C (|˜ g (xj ) − E˜ g (xj )| + 2|E˜ g (xj )|) . hν

254

CHAPTER 11. FUNCTIONAL ESTIMATION

Therefore, for any λ > 0,

≤

2λ log(n) 1 P sup |D g(x) − ED g (x)| ≥ √ + h1/3 n−M0 + C h hν nh x∈[−M,M] λ g(x1 ) − E˜ g (x1 )| ≥ √ C n1−M0 + Lν · sup P |˜ x1 nh λ

√ g (x1 ) − E˜ g (x1 )| ≥ + Lν · sup P |˜ . x1 nh

The exponential inequality (4.3.31) completes the proof of assertion (i). The proof of assertion (ii) is standard and we defer to [6] for a proof .

11.3

˜ MISE for β-dependent sequences

We now consider the problem of estimating the unknown marginal density f from the observations (X1 , . . . , Xn ) of a stationary sequence (Xi )i≥0 . In this context, Viennet (1997) [185] obtained optimal results for the MISE under the

condition k>0 β(σ(X0 ), σ(Xk )) < ∞ for a β-mixing sequence Xn . We wish to extend Viennet’s results to sequences satisfying only ˜ (11.3.1) β(σ(X 0 ), Xk ) < ∞ . k>0

For kernel density estimators , this can be done by assuming only that the kernel K is BV and Lebesgue integrable. For projection estimators, it works only if the basis is well localized, because our variance inequality is less precise than that of Viennet. Note that Condition (11.3.1) is much less restrictive than Viennet’s, for it contains many non mixing examples. In particular, since f is supposed to be square integrable with respect to the Lebesgue measure, the distribution older. Hence, we infer from lemma 5.1 point 3.i) that function F of X0 is 1/2-H¨

(11.3.1) holds as soon as k>0 (τ (σ(X0 ), Xk ))1/3 < ∞. If f is bounded (11.3.1)

holds as soon as k>0 (τ (σ(X0 ), Xk ))1/2 < ∞.

Variance inequalities ˜ = β(σ(X ˜ We set β(i) 0 ), Xi ). The main results of this section are the following upper bounds (compare to Theorems 1.2 and 1.3(a) in Rio (2000) [161] for the mixing coeﬃcients α(σ(X0 ), σ(Xi ))). Proposition 11.1. Let K be any BV function such that |K(x)|dx is ﬁnite. Let (Xi )i≥0 be a stationary sequence, and deﬁne 1 Yk,n . n n

Yk,n = h−1 K(h−1 (x − Xk ))

and

fn (x) =

k=1

(11.3.2)

˜ 11.3. MISE FOR β-DEPENDENT SEQUENCES

255

The following inequality holds

nh

Var (fn (x))dx ≤

K 2 (x) dx + 2

n−1

˜ β(k) dK

|K(x)|dx .

k=1

We now come to the case of projection estimators deﬁned with more details in the next section: Proposition 11.2. Let (ϕi )1≤i≤n be an orthonormal system of L2 (R, λ) (λ is the Lebesgue measure) and assume that each ϕi is BV. Let (Xi )i≥0 be a stationary sequence, and deﬁne 1 ϕj (Xk ) n n

Yj,n =

and

fn =

m

Yj,n ϕj .

(11.3.3)

j=1

k=1

The following inequality holds n

m n−1 m ˜ Var (fn (x))dx ≤ sup ϕ2j (x) + 2 dϕj |ϕj (x)| . β(k) sup x∈R j=1

k=1

x∈R j=1

˜ ˜ Remark 4. Since β(M, X) ≤ φ(M, X), Propositions 11.1 and 11.2 apply to

n−1 ˜

n−1 For dynamical systems satisfying (3.3.1) with 2 i=1 ak instead of i=1 β(k). kernel estimators this can be also deduced from a variance estimate given in Prieur (2001) [154]. Proof of Proposition 11.1. We start from the elementary inequality Var (fn (x)) ≤

n−1 1 2 Y0,n 22 + |Cov(Y0,n , Yi,n )| . n n i=1

Now h Y0,n 22 (x)dx = (K(x))2 dx. To complete the proof, we apply Proposition 5.3: h |Cov(Y0,n , Yi,n )|(x)dx ˜ ≤ dKE b(σ(X0 ), Xi ) |Y0,n (x)|dx ≤ β(i)dK |K(x)|dx . Proof of Proposition 11.2. Since (ϕi )1≤i≤n is an orthonormal system of L2 (R, λ) we have that m Var (Yj,n ) . Var (fn (x))dx = j=1

256

CHAPTER 11. FUNCTIONAL ESTIMATION

Applying Proposition 5.3, we obtain that Var (Yj,n ) ≤ ≤

n−1 1 2 ϕj (X0 )22 + |Cov(ϕj (X0 ), ϕj (Xk ))| n n

1 2 ϕj (X0 )22 + n n

k=1 n−1

dϕj E(|ϕj (X0 )|b(σ(X0 ), Xk )) .

k=1

To complete the proof we sum in j: n

Var (fn (x))dx ≤

m

Eϕ2j (X0 ) + 2

j=1

n−1

m E b(σ(X0 ), Xk ) dϕj |ϕj (X0 )| . j=1

k=1

Some function spaces In this section we recall the deﬁnition of the spaces Lip∗ (s, 2, I), where I is either R or some compact interval [a, b] (see DeVore and Lorentz (1993) [59], Chapter 2). Let Irh = R if I = R and Irh = [a, b − rh] otherwise. For any h ≥ 0, let Th be the translation operator Th (f, x) = f (x + h) and Δh = Th − T0 be the diﬀerence operator. By induction, deﬁne the operators Δrh = Δh ◦ Δr−1 h . Let λ be the Lebesgue measure on I and .2,λ the usual norm on L2 (I, λ). The modulus of smoothness of order r of a function f in L2 (I, λ) is deﬁned by ωr (f, t)2 = sup Δrh (f, ·)1Irh 2,λ , 0≤h≤t

For s > 0, Lip∗ (s, 2, I) is the space of functions f in L2 (I, λ) such that f s,2,I = f 2,λ + sup t>0

ω[s]+1 (f, t)2 < ∞. ts

These spaces are Banach spaces with respect to the norm .s,2,I . Recall that Lip∗ (s, 2, I) is a particular case of Besov spaces (precisely Lip∗ (s, 2, I) = Bs,2,∞ (I)) and that it contains Sobolev spaces Ws (I) = Bs,2,2 (I).

Application to Kernel estimators If fn is deﬁned by (11.3.2), set fh = E(fn ). Let r be some positive integer, and assume that the kernel K is such that: for any f belonging to the Sobolev space Wr (R) we have (f (x) − fh (x))2 dx ≤ M1 h2r f (r) 22 , (11.3.4)

˜ 11.3. MISE FOR β-DEPENDENT SEQUENCES

257

for some constant M1 depending only on r. From (11.3.4) and Theorem 5.2 page 217 in DeVore and Lorentz (1993) [59], we infer that, for any f in L2 (R, λ), (f (x) − fh (x))2 dx ≤ M2 (wr (f, h)2 )2 , for some constant M2 depending only on r. This last inequality implies that, if f belongs to Lip∗ (s, 2, R) for r − 1 ≤ s < r, then (f (x) − fh (x))2 dx ≤ M2 h2s f 2s,2,R . This evaluation of the bias together with Proposition 11.1 leads to the following Corollary. Corollary 11.1. Let r be some positive integer. Let (Xi )i≥1 be a stationary sequence with common marginal density f belonging to Lip∗ (s, 2, R) with r − 1 ≤ s < r, or to Ws (R) with s = r. Let K be a BV function satisfying (11.3.4) and such that |K(x)|dx is ﬁnite. Let fn be deﬁned by (11.3.2) with h = n−1/(2s+1) . If (11.3.1) holds, then there exists a constant C such that E (fn (x) − f (x))2 dx ≤ Cn−2s/(2s+1) . Here are two well known classes of kernel satisfying (11.3.4). Example 1. One says that K is a kernel of order k, if 1. K(x)dx = 1, K 2 (x)dx < ∞ and |x|k+1 |K(x)|dx < ∞ . 2.

xj K(x)dx = 0 for 1 ≤ j ≤ k.

If K is a Kernel of order k, then it satisﬁes (11.3.4) for any r ≤ k + 1. For instance, the naive kernel K = (1/2)1]−1,1] is BV and of order 1. Consequently Corollary 11.1 applies to functions belonging to Lip∗ (s, 2, R) for s < 2, or to W2 (R). A footnote on page 249 proves the existence of such kernels. Example 2. Assume that the fourier transform K ∗ of K satisﬁes |1 − K ∗ (x)| ≤ M |x|r for some positive constant M . Then K satisﬁes (11.3.4) for this r. For instance, K(x) = sin(x)/(πx) satisﬁes (11.3.4) for any positive integer r. Unfortunately, it is neither BV nor integrable. Another function satisfying (11.3.4) for any positive integer r is the analogue of the de la Vall´ee-Poussin kernel V (x) = (cos(x) − cos(2x))/πx2 . This function is BV and integrable, so that Corollary 11.1 applies to any function belonging to Lip∗ (s, 2, I) for s > 0.

258

CHAPTER 11. FUNCTIONAL ESTIMATION

Application to unconditional systems. Proposition 11.2 is of special interest for orthonormal systems (ϕi )i≥1 satisfying the two conditions: √ P1. There exists C1 independent of m such that max dϕi ≤ C1 m. 1≤i≤m

P2. There exists C2 independent of m such that sup

m

√ |ϕj (x)| ≤ C2 m.

x∈R j=1

An orthonormal system satisfying P2 is called unconditional. For such systems, we obtain from Proposition 11.2 that n

n−1 ˜ Var (fn (x))dx ≤ m C22 + 2C1 C2 β(k) .

(11.3.5)

k=1

Example 1 (piecewise polynomials). Let (Qi )1≤i≤r+1 be an orthonormal basis of the space of polynomials of order r on [0, 1] and deﬁne the function Ri on R by: Ri (x) = Qi (x) if x belongs to ]0, 1]and 0 otherwise. We consider the regular partition of ]0, 1] into k intervals ](j − 1)/k, j/k] 1≤j≤k . Deﬁne √ the functions Ri,j (x) = kRi (kx − (j − 1)). Clearly the family (Ri,k )1≤i≤r+1 is an orthonormal basis of the space of polynomials of order r on the interval [(j − 1)/k, j/k]. Let m = k(r + 1) and (ϕi )i≥1 be any family such that ϕi 1 ≤ i ≤ m = Ri,j 1 ≤ j ≤ k, 1 ≤ i ≤ r + 1 . (11.3.6) The orthonormal system (ϕi )i≥1 satisﬁes P1 and P2 with C1 = (r + 1)−1/2 max dRi and C2 = (r + 1)−1/2 sup 1≤i≤r+1

r+1

|Ri (x)|.

x∈[0,1] i=1

√ The case of histograms corresponds to r = 0. In that case ϕj = k1](j−1)/k,j/k] . √ Clearly C2 = 1 and dϕj = 2 k, so that C1 = 2. Assume now that X0 has a density f such that f 1[0,1] belongs to Lip∗ (s, 2, [0, 1]). Suppose that r > s − 1, and denote by f¯ the orthogonal projection of f on the subspace generated by (ϕi )1≤1≤m . From Lemma 12 in Barron et al. (1999) [12] we know that there exists a constant K depending only on s such that 0

1

(f (x) − f¯(x))2 dx ≤ Km−2s .

(11.3.7)

Since f¯ = E(fn ), we obtain from (11.3.5) and (11.3.7) the following corollary.

˜ 11.3. MISE FOR β-DEPENDENT SEQUENCES

259

Corollary 11.2. Let (Xi )i≥1 be a stationary sequence with common marginal density f such that f 1[0,1] belongs to Lip∗ (s, 2, [0, 1]). Let r be any nonnegative integer such that r > s − 1 and k = [n1/(2s+1) ]. Let (ϕi )1≤i≤m be deﬁned by (11.3.6) and fn be deﬁned by (11.3.3). If (11.3.1) holds, then there exists a constant C such that 1 E (fn (x) − f (x))2 dx ≤ Cn−2s/(2s+1) . 0

Example 2 (wavelet basis). Let {ej,k , j ≥ 0, k ∈ Z} be an orthonormal wavelet basis with the following convention: e0,k are translate of the father wavelet and for j ≥ 1, ej,k = 2j/2 ψ(2j x− k), where ψ is the mother wavelet. Assume that these wavelets are compactly supported and have continuous derivatives up to order r (if r = 0, the wavelets are supposed to be BV). Let g be some function with support in [−A, A]. Changing the indexation of the basis if neces 2j M

sary, we can write g = j≥0 k=1 aj,k ej,k , where M ≥ 1 is some ﬁnite integer

J depending on A and on the size of the wavelets supports. Let m = j=0 2j M and (ϕi )i≥1 be any family such that ϕi 1 ≤ i ≤ m = ej,k 0 ≤ j ≤ J, 1 ≤ k ≤ 2j M . (11.3.8) The orthonormal system (ϕi )i≥1 satisﬁes P1 and P2. Assume now that X0 has a density f belonging to Lip∗ (s, 2, R) with compact support in [−A, A]. Denote by f¯ the orthogonal projection of f on the subspace generated by (ϕi )1≤i≤m . From Lemma 12 in Barron et al.(1999) [12] we know that there exist a constant K depending only on s such that 1 (f (x) − f¯(x))2 dx ≤ K2−2Js . (11.3.9) 0

Since f¯ = E(fn ), we obtain the following corollary from (11.3.5) and (11.3.9). Corollary 11.3. Let (Xi )i≥1 be a stationary sequence with common marginal density f belonging to Lip∗ (s, 2, R) and with compact support in [−A, A]. Let r be any nonnegative integer such that r > s − 1 and J be such that J = [log2 (n1/(2s+1) )]. Let (ϕi )1≤i≤m be deﬁned by (11.3.8) and fn be deﬁned by (11.3.3). If (11.3.1) holds, then there exists a constant C such that E (fn (x) − f (x))2 dx ≤ Cn−2s/(2s+1) .

a ˜ Remark 5. More generally, if ni=1 β(σ(X 0 ), Xi ) = O(n ) for some a in [0, 1[, −2s(1−a)/(2s+1) we obtain the rate n for the MISE in Corollaries 11.1, 11.2 and 11.3. Note that if (11.3.1) holds the rate n−2s/(2s+1) is known to be optimal for i.i.d. observations.

260

11.4

CHAPTER 11. FUNCTIONAL ESTIMATION

General Kernels

We are aimed to describe other (linear) estimation procedures obtained by replacement of the kernel Kn (x, y) = h−1 n K (y − x/hn ) by another one. As a meta-theorem we claim that theorems 11.1 and 11.3 remain valid if the convolution kernels are replaced by any of the forthcoming one; moreover those results extend to the estimation questions formulated in § 11.1; the needed assumptions are still a fast enough Riemanian decay of the weak dependence coeﬃcient sequence. The more precise theorem 11.4 requires geometric decay rates. Even if we do not write deﬁnitive convergence results, we provide below all the bounds needed to extend results in § 11.2 to other types of estimators. Projections. Our discussion will be made of three steps: we consider successively ﬁnite order polynomial functions (Tchebichev), inﬁnite order kernels (Dirichlet) and summation methods (Cesaro, developed in the case of De la Vall´ee-Poussin and Fejer on the one hand and Abel-Poisson on the other hand with Melher), and ﬁnally generating functions. For the sake of completeness several classical results are here reminded and we are most of the time working on R rather than Rd , the generalization being straightforward. Orthogonal polynomials. Consider an interval I of R and a real Hilbert space L2 (I, μ) with μ dominated by the Lebesgue measure. The Schmidt orthogonalization of the family of monomial functions {1, x, x2 , . . . } according to the scalar product of L2 (I, μ) leads to the Tchebichev polynomials Pj . Such polynomials verify for all j, deg(Pj ) = j. Denoting κn the highest coeﬃcient of Pn (x), orthogonality of the families implies, for constant Bn+2 , a three terms recurrence relation κn+2 κn+2 κn Pn+2 (x) = x + Bn+2 Pn+1 (x) − 2 Pn (x). κn+1 κn+1 Examples. If I = (−1, 1) and μ(dt) = dt then this gives the Legendre polyno2 mials, if I = (−∞, ∞) and μ(dt) = e−t dt we obtain the Hermite polynomials and if I = (0, ∞) and μ(dt) = e−t dt this leads to Laguerre polynomials. If f is in L2 (I, μ), a natural way to estimate f is by mean of projection on an orthonormal polynomial basis through the estimation fˆn that naturally arises : n m(n) 1 ˆ fn (x) = Pj (Xi )Pj (x) n i=1 j=1

The kernel we have to consider is thus given by

m(n)

Km(n) (Xi , x) =

j=1

Pj (Xi )Pj (x).

11.4. GENERAL KERNELS

261

√ Application to functional results with um (Xi , x) = mKm (Xi , x) as usual. Indeed, the Christoﬀel-Darboux formula (three terms recurrence relation: see e.g. Szeg¨o [182], p. 43), implies that the polynomial functions yield the kernel: Km (x, y) =

κm Pm+1 (x)Pm (y) − Pm (x)Pm+1 (y) κm+1 x−y

where κm is the highest coeﬃcient of Pm (x). One remarks that the division by x− y does not change the polynomial character of Km because x = y is a root of m Km (x, check ∞ κm t . Thereby √ we can straightforwardly √y). Then Km (x, y) ∼1/2 and that m|Km (Xi , t)|dt m−1/2 on every that mKm (Xi , ·)∞ m compact subdomain of I. An example on L2 ([−1, 1], dt) is given by Legendre polynomials also known by the formula: (−1)k dn (1 − x2 )n . Lk (x) = n 2 n! dxn Dirichlet kernel. Consider now the family of trigonometric polynomials functions {cos(nx), sin(nx)}n∈N . This family is well-known to be a dense subset of the set of 2π-periodical continuous functions, denoted C ([0, 2π], dt) by Weierstrass density theorem. If we consider the projection Sn on the trigonometric subspace span cos kx, sin nx k ≤ n = Tn then the natural associated kernel is Dirichlet kernel: Dm (x, y) =

m k=−m

eik(x−y) = 1 + 2

m k=1

cos k(x − y) =

(x − y) sin (2m+1) 2 1 sin 2 (x − y)

The Dirichlet kernel is also of inﬁnite order but is not positive. Thereby one cannot use the previous monotone convergence argument. Consider a function f in C α, ([0, 2π]), the space of r times diﬀerentiable, 2π periodical functions on [0, 2π]. A theorem of Jackson (cf. Doukhan and Sifre [74] p. 217) asserts that Rm (f ) = sup Dm (y − x)(f (y) − f (x))dy = O(m−α log m) x

This implies that the optimal bandwidth condition for Fourier approximation in our case is not deﬁned as a power of n. This furthermore implies that whenever f is in C α, ([0, 2π], R) we cannot use an optimal window to control the bias and the covariance of the estimate. Then in that case we need to erase the bias by taking a1 suboptimal window, i.e. a window whose order is such that Rm (f ) = o n− α . Summation methods. The purpose of summation methods is to transform

262

CHAPTER 11. FUNCTIONAL ESTIMATION

natural kernel into non-negatives, more regular kernels, thereby turning over the above problems. We ﬁrst deﬁne summation methods by the way of a weight sequence {am,j / m ∈ N, 0 ≤ j ≤ m}. Then for all j ≤ m: am,j →m→∞ 0, and

m a j=1 m,j = 1. The weighed kernel is deﬁned as a generalized Cesaro’s mean a Km (x, y)

=

m

am,j Kj (x, y) =

j=0

m

am,j

j=0

j

Pi (x)Pi (y)

i=0

In the sequel we will omit the superscript a. The summations method when well chosen lead to improving result on the bias. Consider now the main classical examples. The de la Vall´ ee-Pousin kernel is obtain by considering the means of the Fourier truncated series: Sm,n =

1 (Sn + · · · + Sm+1 ) m−n

The kernel associated to this summation of projection is given by: Dm,n (t) =

1 sin m−n

(m+n)t 2

sin (m−n)t 2

sin2

t 2

When m takes the value 0, the kernel specializes into the Fejer’s kernel Fm of order m, wich is also the result of a Cesaro mean of the Dirichlet Kernel Dm , setting in that case am,j = m−1 1{1≤j≤m} , 2 F˜m (Xk , x) = mDm (Xk − x) = Fm (Xk − x) ∼ m−1

The Fejer kernel is nonnegative then for all f ∈ C 1, ([0, 2π]) an equivalent of the bias is easily found with a > 0 Rn,m (f )(x)

=

1 n

= m

n

[0,2π] i=1 1/2

= m1/2

m1/2 Fm (y − x)f (y)dy − f (x)

[0,2π]

[0,2π]

Fm (u)f (x + u)dy −

[0,2π]

Fm (u)f (x)du

Fm (u) (f (x + u) − f (x)) du

Now let t = mu and expand f to order one according to Taylor’s formula; the positivity of the kernel implies Rn,m (f )(x) = O m−3/2 , thus the optimal 1 bandwidth condition is m = O(n 6 ) and we may again adapt the CLT. 2p β, −1 For f ∈ C ([0, 2π]) with β > 1 use Jackson kernel Jp,m = qp,m Dm−1 where qp,m normalizes Jp,m .

11.4. GENERAL KERNELS

263

Generating function and Abel summation. Another way of adding regularity to kernels is with analytic extension of the generating functions. Note that this is close to the principles of Abel summations. e.g. the example of Hermite polynomial functions leads to a kernel known as Melher kernel, see [182]. The Hermite polynomial functions Hn (x) are deﬁned by: 2 1 dk (−1)k k (e−ν )(x) Hk (x) = √ k 1/2 dν 2 k!π

The double generating function of this family is of the form: Kt (Xi , x) =

∞

1 1 − ((X 2 +x2 )(t2 +1)−4tXk x) e 2(1−t2 ) k tk Hk (Xi )Hk (x) = # 2 π(1 − t ) k=0

And admits an analytic continuation in t = 1. The Mehler kernel Mm (x) is deﬁned by setting t = 1 − 1/m(n) and studying the behavior when m(n) → ∞. Then we once more have a result. Wavelets basis. Another important class of projection estimates is given by wavelets. Consider a wavelet basis derived from a scaling function φ(x). Then the projection f˜ of order m of a density f over span{φm,k = md/2 φ(md x − k) / k ∈ Z} is f˜(x) =

∞

αm,k φm,k (x)

where

αm,k =

f (t)φm,k (t)dt

k=0

Empirical coeﬃcients are: α ˆ n,m,k = n−1 ˆ estimate f writes as: fˆn,m (x) =

∞

n

α ˆ n,m,k φm,k (x) = n−1

i=1

φm,k (Xi ). Thus the projection

n ∞

φm,k (Xi )φm,k (x)

k=−∞ i=1

k=−∞

We deﬁne the wavelet kernel KM as KM (Xi , x) =

∞

φM,k (Xi )φM,k (x)

k=−∞

The number of terms in the summation is ﬁnite for ﬁnite M whenever the function φ is assumed to have a compact support. This hypothesis is not necessary and we only assume here that wavelets are regular enough (cf. Daubechies (1988) [41]): ∃C > 0, ∃α > 1, ∀x ∈ R: |φ(x)| ≤ C/(1 + |x|)α . This yields a fast

264

CHAPTER 11. FUNCTIONAL ESTIMATION

decay of the kernel: |φ(M d x)|dx |KM (Xi , x)|dx ≤ =

M −d = O(M −d+d(α−1) ) = O(M d(α−2) ), (1 + M d |x|)α−1 O(M −d M −2dα ) O(M d(1−2α) ), a.s. if x = 0, P(Xi = 0) = 0.

We now set m = M 2d(2α−2) . It is remarkable that for x = 0 (and only for this point if we assume regularity conditions) one must set m = M 2d(α−2) hence the speed of convergence is lower. δ = 1/2d(1 − 2α) holds in our theorem. We now note Km instead of KM . Then Km (x, y) = md K(mx, my) for K(x, y) =

∞ k=−∞ φ(y − k)φ(x − k) and bias writes, Em (f )(x) = E(fˆn,m (x) − f (x)) ≤ Km (y, x)f (y)dy − f (x) = md K(md y, md x)(f (y) − f (x))dy t = md K(md x + t, md x) f x + − f (x) dt m Let now the kernel K have regularity r i.e. K(u, u − t)tv dt = 0 if 0 < v < r and K(u, u − t)tr dt = 0. If now f ∈ C r ([a, b], R), then by the Taylor formula, x+t/m n−1 tj ur−1 t (j) f f (r) du, we obtain f x+ (x) + = j r−1 m m j! m (r − 1)! x j=0 x+t/m t 1 K x, x + Em (f )(x) = ur−1 f (r−1) (u)dudt mr−1 (r − 1)! m x If f (r−1) is bounded, the second integral is O((t/m)r ) as Em (f )(x). Remarks. • Em (f )(x) ∼ h(mx)m−r where h is a bounded function when m goes to inﬁnity. The precise characteristics of h depends on the wavelet considered. h in general is a pseudo-periodic function. • Now consider the quadratic deviation. By positivity of the summed functions, the second moment Vm (f )(x) answers t Vm (f )(x) = m2d K 2 (md x + t, md x) f x + − f (x) dt m ∼ m K 2 (u, u + t)dt.

Chapter 12

Spectral estimation Parametric estimation from a sample of a stationary time series is an important statistic problem both for theoretical research and for its practical applications to real data. Whittle’s approximate likelihood estimate is particularly attractive for numerous models like ARMA, linear processes, etc. mainly for two reasons: ﬁrst, Whittle’s contrast does not depend on the marginal law of the time series but only on its spectral density, and second, its computation time is smaller than other parametric estimation methods such as exact likelihood. Numerous papers have been written on this estimation method after Whittle’s seminal paper and in particular Hannan (1973) [102], Rosenblatt (1985) [167] and Giraitis and Robinson (2001) [94] established the asymptotic normality respectively for Gaussian, causal linear, strong mixing processes as well as for ARCH(∞) processes. This chapter is organized as follows. A ﬁrst section details the expression of the spectral densities of some standard models. A second section addresses the asymptotic behavior of the empirical periodogram. This is a non consistent estimator of the spectral density but it yields a consistent parametric procedure called Whittle estimation. A last section is aimed to derive the properties of a bandwidth estimate of the spectral density. This estimate is obtained by convoluting the periodogram with an approximate Dirac measure. Its second order properties are easily derived through a simple diagram formula but higher order asymptotic needed to conclude to a.s. convergence properties need a dependence tool introduced in § 12.3.2.

12.1

Spectral densities

The following models are already described and their weak dependence properties are checked in chapter 3. Nevertheless, this is an important feature to 265

266

CHAPTER 12. SPECTRAL ESTIMATION

provide precise expressions of their spectral properties. Non-causal (two-sided) linear processes. Let X be a zero mean stationary non causal (two-sided) linear time series satisfying: Xk =

∞

aj ξk−j

for k ∈ Z,

(12.1.1)

j=−∞

with (ak )k∈Z ∈ RZ and (ξk )k∈Z a sequence of zero mean i.i.d. random variables such that E(ξ02 ) = σ 2 < ∞ and E(|ξ0 |2 ) < ∞. We set a ˜k = a−k and a ˜ = (˜ ak )k∈Z . Therefore the spectral density of X exists and satisﬁes: 2 ∞ σ 2 −ikλ ak e f (λ) = 2π k=−∞ ⎛ ⎞ ∞ ∞ σ2 ⎝ = ak−j aj e−ikλ ⎠ 2π j=−∞ k=−∞

∞ σ (a ∗ a ˜)k e−ikλ . 2π 2

=

(12.1.2)

k=−∞

There exist very few explicit results in the case of two-sided linear processes. Causal GARCH and ARCH(∞) processes. The famous and from now on classical GARCH(q , q) model, introduced by Engle (1982) [84] and Bollerslev (1986) [23], is given by relations Xk = ρk ξk

with

ρ2k

= a0 +

q

2 aj Xk−j

j=1

+

q

cj ρ2k−j ,

(12.1.3)

j=1

where (q , q) ∈ N2 , a0 > 0, aj ≥ 0 and cj ≥ 0 for j ∈ N and (ξk )k∈Z are i.i.d. random variables with zero mean (for an excellent survey about ARCH modelling, see Giraitis et al. (2005) [93]). Under some additional conditions, the GARCH model can be written as a particular case of ARCH(∞) model, introduced in Robinson (1991) [165], that satisﬁes Xk = ρk ξk

with ρ2k = b0 +

∞

2 bj Xk−j ,

(12.1.4)

j=1

with a sequence (bj )j depending on the family (aj ) and (cj ). Diﬀerent suﬃcient conditions can be given for obtaining a m-order stationary solution to (12.1.3)

12.1. SPECTRAL DENSITIES

267

or (12.1.4). Notice that for both models (12.1.3) or (12.1.4), the spectral density is a constant. As a consequence, the periodogram is that of the squared processes; in the GARCH case (see Bollerslev (1986) [23]) is based on the ARMA representation satisﬁed by (Xk2 )k∈Z . Indeed, if (Xk ) is a solution of (12.1.3) or (12.1.4), then (Xk2 ) can be written as a solution of a particular case of the bilinear equation (see § 3.4.2): ∞ ∞ 2 2 + λ1 b0 + λ1 bj Xk−j bj Xk−j Xk2 = εk γb0 + γ j=1

for k ∈ Z,

j=1

with εk = (ξk2 −λ1 )/γ for k ∈ Z, λ1 = Eξ02 and γ 2 = Var (ξ02 ). Moreover, the time ∞ −1 series (Yk )k∈Z deﬁned by Yk = Xk2 − λ1 b0 1 − λ1 bj for k ∈ Z, satisﬁes j=1

the bilinear equation with parameter c0 = 0. Hence, a suﬃcient condition for the stationarity of (Xk2 )k∈Z with X02 m < ∞ is (ε0 m + 1) ·

∞

|bj | < 1

j=1

⇐⇒

∞ ξ 2 − λ 1 m 0 +1 · |bj | < 1. γ j=1

Set σ 2 = E(X02 − ρ20 ), the spectral density of (Xk2 )k∈Z is f (λ) =

∞ −2 σ 2 bj · eijλ . 1 − 2π j=1

The method developed in Giraitis and Robinson (2001) [94] for establishing the central limit theorem satisﬁed by the periodogram is essentially ad hoc and can not be used for non causal or non linear time series. The recent book of Straumann (2005) [181] also provides an up-to-date and complete overview to this question. Chapter 8 of this book is devoted to the results in Mikosch and Straumann (2002) [131] that studied the case of intermediate moment conditions of order > 4 and < 8 for the special case of GARCH(1,1) processes. Causal Bilinear processes. Now, assume that X = (Xk )k∈Z is a bilinear process (see the seminal paper of Giraitis and Surgailis (2002) [95]) satisfying the equation ∞ ∞ aj Xk−j + c0 + cj Xk−j Xk = ξk a0 + j=1

for k ∈ Z,

(12.1.5)

j=1

where (ξk )k∈Z are i.i.d. random variables with zero mean and such that ξ0 p < ∞ with p ≥ 1, and aj , cj , j ∈ N are real coeﬃcients. Assume c0 = 0 and deﬁne

268

CHAPTER 12. SPECTRAL ESTIMATION

the generating functions

∞

∞ C(z) = j=1 cj z j A(z) = j=1 aj z j

∞

∞ G(z) = (1 − C(z))−1 = j=0 gj z j H(z) = A(z)G(z) = j=1 hj z j .

∞

∞ ∞ |a | + |c | < 1, If ξ0 p j=1 |hj | < ∞, for instance when ξ0 p · j j j=1 j=1 there exists a unique zero mean stationary and ergodic solution X in Lp (Ω, A, P) of equation (12.1.5). For p ≥ 2, the covariogram of X is ∞ ∞ −1 h2j gj gj+k R(k) = a20 ξ0 2 1 − and

j=1

k

j=0

|R(k)| < ∞. The spectral density of X exists and satisﬁes: f (λ) =

∞ ∞ a20 σ 2

∞ 2 gj gj+k e−ikλ , 2π 1 − j=1 hj k=−∞ j=0

with σ 2 = ξ0 22 . Non-causal (two-sided) bilinear and LARCH(∞) processes. linear process X = (Xk )k∈Z satisﬁes the equation Xk = ξk a0 + aj Xk−j , for k ∈ Z,

The bi(12.1.6)

j∈Z∗

where (ξk )k∈Z are i.i.d. random bounded variables and (ak )k∈Z is a sequence of

real numbers such that λ = ξ0 ∞ · j =0 |aj | < 1. Then the spectral density of X exists and is deﬁned by −2 ∞ σ 2 ijλ 1 − f (λ) = b e j . 2π j=1 In the previous expression, the coeﬃcients bj are not written explicitly as functions of the initial parameters (ai )i∈Z and a. By the same way as in the causal case, assume now that Y = (Yk )k∈Z satisﬁes the relation B 2 , Yk = ξk a0 + aj Yk−j for k ∈ Z, (12.1.7) j =0

with the same assumptions on (ξk )k∈Z and (ak )k∈Z . Then, the time series (Yk2 )k∈Z satisﬁes the relation (12.1.6) and is a stationary process. Then, Y is a stationary process, so-called a two-sided LARCH(∞) process. The condition on the sequence (ξk )k∈Z , i.e. i.i.d. random bounded variables, is restricting. However, if it is only a suﬃcient condition for the existence of a non causal LARCH(∞) process; it seems to be very close to be also a necessary condition, see Doukhan and Wintenberger (2005) [77].

12.2. PERIODOGRAM

269

Non-causal linear processes with dependent innovations. Let X be a zero mean stationary non causal (two-sided) linear time series satisfying eqn. (12.1.1) with now a dependent and centered stationary innovation process: Eξ02 = σ 2 .

Eξ0 = 0,

Then denoting by fξ the spectral of the process (ξt ) we have fξ (0) = σ 2 /(2π), moreover the spectral density of X exists and satisﬁes:

f (λ) =

2 ∞ −ikλ ak e fξ (λ) k=−∞

=

fξ (λ)

∞

(a ∗ a ˜)k e−ikλ .

(12.1.8)

k=−∞

Examples of interest are orthogonal series like mean zero LARCH(∞) processes. In this case no additional information about ξ may be obtained from the expression of this spectral density. Another important case is thus that of a process ξ with a non constant spectral density; we previously recalled the precise expression of this spectral density for the case of Bilinear processes introduced by Giraitis and Surgailis (2002) [95].

12.2

Periodogram

Let X = (Xk )k∈Z be a zero mean fourth-order stationary time series with real values. Denote (R(i))i the covariogram of X, and (κ4 (i, j, k))i,j,k the fourth cumulants of X: R(i) = κ4 (i, j, k) =

Cov(X0 , Xi ) = E (X0 Xi ), EX0 Xi Xj Xk − EX0 Xi EXj Xk −EX0 Xj EXi Xk − EX0 Xk EXi Xj ,

for (i, j, k) ∈ Z3 . We will use the following assumption on X: Assumption M. X is such that: γ=

∈Z

R()2 < ∞ and κ4 =

|κ4 (i, j, k)| < ∞.

(12.2.1)

i,j,k∈Z

This assumption is closely linked to weak dependence in the comments following deﬁnition 4.1 and by using the notation (4.4.9) with lemma 4.11 and propositions

270

CHAPTER 12. SPECTRAL ESTIMATION

2.1 and 2.2. Assumption M ensures the existence of X’s spectral density f ∈ L2 ([−π, π[): 1 R(k) eikλ for λ ∈ [−π, π[. 2π

f (λ) =

k∈Z

The periodogram of X is deﬁned as: 2 n 1 −ikλ Xk e In (λ) = , 2πn

for λ ∈ [−π, π[.

k=1

We now rewrite In (λ)

=

1 D Rn (k)e−ikλ 2π |k|
Dn (k) = R

1 n

(n−k)∧n

Xj Xj+k

j=1∨(1−k)

Dn (k) is a biased estimate of R(k). Here R Thus, the periodogram In (λ) is a natural estimator of the spectral density; unfortunately it is not a consistent estimator, as its variance does not tend to zero as n tends to inﬁnity. However, once integrated with respect to some L2 function, its behavior becomes quite smoother and can provide an estimation of the spectral density. Now, let g : [−π, π[→ R a 2π-periodic function such that g ∈ L2 ([−π, π[) and deﬁne: π Jn (g) = g(λ)In (λ) dλ, the integrated periodogram of X −π π g(λ)f (λ) dλ. and J(g) = −π

Theorem Let c > 0. Assume that the function g(λ)= ∈Z g eiλ

12.1 (SLLN). satisﬁes ∈Z (1 + ||)s g2 ≤ c for some s > 1. If X satisﬁes Assumption M, then uniformly with respect to such functions g, Jn (g) → J(g)

a.s.

This theorem will be proved after two lemmas of independent interest. Lemma 12.1. If X satisﬁes Assumption M, then: Dn () ≤ κ4 + 2γ. n max Var R ≥0

12.2. PERIODOGRAM

271

Proof of Lemma 12.1. To prove this result, we denote Yj, = Xj Xj+ − R(), use the identity Cov(Y0, , Yj, ) = κ4 (, j, j + ) + R(j)2 + R(j + )R(j − ) and deduce from the stationarity of (Yj, )j∈Z when is a ﬁxed integer: Dn () nVar R

≤

(n−)∧n (n−)∧n 1 |Cov(Yj, , Yj , )| n j=1∨(1−) j =1∨(1−) |Cov(Y0, , Yj, )|

≤

|κ4 (, j, j + )| + 2R(j)2

≤

κ4 + 2γ,

≤

j∈Z

j∈Z

by using Cauchy-Schwarz inequality for 2 -sequences.

Lemma 12.2. Denote cs = ∈Z (1 + ||)−s . If X satisﬁes assumption of Theorem 12.1, then: 3c γ + cs (κ4 + 2γ) . E|Jn (g) − J(g)|2 ≤ n Proof of Lemma 12.2. As in Doukhan and Le´on (1989) [66], we use the decomposition: ⎧ ⎪ R() g , ⎪ T1 (g) = ⎪ ⎪ ⎪ ||≥n ⎪ ⎪ ⎨ 1 || R() g , Jn (g) − J(g) = −T1 (g) − T2 (g) + T3 (g) with T2 (g) = n ⎪ ||
(12.2.2) Remark that T3 (g) = Jn (g) − EJn (g). Thus, we obtain the inequality: E|Jn (g) − J(g)|2 ≤ 3(|T1 (g)|2 + |T2 (g)|2 + E|T3 (g)|2 ). Cauchy-Schwarz inequality yields: c (1 + ||)−s R()2 ≤ R()2 , |T1 (g)|2 ≤ c n ||≥n ||≥n c 2 c 2 −s 2 |T2 (g)| ≤ || (1 + ||) R() ≤ R()2 . n2 n ||
||
272

CHAPTER 12. SPECTRAL ESTIMATION γ . Lemma 12.1 entails n Dn () − ER Dn ())2 , ≤ c (1 + ||)−s (R

Hence, |T1 (g)|2 + |T2 (g)|2 ≤ |T3 (g)|2

||
E|T3 (g)|2

Dn ()) (1 + ||)−s Var (R

≤

c

≤

1 (1 + ||)−s (κ4 + 2γ) n

||
||
≤

c cs ·

κ4 + 2γ , n

We combine those results to deduce Lemma 12.2. Proof of Theorem 12.1. We derive the strong law of large numbers from a weak L2 -LLN and from Lemma 12.2. The proof follows the scheme of the proof of the standard strong LLN. Set t > 0. First, we know that for all random variables X and Y , we have P (|X + Y | ≥ 2t) ≤ P (|X| ≥ t) + P (|Y | ≥ t). Thus: P 2 max 2 |Jn (g) − J(g)| ≥ 2t ≤ P(|Jk2 (g) − J(g)| ≥ t) k ≤n<(k+1) + P 2 max 2 |Jn (g) − Jk2 (g)| ≥ t k ≤n<(k+1)

and P max |Jn (g) − J(g)| ≥ 2t ≤ n≥N

∞ √

P(|Jk2 (g) − J(g)| ≥ t)

k=[ N ]

+ ≤

∞ √ k=[ N ]

P

max

k2 ≤n<(k+1)

|Jn (g) − Jk2 (g)| ≥ t 2

AN + BN .

(12.2.3)

From Bienaym´e-Tchebychev (or Markov) inequality, Lemma 12.2 implies: AN ≤

C1 1 · , 2 t2 √ k

(12.2.4)

k≥ N

n () = R Dn () − ER Dn (). The ﬂuctuation term BN with C1 ∈ R+ . Now set R is more involved and its bound is based on the same type of decomposition as (12.2.2), because for k 2 < n: Jn (g) − Jk2 (g) = −T1 (g) + T2 (g) − T3 (g),

12.2. PERIODOGRAM

273

with now,

T1 (g) =

R() g ,

k2 ≤||
1 || R() g , n 2 k ≤||
T2 (g) = and T3 (g) =

||
||
As previously, |T1 (g)|2 + |T2 (g)|2 ≤ c · Set Lk =

max

k2 ≤n<(k+1)2

BN ≤

γ . k2

|Jn (g) − Jk2 (g)| and Tk∗ (g) =

√ k≥ N

bk ,

max

k2 ≤n<(k+1)2

bk = P (Lk ≥ t) ≤

with

|T3 (g)|, then,

E(L2k ) . t2

Now E(L2k ) ≤ 3(|T1 (g)|2 + |T2 (g)|2 + E|Tk∗ (g)|2 ) ≤

3cγ + 3E|Tk∗ (g)|2 . k2

Then, for k 2 ≤ n < (k + 1)2 and ∈ Z, n () = R Δ,n,k

=

k2 Rk2 () + Δ,n,k n n∧(n−) 1 Yh, . n 2 2 h=(k ∧(k −))+1

n () = Δ,n,k in such a case. k2 () = 0 if k 2 ≤ || ≤ n and thus R Remark that R Also note that Δ∗,k

=

1 max |Δ,n,k | ≤ 2 k2 ≤n<(k+1)2 k

(k2 +2k)∧((k2 +2k)−)

h=(k2 ∧(k2 −))+1

and thus, E(Δ∗,k )2

≤ ≤

1 (2k)2 max 2 E(|Yh, |2 ) 4 k (h,)∈Z 4 E(|X0 |4 ). k2

|Yh, |,

274

CHAPTER 12. SPECTRAL ESTIMATION

Write T3 (g)

=

||
|Tk∗ (g)| ≤

k2 Rk2 () 1 − Δ,n,k g g − n 2

2 |Rk2 () g | + k 2 ||

||
Δ∗,k |g |,

||<(k+1)2

and we thus deduce for a constant C1 > 0, 4 C1 cA ∗ 2 ∗ 2 D E|Tk (g)| ≤ 2C1 sup Var (Rk2 ()) + sup E(Δ,k ) ≤ 2 k ∈Z k2 ∈Z for a constant A > 0 depending on E|X0 |4 , κ4 , and γ only. Hence bk ≤ 3(γ + AC1 )/(k 2 t2 ) is a summable series and, with C2 > 0, BN ≤

C2 1 · . 2 t2 √ k

(12.2.5)

k≥ N

Then, (12.2.3), (12.2.4) and (12.2.5) imply sup |Jn (g) − J(g)| → 0 in probabiln≥N

ity, so that Jn (g) → J(g) a.s. Two frames of weak dependence are considered here, the θ-weak dependence property and the non causal η-weak dependence.

12.2.1

Whittle estimation

The previous examples are essentially explicit representations of the spectral density of some commonly used times series. In the case when the coeﬃcients are functions of an unknown ﬁnite dimensional parameter β, a way to estimate this parameter is to use the contrast J(gβ−1 ) where f (λ) = σ 2 gβ (λ) denotes the spectral density of the model according to the value β of the parameter and moreover gβ (0) = 1. We thus exhibit two parameters, σ 2 and β. Let (X1 , . . . , Xn ) be a sample from X. Deﬁne the Whittle maximum likelihood estimators of β ∗ and σ ∗2 , that are π In (λ) −1 D dλ and βn = Argminβ∈K Jn (gβ ) = Argminβ∈K −π gβ (λ) 1 σ Dn2 = Jn (g −1 Dn ). β 2π In Bardet, Doukhan and Le´ on (2005) [11] it is shown that strong consistency of the estimators βDn and σ Dn2 may be proved by using extensions of the previous tools. First a SLLN can be deduced of the one of the previous section and

12.3. SPECTRAL DENSITY ESTIMATION

275

secondly the CLT is derived by using the previous representation of the periodogram and the results in § 7.1. To our knowledge, the known results about asymptotic behavior of Whittle parametric estimation for non-Gaussian linear processes are essentially devoted to one-sided (causal) linear processes (see for instance, Hannan (1973) [102], Hall and Heyde (1980) [100], Rosenblatt (1985) [168], Brockwell and Davis (1988) [28]). There exist very few results in the case of two-sided linear processes. In Rosenblatt ((1985), p. 52 [168]) a condition for strong mixing property for two-sided linear processes was given, but some restrictive conditions on the process were also required for obtaining a central limit theorem for Whittle estimators: the distribution of random variables ξk has to be absolutely continuous with respect to the Lebesgue measure with a bounded variation density, m > 4 + 2δ with δ > 0 and a central

∞ limit theorem obtained with a tapered periodogram (under assumption also m=1 α4, ∞ (m)δ/(2+δ) < ∞ where α4, ∞ (m) ≥ αm denote a strong mixing coeﬃcient deﬁned now with four points in the future instead of 2 for α m ). The case of strongly dependent twosided linear processes was also treated by Giraitis and Surgailis (1990) [95] or Horvath and Shao (1999) [108]. In the case of causal linear processes, it is well known that: √ n(βDn − β ∗ ) → Np 0 , 2π(W ∗ )−1 , √ σ Dn2 is a consistent estimate of σ 4 and therefore n(D σn2 − σ ∗ ) → N 0 , σ ∗4 γ4 , √ √ with γ4 the fourth cumulant of the (ξk )k∈Z , and n(βDn − β ∗ ) and n(D σn2 − σ ∗ ) are asymptotically normal and independent.

12.3

Spectral density estimation

In this section, X = (Xn )n∈Z denotes a vector valued stationary sequence (with values in RD ). The spectral matrix density of X writes

fX (λ) =

t∈Z

Cov(X0 , Xt )eitλ

276

CHAPTER 12. SPECTRAL ESTIMATION

here, Rt = Cov(X0 , Xt ) = EX0T Xt − EX0T EXt is a D × D−matrix. This D × D−matrix valued function is estimated by fDX (λ)

= = =

Fm In (λ) ≡

π

−π

Fm (μ)In (λ − μ)

dμ 2π

+ n 1 |k − l| XkT Xl ei(k−l)λ 1− n m k,l=1 1 |s| D 1− Rn (s)e−isλ , 2π m

(12.3.1) (12.3.2) (12.3.3)

|s|≤m

Dn (s) = 1 R n

(n−s)∧n

(12.3.4)

j=1∨(1−s)

isλ = where Fm (λ) = |s|<m 1 − |s| m e the matrix periodogram is deﬁned as In (λ) =

XjT Xj+s

1 n

sin(m+ 12 )λ sin λ 2

2 is the Fe´jer Kernel and

n 1 T Xk Xl ei(l−k)λ n k,l=1

if the sequence Xt is centered at expectation. Note that in equation (12.3.4), the summation contains n − |s| terms hence this estimate of Rs is biased. The following relation links the spectral density to the limit variance Σ in the CLT for X = (Xn )n∈Z fX (0) =

Cov(X0 , Xs ).

s∈Z

Assume

s

|s|ρ Rs < ∞ where · is any matrix norm, then

bias(λ)

=

bias(λ)

= ≤ =

fDX (λ) − fX (λ)

|s| |s| isλ isλ

Rs e

1− 1− − 1 Rs e −

m n

|s|<m

|s|≥m 1 ρ 1 |s|Rs + ρ |s| Rs n s m s 1 1 + O n mρ

12.3. SPECTRAL DENSITY ESTIMATION

12.3.1

277

Second order estimate

First recall the 4-th order cumulant of a centered random vector (x, y, z, t) ∈ R4D writes, for 1 ≤ a, b, c, d ≤ D, (a,b,c,d)

κ4

(x, y, z, t) = Ex(a) y (b) z (c) t(d) − Ex(a) y (b) Ez (c) t(d) − Ex(a) z (c) Ey (b) t(d) − Ex(a) t(d) Ey (b) z (c)

Ex(a) y (b) z (c) t(d) − Ex(a) y (b) Ez (c) t(d)

(a,b,c,d)

= κ4

(x, y, z, t)

(a) (c)

z

Ey (b) t(d)

(a) (d)

Ey (b) z (c) .

− Ex

− Ex

t

Set now (a)

(b)

(c)

(d)

(a)

κ(a,b,c,d) (i, j, k) = κ4 (X0 , Xi , Xj , Xk ),

(b)

r(a,b) (i) = EX0 Xi . (12.3.5)

Assume that for 1 ≤ a, b, c, d ≤ D,

|j|ρ r(a,b) (j)

=

rρ(a,b) < ∞,

(12.3.6)

κ(a,b,c,d) < ∞.

(12.3.7)

j∈Z

κ(a,b,c,d) (i, j, k) = i,j,k∈Z

Then we may rewrite

fD(λ) − EfD(λ)

=

n

Zk,n (λ),

(12.3.8)

k=1

Zk,n (λ)

=

δn,k,s (λ)Yk,s ,

with

(12.3.9)

|s|<m

Yk,s δn,k,s (λ)

= XkT Xk+s − R(s) |s| −isλ = 1− 1{1≤k+s≤n} e m

(12.3.10) (12.3.11)

278

CHAPTER 12. SPECTRAL ESTIMATION

hence setting fD(λ) = fD(a,b) (λ) 1≤a,b≤D , each coordinate writes for 1 ≤ a, b, c, d ≤ D as: (a,b,c,d) Var fD(λ) = Cov fD(a,b) (λ), fD(c,d) (λ) n 1 |j| (a,b,c,d) (λ), = 1− Cj,n n n |j|
max

1≤a,b≤D

max |ua,b |,

1≤a,b≤D

|αa,b,c,d | ≤ D2

1≤c,d≤D

max

1≤a,b,c,d≤D

|αa,b,c,d |.

(12.3.12)

(12.3.13)

The variance of this estimate is a 4−th order tensor which writes (using notation (12.3.13)) (a,b,c,d) ≤ 1 Var fD(λ) |κ(a,b,c,d) (s, j, j + t)| n |j|
+

1 n

+

1 n

|r(a,c) (j)||r(b,d) (j + t − s)|

|j|
|r(a,d) (j + t)||r(b,c) (j − s)|,

|j|
(a,b,c,d) Assumptions (12.3.7) implies that 2m+1 bounds the ﬁrst sum and after n κ an interversion of summations w.r.t. t and s, assumption (12.3.6), both other (a,c) (b,d) (a,d) (b,c) terms are bounded above by 2m+1 r1 and 2m+1 r1 , respectively. n r0 n r0 This entails ⎛ ⎞ D D D

2m + 1

(a,c) (b,d) ⎝ max κ(a,b,c,d) + 2 max r0 r1 ⎠ ,

Var fD(λ) ≤ 1≤a,b≤D 1≤a,b≤D n c=1 c,d=1

d=1

12.3. SPECTRAL DENSITY ESTIMATION

279

we thus obtain Lemma 12.3. Assume that the vector valued stationary sequence (Xn )n∈Z satisﬁes conditions (12.3.7) and (12.3.6) for some ρ ≥ 1, then (using notation (12.3.12))

2

1 1 m

+ + E fD(λ) − f (λ) ≤ C n2 m2ρ n −2ρ/(2ρ+1) by setting m = n1/(2ρ+1) . This expression is optimized as O n

12.3.2

Dependence coeﬃcients

This section is aimed to prove that bounds of higher order moments needed for deriving a.s. convergence of spectral density estimates are analogue to those computed for densities estimates in chapter 11, see § 11.2.2. Spectral estimates are written as second order polynomials of the initial process. We thus need a translation table to compute the properties of the initial process in order to derive asymptotics. In connection with § 4.3 and § 4.4.1 we now make use of the decorrelation coeﬃcients (2.2.1), the following lemma is a ﬁrst rough bound which relates those coeﬃcients to those built upon the sequence Z = (Zk )k∈Z deﬁned in an analogous way to (12.3.9), for some ﬁxed complex valued sequence δk,s ∈ C such that supk,s |δk,s | ≤ 1 (here the dependence with respect to λ is omitted). In order to obtain a suitable bound of this coeﬃcient, it seems unavoidable to make use of the previous diagram formula. (a ,b ) (a ,b ) (al+1 ,bl+1 ) (a ,b ) cZ,q (r) = max · · · Zkq q q Cov Zk1 1 1 · · · Zkl l l , Zkl+1 k1 ≤ · · · ≤ kq kl+1 − kl = r 1 ≤ a1 , . . . , aq ≤ D 1 ≤ b1 , . . . , bq ≤ D

Those coeﬃcients are already deﬁned for vector valued sequences in (4.4.7). Set, (a ,b ) (a ,b ) (al+1 ,bl+1 ) (a ,b ) C = Cov Zk1 1 1 · · · Zkl l l , Zkl+1 · · · Zkq q q if kl+1 − kl = r, then (in a condensed notation) (a ,b ) (aq ,bq ) (al+1 ,bl+1 ) (a ,b ) · · · Y C = Cov Yk1 ,s1 1 1 · · · Ykl ,sl l l , Ykl+1 kq ,sq ,sl+1 |s1 |,...,|sq |≤m

≤

q

u

|κμi (k,s) (X (a,b) )|

u=1 μ1 ,...μu |s1 |,...,|sq |≤m i=1

where the previous sum extends on such undecomposable diagrams such that some μi is not entirely contained neither in the past nor in the future of the

280

CHAPTER 12. SPECTRAL ESTIMATION ⎧ (1, 1), ⎪ ⎪ ⎨ (2, 1), ⎪ ... ⎪ ⎩ (l, 1),

table Past

Future

(1, 2) (2, 2) ... (l, 2)

⎧ ⎨ (l + 1, 1), (l + 1, 2) ... ... ⎩ (q, 1), (q, 2)

The number of such regular diagrams is thus q! (each one is deﬁned by one term in the past and one term in the future). In the general case, set λi = #μi for 1 ≤ i ≤ u, then if μi contains vi terms (ji,1 , 2), . . . , (ji,vi , 2) from the second column of the table u κμi (k,s) (X (a,b) ) Cμ1 ,...,μu = |s1 |,...,|sq |≤m i=1

|Cμ1 ,...,μu | ≤

u

|κμi (k,s) (X (a,b) )|

i=1 |sji,1 |,...,|sji,v |≤m i

For the general case we need to consider sums of cumulants indexed by ν(s) = (h1 + s1 , . . . , hi + si , h 1 , . . . , h j ) κμ(s) (X) |s1 |,...,|si |≤m

Using stationarity, it is easy to check that this sum is bounded by (2m+1)w κi+j (i + j = #μ) where w = 7 0 or 1 according to the fact that j = 0 or j = 0. Hence |Cμ1 ,...,μu | ≤ (2m + 1)w ui=1 κ#μi where w is the number of those μi entirely in the second column of the table, hence w ≤ q2 . In fact, the non Gaussian partitions (those with μi > 2 for some i) have a contribution of a lower order O(m[q/2]−1 ). Taking in account that in each partition one term has factors both in the Past and in the Future and lemma 4.11 we thus obtain Theorem 12.2. If the sums of cumulants of X, κp < ∞ are ﬁnite for each p ≤ 4q (which holds if r (r + 1)p−2 c∗X,p (r) < ∞ from lemma 4.12) then there is a constant Kq > 0 such that cZ,q (r) ≤ Kq (2m + 1)[q/2] c∗X,2q (r − 2m). Using inequality (4.4.12) we thus derive the main result of this section. Corollary 12.1. Let 2p ≥ 2 be an even integer such that κq < ∞ for q ≤ 4p, then m p EfD(λ) − EfD(λ)2p = O n

12.3. SPECTRAL DENSITY ESTIMATION

281

• Note that lemma 4.12 prove that the previous conditions may also be expressed in terms of weak dependence coeﬃcients (4.4.8). • Moreover the condition m(n) q <∞ n n implies a.s. convergence of such spectral density estimates as this was done in § 11.2.2 for regression estimates.

Chapter 13

Econometric applications and resampling Essentially few rigorous results are stated in this chapter. We are aimed here to check how weak dependence may be applied in standard applications. This last chapter is more aimed at providing reasonable directions for further investigations of times series. The chapter is organized as follows. In order to provide deep econometric motivations, Section 13.1 includes several situations where various weak dependence conditions arise. After some generic examples including bootstrapping, we consider speciﬁc problems including unit root problems and parametric or semiparametric problems in § 13.1.1, 13.1.2 and 13.1.3. A following section 13.2 reviews the question of bootstrap; some models based bootstraps are ﬁrst considered (see also § 13.2.4). We consider the block bootstrap in § 13.2.1 and § 13.2.2 addresses GMM estimation for which weak dependence provides a complete proof of the results in Hall and Horowitz (1996) [101]. We also mention conditional bootstrap in § 13.2.3 and sieve bootstrap in § 13.2.4. Finally in Section 13.3 we study more completely the problem of limiting variance (in the central limit theorem) estimation under η-weak dependence.

13.1

Econometrics

Time series analysis is a major part of econometrics. Here we provide several examples of interest in which it is essential to consider dependent structures instead of simple independence. In some situations like bootstrap mixing notions seem useless (see § 13.2.2). An application concerns linearity tests in time series analysis. Rios (1996) [162] considers a stationary functional autoregressive model (13.2.1) where r = L + C is the decomposition of the autoregression 283

284

CHAPTER 13. ECONOMETRICS AND RESAMPLING

function into a sum of linear (L) and nonlinear (C) components. Local linearity of r is then tested via the null hypothesis 2 H0 : (r

(x)) w(x) dx = 0 where the weight function w has compact support. Rios (1996) [162] proves that a local linearity test can be handled in the strong mixing case. The function r is assumed to be ρ continuous. Then the plug-in estimator TD = rD2 (x)w(x)dx converges to T = r2 (x)w(x)dx if αn = O(n−a ) and a > 2 + 3/ρ and the 1 1 bandwidth condition hn ∈ [n− 10 , n− 2ρ−4 ]. This result may be extended to weak dependence as in Ango Nze et al. (2002) [6]. Still another problem of interest is to test the independence of the innovations (ξn )n∈Z in a regression model Xn = aYn + ξn . This can be performed using the Durbin-Watson statistic which is a non correlation test. The latter can be written as a continuous functional of the Donsker line of the sequence (ξn )n∈Z . Some other applications are detailed in the forthcoming subsections.

13.1.1

Unit root tests

Consider a stationary autoregressive sequence (Xn )n∈Z generated by an i.i.d. sequence (ξn )n∈Z , Xn = aXn−1 + ξn . A classical problem is to test whether there is a unit root (that is a = 1). In the speciﬁc context of aggregated time series, the assumption of white noise innovations seems to be rather strong. Phillips (1987) [145] develops unit root tests for mixing and heterogeneously distributed innovations. The ordinary least square estimate D a is shown to be a continuous functional of the Donsker line of the sequence (ξn )n∈Z . As an application of the functional central limit theorem, Phillips shows that a unit root test can be based on the fact that under the null hypothesis H0 : a = 1, σξ2 1 D 2 n (D a − 1) −−−−→ 1 W1 − 2 n→∞ 2 σ Wt2 dt 0

∞ where W denotes the standard Brownian motion and σ 2 = −∞ Cov(ξ0 , ξk ); in the initial i.i.d. frame σξ2 /σ 2 = 1 but this is no more the case for dependent innovations. The author works with stationary strong mixing sequences, and conditions under which the functional CLT result holds are given before. This example, as the author suggests, can be generalized to error sequences (ξn )n∈Z that allow for heteroskedasticity.

13.1. ECONOMETRICS

13.1.2

285

Parametric problems

Generalized method of moments (GMM) estimation procedures involve an estimate θDn , which is a solution of the arg-min problem Jn (θDn ) = minθ∈Θ Jn (θ), where n n 1 1 Jn (θ) = g(Xi , θ) Ω g(Xi , θ) . (13.1.1) n i=1 n i=1 Here Θ ⊂ Rd is a ﬁnite dimensional parameter set, and g(·, ·) is a given function such that Eθ0 g(X1 , θ0 ) = 0, where θ0 is the true parameter point. In the time series context, the positive semi-deﬁnite matrix Ω is often replaced (see Hall and Horowitz, 1996, equation (3.2) [101]) by an asymptotically optimal weight matrix estimate Ωn (θ) H(x, y, θ)

=

n κ 1 g(Xi , θ)g(Xi , θ) + H(Xi , Xi+j , θ), n i=1 j=1

= g(x, θ)g(y, θ) + g(y, θ)g(x, θ) ,

and κ is such that Eg(Xi , θ)g(Xj , θ) = 0 if |i − j| > κ. The statistic to test H0 : 1 n θ = θ0 is Jn (θ) = Kn (θDn ) Kn (θDn ), where Kn (θ) = √1n Ωn (θ) 2 i=1 g(Xi , θ) (the square root of a symmetric positive matrix is uniquely deﬁned). The following CLT holds under standard mixing assumptions: √ D θDn − θ −−−−→ Nd (0, Id ), (13.1.2) Tn (θ) = n Σ−1 n n→∞

where the diagonal matrix Σn has d entries. GMM techniques naturally involve an unknown covariance matrix. In order to estimate such limiting distributions it will be natural to use the bootstrap techniques described in § 13.2.1.

13.1.3

A Semi-parametric estimation problem

We follow the presentation in Robinson (1989) [164]. He considers an economic variable observable at time n which is an R × 1 vector of r.v.’s (Wn )n∈Z . We observe Wn at time n = 1 − P, 2 − P, . . . , T where P is nonnegative and T large.

Hypotheses of economic interest often involve a subset Xn = B(Wn , . . . , Wn−P )

of the array Wn , . . . , Wn−P ; for this B is a J × (P R) matrix formed from the P R−rowed identity matrix IP R by omitting P R − J rows (which means that in B, P R − J elements of Wn , . . . , Wn−P are deleted). Thus, in B, elements of Wn , Wn−1 , . . . , Wn−P which are not in Xn are deleted, and Xn can have elements in common with Xn+P −1 , . . . , Xn+1 , Xn−1 , . . . , Xn−P . Let Xn = (Yn , Zn ) , where Yn and Zn are K × 1 and L × 1 vectors (K + L = J). The problem of interest is to test the hypothesis E(Yn |Zn ) = 0 against the

286

CHAPTER 13. ECONOMETRICS AND RESAMPLING

alternative E(Yn |Zn ) = 0. This null hypothesis is written in the form τ = 2 H(z, z)f (z)dz = 0 for M = 0 and RL

τ= H(z, z) f (z), f (1) (z) , . . . , f (M) (z) f (z)dz = 0 RL

for some M > 0 and some function H(z, z) deﬁned as G(x1 , x2 )dF (y1 |z1 )dF (y2 |z2 ) H(z1 , z2 ) = RK ×RK

for some convenient function G and where F (A|z) = P(Yn ∈ A|Zn = z) for any Borel set A of RK and z ∈ RL and x1 = (y1 , z1 ) and x2 = (y2 , z2 ) . Here f (j) (z) denotes the vector of j−partial derivatives of f . An example of this framework is given by Xn = (Yn , Zn ) , where Yn = (tn , s n )

and Zn = vn . The regression model tn = β (sn − En sn ) + γ En sn + un

(13.1.3)

is of common use in econometrics. Here sn , tn , vn are respectively scalars, p × 1 and q × 1; they are observable random sequences while the innovation process (un ) is centered and unobservable, so that E(un |sn , tn ) = 0; we denote En ( · ) = E(·|vn ). In the case of a weakly dependent and stationary innovation process, Robinson (1989) [164] considers the hypothesis H0 : β = 0. In this case, the hypothesis can be written as before and Robinson calculates β = τ where K = p + 1, L = 1, M = 0 and G(x1 , x2 ) = (t1 − t2 )s1 φ(v1 ) for some function φ : Rq → R (usually φ ≡ 1). Robinson considers the ˆ = nD D −1 τD constructed from the n−sample (X1 , . . . , Xn ). Here, statistics λ τ Ω n 1 D is the natural esG(Xi , Xj )k(Zi − Zj /h) is a U −statistics and Ω τD = 2 L n h i,j=1

D = 1 timator of the covariance matrix of τD. One such estimate is Ω ci c . n i=1 i Tapered versions might be preferred (see formula (2.21) of Robinson, 1989 n di,j + dj,i with di,j = G(Xi , Xj )k(Zi − Zj )/h, where [164]), here ci = n

j=1 β−mixing assumpk(z) = h−L k(z), h−1 k (1) (z) , . . . , h−M k (M) (z) . Under √ tions, Robinson proves that the above estimates are n−consistent and satisfy a CLT. Under a natural β−mixing condition, Robinson proves in fact that D has asymptotically a χ2 −distribution if βj = O(j −b ) where the statistic λ b > μ/(μ − 2) under the moment assumption supi,j E|G(Xi , Xj )|μ < ∞. The β−mixing assumption allows to compare the joint distribution of the initial sequence with respect to a sequence of r.v.’s with independent blocks. This

13.2. BOOTSTRAP

287

reconstruction is due to Berbee’s coupling lemma, no matter how big the size of the blocks may be. Yoshihara (1976) [193] derives a covariance inequality that ﬁts to U −statistics. A way to get rid of β−mixing conditions is to consider an ˜ n of the trajectory X1 , . . . , Xn . Now a simpler ˜1, . . . , X independent realization X estimator of τ is given by n ˜j 1 − Z Z i ˜ j )k τ = 2 L . G(Xi , X n h i,j=1 h The asymptotic behavior of this expression is easy to derive under

alternative weak dependence conditions by using our results because τ = h1L ni=1 Wn,h (Xj ) is the numerator of a Nadaraya-Watson kernel for the regression estimation problem E(s1 (t1 − t)|v1 = z) in the special case of the previous example. In fact this trick avoids the corresponding coupling construction for U −statistics.

13.2

Bootstrap

Consider the following example concerning bootstrap: let a stationary autoregressive sequence be generated by an independent and identically distributed (i.i.d.) and centered sequence (ξn )n∈Z : Xn = r(Xn−1 ) + ξn .

(13.2.1)

Standard nonparametric estimation techniques provide an estimate of the autoregression function r. Let rD be a convenient estimator of r. Given data (X1 , . . . , Xn ) from the sequence 13.2.1, another autoregressive process can be deﬁned by Dn−1 ) + ξn∗ . Gn = rD(X (13.2.2) X The innovations (ξn∗ ) are i.i.d. drawn according to the centered empirical measure of the estimated centered residuals, 1 ξj , ξDi = ξi − n j=1 n

ξi = Xi − rD(Xi−1 ),

1 ≤ i ≤ n.

From the ﬁrst example § 1.5 this is clear that no mixing assumption can be expected for the model (13.2.2). However, our concept of fading memory can still be applied. Bickel and B¨ uhlmann (1999) [18] set up such a new weak dependence condition in order to build critical bootstrap values for a linearity test in linear models. Doukhan and Louhichi (1999) [67] have extended it in order to ﬁt models such as positively dependent sequences, Markov chains (with or without topological assumptions), and Bernoulli shifts. The Bernoulli shifts are deﬁned in Assumption 1 of Hall and Horowitz (1996) [101] and are used throughout

288

CHAPTER 13. ECONOMETRICS AND RESAMPLING

that paper. The above mentioned weak dependence conditions yield standard √ results concerning convergence in distribution with a n-normalization. If now the process Xk = H(ξk , ξk−1 , . . .) is deﬁned as a Bernoulli shift a suitable D ξ∗, ξ∗ , . . . , ξ∗ form of the resampled version Xk∗ of Xk is H k k−1 k−l+1 , 0, . . . . Here D and ξ ∗ are i.i.d. random variables drawn uniformly with the distribution Fn , H k the centered version of residuals distribution obtained through ﬁltering. In the

simple case of a linear process (H(z0 , z1 , . . .) = k ak zk , see Section 13.2.2); in the general setting, one needs to develop additional estimation procedures. In order to describe the asymptotic properties of such processes one needs to know the limiting asymptotic behaviour of Bernoulli shifts. Unfortunately such representations are not always known and additional bootstrap procedures have been considered.

13.2.1

Block bootstrap

We describe here the block-bootstrap procedure which is adapted to the times series (Xi )i∈N . Let b = b (n) and l = l (n) denote the number and the length of the blocks. Assume b · l = n and consider l blocks (X(j−1)l+1 , . . . , Xjl ) for jl ) are now randomly drawn (uniformly) (j−1)l+1 , . . . , X 1 ≤ j ≤ b. Blocks (X among those l blocks. A trajectory of the resampled process is obtained by concatenation of those block; however a problem clearly appears to connect those (see K¨ unsch, 1989 [113]).

13.2.2

Bootstrapping GMM estimators

˜∗ Using the notations in § 13.1.2, let (Xi )1≤i≤n denote a block-bootstrap sample and let g ∗ (x, θ) = g (x, θ) − E∗ g x, θDn . The expectation is taken with respect to the bootstrap distribution. The GMM estimate θD∗ solves the arg-min problem n

Jn∗ (θ)

=

1 ∗ ∗ g (Xi , θ) n i=1 n

Ω

n 1 ∗ ∗ g (Xi , θ) n i=1

(13.2.3)

when the matrix Ω is known. In order to prove the consistency of this bootstrap procedure Hall and Horowitz (1996) [101] propose an uncomplete proof. However, weak dependence will allows us to prove rigorously this consistency. More precisely, if Xn = h( n ,

n−1 , . . .) for some i.i.d. sequence ( i )i∈Z , their Assumption 1 is∗ E h( n , n−1 , . . .) − h( n , n−1 , . . . , n−m , 0, 0, . . .) ≤ ∗ The

∗

function h : RN → B with B a Banach space with norm · .

e−dm . d

13.2. BOOTSTRAP

289

This condition holds for linear processes and it is claimed to imply geometric strong mixing in [101]. Andrews’s simple example (1984) [2] proves that this does not hold in general. It implies however θ−weak dependence with geometric decay yielding the following useful tail inequality for sums of functions of the sequence ξn = f (Xn , θ)† . This is the main tool to prove the validity of the bootstrap in this dependent setting. A rigorous version of Lemma 1 in Hall and Horowitz (1996) [101] thus follows Lemma 13.1 (Ango Nze & Doukhan, 2004 [7]). Let (ξn ) be a stationary η−weakly dependent sequence with Eξn = 0 such that η(r) = O (e−ar ) as −33 , as |z| → ∞. Then r ↑ ∞ for some a > 0, and P(|ξ1 | ≥ z) = O |z| Rn = n−1 ni=1 ξi satisﬁes 2+ lim n P |Rn | > n− 5 = 0.

n→∞

n = 1 n ξi,n . Proof. Set ξi,n = ξi 1{|ξi |≤n1/16 } − Eξi 1{|ξi |≤n1/16 } and let R i=1 n Then 2+ 2+ n | > n− 2+ 5 P |Rn | > 2n− 5 ≤ P |R + 2n1+ 5 E ξi 1{|ξi |>n1/16 } 32(2+) 2+ 3 1 ≤ n− 5 E|Rn |32 + 2n1+ 5 ξ1 4 P 4 |ξ1 | > n 16 −11+32 47 + n 5 − 320 = n−1 O n 5 = o(n−1 ). Following precisely the same steps as in Hall and Horowitz (1996) [101], we thus prove, by only replacing their Lemma 1 by our lemma 13.1, that bootstrapping critical values for GMM estimators is an asymptotically valid procedure. Remark 13.1. Theorems 1, 2 and 3 in Hall and Horowitz (1996) [101] seem now to be rigorously proved. Analogous comments ﬁt to a more recent paper by Andrews (2002) [4]. The exponent 33 in the previous lemma is unnatural and it will be improved in a forthcoming paper. The above procedure can be used for testing the null hypothesis H0 : θ = θ0 against the bilateral alternative. Under H0 the studentized statistic Tn (θ) described in (13.1.2) satisﬁes with the critical value Qα , P (|Tn (θ)| > Qα ) = α + O (1/n) . † In this equation, θ is really the parameter to be estimated and in order too avoid further confusion with the dependence we shall prove below a result under the weaker η weak dependence.

290

CHAPTER 13. ECONOMETRICS AND RESAMPLING

As in G¨ otze & Hipp (1978) [99], Hall and Horowitz seem to prove that Tn (θ) and the bootstrap studentized statistic √ Tn∗ (θ) = n Σ∗n −1 θDn∗ − θDn , have close distributions in the sense that P sup |P∗ (Tn∗ (θ) ≤ z) − P (Tn (θ) ≤ z)| > n−a = o n−a ,

(13.2.4)

z∈R

for a relevant integer 2a, with a ≥ 1 + ξ, and the range of ξ ∈ [0, 1] is formulated according to the dependence assumptions prescribed. This relation comes from an Edgeworth development. It yields an improved acceptation rule for the test of H0 : P (|Tn∗ (θ)| > Q∗α ) = α + O n−1−ξ .

13.2.3

Conditional bootstrap

A simple local conditional bootstrap is investigated by Ango Nze et al. (2002) [6]. Consider a stationary process (Xt , Yt )t∈Z . The local bootstrap for nonparametric regression is deﬁned as follows: the empirical distribution for Yt given Xt = x writes n 1 x − Xt * D D F (y|x) = 1{Yt ≤y} K fn,b (x), nb t=1 b for a kernel density estimator with bandwidth b = b(n). The bootstrap sample is now deﬁned as (Xt , Yt∗ )1≤t≤n where Yt∗ ∼ FD( · |Xt ) is independent of (Ys∗ )s =t conditionally to the data for 1 ≤ t ≤ n. In this article, it is shown that ∗ the asymptotic properties of the local regression bootstrap estimator rDn,h with bandwidth h = h(n) constructed from this sample are analogue to those of the regression estimator rDn,h contructed from the data (Xt , Yt )1≤t≤n : √ ∗ ∗ nh rDn,h (x) − E∗ rDn,h (x) ≤ u sup P∗ u∈R √ P nhD rn,h (x) − ED rn,h (x) ≤ u →n→∞ 0, −P under suitable weak dependence assumptions (see theorem 4 in [6]).

13.2.4

Sieve bootstrap

Bickel and B¨ uhlmann (1999) [18] tackle the problem of the ‘sieve bootstrap’ for a one sided linear process Xn − μ = ξ0 +

∞ t=1

at ξn−t

(13.2.5)

13.2. BOOTSTRAP

291

where (ξn ) is a sequence of i.i.d. random variables (r.v.’s) with Eξ0 = 0 and the ∞ |at | < ∞ and μ = EXn . density function fξ , and where t=1

Under the assumption that the function Ψ(z) = 1 +

∞

at z t has no root in the

t=1

closed unit circle, the process (13.2.5) admits an AR(∞) representation (Xn − μ) +

∞

bt (Xn−t − μ) = ξn ,

t=1

with

∞

|bt | < ∞.

(13.2.6)

t=1

The latter process (13.2.6) is ﬁtted with an autoregressive process of ﬁnite order p(n) (p(n)/n → 0, p(n) → ∞). Using estimated residuals, the resampling (i.i.d.) innovation process (ξn∗ )n∈Z is constructed by smoothing the empirical process based on those residuals by a kernel density estimate of the density fξ . Finally, the smoothed sieve bootstrap sample (Xn∗ )n∈Z is deﬁned by resampling the AR(p(n)) process from innovations (ξn∗ )n∈Z :

p(n)

(Xn∗

¯ + − X)

Dbt (X ∗ − X) ¯ = ξn∗ n−t

(13.2.7)

t=1

The purpose of [18] was to carry over a weak dependence property (here strong mixing) of the initial sequence (Xn )n∈Z to the sieve processes (Xn∗ )n∈Z (a classic and a smoothed version were examined in the paper). The goal is unrealistic for the classic bootstrap sample, since the distribution of the bootstrapped innovations is discrete. Proving a mixing property for the smoothed sieve bootstrap sample eludes the eﬀorts of the authors. In the latter case, it nevertheless appears that limit theorems can be proven by another method. It consists in using the following property: |Cov (g1 (X−d1 +1 , . . . , X0 ) , g2 (Xk , . . . , Xk+d2 −1 )) | ≤ 4 g1 ∞ g2 ∞ ν k; C d1 , C d2 ,

(13.2.8)

with d1 , d2 ∈ N and for smooth functions g1 , g2 belonging to the classes C d1 and C d2 (see equation (3.1) for the deﬁnition of the class C d and some examples). The new dependence coeﬃcient ν is less than the strongly mixing coeﬃcient. Bickel and B¨ uhlmann (1999) [18] cannot prove that the sieve sequence (Xn∗ ) is strongly mixing. A weak dependence condition is now deﬁned by the ν coeﬃcient. The authors prove that it is satisﬁed by both this sequence and a smooth version of the resampled innovations. For instance, Bickel and B¨ uhlmann prove that if the sequence (Xn )n∈Z satisﬁes some regularity conditions ensuring that αk = O (k −γ ) (recall that νk ≤ αk ), then the sieve bootstrap process (Xn∗ )n∈Z satisﬁes a ν mixing condition with a polynomial rate ν ∗ (k; C d1 , Dd2 ) = O k −Lγ for relevant classes C d1 , Dd2 and a positive constant L. See Theorem 3.2 on page 422 in Bickel and B¨ uhlmann (1999) [18] for more details.

292

13.3

CHAPTER 13. ECONOMETRICS AND RESAMPLING

Limit variance estimates

The end of this chapter is aimed at providing a very important resampling tool for weakly dependent time series. In this monograph (and elsewhere) extensions of the central limit theorem have been proved for times series. The limiting variance takes a complicated form; the results write as 1 d √ Xk →n→∞ N (0, σ 2 ), n n

where

σ2 =

k=1

∞

EX0 Xk .

k=−∞

Functional vector valued versions of such results also arise, 1 √ Xj → Z(t) − Z(s), n

(13.3.1)

ns<j≤nt

with Z(t) ∈ RD , the D-dimensional centered Gaussian random process such that ∞ Cov(X0 , Xk ). Cov(Z(s), Z(t)) = (t ∧ s) · Σ, Σ= k=−∞

For statistical use, one needs to provide self-normalized versions of such results. The expression reduces to σ 2 = EX02 for iid sequences and it may be directly n 1 estimated by n k=1 Xk2 using the method of moments. The ﬁrst and natural way to achieve this for dependent sequences is to replace σ 2 by some convergent estimator. We shall assume η−weak dependence; several ways of estimating this quantity are reasonable. • Recalling that σ 2 is only the value at origin of Xt ’s spectral density gives a ﬁrst approach; spectral density estimates from chapter 12 yield as in § 12.3 estimators of σ 2 , we defer a reader to Bardet et al. (2005) [11] for this approach. • Another way to estimate this quantity is considered here: we mimick an argument by Carlstein (1986) [34], see also Peligrad and Shao (1994) [141]. This estimation is based on the Donsker invariance principle and a subsampling argument described in section 13.3.2. We provide below a.s. convergence properties of those estimates as well as a CLT; modiﬁcations of the previous CLT will make it suitable for applications. To this aim, subsection 13.3.1 relates moments of sums with the cumulants of stationary sequences; this is a tool of an independent interest for several applications, like extensions to multispectra of the results in chapter 12. We are involved here in a vector valued version of the estimation of σ 2 . The main motivation for this is to derive a dependent version of Kolmogorov-Smirnov test. Empirical CLT (for the empirical cdf) are derived in chapter 10: n d 1 √ 1(Xk ≤x) − F (x) →n→∞ B(x), n k=1

13.3. LIMIT VARIANCE ESTIMATES

293

where B(x) denotes the centered Gaussian process such that EB(x)B(y) =

∞

Cov 1(X0 ≤x) , 1(Xk ≤y) ,

k=−∞

A direct extension of the Kolmogorov-Smirnov test is not possible for such dependent sequences because limits are not distribution free. After a convenient discretization, Doukhan & Lang (2002) [63] prove that conﬁdence bounds for such statistics, which are not anymore distribution-free, may however be estimated with the present sharp estimates of the multivariate extension of σ 2 . The following subsection is aimed to derive useful tools in order to precise asymptotics for the previous estimation procedure.

13.3.1

Moments, cumulants and weak dependence

We thus consider a stationary vector valued sequence (Xn )n∈Z with values in RD ; we equip RD with the norm deﬁned as x = |x(1) | + · · · + |x(D) | for x = (x(1) , . . . , x(D) ) ∈ RD . We assume below that there exists some b > 2 such 1 u that X0 b = EX0 b b < ∞. For each u ≥ 1 we identify the sets RD and RD·u . We use the coeﬃcients cX,q (r) from (4.4.7). We deﬁne nonincreasing coeﬃcients, for further convenience, (d) t cX,q (r) = max cX,l (r)μq−l , with μt = max E X0 . (13.3.2) 1≤l≤q

1≤d≤D

Those coeﬃcients (4.4.7) are now linked to the weak dependence coeﬃcients: Proposition 13.1. Assume that the stationary and vector valued sequence (a) (Xn )n∈Z is η-dependent and satisﬁes μ = max1≤a≤D X0 b < ∞ for some b > q then the coeﬃcients deﬁned in eqn. (4.4.7) satisfy b

cX,q (r) ≤ q4 b−1 μ

(q−1)b b−1

b−q

η(r) b−1 .

Proof of proposition 13.1. Consider integers 1 ≤ < q, 1 ≤ a1 , . . . , aq ≤ D and t1 ≤ · · · ≤ tq such that t+1 − t ≥ r, we need to bound (uniformly wrt , 1 ≤ a1 , . . . , aq ≤ D and t1 ≤ · · · ≤ tq ) the expression (a ) (a+1 ) (a ) (a ) c = Cov Xt1 1 · · · Xt , Xt+1 · · · Xtq q = |Cov(A, B)| (13.3.3) In order to bound (13.3.3), for some M > 0 depending on r [to be deﬁned later] (1)

(D)

we now set X j = (X j , . . . , X j

(a)

) with X j

(a)

= Xj

∨ (−M ) ∧ M for each

294

CHAPTER 13. ECONOMETRICS AND RESAMPLING (1)

(D)

1 ≤ a ≤ D if Xj = (Xj , . . . , Xj c

). Then we also have,

≤ |Cov(A, B)| + |Cov(A − A, B)| + |Cov(A, B − B)|, (a1 )

A = X t1

(a )

(a+1 )

· · · X t ,

(aq )

· · · X tq .

B = X t+1

Now we note that |(A − A)B| ≤ |A(B − B)| ≤

(ai )

(a )

Yi |Xti i − X ti |,

i=1 q

(ai )

(a )

Yi |Xti i − X ti |

i=+1 (aj )

(a )

where Yi writes as the product of q − 1 factors Zi,j = |Xtj j | or |X tj | for 1 1 ≤ j ≤ q and j = i. It is thus clear that for q−1 b + p = 1 an analogue representation of the centering terms yields |Cov(A−A, B)|+|Cov(A, B−B)| ≤ 2

q i=1

(a)

(a)

(a)

max X0 q−1 max X0 −X 0 p . b

1≤a≤D

1≤a≤D

Set h(x) = x ∨ (−M ) ∧ M thenLip h = 1 and h ∞ = M thus f (x1 , . . . , x ) = (a+1 ) (a ) (a1 ) (a ) h(x1 )···h(x ) is such that A =f Xt1 , . . . , Xt , . . . , Xtq q ,B =fq− Xt+1 and f ∞ ≤ M , Lip f ≤ M −1 ; from η-dependence we derive, |Cov(A, B)| ≤ (Lip f fq− ∞ + (q − )Lip fq− f ∞ ) η(r) ≤ qM q−1 η(r). (a)

Now for each real valued random variable X1 (a)

E|X0

(a)

− X 0 |p

=

(a)

E|X0

and 1 ≤ a ≤ D, (a)

− X 0 |p 1|X (a) |≥M 0

(a) E|X0 |p

≤

2

≤

(a) 2p E|X0 |b M p−b .

p

1|X (a) |≥M 0

Hence, (a)

(a)

(a)

b

X0 − X 0 p ≤ 2X0 bp M 1− p , b

(a)

thus setting μ = max1≤a≤D X0 b , we obtain with the previous inequalities, |Cov(A − A, B)| |Cov(A, B)|

+ |Cov(A, B − B)| ≤ 4qμq−1+ p M 1− p , b b ≤ q 4μq−1+ p M 1− p + M q−1 η(r) ≤ q 4μb M q−b + M q−1 η(r) . b

b

13.3. LIMIT VARIANCE ESTIMATES

295

The previous expression has an order optimized with 4μb M 1−b = η(r) yielding b

c = |Cov(A, B)| ≤ q4 b−1 μ

(q−1)b b−1

b−q

η(r) b−1 .

As a ﬁrst application of this relation, it seems useful to state the following moment inequality, they also entail laws of large numbers. Corollary 13.1 (D = 1). Let (Xt )t∈Z be a real valued and stationary η-weakly dependent times series. Assume that E|X0 |b < ∞ for some b > q, EX0 = 0, and ∞ b−q (r + 1)q−2 η(r) b−1 < ∞, r=0

then there exists a constant C > 0 only depending on q and on the previous series such that ⎞q ⎛ n E⎝ Xj ⎠ ≤ Cn[q/2] . j=1

Proof. We note, that

(r + 1)q−2 cX,q (r) < ∞ and we use theorem 4.2 together

r≥0

with proposition 13.1 to conclude.

13.3.2

Estimation of the limit variance

Now let N be some ﬁxed integer and n = N m, ˜ we rewrite the previous limit ˜ ˜ i,m (13.3.1) as Δ ˜ → Δi (which have the same distribution) where we set i+m ˜ √ i i 1 D ˜ +1 −Z Δi,m Xj , Δi = N Z = Δ ∼ ND (0, Σ) . ˜ = √ m m m ˜ j=i+1 The heuristic in Peligrad and Shao (1995) [142]’s variance estimator consists to ˜ i,m substitute Δi by Δ ˜ in this last relation. For diﬀerent values of i the random ˜ On another variables Δi are independent only for values with a diﬀerence ≥ m. hand if i = i(n), j = j(n) are such that limn→∞ |i(n) − j(n)|/m(n) ˜ = α then the couple (Δi(n) , Δj(n) ) is Gaussian with Cov(Δi(n) , Δj(n) ) = 1 − α, if 0 ≤ α ≤ 1. This implies ⎛ lim Var ⎝ #

n→∞

1 n − m(n) ˜

⎞

n−m(n) ˜

i=1

˜ i,m(n) F (Δ )⎠ ˜

= 0

1

Cov F Z(1 , F Z(1 + α) − Z(α) dα.

296

CHAPTER 13. ECONOMETRICS AND RESAMPLING

This integral may be explicitly computed with the help of an Hermite expansion but it is however somewhat complicates. We thus subsample this sums setting ˜ ˜m Δi,m ˜ = Δim, ˜ , then, N 1 F (Δi ) →N →∞ N i=1 N 1 √ (F (Δi ) − EF (Δi )) N i=1

D

→N →∞

EF (Δ)

(13.3.4)

ND (0, Var F (Δ)) .

(13.3.5)

Examples • F (x) = x x yields the estimation of the covariance matrix Σ. • Let F (x) = |x a| for a ﬁxed vector a ∈ RD , then EF (Δ) = E|N (0, 1)|.

√ a Σa ·

• Setting more generally F (x1 , . . . , xD ) = (|xi + xj |2 )1≤i≤j≤D ∈ RD(D+1)/2 provides estimates for all the coeﬃcients of the matrix Σ: for this, use the polarity argument 1 1 1 (i) (j) (i) (j) σi,j = . Var (Δ + Δ ) − Var 2Δ − Var 2Δ 2 4 4 In order to make the heuristic (13.3.5) work, we better consider sequences = (n), m = m(n), m ˜ = m + and N = N (n) converging to inﬁnity and such that n ≥ N (m + ) − . Now we set 1 Δi,m = √ m

(i−1)(m+)+m

Xj ,

(13.3.6)

j=(i−1)(m+)+1

and EF (Δ) is estimated by FDn =

N (n) 1 F (Δi,m(n) ). N (n) i=1

(13.3.7)

Peligrad and Shao (1995) [142] consider instead that EF (Δ) is estimated by Fn =

n− m ˜ 1 ˜ i,m(n) F (Δ ). ˜ n−m ˜ i=1

We need to quote that in this case the sum runs on 1 ≤ i ≤ n − m ˜ which is a number of much larger order than N . In this case, the Gaussian limiting

13.3. LIMIT VARIANCE ESTIMATES

297

variables are not independent. This makes the limiting distribution a bit more complicated and we thus avoid this estimation by working analogously to Carlstein (1986) [34]. More precisely, this author proves that if i = i(n), j = j(n) ˜ ≥ 1 then vary in such a way that lim inf n→∞ |i(n) − j(n)|/m(n) ˜ j(n),m(n) ˜ i(n),m(n) →n→∞ 0 ,Δ Cov Δ ˜ ˜ This is exactly our situation and the variables Δi,m are here asymptotically independent as n ↑ ∞ for any choice of . The setting in Carlstein is that of a strong mixing sequence and other type of increment are also investigated. The forthcoming section is aimed to derive the asymptotic behaviour for this estimation.

13.3.3

Law of the large numbers

Lemma 13.2. Let now F : RD → R be a Lipschitz function. Assume that the sequence (Xn )n∈Z is stationary and η−weakly dependent then N 1 L2 F (Δi,m ) →N →∞ EF (Δ) , N i=1

(13.3.8)

∞ b−4 (a) 1 (r + 1)2 η(r) b−1 < ∞ for some b > 4. if μ = max1≤a≤D E|X0 |b b < ∞, r=0

Proof of lemma The proof 13.2. followsNfrom two steps. N 1 1 Step 1. Var F (Δi,m ) ≤ |Cov(F (Δ1,m ), F (Δi,m ))| →N →∞ 0. N i=1 N i=1 For this we ﬁrst write Fi,m = F (Δi,m ) f (x1 , . . . , xm )

= f (X(i−1)(m+)+1 , . . . , X(i−1)(m+)+m ), x1 + · · · + xm √ = F , xi ∈ RD , m

with

√ thus Lip f ≤ Lip F/ m; this means that covariances will not be directly controlled with weak dependence. We thus have to control |Cov(F (Δ1,m ), F (Δi,m ))| ,

1≤i≤N

(13.3.9)

We ﬁrst consider the variance term obtained with i = 1. If we replace F by F − F (0) the expression (13.3.9) is unchanged so that we assume that F (0) = 0 m

*√

thus |F1,m | ≤ Lip F

Xj

m and j=1 m D m D m

2 2 2

(k) (k) E

Xj ≤ E Xj ≤ D E Xj . j=1

k=1

j=1

k=1

j=1

298

CHAPTER 13. ECONOMETRICS AND RESAMPLING

2 b 1 (k) (k) With this relation and Cov(X0 , Xr ) ≤ 23+ b−1 μ b−1 η(r)1− b−1 obtained from proposition 13.1, we derive: E|F1,m |2

≤

2D(Lip F )2 2

D m−1 (k) Cov(X0 , Xr(k) ) k=1 r=0 m−1 2

≤

24+ b−1 μ b−1 D

<

∞,

b

1

η(r)1− b−1

r=0

from our assumption.

Now consider F (x) = F (x) ∨ (−M ) ∧ M for some M > 0 to be deﬁned later, we set F i,m = F (Δi,m ), then F i,m = f (X(i−1)(m+)+1 , . . . , X(i−1)(m+)+m ), √ √ where f (x1 , . . . , xm ) = F (x1 + · · · + xm / m) ; thus Lip f ≤ Lip F/ m and f ∞ ≤ M and we derive, |Cov (F1,m , Fi,m )| ≤ Cov F 1,m , F i,m + Cov F1,m , Fi,m − F i,m + Cov F1,m − F 1,m , F i,m ≤ Cov F 1,m , F i,m + (F1,m 2 + F i,m 2 )Fi,m − F i,m 2 ≤ Cov F 1,m , F i,m + 2F1,m 2 Fi,m − F i,m 2 . Then

FM Cov F 1,m , F i,m ≤ 2Lip √ η((i − 2)( + m) + ). m

On the other hand we already obtained F1,m 2 ≤ CD for some constant C > 0 and the following lemma 13.3 with p = 2 and q = 4 implies with corollary 13.1, (k) Δi,m 44 ≤ cμ4 for some constant if ∞

b−4

(r + 1)2 η(r) b−1 < ∞, and Fi,m − F i,m 2 ≤ CD2 M −1 .

r=0 (1)

(D)

Lemma 13.3. Let q > p ≥ 1 and 1 ≤ i ≤ N then if Δi,m = (Δi,m , . . . , Δi,m ), (k)

Fi,m − F i,m p ≤ 2Dq/p Lip q/p F · M 1−q/p max Δi,m q/p q . 1≤k≤D

If now q is an even integer, then if

∞

b−q

(r + 1)q−2 η(r) b−1 < ∞ the last moment

r=0

is bounded uniformly wrt i and m. Useful cases are given for q = 2 and q = 4.

13.3. LIMIT VARIANCE ESTIMATES

299

Proof of lemma 13.3. We ﬁrst quote that Fi,m − F i,m p ≤ 2Fi,m 1|Fi,m |≥M p , then E|Fi,m |p 1|Fi,m |≥M

Fi,m − F i,m p

≤

M p−q E|Fi,m |q

≤

Lip q F · M p−q E|Δi,m |q

≤

Dq Lip q F · M p−q max Δi,m qq

≤

2Dq/p Lip q/p F · M 1−q/p max Δi,m q/p q .

(k)

1≤k≤D

(k)

1≤k≤D

(k)

Now corollary 13.1 implies that max1≤k≤D Δi,m qq is uniformly bounded. End of the proof of lemma 13.2. We thus have proved that there is some constant C > 0 such that N D2 1 M 3 −1 Var F (Δi,m ) ≤ C D M + √ η() + , N i=1 m N D2

3/2 −1/4 √ η() + ≤ C D m . N The last inequality holds for some C > 0 with the choice M = D3/2 m1/4 η()−1/2 . Step 2. We now need to prove that EF (Δ1,m ) →m→∞ EF (Δ). We ﬁrst note that the condition η(r) = O(r−α ) for α > 2 + 1/(b − 2) from Bardet, Le´ on and Doukhan (2005) [11] for CLT to hold is implied by the assumptions in our result. Consider again the truncated function F but for some M which may vary with m, |EF (Δ1,m ) − EF (Δ)| ≤ A + B + C, where A = E|F (Δ1,m ) − F (Δ1,m )|, B = E|F (Δ1,m ) − F (Δ)|, C = E|F (Δ) − F (Δ)|. Now as M → ∞, A ≤ F (Δ1,m ) − F (Δ1,m )2 = O(D2 /M ) converges to 0 (uniformly wrt m), and C →M→∞ 0, because E|F (Δ)| < ∞. Now, from the central limit theorem, B → 0 as m → ∞. Thus we may choose a sequence M = M (m) ↑ ∞ such that EF (Δ1,m ) → EF (Δ).

13.3.4

Central limit theorem

√1 (F (Δi,m ) − EF (Δi,m )). Theorem 13.1. Set Zn = N i=1 xi,n with xi,n = N Assume that the centered and stationary sequence (Xt ) satisﬁes E|X0 |b < ∞ for some b > 4 and this is η−weakly dependent with η(r) = O (r−α ) for some α > 2 + 1/(b − 2). Then, D

Zn →n→∞ N (0, Var F (Δ)), where N = N (n) is the largest number such that n ≥ N (m + ) − , and where m = m(n) = [mγ ], = (n) = [mδ ] satisfy α > 12 /δ + 43 (1 − γ)/δ.

300

CHAPTER 13. ECONOMETRICS AND RESAMPLING

Remark. For α > 2 + 1/(b − 2) one may choose such rates with γ > δ > 0. Proof. Consider Z ∼ N (0, Var F (Δ)). For f ∈ C 3 (R, R), we want to control |Ef (Zn ) − f (Z)| ≤ |Ef (Z (n) ) − f (Z)| +

N

|Ui |,

i=1

i−1 with Ui = E (fi (wi,n + xi,n ) − fi (wi,n + yi,n )), wi,n = j=1 xi,n and fi (t) =

n f t + j=i+1 yi,n , where, as usual, empty sums are set equal to 0 and where Z (n) =

N

yi,n , yi,n ∼ N (0, Var xi,n ), and the Gaussian random variables yi,n

i=1

are set as jointly independent and independent of the process (Xt )t∈Z . (j)

Step 1. Lindeberg technique. Notice that fi ∞ ≤ f (j) ∞ for j = 0, 1, 2, 3 and for each i ≤ N . From a Taylor expansion and from the bound E|yi,n |3 ≤ 3/2 ≤ E|N (0, 1)|3 E|xi,n |3 , there exists a constant c > 0 such E|N (0, 1)|3 Ex2i,n that N

N c 1 |Ui | ≤ √ E|Δ1,m |3 + |Cov(fi (wi,n , xi,n )| + Cov(fi

(wi,n , x2i,n ) . 2 N i=1 i=1

√ As mentioned in the previous section the ﬁrst term is bounded as O(1/ N ) if the moment of order 4 of a sum is suitably controlled. In order to make use of the weak dependence condition, we have to truncate the random variables level M > 0 precisely xi,n . Set F (t) = F (t) ∨ (−M ) ∧ M for a truncation set later, then xi,n = √1N F (Δi,m ) − EF (Δi,m ) writes as a function xi,n = g(X(i−1)(m+)+1 , . . . , X(i−1)(m+)+n ) where t1 + · · · + tm 1 √ F g(t1 , . . . , tm ) = √ − EF (Δi,m ) , m N √ √ thus g∞ ≤ 2M/ N and Lip g ≤ Lip F/ N m. By another hand, we may (j) write fi (wi,n ) = G (Xs )s≤(i−1)(m+)− where G : (RD )(i−1)(m+) → R satis√ ﬁes G∞ ≤ f (j) ∞ , Lip G ≤ f (j+1) ∞ Lip F/ mN and is a function of less than n variables in RD . |Cov(fi (wi,n ), xi,n )| |Cov(fi (wi,n ), xi,n )| Fi,m − F i,m 1

2f ∞ F (Δ1,m ) − F (Δ1,m )1 ≤ |Cov(fi (wi,n ), xi,n )| + √ N √ √ ≤ c mM η(i−1)(m+)+ ≤ c mM η , with η dependence, ≤ cD2 M −3 ,

from lemma 13.3.

13.3. LIMIT VARIANCE ESTIMATES

301

With M = η()−1/4 m−1/8 N −1/8 D1/4 we thus arrive with some constant c > 0 to |Cov(fi (wi,n ), xi,n )| ≤ cD1/4 m3/8 N 3/8 η()3/4 ≤ cD1/4 n3/8 η()3/4 . Analogously Cov(f

(wi,n ), x2 ) ≤ Cov(f

(wi,n ), x2 ) + 2f

∞ E|x2 − x2 | i i,n i i,n i,n i,n 2 1 M M Cov(fi

(wi,n ), x2i,n ) ≤ c n √ √ + m√ η(), mN N mN N √ M2 n η(), with η dependence. ≤ c N 2 E|x2i,n − x2i,n | ≤ F (Δi,m )2 F (Δi,m ) − F (Δi,m )2 N D3 , from lemma 13.3. ≤ c MN Here we choose M = η()−1/3 n−1/6 D to obtain, for some constant c > 0, Cov(fi

(wi,n ), x2i,n )

≤

c

D2 n1/6 η()1/3 . N

We thus have proved that N 1 |Ui | ≤ c N D1/4 n3/8 η()3/4 + D2 n1/6 η()1/3 + √ . N i=1 Now consider m(n) = [nγ ] and (n) = [nδ ]. Then N (n) ∼ cnβ where β = 1 − γ. In order that the previous expression converges to 0, we only need β > 0,

α>

1 , 2δ

α>

1 4(1 − γ) + , 2δ 3δ

δ < γ.

To conclude, this is enough to choose δ < γ, such that those relations hold. If δ and γ are both close enough to 1, α > 2 + 1/(b − 2) implies those inequalities. Step 2. Gaussian approximation. To conclude we still need to bound the expression |E(f (Z (n) ) − f ((Z))|. For this set Z = σN and Z (n) = σm N for some 2 standard Gaussian random variable N and σm = Var F (Δ1,m ), σ 2 = Var F (Δ). Then |E(f (Z (n) ) − f ((Z))| ≤ f

∞ |σ − σm |2 ≤

f

∞ 2 2 |σ − σm |. σ

We need to prove that EF 2 (Δ1,m ) →m→∞ EF 2 (Δ) and we use step 2 in the proof of lemma 13.2. We ﬁrst note that the condition η(r) = O(r−α ) with α > 2 + 1/(b − 2) from Bardet, Le´ on and Doukhan (2005) [11] implies the CLT. We consider the truncated function F for some M which may vary with m, |EF 2 (Δ1,m ) − EF 2 (Δ)| ≤

A + B + C,

302

CHAPTER 13. ECONOMETRICS AND RESAMPLING 2

2

2

where we set A = E|F 2 (Δ1,m )−F (Δ1,m )|, B = E|F (Δ1,m )−F (Δ)|, and C = 2 E|F (Δ)−F 2 (Δ)|. Now A ≤ 2F 2 (Δ1,m )2 F (Δ1,m )−F (Δ1,m )2 = O(D2 /M ) (as M → ∞) converges to 0 (uniformly wrt m), and C →M→∞ 0 because E|F 2 (Δ)| < ∞. From the central limit theorem, B → 0 as m → ∞. Thus we may choose a sequence M = M (m) ↑ ∞ such that EF 2 (Δ1,m ) → EF 2 (Δ).

13.3.5

A non centered variant

An alternative more attractive result involves Tn =

N

ti,n ,

where

i=1

1 ti,n = √ (F (Δi,m ) − EF (Δ)) . N

A central limit theorem for this non centered quantity looks much more convenient to consider the estimation of the parameter EF (Δ). Such results are not considered by Peligrad & Shao (1995) [142]. In order to derive them one still uses the Lindeberg technique with blocks. Hence, for some bounded and C 3 function f we bound again: |E(f (Tn ) − f (Z))| ≤ ≤

|E(f (Tn ) − f (Zn ))| + |E(f (Zn ) − f (Z))| √ N f ∞ |EF (Δ1,m ) − EF (Δ)| + |E(f (Zn ) − f (Z))|

with Z ∼ N (0, Var F (Δ)). This means that the previous convergence relies on the decay rate of the expression 1 |EF (Δ1,m ) − 2EF (Δ)|; as stressed in the if F (x) = (x a) and the previous quantity forthcoming remark, this is O m tends to zero under the assumption: limn→∞ N/m2 = 0. Examples of functions F

• Case D = 1. In this case for F such that |F (x)|/(1 + x2 ) dx < ∞, Petrov (1996) lemma 5.4 page 152 [144] implies |F (x)| dx |EF (Δ1,m ) − EF (Δ)| ≤ c(vn + δ) 1 + x2 with vm = supx∈R |P(Δ1,m ≤ x) − P(Δ ≤ x)| = O m−λ for some λ < 18 given in Bardet, Doukhan and Le´on (2005) [11] and δ = EΔ21,m − EΔ2 = 1 O m . This implies that the cases F (x) = |x|β for some β ≤ 2 is also obtained if, now, limn→∞ N/m2λ = 0; Peligrad and Shao (1995) [142] consider the special case β = 1. Here λ → 18 as a, b → ∞ if η(r) = O(r−a ) and E|X0 |b < ∞.

• Case D = 1. If now F (x) = xp then theorem 4.6 provides a bound O(1/m) for each integer p which resembles the Rosenthal inequality in the independent case (see Hall & Heyde, 1980 [100]).

13.3. LIMIT VARIANCE ESTIMATES

303

• Case D > 1. The special case F (x) = (x a)2 , for some a ∈ RD is the simplest multi-dimensional one. In this case, indeed setting ξt = Xt a, we still assume Eξt = 0 and one may write |s| EF (Δ1,m ) = 1− Eξ0 ξs , m

∞

EF (Δ) =

|s|<m

Eξ0 ξs ,

s=−∞

1 |s|Eξ0 ξs + Eξ0 ξs m |s|<m |s|≥m ∞ 1 |s| = |s|Eξ0 ξs + 1− Eξ0 ξs , m s=−∞ m

so that EF (Δ) − EF (Δ1,m ) =

|s|≥m

thus ∞ 1 |s|Eξ0 ξs ≤ |Eξ0 ξs | . EF (Δ) − EF (Δ1,m ) − m s=−∞

(13.3.10)

|s|≥m

Using proposition 13.1 now yields if a = (a(1) , . . . , a(D) ), 3b−1

1

|Eξ0 ξs | ≤ Da2 cX,2 (s) ≤ 2 b−1 Da2 μ b−1 η(s)1− b−1 . 1 This previous bound (13.3.10) is thus O Da2 η(s)1− b−1 . For ﬁxed D this bound has order o(m−1 ) if

b

∞

s≥m 1

sη(s)1− b−1 < ∞, and thus

s=1

EF (Δ) = EF (Δ1,m ) +

∞ 1 1 |s|Eξ0 ξs + o . m s=−∞ m

Remark. The previous results given under η weak dependence will be extended in forthcoming papers under κ and λ-weak dependence.

Bibliography [1] Aldous, D. J., Eagleson, G. K. (1978) On mixing and stability of limit theorems. Ann. Probab. 6, 325-331. [2] Andrews, D. W. K. (1984) Non strong mixing autoregressive processes. J. Appl. Probab. 21, 930-934. [3] Andrews, D. W. K. (1988) Laws of large numbers for independent non identically distributed random variables. Econometric Theory 4, 458-467. [4] Andrews, D. W. K. (2002) Higher-order Improvements of a Computationally Attractive k-step Bootstrap for Extremum Estimators. Econometrica 70, 119-162. [5] Andrews, W. K., Pollard, D. (1994) An introduction to functional central limit theorems for dependent stochastic processes. Int. Statist. Rev. 62, 119-132. [6] Ango Nze, P., B¨ uhlman, P., Doukhan, P. (2002) Weak Dependence beyond Mixing and Asymptotics for Nonparametric Regression. Annals of Statistics, 30-2, 397-430. [7] Ango Nze, P., Doukhan, P. (2004) Weak dependence, models and applications to econometrics. Econometric Theory 20-6, 995-1045. [8] Bakhtin, Yu., Bulinski, A. V. (1997) Moment inequalities for sums of dependent multiindexed random variables. Fundamentalnaya i prikladnaya matematika 3-4, 1101-1108. [9] Barbour, A. D., Gerrard, R. M., Reinert, G. (2000) Iterates of expanding maps. Probab. Theory Rel. Fields 116 151-180. [10] Bardet, J. M., Doukhan, P., Lang, G., Ragache, N. (2006) A Lindeberg central limit theorem for dependent processes and its statistical applications. Submitted. [11] Bardet, J. M., Doukhan, P., Le´ on, J. R. (2006) A uniform central limit theorem for the periodogram and its applications to Whittle parametric estimation for weakly dependent time series. Submitted. [12] Barron, A. R., Birg´e, L., Massart, P. (1999) Risk bounds for model selection by penalization. Probab. Theory and Rel. Fields 113, 301-413. [13] Bernstein, S. (1939) Quelques remarques sur le th´eor`eme limite Liapounoﬀ. C. R. (Dokl.) Acad. Sci. USSR, Ser. 24, 1939, 3-8. [14] Bena¨ım, M. (1996) A dynamical system approach to stochastic approximation. SIAM J. control and optimization 34-2, 437-472.

305

306

BIBLIOGRAPHY

[15] Benveniste, A., M´etivier, M. Priouret, P. (1987) Algorithmes adaptatifs et approximations stochastiques. Masson. [16] Berbee, H. C. P. (1979) Random walks with stationary increments and renewal theory, Math. centre tracts, Amsterdam. [17] Bickel, P., B¨ uhlmann, P. (1995) Mixing property and functional central limit theorems for a sieve bootstrap in times series. Techn. Rep. 440, Dept. Stat., U. C. Berkeley. [18] Bickel, P., B¨ uhlmann, P. (1999) A new mixing notion and functional central limit theorems for a sieve bootstrap in time series Bernoulli 5-3, 413-446. [19] Bickel, P. J., Wichura, M. J. (1971) Convergence criteria for multiparameter stochastic processes and some applications. Ann. Math. Stat. 42, 1656-1670. [20] Billingsley, P. (1968) Convergence of Probability Measures. Wiley, New-York. [21] Billingsley, P. (1995) Probability and measure. Third edition. Wiley Series in Probability and Mathematical Statistics. A Wiley-Interscience Publication. John Wiley & Sons, Inc., New York. [22] Birkel, T. (1988) Moment bounds for associated sequences. Ann. Probab. 16, 1184-1193. [23] Bollerslev, T. (1986) Generalized autoregressive conditional heteroskedasticity J. of Econometrics, 31, 307-327. [24] Bolthausen, E. (1982) On the central limit theorem for stationary random ﬁelds. Ann. Probab. 10, 1047-1050. [25] Borovkova, S., Burton, R., Dehling, H. (2001). Limit theorems for functionals of mixing processes with application to U -statistics and dimension estimation. Trans. Amer. Math. Soc. 353, 4261-4318. [26] Bosq, D. (1993) Bernstein’s type large deviation inequalities for partial sums of strong mixing process. Statistics 24, 59-70. [27] Bougerol, P. (1993) Kalman ﬁltering with random coeﬃcients and contractions. SIAM J. Control Optim. 31, 942-959. [28] Brockwell, P., Davis, R. (1987) Time Series: Theory and Methods. SpringerVerlag. [29] Bradley, R. C. (1983) Approximations theorems for strongly mixing random variables. Michigan Math. J. 30, 69-81. [30] Bradley, R. C. (2002) Introduction to strong mixing conditions, Volume 1. Technical Report, Department of Mathematics, I.U. Bloomington. [31] Broise, A. (1996) Transformations dilatantes de l’intervalle et th´eor`emes limites. ´ Etudes spectrales d’op´erateurs de transfert et applications, Ast´erisque 238, 1-109. [32] Brandi`ere, O., Doukhan, P. (2004) Dependent noise for stochastic algorithms. Probab. Math. Statist. 2-24, 381-399. [33] Bulinski, A. V., Sashkin, A. P. (2005) Strong invariance principle for dependent multi-indexed random variables. Doklady. Mathematics 72-11, 503-506, MAIK Nauka/Interperiodica.

BIBLIOGRAPHY

307

[34] Carlstein, E. (1986) The Use of Subseries Values for Estimating the Variance of a General Statistic from a Stationary Sequence. Annals of Statistics 14-3, 1171-1179. [35] Castaing, C., Raynaud de Fitte, P., Valadier, M. (2004). Young Measures on Topological Spaces. Mathematics and its applications, Kluwer. [36] Chen, H-F. (1985) Stochastic Approximation and Its Apllications. Non convex Optimization and Its Application 64, Kluwer Academic Plublishers. [37] Chow, Y. S. (1966) Some convergence theorems for independent random variables. Ann. Math. Stat. 37, 1482-1493 and 36-4, 1293-1314. [38] Collomb, G. (1984) Propri´ et´es de convergence presque compl`ete du predicteur de noyau. Z. Wahrsch. Verw. Gebiete 66, 441-446. [39] Comtet, L. (1970) Analyse Combinatoire. Tome 1. Coll. S.U.P. Presses Universitaires de France. [40] Coulon-Prieur, C., Doukhan, P. (2000) A triangular CLT for weakly dependent sequences. Statist. Probab. Lett. 47, 61-68. [41] Daubechies, I. (1988) Orthonormal bases of compactly supported wavelets. Comm. Pure & Appl. Math. 41, 909-996. [42] Dedecker, J. (2001) Exponential inequalities and functional central limit theorems for random ﬁelds. ESAIM Probab. & Stat. http://www.emath.fr/ps/ 5, 77-104. [43] Dedecker, J., Doukhan, P. (2003) A new covariance inequality and applications. Stoch. Proc. Appl. 106-1, 63-80. [44] Dedecker, J., Merlev`ede, F. (2002) Necessary and suﬃcient conditions for the conditional central limit theorem. Ann. Probab. 30, 1044-1081. [45] Dedecker, J., Prieur, C. (2004a) Coupling for τ -dependent sequences and applications. Journal of Theor. Probab. 17-4, 861-885. [46] Dedecker, J., Prieur, C. (2004b) Couplage pour la distance minimale. C. R. Acad. Sci. Paris S´erie 1 338-10, 805-808. [47] Dedecker, J., Prieur, C., Raynaud De Fitte, P. (2006) Parametrized KantorovichRubinstein theorem and application to the coupling of random variables. [48] Dedecker, J., Prieur, C. (2005) New dependence coeﬃcients. Examples and applications to statistics. Prob. Theor. and Rel. Fields 132, 203-236. [49] Dedecker, J., Prieur, C.(2006) An empirical central limit theorem for dependent sequences, in Dependence in probapity and statistics, LNS 187, 105-121, Stoch. Proc. Appl., 117, 121-142. [50] Dedecker, J., Rio, E. (2000) On the functional Central Limit Theorem for Stationary processes. Ann. I. H. P. ser. B 36-1, 1-34. [51] Deheuvels, P. (1979) La fonction de d´ependance empirique et ses propri´et´es. Acad. Roy. Belg., Bull. C1 Sci. 5`eme s´er. 65, 274-292. [52] Deheuvels, P. (1981a) A Kolmogorov-Smirnov Type Test for independence and multivariate samples. Rev. Roum. Math. Pures et Appl. 26-2, 213-226.

308

BIBLIOGRAPHY

[53] Deheuvels, P. (1981b) A nonparametric test for independence. Pub. Inst. Stat. Univ. Paris. 26-2, 29-50. [54] Dehling, H., Philipp, W. (1982) Almost sure invariance principles for weakly dependent vector-valued random variables. Ann. Probab. 10, 689-701. [55] Dehling, H., Taqqu, M. (1989) The empirical process of some long-range dependent sequences with applications to U -statistics. Ann. Statist. 17, 1767-1783. [56] De Jong, R. M. (1996) A strong law of large numbers for triangular mixingale arrays. Statist. and Probab. Lett. 27, 1-9. [57] Delyon, B. (1990) Limit theorem for mixing processes. Tech. Report IRISA, Rennes 1, 546. [58] Delyon, B. (1996) General convergence result on stochastic approximation. IEEE Trans. Automatic Control 41, 9. [59] De Vore, R., Lorentz, G. (1993) Constructive Approximation. Springer-Verlag, Berlin. [60] Diaconis, P., Freedman, D. (1999) Iterated random functions. SIAM Rev. 41-1, 45-76 . [61] Doukhan, P. (1994) Mixing: Properties and Examples. Lecture Notes in Statistics 85. Springer Verlag. [62] Doukhan, P. (2002) Models inequalities and limit theorems for stationary sequences, in Theory and applications of long range dependence (Doukhan et alii ed.) Birkh¨ auser, 43-101. [63] Doukhan, P., Lang, G. (2002) Rates of convergence in the weak invariance principle for the empirical repartition process of weakly dependent sequences. Stat. Inf. for Stoch. Proc. 5, 199-228. [64] Doukhan, P., Lang, G., Louhichi, S., Ycart, B. (2005) A functional central limit theorem for interacting particle systems on transitive graphs. Available from http://arxiv.org/abs/math-ph/0509041. [65] Doukhan, P., Latour, A., Oraichi, D. (2006) Simple integer-valued bilinear time series model. Adv. Appl. Probab. 38-2, 559-578. [66] Doukhan, P., Le´ on, J. R. (1989) Cumulants for mixing sequences and applications to empirical spectral density. Probab. Math. Stat. 10.1, 11-26. [67] Doukhan, P., Louhichi, S. (1999) A new weak dependence condition and applications to moment inequalities. Stoch. Proc. Appl. 84, 313-342. [68] Doukhan, P., Louhichi, S. (2001) Functional estimation for weakly dependent stationary time series. Scand. J. Statist. 28, 325-342. [69] Doukhan, P., Madre, H., Rosenbaum, M. (2006) Weak dependence beyond mixing for inﬁnite ARCH-type bilinear models. Statistics. [70] Doukhan, P., Massart, P., Rio, E. (1994) The functional central limit theorem for strongly mixing processes. Ann. I. H. P. ser. B 30-1, 63-82. [71] Doukhan, P., Neumann, M. (2006) A Bernstein type inequality for times series. Submitted to Stoch. Proc. Appl.

BIBLIOGRAPHY

309

[72] Doukhan, P., Oppenheim, G., Taqqu, M., editors (2003) Theory and Applications of Long-range Dependence. Birkha¨ user, Boston (2003). [73] Doukhan, P., Portal, F. (1983) Moments de variables al´eatoires m´elangeantes. C. R. Acad. Sci. Paris S´erie 1 297, 129-132. [74] Doukhan, P., Sifre, J.C. (2001) Cours d’analyse - Analyse r´eelle et int´egration. Dunod. [75] Doukhan, P., Teyssi`ere, G., Winant, P. (2006) Vector valued ARCH inﬁnity processes, in Dependence in Probability and Statistics. Patrice Bertail, Paul Doukhan and Philippe Soulier Editors, Springer, New York. [76] Doukhan, P., Truquet, L. (2006) Weakly dependent random ﬁelds with inﬁnite interactions. Preprint in http://samos.univ-paris1.fr/ppub2006.html. [77] Doukhan, P., Wintenberger, O. (2005) An invariance principle for weakly dependent stationary general models. Preprint in http://samos.univ-paris1.fr/ppub2005.html. [78] Doukhan, P., Wintenberger, O. (2006) Weakly dependent chains with inﬁnite memory Preprint in http://samos.univ-paris1.fr/ppub2005.html. [79] Dudley, R. M. (1968). Distances of probability measures and random variables. Ann. Math. Statist. 39, 1563-1572. [80] Dudley, R. M. (1989) Real analysis and probability. Wadworsth Inc., Belmont, California. [81] Duﬂo, M. (1990) M´ethodes r´ecursives al´eatoires. Masson, Paris (English Edition, Springer, 1996). [82] Duﬂo, M. (1996) Algorithmes Stochastiques, Collection Math. et Appl. 23, Springer. [83] Eberlein, E., Taqqu, M., editors (1986) Dependence in Probability and Statistics. Birkha¨ user, Boston. [84] Engle, R. F. (1982) Autoregressive conditional heteroskedasticity with estimate of the variance in the U.K. inﬂation Econometrica, 15, 286-301 [85] Esary, J., Proschan, F., Walkup, D. (1967) Association of random variables with applications. Ann. Math. Statist. 38, 1466-1476. [86] Fermanian, J-D., Radulovic, D., Wegkamp, M. (2004) Weak convergence of empirical copula processes. Bernoulli 10, 847-860. [87] Fortuyn, C., Kastelyn, P., Ginibre, J. (1971) Correlation inequalities in some partially ordered sets. Comm. Math. Phys. 74, 119-12. [88] Fr´echet, M. (1957) Sur la distance de deux lois de probabilit´ e. C. R. Acad. Sci. Paris. 244-6, 689-692. [89] Fuk, D. Kh., Nagaev, S. V. (1971) Probability inequalities for sums of independent random variables. Theory Probab. Appl. 16, 643-660. [90] Garsia, A. M. (1965) A simple proof of E. Hopf ’s maximal ergodic theorem. J. of Maths and Mech. 14, 381-382.

310

BIBLIOGRAPHY

[91] Ghosal, S., Chandra, T. K. (1998) Complete convergence of martingale arrays. J. Theor. Probab. 11-3, 621-631. [92] Giraitis, L., Kokoszka, P., Leipus, R. (2000) Stationary ARCH−models: dependence structure and central limit theorem. Econometric Theory, 16, 3-22. [93] Giraitis L., Leipus, R., Surgailis, D. (2006) Recent advances in ARCH modelling in Long memory in Economics, Ed. Gilles Teyssi`ere and Alan Kirman, Springer Verlag. [94] Giraitis L., Robinson P. M. (2001) Whittle estimation of ARCH models Econom. Theory 17-3, 608-631. [95] Giraitis, L., Surgailis, D. (2002) ARCH-type bilinear models with double long memory. Stoch. Proc. and their Applic. 100, 275-300. [96] Goldstein, S. (1979) Maximal coupling. Z. Wahrsch. Verw. Gebiete 46, 193-204. [97] Gordin, M. I. (1969) The central limit theorem for stationary processes. Dokl. Akad. Nauk SSSR 188, 739-741. [98] Gordin, M. I. (1973) Abstracts of Communication, T.1: A-K. International Conference on Probability Theory, Vilnius. [99] G¨ otze, F., Hipp, C. (1978) Asymptotic expansions in the central limit theorem under moment conditions. Z. Wahr. und Verw. Gebiete 42-1, 67-87. [100] Hall, P., Heyde, C. C. (1980) Martingale limit theory and its applications. Academic Press. [101] Hall, P., Horowitz, J. (1996) Bootstrap critical values for tests based on generalized method of moment estimators. Econometrica 64-4, 891-916. [102] Hannan, E. J. (1973)The estimation of frequency. J. appl. Probab. 10, 510-519. [103] Heyde, C. C. (1974) On the central limit theorem for stationary processes. Z. Wahrsch. Verw. Gebiete 30, 315-320. [104] Heyde, C. C., Scott, D. J. (1973) Invariance principles for the law of the iterated logarithm for martingales and processes with stationary increments. Ann. Probab. 1, 428-436. [105] Heyde, C. C. (1975) On the central limit theorem and iterated logarithm law for stationary processes. Bull. Austral. Math. Soc. 12, 1-8. [106] Hofbauer, F., Keller, G. (1982) Ergodic properties of invariant measures for piecewise monotonic transformations. Math. Z. 180, 119-140. [107] Hoﬀmann-Jorgensen, J. (1974) Sums of independent Banach space valued random variables. Studia Math. 52, 159-186. [108] Horv´ ath, L., Shao, Q.M. (1999)Limit theorems for quadratic forms with applications to Whittle’s estimate. Ann. Appl. Probab. 9, No.1, 146-187. [109] Hsu, P. L., Robbins, H. (1947) Complete convergence and the strong law of large numbers. Proc. Nat. Acad. Sci. 33, 25-31. [110] Ibragimov, I. A. (1962) Some limit theorems for stationary processes. Theory Probab. Appl. 7, 349-382.

BIBLIOGRAPHY

311

[111] Kallenberg, O. (1997) Foundations of Modern Probability. Springer-Verlag, New York. [112] Kolmogorov, A. N., Rozanov, Y. A. (1960) On the strong mixing conditions for stationary Gaussian sequences. Th. Probab. Appl. 5, 204-207. [113] K¨ unsch, H. R. (1989) The jacknife and bootstrap for general stationary observations. Ann. Statist. 17, 1217-1241. [114] Kushner, H. J., Clark, D. S. (1978) Stochastic approximation for constrained and uncontrained systems. Applied math. science series 26, Springer. [115] Lasota, A., Yorke, J. A. (1974) On the existence of invariant measures for piecewise monotonic transformations. Trans. Amer. Math. Soc. 186, 481-488. [116] Latour, A. (1997) The multivariate GINAR(p) process. Adv. Appl. Prob. 29, 228247. [117] Latour, A. (1998) Existence and spectral density of the GINAR(p) process. J. Times Ser. Anal. 19, 439455. [118] Lee, M. L. T., Rachev, S. T., Samorodnitsky, G. (1990) Association of stable random variables. Ann. Probab. 18-4, 1759-1764. [119] Leonov, V. P., Shiraev, A. N. (1959) On a method of calculation of semiinvariants. Th. Probab. Appl. 5, 460-464. [120] Li, D., Rao, M. B., Jiang, T., Wang, X. (1995) Complete convergence and almost sure convergence of weighted sums of random variables. J. Theor. Prob. 8, 49-76. [121] Liebscher, E. (1996) Strong convergence of sums of α- mixing random variables with applications to density estimation. Stoch. Proc. Appl. 65, 69-98. [122] Liggett, T. M. (1985) Interacting particle systems. Springer-Verlag, New York. [123] Liverani, C., Saussol B., Vaienti S. (1999) A probabilistic approach to intermittency. Ergodic theory and dynamical systems, 19, 671-685. [124] Louhichi, S. (2000) Weak convergence for empirical processes of associated sequences. Ann. I. H. P. ser. B 36-5, 547-567. [125] Louhichi, S. (2003) Moment inequalities for sums of certain dependent random variables. Theor. Prob. Appl. 47-4. [126] Major, P. (1978) On the invariance principle for sums of identically distributed random variables. J. Multivariate Anal. 8, 487-517. [127] Mc Leish, D. L. (1975) A generalization of martingales and mixing sequences. Adv. in Appl. Probab. 7-2, 247-258. [128] Mc Leish, D. L. (1975a) A maximal inequality and dependent strong laws, Ann. Probab. 3, 829-339. [129] Mc Leish, D. L. (1975b) Invariance principles and mixing random variables. Z. Wahrsch. Verw. Gebiete 32, 165-178. [130] Merlev`ede, F., Peligrad, M. (2002) On the coupling of dependent random variables and applications. in Empirical process techniques for dependent data. 171-193. Birkh¨ auser.

312

BIBLIOGRAPHY

[131] Mikosch, T., Straumann, D. (2002) Whittle estimation in a heavy-tailed GARCH(1,1) model. Stoch. Proc. Appl. 100, 187-222. [132] Mokkadem, A. (1990) Propri´ et´es de m´elange des mod`eles autor´egressifs polynomiaux. Ann. I. H. P. ser. B 26-2, 219-260. [133] M´ oricz, F. A., Serﬂing, J. A., Stout, W. F. (1982) Moment and probability bounds with quasi-superadditive structure for the maximum partial sum. Ann. Probab. 10, 1032-1040. [134] Nelsen, R. B. (1999) An introduction to copulas. Lecture Notes in Statistics 139, Springer. [135] Newman, C. M. (1980) Normal ﬂuctuations and the F KG inequalities. Comm. Math. Phys. 74-2, 119-128. [136] Newman, C. M. (1984) Asymptotic independence and limit theorems for positively and negatively dependent random variables. In Y. L. Tong, Editor, Inequalities in Statistics and Probability, IMS Lecture Notes-Monograph Series 5, 127-140. [137] Oodaira, H., Yoshihara, K. I. (1971) Note on the law of the iterated logarithm for stationary processes satisfying mixing conditions. Kodai Math. Sem. Rep. 23, 335-342. [138] Ornstein, D. S. (1973) An example of a Kolmogorov automorphism wich is not a Bernoulli Shift. Adv. in Math. 10, 49-62. [139] Peligrad, M. (1983) A note on two measures of dependence and mixing sequences. Adv. in Appl. Probab. 15-2, 461-464. [140] Peligrad, M. (2002) Some remarks on coupling of dependent random variables. Stat. and Prob. Lett. 60, 201-209. [141] Peligrad, M., Shao, Q. M. (1994) Self-normalizing central limit theorem for sums of weakly dependent random variables. J. Theoret. Probab. 7, 309-338. [142] Peligrad, M., Shao, Q. M. (1995) Estimation of the variance for partial sums for ρ-mixing random variables. J. of Multiv. Analysis 52, 140-157. [143] Peligrad, M., Utev, S. (1997) Central limit theorem for linear processes. Ann. Probab. 25-1, 443-456. [144] Petrov, V. V. (1996) Limit theorems of probability theory: sequences of independent random variables. Clarendon Press, Oxford. [145] Phillips, P. C. B. (1987) Time Series Regression with a Unit Root. Econometrica 55-2, 277-301. [146] Pitt, L. (1982) Positively Correlated Normal Variables are Associated. Ann. Probab. 10, 496-499. [147] Pham, T. D., Tran, L. T. (1985) Some mixing properties of time series models. Stoch. Process. Appl. 19, 297-303. [148] Pham, T. D. (1986) Bilinear Markovian representation and bilinear models. Stoch. Proc. Appl. 20, 295-306. [149] Phillips, P.C.B. (1987) Time series regression with a unit root. Econometrica 55, 277-301.

BIBLIOGRAPHY

313

[150] Philipp, W. (1986) Invariance principles for independent and weakly dependent random variables in Dependence in probability and statistics (Oberwolfach, 1985), 225-268. Prog. Probab. Statist. 11, Birkh¨ auser, Boston. [151] Pollard, D. (1984) Convergence of stochastic processes. Springer-Verlag, Berlin. [152] P¨ otscher, B. M., Prucha, I. R. (1991) Basic structure of the asymptotic theory in dynamic nonlinear econometric models, part 1: consistency and approximation concepts, part 2: asymptotic normality. Econometric Reviews, 10, 125-216 and 253-325. [153] Prakasa, R. (1983) Nonparametric functional estimation. New York, Academic Press. [154] Prieur, C. (2001) Density Estimation For One-Dimensional Dynamical Systems. ESAIM, Probab. & Statist. 5 www.edpsciences.com/ps/, 51-76. [155] Prieur, C. (2002) An Empirical Functional Central Limit Theorem for Weakly Dependent Sequences. Probab. Math. Statist. 22-2, 259-287. [156] R´enyi, A. (1963) On stable sequences on events. Shankhya Ser. A 25, 293-302. [157] Rio, E. (1993) Covariance inequalities for strongly mixing processes. Ann. I. H. P. ser. B 29-4, 587-597. [158] Rio, E. (1995a) About the Lindeberg method for strongly mixing sequences. ESAIM-PS 1, 35-61. [159] Rio, E. (1995b) The functional law of iterated logarithm for strongly mixing processes. Ann. Probab. 23, 1188-1203. [160] Rio, E. (1996) Sur le th´eor`eme de Berry-Esseen pour les suites faiblement d´ependantes. Probab. Theory Relat. Fields 104, 255-282. [161] Rio, E. (2000) Th´eorie asymptotique pour des processus al´eatoire faiblement d´ependants. SMAI, Math´ematiques et Applications 31, Springer. [162] Rios, R. (1996) Utilisation des techniques non param´ etriques et semi param´etriques en statistique des donn´ees d´ependantes. Ph. D. Thesis: Universit´e Paris sud, Paris. [163] Robinson, P. M. (1983) Nonparametric estimators for time series. Journal of Time Series Analysis 4, 185-207. [164] Robinson, P. M. (1989) Hypothesis Testing in Semiparametric and Nonparametric Models for Econometric Time Series. The Review of Economic Studies 56-4, 511-534. [165] Robinson, P. M. (1991) Testing for strong serial correlation and dynamic conditional heteroskedasticity in multiple regression. Journal of Econometrics, 47, 67-84 [166] Rosenblatt, M. (1956) A central limit Theorem and a strong mixing condition. Proc. Nat. Ac. Sc. U.S.A. 42, 43-47. [167] Rosenblatt, M. (1961) Independence and dependence in Proc. 4th. Berkeley Symp. Math. Stat. Prob. 411-443. Berkeley University Press. [168] Rosenblatt, M. (1985) Stationary sequences and random ﬁelds. Birkha¨ user.

314

BIBLIOGRAPHY

[169] Rosenblatt, M. (1991) Stochastic curve estimation, NSF-CBMS Regional Conference Series in Probability and Statistics, 3. [170] Rosenthal (1970) On the subspaces of Lp (p > 1) spanned by sequences of independent random variables. Israel Journal of Math, 8, 272-303. [171] R¨ uschendorf, L. (1985) The Wasserstein Distance and Approximation Theorems. Z. Wahr. verw. Gebiete 70, 117-129. [172] Samorodnitsky, G., Taqqu, M. S. (1994) Stable Non-Gaussian Random Processes: Stochastic Models with Inﬁnite Variance. Chapman and Hall, New York. [173] Saulis, L., Statulevicius, V. A. (1991) Limit theorems for large deviations. Birkh¨ auser. [174] Shao, Q. M. (1993) Almost sure invariance principle for mixing sequences of random variables. Stoch. Proc. Appl. 48 319-334. [175] Shao, Q. M., Yu, H. (1996) Weak convergence for weighted empirical processes of dependent sequences. Ann. Probab. 24-4, 2098-2127. [176] Shashkin, A. P. (2005) A weak dependence property of a spin system. Preprint. [177] Shixin, E. G. (1999) On the convergence of weighted sums of Lq -mixingale arrays. Acta Math. Hungar. 82(1-2), 113-120. [178] Sklar, A. (1959) Fonctions de r´epartition ` a n dimensions et leurs marges. Publ. Inst. Statist. Univ. Paris 8, 229-231. [179] Skorohod, A. V. (1976). On a representation of random variables. Theory Probab. Appl. 21, 628-632. [180] Strassen, V. (1964) An invariance principle for the law of the iterated logarithm. Z. Wahrsch. Verw. Gebiete 3, 211-226. [181] Straumann, D. (2005) Estimation in conditionally heteroscedastic time series. Lecture Notes in Statistics 181, Springer, New York. [182] Szeg¨ o, S. (1967) Orthogonal Polynomials. A. M. S. 23, New York. [183] van der Vaart, A. W., Wellner, J. A. (1996) Weak convergence and empirical processes. Springer, Berlin. [184] Valadier, M. (1973) D´esint´egration d’une mesure sur un produit. C. R. Acad. Sc. Paris 276, 33-35. [185] Viennet, G. (1997) Inequalities for absolutely regular sequences: application to density estimation. Prob. Th. Rel. Fields 107, 467-492. [186] Withers, C. S (1981) Central limit theorem for dependent variables. Z. Wahrsch. Verw. Gebiete 57, 509-534. [187] Wolkonski, V. A., Rozanov, Y. A. (1959, 1961) Some limit theorems for random functions, Part I: Theory Probab. Appl. 4, 178-197; Part II: Theory Probab. Appl. 6, 186-198. [188] Wu, W. B. (2005a) Nonlinear system theory: Another look at dependence. Proc. Nat. Acad. Sci. USA 102, 14150-14154.

BIBLIOGRAPHY

315

[189] Wu, W. B. (2005b) On the Bahadur representation of sample quantiles for dependent sequences. Ann. Statist. 33, 1934-1963. [190] Wu, W. B. (2005c) Fourier transforms of stationary processes Proc. Amer. Math. Soc. 133, 285-293. [191] Wu, W. B. (2006) M -estimation of linear models with dependent errors. Ann. Statist., To Appear. [192] Wu, W. B., Zhao, Z. (2005) Inference of Trends in Time Series. Under Review, JRSSB. [193] Yoshihara, K. (1973) The law of the iterated logarithm for the processes generated by mixing processes. Yokohama Math. J. 21, 67-72. [194] Young, L. S. (1973) Recurrence times and rates of mixing. Israel Journal of Math., 110, 153-188. [195] Yu, H. (1993) A Glivenko-Cantelli Lemma and weak convergence for empirical processes of associated sequences. Probab. Th. Rel. Fields 95, 357-370. [196] Yu, K. F. (1990) Complete convergence of weighted sums of martingale diﬀerences. J. Theor. Prob. 3, 339-347.

Index algorithm Kiefer-Wolfowitz, 143 linear regression, 145 Appell polynomial, 24 associated processes, 53 associated sequences, 225 association, 3, 6 autoregressive processes, 59 Bernoulli shift, 21, 35 causal, 23, 31, 32 noncausal, 24, 25 Volterra, 22, 26 Bernstein block, 230, 242 bracketing number, 98 Catalan number, 80 chaining, 98 concentration, 67 convergence almost sure, 252 complete, 144 ﬁnite dimensional, 202, 207, 230 stable, 205 copula, 234 covariogram, 268, 269 cumulant, 85, 88, 91, 269, 277 dependence I, 67 coeﬃcient, 11, 15 near epoch, 6 projective measure, 19 vectors, 137 dependent noise, 137, 140 diagram, 280 formula, 279

distribution Bernoulli, 2 Gaussian, 2 Rademacher, 206 dynamical system, 38 β-transformation, 41 expanding map, 41 Gauss map, 41 neutral ﬁxed point, 41 spectral gap, 38 empirical process, 98, 223 estimation bias, 249, 258, 259, 261, 262, 264 conditional quantile, 248 consistence, 248, 270 density, 248, 254 functional, 247 kernel, 248 MISE, 254, 259 non parametric, 247 projection, 255, 263 regression, 247, 248 spectral, 265 density, 275 volatility, 247 Whittle, 274 function ρ-regular, 248 bounded variation, 11 Lipschitz, 10 polynomial, 260 Gaussian processes, 55 generalized inverse, 18

317

318 Hahn-Jordan, 11 heredity, 12 independence, 2, 215 inequality (2 + δ)-order, 140 Bennett, 120 Burkholder, 123 covariance, 38, 110, 111, 137 association, 7 Doob maximal, 211 exponential, 82, 93, 239 Fuk-Nagaev, 121, 217 Marcinkiewicz-Zygmund, 77, 140 maximal, 132, 134 moment, 201 (2 + δ)-order, 70 Rosenthal, 79, 95, 126, 130, 131, 138, 139, 227, 237, 251 variance, 254 interacting particle systems, 56 conﬁguration, 56 Feller semigroup, 56, 57 inﬁnitesimal generator, 56 smooth function, 57 invariance principle conditional, 205 Donsker, 200 strong, 215 weak, 199 iterated random function, 33

INDEX mixing, 4, 8 absolute regularity, 4 maximal correlation, 4 strong, 4, 215 uniform, 4, 215 mixingale, 6, 140 model ARCH process, 36, 43 ARCH(∞), 43 autoregressive process, 36 bilinear, 48 GARCH(p, q), 43 inﬁnite memory, 64 integer valued, 61 LARCH(∞) process, 42 bilinear, 42 random ﬁeld, 62 Markov chain, 33 autoregressive, 36 branching, 37 moment centered, 85 cumulant, 85, 92 sum, 91 ODE method, 135 orthogonality, 2, 3, 7 periodogram, 267, 269, 270 integrated, 270 matrix, 276

kernel function, 249 de la Vall´ee-Poussin, 257, 262 Dirichlet, 261, 262 Fejer, 262, 276 wavelet, 263

random ﬁeld, 11, 62, 200, 236 relative compactness, 205, 208

law of the iterated logarithm, 213 functional, 215 linear process dependent innovations, 269 non-causal, 266

tail function, 103 tightness, 98, 200, 205, 209, 224, 231, 232 total variation, 11

martingale diﬀerence, 6 measure (signed), 11

spectral density, 265, 268, 270, 274 matrix, 275 Steutel-van Harn operator, 61

unconditional system, 258